Intelligent Virtual Environments - A State-of-the-art …ruth/Papers/vr-ar/star.pdfThe paper reviews...

1

Intelligent Virtual Environments - A State-of-the-artReport

Ruth Aylett and Marc CavazzaUniversity of Salford, CVE, Salford; University of Teesside, School of Computing and

Mathematics, Middlesbrough,M5 4WT UK; TS1 3BA; United Kingdom

[email protected], [email protected]

Keywords: Artificial intelligence, ALife, virtual agents

1. Why and how VEs and AI/Alifetechnologies are converging

In this report we consider a new field wecall Intelligent Virtual Environments,formed by the increasing overlap of thetechnologies involved in 3D real-timeinteractive graphics environments andArtificial Intelligence/Artificial Lifetechnologies.

The impetus for this field comes from anumber of directions. Firstly, as theamount of processing power available forrendering has increased, it has now becomefeasible to devote a little of it to aspects ofa VE beyond visual realism. Howevervisually appealing a VE is, if it is static andempty of change and behaviour, theimmersive experience is of limited interest.Cities without people cannot make the userfeel a real sense of presence. Whileanimation can create dynamic interest, itspre-scripted nature runs against theinteractional freedom a VE gives users andcan only hold their interest for a limitedtime.

Alongside the availability of processingpower, and the desire to make VEs moredynamic and interesting, has come theirapplication to industrial domains in which

the direct interaction metaphor ofimmersion runs out of steam. For example,a model of the RAF Tornado, created inthe late 1990s by the company VirtualPresence, has thousands of components,far too many for the realistic visualisationthat has been developed to be all a userneeds in understanding what each does orhow particular tasks need to be carried out.

The growth of e-commerce on the webis producing a similar requirement for usersupport. While 3D content is still rare, it isclear that users require intelligent assistancein many cases, and 2D interface agents arebeginning to give way to 3D 'TalkingHeads', which are also being extended tonew media such as interactive digitaltelevision. In the same way, large-scaledistributed chat environments are movingto 3D graphics and finding the need forintelligence to support their user avatarpopulations. In turn this technology isbeing applied to long-standing research inComputer-Supported Cooperative Work.

The growth of processing power hasalso affected the computer games industry,making it much more difficult to competeon visual realism alone. As a result theincorporation of AI technology into

AbstractThe paper reviews the intersection of AI and VEs. It considers the use of AI as a component ofa VE and Intelligent Virtual Agents as a major application area, covering movement, sensing,behaviour and control architectures. It surveys work on emotion and natural languageinteraction, and considers interactive narrative as a case-study. It concludes by assessing thestrengths and weaknesses of the current state-of-the-art and what might take it forward.

mailto:[email protected]

mailto:[email protected]

computer games – albeit in ad hoc andpiecemeal fashion – is now widespread,with Creatures and The Sims examples ofnew genres entirely dependent on the useof AI and ALife. Given the impetus thatcomputer games have given to thedevelopment of 3D graphics in general, itwill be interesting to see if the industry hasthe same accelerating effect on thecombination of AI and ALife with 3Dgraphics.

The incorporation of virtual humans inparticular also opens the way to the use ofVEs for applications which up to now haveused 2D simulations. Evacuation ofbuildings in emergencies and crowd controlin urban spaces are obvious examples, whiletraffic simulation is also moving towards3D and intelligent virtual cars. Theconcept of Digital Biota – 3D graphicallybased ecology – is also under activeinvestigation. ALife technology is alsobeing applied to VEs by artists, from theonline ecology of TechnoSphere to thegenetic algorithm- generated FeepingCreatures.

On the side of AI and ALife, groups ofresearchers are coming to recognise VEs asa powerful testbed for their technologies.Real-time interaction in a visuallycompelling virtual world is both moremotivating than the text-based interactionof earlier research and – by engaging theuser’s senses – also more supportive,grounding natural language interaction, forexample, in a shared visual context.Intelligent tuition systems can move intomulti-modal interaction and incorporate anembodied tutor. Work on narrative andstory-telling is also able to extend from thenovelistic and purely text-based into dramaand virtual theatre.

VEs also allow experimentation withcomplex embodied agents without all theproblems of dry joints, sticking wheels andlimited battery time that often frustrateresearchers in robotics. With less effortrequired to develop basic controlarchitectures – often adapted directly fromrobotics – it becomes possible withintelligent virtual agents, or syntheticcharacters, to investigate the modelling ofemotion and the basis for agent socialbehaviour. This work can be carried out atboth the behavioural level, incorporatingbody language and gesture, and at the

cognitive level, using planning andemotionally-based inferencing.

AI technologies are also being used insome cases to deal with complexity orprocessing overhead. Once physics isincorporated into a VE, the overhead ofsolving large sets of equations by analyticalmethods may severely impact therendering cycle. The use of neural nets canallow the functional mappings required tobe learned off-line and effectively held in adynamic lookup table. In the same way,synthetic characters with complex skeletalstructures can use AI learning technologyto acquire relevant movement abilitieswhere explicitly programming them wouldbe a difficult task.

In discussing these areas in the reportthat follows, one should note that thoughthe convergence we have just discussed isreal, nevertheless researchers from each ofthe main communities involved haddifferent starting positions and, still, ratherdifferent perspectives. Thus the state-of-the-art is far from homogeneous oruniversally agreed.

2. Intelligent Virtual RealitySystems

The term of Intelligent VirtualEnvironments, as this review illustrates,encompasses a broad range of topics andresearch areas. However, systems that aimat integrating Artificial Intelligence (AI)techniques into the virtual environmentitself constitute a specific kind ofapplication, that we propose to nameIntelligent Virtual Reality Systems (IVRS).

In these systems, intelligence isembedded in the system architecture itself,by incorporating AI algorithms into thevirtual reality system.

The inclusion of an AI layer in a virtualenvironment can be justified from severalperspectives:

- adding a problem-solvingcomponent to the virtualenvironment, for instance inconfiguration, scheduling orinteractive design applications

- building a knowledge levelsupporting conceptual scenerepresentation, which can supporthigh-level processing of thegraphic scene itself, or interface

3

with natural language processingsystems

- describing causal behaviours in thevirtual environment, as analternative to physical simulation

- enhancing interactivity, i.e. byrecognising user interaction interms of high-level actions todetermine adaptive behaviour fromthe system

Though IVRS have been in developmentfor the past few years, they are still to beconsidered an emerging technology andmany research problems remain to besolved.

One key aspect from the user-centredperspective is that the current AItechniques are often much less interactivethan would be required for a completeintegration into virtual environments, dueto the difficulty of developing real-time(and/or reactive) AI techniques. In otherwords, the Holy Grail of IVRS researchwould be to have anytime AI techniqueswhose result production granularity wouldbe compatible with the sampling rate, notso much of the visualisation, but of the userinteraction with the virtual world objects,which itself depends on the application.

To date, very few IVRS systems havebeen described. The first one is a part ofthe Oz programming system into the DIVEVR software [1]. Though Oz supportsconstraint programming, theimplementation reported is used as ageneric high-level programming languagefor system behaviour rather than forembedding problem-solving abilities intothe virtual environment. More recently,Codognet [20] has developed a genericconstraint package, VRCC, which is fullyintegrated into VRML and used to definehigh-level behaviours for agents throughthe specification of constraints. Bothsystems rely on Constraint LogicProgramming (CLP).

Current research has put forward CLP asa good candidate technique to supportIVRS, essentially for two reasons: itcomputes solution fast enough to fit theinteraction loop and it can accommodateincremental solution. Even though recentimplementations of CLP can be used indynamic environments, CLP is not, strictlyspeaking, a reactive technique. However,they permit fast solutions even with large-scale problems. This makes it possible in

many cases to have a response time whichmatches the user interaction loop. As aknowledge representation formalism, CLPsystems are only moderately expressive,being based on finite domains, though theyretains the declarative aspect of otherformalisms. As a result, knowledgeacquisition may be more tedious than withrule-based systems. Fortunately, the typeof knowledge to be represented in IVRSoften includes a strong spatial component,which can be captured by the constraintformalism. Knowledge representation usingCLP in IVRS is however only just emergingas a research topic, and significantadvances in the near future should not beruled out.

Other AI approaches are possible, suchas heuristic repair and local search, whichare alternatives to the basic mechanismsused in constraint programming, such asarc-consistency. For instance, heuristicrepair works on non-consistent allocationof variables by “repairing” the variable thatcauses the greatest conflict. This could beexploited in IVRS in the following fashion:when a pre-existing solution is disrupted byacting on a single object, heuristic repair isa good candidate solution and can provide aquick solution. This new direction has beensuggested recently by Codognet [21].

IVRS applications integrate real-timeproblem solving algorithms into VE. Theyrely on the close integration between thenatural interactivity of the VE in terms ofuser centered visualisation and objectmanipulation, and the interactive aspectsof the problem solving AI algorithms. Forthose applications in which there is anisomorphism between the spatial layoutand the problem space, like configurationproblems, the VE can actually beconsidered as a visual interface to the AIsystem. This is most often the case whenthe constraints to be satisfied by objectconfiguration have a major spatialcomponent.

We have ourselves developed an IVRSusing GNU Prolog (which includes finitedomain constraints) and the game engineUnreal Tournament [13]. The system is aninteractive configuration system to be usedin interactive building design. Constraintson the placement of building elements aredescribed using finite domain constraintsand a constraint solver written in GNU

Prolog produces a solution in terms ofobject positions. This solution isimmediately displayed in the virtualenvironments, i.e. 3D objects are createdmatching the solutions. As the objects arepart of an interactive environment, theycan be manipulated by the user who candisplace them to explore visually newconfigurations and design solutions. Userinteraction will change the nature of thedata and disrupt some of the constraints.The new configuration created by the usershould directly be analysed by the systemto produce a new solution. The newposition information is passed in real-timeto the constraint solving programme via aTCP socket and the programme computesa new solution satisfying the new set ofconstraints.

As a result, as seen from the userperspective the interaction with someobjects part of the solution configurationresults in a new solution being computedand other objects being re-configured bythe system to maintain a coherentsolution.

This form of interactivity naturallyrelies on the interactivity of the solutionitself. However, constraint programming initself is not a reactive technique: itemulates reactivity because it can produce asolution quickly enough. The interactioncycle is determined by the speed at whichnew solutions are computed. In otherwords, the sampling rate of objectmanipulation in the virtual environmentmust be compatible with the resultproduction granularity of the problemsolving algorithm. We are currentlyexperimenting with more complexconfiguration problems. As constraintprogramming has proven able to solveefficiently problems involving a largenumber of variable, it should be able tosupport an appropriate level ofinteractivity.

3. Virtual humans and syntheticcharacters

The biggest single area in which AI and VEsoverlap is undoubtedly that of virtualhumans and synthetic characters. Suchcharacters need not be human - they couldbe abstract, [59], mechanical [52] orfictional like Creatures [27], Woggles [37]or Teletubbies [2]; they could be animalssuch as birds [53], fish [6], dolphins [40] or

dogs [8]. As virtual humans, they mightvary from the highly naturalistic VirtualMarilyn [63] to the physically stylistic butcognitively endowed pedagogical agent [54,55].

From an AI perspective, these areembodied agents, and research in robotics istherefore highly relevant. However thegraphics perspective starts from animatedfigures and often results in a slightlydifferent emphasis. In this report we adoptthe term intelligent virtual agent (IVA) tocover these entities. Note that the termavatar, which originally referred to theuser's representation in a VE and thereforeto a graphical representation with noautonomy and limited actuationcapabilities, is currently sometimes used torefer to any humanoid agent in a VE. Wedo not extend the term like this: here anavatar and an IVA are used as separateterms. A basic yardstick for IVAs is oftentaken to be believability [7], an ill-definedterm for evaluation purposes which mightbe considered the degree to which an IVAsupports or undermines the user's overallsense of presence. It is important tounderstand that this is not the same thingas naturalism, and indeed naturalism andbelievability may conflict under somecircumstances.

If we consider the different aspects ofan IVA, a wide range of issues emerge,which we will consider in succeedingsections. First, an IVA has a body whichmust move in a physically convincingmanner in order to support believability.Thus it must have both a body surface, anda body geometry, equivalent to a skeleton(though usually very much simpler than inthe real-world case), and some behaviourusing that body.

Secondly, in order for an IVA to appearresponsive to its environment, there mustbe some kind of coupling between the IVA'sbehaviour and the state of the VE, whetherthrough virtual sensing or some othermeans. The IVA's behavioural repertoiremay vary depending on the application: insome cases focussing very much onphysical interaction with the environmentand in other cases more on cognitivebehaviour, usually expressed throughspeech or at least natural language. SomeIVAs can be characterised as 'talking heads',that is their body, nearly always humanoid,

5

stops at the shoulders: and their behaviouris therefore almost entirely language-based.Finally, the IVA's behaviour must bedirected by some means, requiring somecontrol architecture within which thedegree of autonomy displayed by the IVA isan issue.

Conceptually these aspects of an IVAare brought together in personality. AsMartinho [40] points out, one may viewthe problem of creating an IVA personalityfrom the perspective of ALife, as one ofcreating the conditions under whichpersonality will emerge bottom-up.Alternatively one may view it from a moreartistic point of view as an authoring ordesign issue, where the IVA is to bedesigned with respect to a pre-definedpersonality. In the current state-of-the-artboth approaches are tried by differentresearchers.

Pragmatically, the different aspectswould be united in an authoring tool whichwould incorporate them all if such a thingexisted. But one measure of the relativeimmaturity of this area is that no such tooldoes exist, and researchers and developersare forced to integrate a number ofdisparate systems anew for eachapplication.

3.1 Moving the body.Creating a visually compelling body for anIVA still in general requires the designer towork with a 3D modelling package such as3D Studio Max or Maya. Specialistpackages support the creation ofhumanoids, ranging from Poser at the lowend to the Boston Dynamics DI Guy orTransom Jack at the top end. Thesepackages support editing at the low level ofsize and shape; there is no package thatmight modify a body according to high-level semantic categories ('menacing' 'shy'),though one can conceive of such a thing.

IVA movement comes in a variety offorms and though it can be viewed from apurely functional perspective asdetermining what an IVA can do in a VE, itmay be at least as significant as acontribution towards personality andexpressiveness. This of course has beenknown to animators since the origin ofcartoons, and animation technology,entirely prescribed and therefore requiringno intelligence, is still widely used, not leastbecause it is widely available through

standard 3D modelling packages as well asthrough specialist packages such as Poser.However in a VE, as distinct from a film,animation has two major disadvantages.

Firstly it is extremely time-consumingand therefore expensive, even using key-frame techniques with some level ofintelligence for interpolation. It has beensaid for example that the character Woodyin the film Toy Story, which had ageometry with 700 degrees of freedom,200 of these for the face and 50 alone forthe mouth, required the effort of 150people at Pixar in a week to generate 3minutes of animation. A film seen bymillions of paying customers can recoupthis cost, a Virtual Environment cannotunder current market conditions. Secondly,the fixed nature of animation means thatits novelty is limited, again, unimportantin a film viewed as a linear spectacle, but amajor problem in a Virtual Environmentwhere the user wanders and interacts at will.

The issue of time and effort has beentackled initially using motion capture(mocap), especially as mocap studios havedropped in price and become more widelyavailable .partly due to their extensive usein computer games. The ability toconstruct novel sequences from a mocaplibrary as discussed below in 3.4 may thenbe incorporated, Aside from the issuesraised by scripting however, mocap also hasobvious limitations in terms of coverage -mainly humanoids, certain ranges ofmovement and almost always flat surfaces.

It usually concentrates on movementfor mobility - walking, crawling, running -where generic motion is likely to succeed,and avoids agent-agent (embracing, shakinghands) and agent-object interaction(grasping, lifting, throwing) which varymuch more according to the actualsituation. For obvious reasons it does notextend to socially coordinated movementin groups of agents. Finally, while inprinciple such a library could also containexpressive movement (stumbling alongwearily, striding along confidently) thecombinatorial and indexing issues make thisimpractical.

AI and ALife technologies becomemore applicable where self-animation orbehavioural animation [53] is applied. Inself-animation, movement is drivendirectly by the agent's control system much

as the movement of a robot would be in thereal world. This normally involves thecreation of a physical model of the virtualmechanism and the use of inversekinematics to produce the correctmovement in the mechanism for theposition desired at each rendering cycle.The physical model can be driven by inputfrom the VE, giving movement which isappropriate to the given moment ratherthan pre-scripted.

However producing fluid movement in amechanism with a large number of degreesof freedom is correspondingly a verycomplex task, while the standard methodof calculating inverse kinematics involvesinverting a matrix, which is bothcomputationally demanding and is affectedby singularities in the solution space wheremotion becomes undefined. Wheredynamics are also important, as in theartificial fish of Terzopolous and Tu [61] -see Figure 1 - the model may be aselaborate as a linked spring-mass systemand thus even more computationallydemanding, as well as usually non-linear.

Good results have been obtained inreducing the computational demand byapplying learning algorithms. Thecomputational demands of the physicalmodel can be reduced by learning itsbehaviour, especially given thatapproximation is normally quite adequate.Neural net technology is an appropriatechoice [29] since it is able to learn an

arbitrary functional mapping from atraining set generated by the physicalmodel off-line. A two-step process can beapplied [30] in which one neural net learnsthe forward model, from agent actions tothe state transitions these produce in thephysical model, and then this neural net isused to train the backwards system whichmaps state transitions back onto actions.More radically, the agent's control systemitself can be learned as discussed below.

Behavioural Animation was a termcoined by Reynolds in his seminal work onflocking and herding [53]. He demonstratedwith his boids system that complex socialmovement could be modelled by equippingeach agent with a set of simple rules, suchas keeping a minimum distance fromneighbours and obstacles, matchingvelocity (speed and direction) withneighbours and flying towards the centre ofmass. Note that if one wishes to be trulyagent-centred, centre of mass ought to bedefined as perceived centre of mass ratherthan being calculated globally as in someapplications of this work. In the currentstate-of-the-art, this approach has beenapplied successfully to stampeding animalson film, as in The Lion King and JurassicPark; as well as to schooling of fish [61]and other animal social movement. Takingit further might involve extending the rulesto include formations, internal state and awider range of environmental stimuli.

Work has also taken place on social

FIGURE 1: Self-animating fish

7

movement among humans, although hereinternal state is clearly of much greaterimportance unless very large crowds arebeing modelled as for example in egressdesign for stadia, when particle systemshave sometimes been used [10]. rather thandrawing on ALife or AI. In dealing withcrowds in urban landscapes, rule-basedsystems have been used to generate goaltrees influencing the movement ofindividuals [57] - for example an agentwith a goal to catch a train must have asubgoal of moving to the station andbuying a ticket in the ticket office. This isnot incompatible with flocking but can beused to motivate it. Note however that theapplication of a global rule-set to individualagents is an alternative to an integratedarchitecture for an agent that includesmotivations, of the type discussed below.

Once humanoid IVAs are involved,movement clearly requires more than low-level calculation on the geometry orphysical modelling, since humanmovement merges the voluntary and goal-directed [33], with the involuntarymovement that is driven by physiology orbiology. If movement is to be generatedfrom goals - sometimes described as task-based aniimation - a representation thatmediates between the level of symbolicgoals and natural language and the executedmovement is needed.,

Badler's Parameterized ActionRepresentation or PAR [3] is one suchrepresentation. A PAR holds twelve fieldswhich mix the temporal and spatial controlof a movement with specifications ofaction decomposition, applicabilityconditions and expected effects familiarfrom AI Planning. A PAR also referencesany objects involved and the agentexecuting the movement and allows theagent manner to be defined, with adverbssuch as carefully and quickly beingtranslated into low-level motion. A PARcan be thought of as an interface to a high-level scripting system, and has in fact beenused like that, but is not incompatible withan autonomous agent. Note however thatits very flexibility is likely to put asubstantial burden upon the designer whomust specify all these parameters.

Work in conversational agents [Cassell00] identified a class of human movementsrelated specifically to communicative acts,drawing on observational work inpsychology [43] in which the gestures usedby real subjects during conversation orstory-telling are noted. This workidentified categories of gesture such asiconics , which represent features of topicssuch as the size and shape of objects, beats,which occur with accented words and turn-taking behaviour, and emblems, gesturescarrying (culturally variable) knownmeanings, such as thumbs-up or obscenity.Categorising movement like this with acoarser granularity allows it to be linkeddirectly into natural language via anannotation system, though this is stillusually hand-coded. At least one talkinghead [48] has taken this approach, and hastried to automate the annotation stepthrough the use of Rhetorical StructureTheory as shown in Figure 2.

However the expressive results ofsynthetic systems using this approach arestill less convincing than the output ofskilled animators [Chi et al 00], and thishas led to an alternative approach based onLaban Movement Analysis [34]. Here.movement is characterised at a much lowerlevel by effort and shape components, thusavoiding the issue of what counts as agesture and what does not. This approachhas been taken up by Badler and his groupand is neatly incorporated into the PARformat discussed above [18].

Once agent-object and agent-agentinteraction is included, the problems ofgenerating movement from first principlesbecome very much more complex. Theapproach that has been very widely takento this problems is that of smart objects[49], in which the interaction knowledge ispart of the specification of an object in theVE rather than a demand made upon theIVA. Thus an IVA which passes through adoor does not have to perform complexreasoning in order to accommodate itshand to the shape of the doorknob - apredefined handshape can be stored withthe door object.

This means that objects must be definedby more than a collection of polygons inthe scene graph, and are for exampleannotated with interaction features whichare associated with the object'sfunctionality [31]. This technique iseffective but has clear limitations since inits basic form it requires every agent tointeract in the same way - not a problemwith doorknobs, but difficult to generalise.Consider agents with hands of differentsizes, and different skin tones, or somewearing gloves. Then the interaction eitherrequires a transformation to be passed fromthe interaction feature to the agentgeometry or the geometry of the hand tobe passed to the interactional feature.

3.2 Responding to the environmentIt is noticeable that there is far more workin IVA movement than in virtual sensing.While movement is an indispensableattribute, the nature of a VE, in which thedesigner can produce agent movementfrom the top, with a god-like perspective,and use the scene graph as an omniscientsource on events and attributes, means thatvirtual sensing need not be included at all.In computer games for example, monsterscan usually detect a player through walls byaccessing his or her position from the datastructures and the application of some AIalgorithms such as A* to path planning insuch environments typically depends on aglobal map. This can adversely impactgameplay by making opponentsunrealistically persistent or difficult tocombat.

However, once the requirement forbelievable sensing is established, it seemsthat it is actually more difficult to work outwhat an agent should be able to perceivefrom an omniscient position than it is toequip the agent with virtual sensors whichas a direct result deliver the appropriateinformation [50]. Gaze is also a potentcommunicator of attentional focus, and anIVA with virtual eyes will use gaze in orderto direct its virtual sensing in a natural andrealistic manner.

The extent to which virtual sensing isactually applied varies. At one extreme,the artificial fish discussed above, wereequipped with a model of the primate visualsystem [62]. This involved a binocularprojection of the 3D world onto the 2Dvirtual retinas of the fish, which weremodelled with high resolution foveas andlower resolution peripheries and moved bya set of virtual motor lenses, implementedas four virtual coaxial cameras. The systemapplies an active vision principle: (in thesense of the term in the AI Visioncommunity); using incoming colour data,the fish control system sends controlsignals to stabilise the visual system duringmovement. Interesting objects in theperipheral vision are identified by theircolour histogram and the eyes saccade tolocate the object first in the fovea and thenin the centre of the eye. Stabilisation wascarried out for small displacements bycomputing the overall translationaldisplacement (u,v) of light patternsbetween the current foveal image and that

FIGURE 2 From Rhetorical System Theory toFacial expression

9

from the previous time instant, andupdating the gaze angles to compensate.Large displacements produced re-foveation.

The virtual dog, Silas [9] incorporated aless complex artificial vision system thatused optical flow to control navigation,again very much an active vision idea.Again, the 3D virtual world was mappedonto 2D as if through an RGB camera, withfalse colouring used to render the current-object-of-interest. A motion energycalculation was separately carried out onleft and right halves of the 2D field, usingpixel movement between frames and apseudo-mass weighting. The difference intotal flow energy between left and rightfields was then input into the controlsystem for steering, with an absolutethreshold to stop Silas crashing head-oninto walls.

At the other end of the spectrum, theSTEVE pedagogical agent [54,55] has asensorimotor system which monitors thecommunication bus in a distributed systemlinking the agent to the virtual world formessages describing changes in terms ofobjects and attributes. These messages areused to update a symbolic world model onwhich STEVE's reasoning operates. Gaze isthen unnecessary for perception since theagent has 'eyes in the back of his head',which is however useful in a pedagogicalprocess (as many real teachers wouldagree). However gaze still turns out to beimportant for believability and isincorporated as a planned action within thereasoning system so that STEVE looks atobjects that his bodily representation isinteracting with and also at the user wheninteracting using speech.

Somewhere between these two extremesare virtual sensors inspired by roboticsrather than by living things, modellingactive sensing systems such as infra-red andultra-sound rather than passive ones likevision. This is an obvious step where a VEis being used for virtual robotics, but is easyto carry over into other IVAs [2]. Aninfra-red sensor can be modelled verysimply by a line or cone attached to thegeometry of the agent which gives thedistance of any object with which itintersects within its range. This can be usedfor obstacle avoidance in the same way asfor a real robot, but can also be used to givethe agent information about the identity of

objects which would not normally beavailable in the real world without a greatdeal of processing. In the same way, thebounding boxes available in most VEtoolkits can be used so that when two agentbounding boxes intersect they exchangeinformation about identity and possiblyinternal state.

The state-of-the-art covers thespectrum discussed above for virtual visionand active robot-like sensors, but verymuch less work has been carried out on anyother kind of virtual sensing. Though soundmodels exist within various VE toolkits,there is little evidence of virtual hearingbeing modelled, though one exception hereoccurs in the computer game Thief! wherethe user tries to steal goods from locationsguarded by virtual characters who will onlyhear noises generated by the player withina certain range. Smell has been investigatedby a few groups in relation to the humanuser of a VE, but there seems to be littlework as yet (but see [23]) using virtualsmell or virtual noses, though this would fitwell into some of the more ALife-orientedIVAs.

Outside the scope of this particularreport is the use of vision technology inparticular to provide a link between IVAsand the human user. Since a VE is beingrendered in relation to a user's position andorientation, that information is availableto IVAs, but the non-intrusive capturing ofgesture and facial expression is an activefield where much remains to be done.

3.3 Intelligent BehavioursFor an IVA, intelligence comprises severalcomponents, such as sensing, learning,natural language communication andreasoning, all to be integrated. In theremainder of this section, we willessentially discuss reasoning abilities and itsintegration with physical action. Theintegration principles we introduce are alsovalid for virtual sensing, just described.Natural language communication inconjunction with high-level reasoning willbe discussed in the next section.

From a generic perspective, Intelligentbehaviour consists in determining the bestsequence of actions to be executed in thevirtual environment, taking intoconsideration the agent’s goals and theenvironment’s resources. For thesereasons, AI planning techniques are a good

formalism to analyse the problem as well asa generic technique to implement agentbehaviour. Planning as a hierarchicalapproach is also in a good position to co-ordinate lower levels of control andanimation of virtual humans. Advancedresearch in the field is indeed mostly basedon planning techniques, though in somecases the actual implementation mightresort to simpler formalisms that appear ascompiled plans.

Planning techniques support a“cognitive” level that should controllower-level animation. This is ageneralisation of techniques developed inthe area of virtual human animation. Forinstance, Perlin & Goldberg [49] explicitlyintroduce a distinction between low-levelanimation control and high-level behaviourmodules in their Improv systemarchitecture. Similarly, Magnenat-Thalmann & Thalmann [39] introduce adistinction between general motion controland more complex behavioural patterns.

They resort to “Displacement LocalAutomata” (DLA) for the high-levelcontrol of motion form a goal-orientedperspective. Because of their directrelations to Scripts [56], DLA can also beconsidered as some form of precompiledplans, without the flexibility that plansnormally support. A sophisticatedarchitecture for animation control has beendeveloped by Badler et al. [4], which will bedescribed in the next sections.

To state the problem in the simplestpossible terms, we suggest three levels ofdescription for an agent’s behaviour, whichdeal with motion, action and intentions.Motion corresponds to the actual physicalsimulation in the virtual world, action tothe patterns of movements required toexecute specific actions and intentions tothe high-level goals that the agent ispursuing.

Integrating the various levels ofdescription, starting with the AI level, isthus a major endeavour in the developmentof intelligent virtual humans. We will firstillustrate the integration problem with a“historical” example, before pointing atthe most recent research in the field.

Early forms of integration used to takeplace directly between an AI algorithm andthe animation primitives, as can beillustrated with the case of path planning.

Path planning consists in finding anoptimal path (generally the shortest one)between a starting point and a destinationpoint in a virtual environment, avoidingobstacles.

Traditionally, path planning has beensolved using a heuristic search algorithmsuch as A* [3,6] directly coupled with thelow-level animation of the agent. The useof A* for path planning is based on a two-step process. The virtual environment isfirst discretised to produce a grid of cells.This grid is formally equivalent to aconnectivity tree of branching factor eight,as each cell in the discretised environmenthas eight neighbours. Searching thisconnectivity tree with A* using a distance-based heuristic (Euclidean distance orManhattan distance) produces the shortestpath to the destination point (Figure 3).

FIGURE 3 Path Planning

This path is calculated offline, as A* isnot a real-time algorithm, and the agent issubsequently animated along this path. As aconsequence, this method cannot beapplied to dynamic environments. Thisdirect integration of A* with low-levelanimation primitives is faced with anumber of limitations. For instance, it isnot connected to high-level decisionmaking, nor to the agent perception.Monsieurs et al. [45] have proposed to useA*-based path planning in conjunctionwith synthetic vision for more human-likebehaviour, while Badler et al. [4] haveincorporated path planning into generichigh-level behaviour planning.

Though the example of path planning isbased on an AI algorithm, the integrationachieved is rather weak and it clearly doesnot cover properly the levels ofdescription we have introduced.

3.4 IVA control architecturesTypically there is still a gap between themethods adopted by researchers fromgraphics backgrounds for controlling an

11

IVA and those favoured by researchersfrom AI and ALife backgrounds. Thedividing issue is often one of artistic ordirectorial control versus agent autonomy,with many researchers who have movedfrom animation still favouring variouskinds of scripting where AI and ALiferesearchers often think in terms of sensor-driven behavioural control or of goal-driven action supported by symbolicreasoning. There are many definitions ofautonomy in different fields, frompsychology and philosophy to animation.Here we adopt the robotic definition thatan autonomous agent has a sense-reflect-act cycle of its own operating in real-timein interaction with its environment. Theamount of autonomy possessed by an IVAis therefore related to its controlarchitecture.

Scripting has been a very popularmethod of allowing developers and authorsto control IVAs at a higher level than thatof conventional animation. One of the bestknown of such scripting systems wasIMPROV [49] which was intended tosupport the directorial level control ofvirtual actors. Though not a 3D system,the scripting language of the MicrosoftAgent toolkit has also been widely used,and the humanoid toolkits of TransomJACK and Boston Dynamics' DI Guy bothsupply scripting languages.

Scripting languages act upon libraries ofbehaviours, but it is important for the sakeof clarity to understand that behaviourhere does not correspond to the use of theterm in AI and ALife. In this context itusually refers to a pre-scripted animationor sometimes chunk of motion capture,such as walk, run, look-left.

In contrast, the term behaviour in AIrefers to a sensor-driven control system[11,12] where incoming stimulus is tightlycoupled to outgoing reaction. An emergentbehaviour such as run would be produced bythe reaction of the physical system to thestrength and type of the control signalgenerated at any particular moment by thesensory stimulus. Scripting supposes anauthor or director who determines thebehaviour of agents, and not an agent-centred mechanism where the agentautonomously controls itself

The important issues in scriptinglanguages are then those of making smooth

transitions from one animation to anotherand deciding which animations can becombined and which cannot, allowing anagent to 'walk and chew gum' but not crawland run simultaneously.. One model forcontrolling graphical representations inthis flexible way is that of ParallelTransition Networks or PaT-Nets [3];other human animation systems haveadopted similar approaches. In a PaT-Net,network nodes represent processes whilearcs contain predicates, conditions, rules,or other functions that cause transitions toother process nodes. Synchronisationacross processes or networks is effectedthrough message-passing or global variableblackboards.

The conditional structure used in PaT-Nets gives them more power than just theparallel organisation and execution of lowlevel motor skills. It provides a non-linearanimation model, since movements can betriggered, modified, or stopped bytransition to other nodes rather than beinglaid out linearly along the timeline as intraditional animation. This is a step towardautonomous behaviour since it enables anagent to react to the environment andcould support autonomous decision-makingcapabilities.. However it is a very low-levelrepresentation which is encoded in aprogramming language and is therefore notvery accessible to the non-expert. It wasfor this reason that the PAR formalismdiscussed in 3.1 above was developed at ahigher level.

Scripting is one method of dealing withthe problem known in AI as actionselection: which of the many things anagent can do at any moment is the rightthing to do? A solution at the level ofsymbolic representation might involve AIplanning, the use of reasoning mechanismssuch as the situation calculus, or lower-levelmechanisms such as the rule-basedproduction system, in which rules whichmatch a current world condition may firesubject to conflict resolution amongcompeting alternatives. The importantissue, as discussed in 3.3, is how these levelscan be smoothly integrated with the rest ofan agent system right down to theanimation level.

Recent research at the University ofPennsylvania has produced the mostcomprehensive framework for intelligent

virtual agents to date. This frameworkintegrates all the components for high-level reasoning, action patterns andphysical simulation of motion. Mostimportantly, it is strongly based on up-to-date AI planning techniques. The startingpoint for this research was the observationreported above that realistic behaviourcould not be based on procedural animationonly, as an embodied intelligent agent hasto deal with a changing environment.Simultaneously, another motivation for thedevelopment of high-level behaviourtechniques was the investigation of naturallanguage control of character animation.

Some of the components of thisintegrated approach have been discussedalready as we will see.. For a more detailedpresentation, we refer the reader to theoriginal literature from the project [4,66].

The objective of the agent’sarchitecture is to support the integration ofsensing, planning and acting. It comprisesthree control levels:

- An AI planner based on the agent’sintentions, using incrementalsymbolic reasoning

- The Parallel transition networks(PaT-Nets) discussed above as anaction formalism.

- A Sense-Control-Act (SCA) loopperforms low-level, reactive controlinvolving sensor feedback andmotor control

The SCA or behavioural loop isanalogous to a complete behaviour in asubsumption architecture [11,12]: it ismainly used for locomotion reasoning.

Pat-Nets are used to ground the actioninto parametrised motor commands for theembodied agents, which constitute thelowest level of description (the “motion”level). They are invoked or generated bythe planner.

The planner is in charge of high-levelreasoning and decision making. It is basedon the “intentional” planner ItPlans [26].ItPlans is a hierarchical planner in whichexpansion takes place incrementally, onlyto the degree necessary to determine thenext action to be carried out. This makes itpossible to interleave planning withexecution taking into account the dynamicnature of the environment. The theoreticalanalysis behind the planner is based on thenotions of intentions and expectations

[67]. Geib and Webber [25] havedemonstrated that the use of intentions inplanning for embodied agents was notcompatible with the traditional definitionof pre-conditions in planning. Theysuggested replacing traditionalpreconditions with situated reasoning, i.e.,reasoning about the effects of performingan action in the agent’s environment. Thisapparently theoretical point actually hasimportant consequences for the effectivecoupling of planning and actionperformance, and situated reasoningenables the planner to make predictionsabout the results of executing an action,without having to maintain a complexmodel of the world. An example using thisapproach is discussed below [Funge 98].

Several demonstrators have beenimplemented with this framework, such asa “hide-and-seek” simulator based onintelligent agents [4] and a militarytraining system for checkpoint controlfeaturing virtual soldiers, the latter alsoaccepting natural language instructions toupdate the knowledge of virtual actors(Figure 4) .

FIGURE 4 Checkpoint control appl.

At the other end, plans can alsointerface with abstract specifications ofbehaviours through intentions and high-level goals: this, in particular, makespossible to accept behavioural instructionin natural language. The determination ofan agent’s behaviour through naturallanguage will be discussed as part of thenext section.

An example that uses both planning anda rule-based inference engine is the STEVEsystem already mentioned in 3.2 [54,55].The cognitive component wasimplemented in Soar [35] a mature genericAI architecture. A STEVE agent acts as atutor or demonstrator in a domain where

13

the student is trying to learn correctprocedures, that is, sequences of actions,for operating machinery.

STEVE thus requires a representation ofa plan, a correct sequence of actions. Astandard AI planning formalism is used [68]for this, with actions represented as nodesin a partial ordering over the causal linksbetween them. A causal link occurs whenthe effect of one action (the goal that itachieves) is the precondition for theexecution of the next. Actions themselvescan be primitive, that is executable by theagent, or expandable, that is, reducible bymore planning to a set of causally-linkedactions.

Primitive actions are passed to asensorimotor component, which alsomonitors execution, although ascommented above this is some way fromtrue virtual sensing. STEVE is also verysimple graphically, represented either by afloating torso or by a simple detachedhand, so that the motor actions requiredare relatively straightforward.

This is less true of a second example,the Merman of Funge [24]. Here, adifferent formalism, the situation calculus[42], is used as a cognitive layer over thesensing and motor systems developed forthe artificial fish already discussed [61].The situation calculus is a first-order state-based logic using sorts (snapshots of acurrent state) and fluents (a world propertythat can change, taking a situation as itsargument).

In the Merman system, a fast-swimmingshark whose behaviour is driven by a fewsimple rules, tries to catch a slower-swimming merman, whose cognitive layerallows him to reason about possible futuresituations. In an underwater scenecontaining rocks behind which the mermancan hide. This cognitive layer allows themerman to predict the shark's behaviourand pick a rock which conceals him. In asecond scenario, the Merman has a pet, andthe cognitive layer also allows him topredict whether an attempt to distract theshark from eating the pet will prove fataland should not be attempted.

Intelligent actors based on other AItechniques than planning have beendescribed: for instance rule-based systemslike the Soar system have been used to

develop intelligent “Quakebots” [36]. Themain advantage of rule-base systems is thatthey can benefit from efficientimplementations and support fast reactiontimes for virtual humans. Rule-basedsystem can implement hierarchicalbehaviour, though some explicitrepresentational properties are often lostin the process. Plans, on the other hand,tend to provide a readily visiblerepresentation that can serve as a resourcefor various types of actions. And, as wehave seen, their hierarchical naturefacilitates their integration with actionpatterns and physical simulations.

Both STEVE and the Merman areexamples of agents with goal-drivencognitive components which allow them toselect their own actions on the basis of afuture projected state. However ALIFE hasbeen more concerned with insects andanimals than with huimans, and conceptssuch as drives which are more relevant thangoals for these creatures.

Behavioural architectures aresubsymbolic, and a behaviour can bethought of as a coupling - or functionalmapping - between sensory stimulus andmotor response. It is the motor responsewhich then animates the body. However,identical sensory inputs may producedifferent behaviours according to internaldrives which have the effect of activatingdifferent parts of the behaviouralrepertoire. Activation and inhibitionrelationships between behaviours will alsocome into play. The architecture used onSilas the dog [8] applied these ideas alongwith the ability to learn new sensorimotorcouplings.

Similar ideas were explored in theVirtual Teletubbies [2]. Drives includedhunger, fatigue, and curiousity. Theproblem of producing coherent sequencesof behaviour was tackled by linkingdifferent repertoires of behaviours togetherwith the level of internal drives allowingsuch a sequence to be activated.

3.4 Putting in EmotionThe discussion of the use of drives as ameans of motivating behaviour brings us tothe related topic of affect, an area whichhas become much more active on therecent period. This may be partly becausean embodied agent has many morechannels for expressing it - facial

expression, bodily posture, gesture - andpartly because the richness of a virtualworld provides much more scope forinteraction between IVAs and between anIVA and a user. One should remember thatin AI, where affect has been investigatedfor some time, the standard experimentalenvironment had consisted entirely oftyped text interaction.

Just as the control architecturesdiscussed above seem to fall intobehavioural systems using internal drives,and reasoning systems using symbolicplanning and inferencing, so work in affecttends to split along similar dimensions.This division mirrors a long-standingdebate within psychology [51] and can betraced right back to the cartesian mind-body split. In behavioural architectures,affect is a fundamental part of the controlprocess [14]; in the more cognitivearchitectures it is the interaction betweenaffect and reasoning that is of interest.

The most popular model of affect inthe more cognitive architectures is basedon the work of Ortony, Clore and Collins -the OCC theory [47]. They based theirmodel on the three concepts of appraisal,

valence, and arousal. Here appraisal is theprocess of assessing events, objects oractions; valence is the nature of theresulting emotion - positive or negative -and arousal is the degree of physiologicalresponse. Emotions can then be grouped onthese dimensions.

For example, the well-being groupsimply contains joy and distress, where theformer is a positive emotion arising fromthe appraisal of a situation as an event, andthe latter is a negative emotion arisingfrom the same appraisal. Given an agentwith goals, it is easy to translate this intopositive emotion where an event supportsthe agent's current goal and negative whereit runs counter to a current goal. Thefortunes-of-others group starts from adifferent appraisal - of a situation as anevent affecting another. It then includeshappy-for, sorry-for, gloating andresentment, where, for example, happy-foris a positive emotion for an event causing apositive emotion in the other, andresentment is a negative emotion for anevent causing a positive emotion inanother. The definitional system can beexpanded to compound various emotionsalso, so that shame + distress = remorse .

FIGURE 5 - Dolphin emotions

15

This produces a taxonomy containing atotal of 22 related emotions.

While this has been characterised bysome as a 'folk-theory' of emotion, it hasbeen widely implemented, for example inthe Oz project 'Edge of Intention' wheregraphically simple creatures called Wogglesreflected their emotional responses on avery simple set of features [7]. Whileclassically this approach was employedwith natural language interaction, so thatappraisal was a simple assessment of whathad been said, it is quite possible to apply itto a VE in which sensed events areappraised and applied to behaviour.

This was seen in the virtual dolphins[40] referred to above. In order to map the22 emotions onto the behaviour of thedolphins, the emotions derived from theappraisal process were then aggregatedalong four dimensions. These were thetotal positive (pleased) and negative(displeased) valence involved, and twodimensions called passionate and frighten -the former the sum of all love emotions inthe taxonomy, and the latter of all fearemotions.

The creation of a characteristic set ofemotional responses was used to producepersonality. For example, the maledolphin, of the two produced, reacted withhate and fear to a crashed aircraft insertedin the scene as this was a human-constructed object, while the femaledolphin reacted to the same object withcuriosity. Figure 5 gives some indication of

the way each aggregation of emotionschanged the behaviour exhibited.

A further interesting use of this type ofemotional/cognitive response system canbe seen in a development of the STEVEsystem already discussed. This is animmersive training system which models aUS soldier carrying out a peace-keepingmission in Bosnia [28].. The scenario triesto model the conflicting situations such asoldier might face through an examplewhere an army truck runs over a local childand at the same time reinforcements areneeded elsewhere. Here the emotionalsystem embodied in the virtual characterplaying the mother of the child is beingused communicate strong emotion to thetrainee who has in turn to manage thecombination of emotion and reason so asto arrive at a choice of action.

The mother's emotional state changesfrom fear and shock, expressed ascrouched, arms around herself, littlelanguage behaviour; to anger, if it appearsthat surrounding troops are being movedoff as reinforcements rather than helpingto evacuate the injured child. Anger isexpressed as standing, arms actively out(see Figure 6) and angry languagebehaviour. Thus this type of emotionaltheory can be integrated well into planningas well as expressive behaviour.

4. Language Technologies in VirtualReality

There has been a sustained interest in theuse of natural language technologies for

FIGURE 6 Bosnia scenario

virtual environments applications. Therationale for the use of NL initially camefrom an interface perspective: speech doesnot disrupt visualisation and presence and iscompatible with other I/O devices in VR. Itis thus a convenient way of accessinginformation and controlling VE systems[32]. However, language is also aconvenient way to describe spatialconfigurations. As such, it can be used toprovide a high-level description of scenesto be created in VE.

This specific approach has beendescribed in several military applications[16,65]. In these applications, situationassessment can be assisted by inputtingnatural language descriptions of thebattlefield [16] or querying thecorresponding database. NL can also serveas a transparent interface to virtual agentsin a distributed simulation [38]. This makespossible for the user to address in atransparent fashion other entities in thesimulation without being able tell thesynthetic forces from the mannedsimulators, which would otherwise influencehis behaviour.

Clay and Wilhelms [19] have described aNL interface for the design of 3D scenes invirtual environments. In this system, NLdescriptions replace traditional interfacesto construct a scene from 3D objects, or atleast makes it possible to quickly exploredesign ideas, to be subsequently refined withmore traditional methods. Not surprisingly,there is a strong emphasis on theprocessing of spatial expressions andreference to cognitive linguistics for thespecific semantics of spatial expressions(not only prepositions but also Lakoff’ssemantic theory of space).

The incorporation of a NLU layer in avirtual environment system is based onwell-described principles, which arecommon to most of the systems describedso far [67,70,44]. One aspect consists inthe linguistic analysis step (parsing) and theother in the identification of discourseobjects in relation with the virtual worldcontents.

The implementation of the parser andits linguistic resources (semantic lexiconand grammar) is dictated by the way usersexpress themselves and by the targetsystem functions of the VE system. Theparsers used in IVE tend to specifically

generally target sub-language applications,though in some cases large-coverageparsers have been customised to therequirements of the application. Linguisticinput is strongly biased towards spatialexpressions, as it tends to describe objectconfigurations to be reproduced in the VE,or spatial instructions for navigation in theenvironment. Another very commonlinguistic phenomenon observed is the useof definite descriptions to verballydesignate objects in the virtual worldhaving salient perceptual properties, forinstance “the ship with the admiral flag”,“the door at the end of the room with an‘exit’ sign on it”.

These sorts of expressions are known tocarry specific syntactic and semanticproblems. At the syntactic level, they areorganised around spatial prepositions,which generate syntactic attachmentambiguities. Consider for instance theexpression “open the red door with thekey”. From a purely syntactic perspective,the prepositional phrase “with the key”could relate both to “door” or to “open”.Semantic information stating that a key isan instrument for an opening action can beused to disambiguate parsing.

Semantic processing is mostlyconcerned with identifying entities in theVE: this particular step is known asreference resolution. It consists inidentifying objects in the virtual worldfrom their linguistic expressions. Objectscan be directly referred to or designatedthrough definite descriptions.

The above applications are mostlyconcerned with scene descriptions, andnavigation within the environment.However, NL is also a convenient way toinstruct virtual actors. The principlesbehind NL control of artificial actors aresomehow different than those governingNL interfaces to VR system. A way ofillustrating the difference between the useof NL input for VR systems control and itsuse to instruct intelligent characters is toremark that in the former case emphasis ismore on the identification of objects, asthe NL input will trigger the creation ofthese objects in the virtual world, while inthe latter it is more on the detailedsemantics of action, as the ultimate systeminterpretation will consist in producing aparametrised action representation.

17

Further, communication with virtualactors essentially falls under two differentparadigms: the instruction of virtual actorsto carry out autonomous tasks in the VE[67], and the communication with virtualactors for information exchange, assistanceand explanations [55], [44].

The use of NL to instruct artificial actorshas been described for instance in [Webberet al., 1995] and [70]. Most of thefundamental principles have been laid outin the “AnimNL” project [67], inparticular the classification of instructionaccording to their interpretations in termsof plans to be generated. For instance,doctrine statements (e.g. “remain in a safeposition at all times while checking thedriver’s ID”), convey generic statementsabout an agent behaviour that are to bevalid throughout the simulation. This hasbeen illustrated in a checkpoint trainingsimulation (Figure 4) . In this system,virtual actors control a checkpoint passedby various vehicles, some of which mightcontain a hostile individual. Wheneverthey fail in the procedure, their behaviourcan be improved by high-level NLinstructions, such as “watch the driver atall times”.

Cavazza and Palmer [70] have described aNL interface to a semi-autonomous actorin a “Doom-like” computer game (Figure7) . In this system, agent control takesplace at a lower level, the agent onlygenerating scripted action sequences inresponse to NL instructions, such as “runfor the plasma gun near the stairs on theleft”.

The target actions correspond to a smallset of possible actions within the game.Emphasis is more on the processing ofcomplex spatial expressions, fast responsetimes and the possibility to receive newinstructions while previous actions are stillbeing executed (e.g. “go to the blue door atthe end of the room”, “turn right after thedoor”).

The “Lokutor” system [Milde, 2000] is anintelligent presentation agent embodied ina 3D environment, communicating withthe user through dialogue. Its current task isto present a car to a user, as some kind ofvirtual sales assistant. The NL input isanalysed by a parser and then passed to adeliberative system in charge of the high-level goals of the agent. These in turn

control a behavioural module, whichdetermines the next action to be taken bythe agent. Simple directives can howeverdirectly determine actions to be taken byLokutor. Reference resolution takesadvantage of the agent’s embodiement tosolve context-dependent reference andindexical expressions. The dialogue aspectsof the system are strongly task-dependent,as language generation by Lokutor isessentially based on the car’s user manual,rather than on a full dialogue model.

There is more emphasis on the dialogueaspects in the Steve system [Rickel andJohnson, 2000], as Steve is developed as atutoring agent. Several scenarios have beenimplemented, including the maintenanceprocedure for a high-pressure aircompressor aboard a ship. Steve presentsthe procedure on a step-by-step basis,pointing at the corresponding parts of the3D device and explaining the variousmaintenance steps through spoken output(e.g. “open cut-out valve three”). Stevecan shift between various modes in which iteither demonstrates the procedure ormonitors the trainee performing themaintenance task herself. The systemalways remains interactive through itsdialogue capabilities. The demonstrationmode can be interrupted by the user forexplanations or for requesting permissionto complete the current task (“let mefinish”).

In the monitoring mode, Steve isavailable to assist the student, who can turn

Figure 7: A NL Interface to aComputer Game (courtesy of Ian Palmer).

Figure 8: An Interactive StorytellingEnvironment

to him for advice while operating onthe compressor: if she is unsure about theprocedure, she can ask “what should I donext?”. Steve will, in response, provideadvice such as “I suggest that you press thefunction test button” [55]. Human-computer dialogue in this context alsorequires rule for the display of multimodalinformation (e.g., gestures pointing at therelevant parts of the compressor). Finally,the actions to be taken by Steve areassociated with relevant communicativeacts, and are represented with AugmentedTransition Networks (though these areimplemented with Soar rules).

To summarise, it can be said that theintegration between NL instructions andthe spatial properties of VE is a majorcharacteristic of the inclusion of languagetechnologies into IVE. It relates to activeresearch areas of NLP, which deal with theprocessing of spatial expressions. Theother specific aspect is the semantics ofaction. This is of specific relevance to thecontrol of virtual actors, where thesemantics of action has to be related tosophisticated action representation, inwhich actions can be parametrisedaccording to the detailed meaning of NLinstructions.

5. A Case Study in IVE: VirtualInteractive Storytelling

The development of intelligent charactersnaturally leads to a new kind ofentertainment systems, in which artificialactors would drive the emergence of astory. This should make possible real-timestory generation and henceforth userintervention in the storyline. Userinvolvement can take various forms: theuser can participate in the story, playingthe role of an actor, or interferes with the

course of action from a spectatorperspective.

Interactive Storytelling is strongly AI-based, and integrates many of thetechniques discussed above for virtualactors, such as planning, emotionalmodelling and natural language processing.For these reasons, it constitutes a goodillustration of many of the conceptsintroduced so far. We will restrict thediscussion to those storytelling applicationswhich make reference to some kind ofstoryline. Other forms of storytelling inwhich the user is the principal source ofinterpretation, through processes of“storyfication” or emotionalcommunication have been discussed inprevious sections.

Interactive storytelling is also facedwith the same kind of alternativeapproaches as other IVE systems:intelligence can be primarily embedded inthe system or in the virtual actors thatpopulate it. In the former case, explicitplot representations govern the dynamicbehaviour, while in the latter case it isagents’ plans represent their roles, fromwhich the actual story will emerge.

Explicit plot structures can dictatevirtual actors' behaviours from the strictperspective of the storyline. Maintainingsuch an explicit plot representationsupports the causality of narrative actions,but might require alternative choices anddecision points to be made explicit to theuser. In an approach called user-centredresolution, Sgouros et al. [58] have basedtheir narrative representation on anAssumption-based Truth MaintenanceSystem (ATMS).

Considering the duality betweencharacter and plot, an alternative option isto follow a character-based approach. Inthis case, planning is once again the maintechnique to implement actors’ behaviour[69,41,17]. There is indeed somecontinuity between generic behaviouraltechniques based on planning and thestorytelling approach. However, and eventhough formalisms might be similar, thereis a significant difference in perspective.The planning ontology for the narrativeapproach differs from the one used in amore traditional “cognitive” approach:plan-based representations for storytelling

19

are based on narrative concepts instead ofgeneric beliefs and desires.

The various sub-goals in the plan docorrespond to a set of narrative functionssuch as “gaining affection”, “betraying”,etc. This is more than just a difference inontology, it is a difference in granularity aswell. Though artificial actors’ behaviourscould in theory be derived from generichigh-level beliefs and intentions, there isno guarantee that the action derived wouldbe narratively relevant. Nor would long-term narrative goals or multiple storylinesbe derived from abstract principles only. Inthat respect, artificial actors can be said tobe playing a role rather than improvisingon the basis of very generic concepts. Therelation between cognitive approaches andnarrative approaches can be seen as one ofspecification and is essentially dictated byrepresentational and relevance issues. Thispoint is not always covered in the technicalliterature due to the relative immaturity ofthe field and the absence of large scalenarrative implementations based on astoryline. Other justifications for the use ofnarrative representations have beendiscussed by Szilas [60].

We have implemented such aninteractive storytelling system (Figure 8) ,in which a generic storyline played byartificial actors can be influenced by userintervention. The final applications we areaddressing consist in being able to alter theending of stories that have an otherwisewell-defined narrative structure. Ideally, itwould make possible to alter the otherwisedramatic ending of a classical play towardsa merrier conclusion, but doing so withinthe same generic genre of the story. Ourown solution to this problem consists (inaccordance with our final objectives statedabove) in limiting the user involvement inthe story, though interaction should beallowed at anytime. This is achieved bydriving the plot with autonomouscharacters’ behaviours, and allowing theuser to interfere with the characters’ plans.The user can interact either by physicalintervention on the set or by passinginformation to the actors (e.g., throughspeech input).

The storyline for our experiments isbased on a simple sitcom-like scenario,where the main character (“Ross”) wantsto invite the female character (“Rachel”)

out on a date. This scenario tests anarrative element (will he succeed?) as wellas situational elements (the actual episodesof this overall plan that can have dramaticsignificance, e.g., how he will manage totalk to her in private if she is busy, etc.).Our system is driven by characters’behaviours. However, rather than beingbased on generic “cognitive” principles,these actually “compile” narrative contentinto characters’ behaviours, by defining asuperset of all possible behaviours.Dynamic choice of an actual course ofaction within this superset is the basis forplot instantiation [69]. However, furtherto the definition of actor roles, no explicitplot representation is maintained in thesystem. In this regard, there is no externalnarrative control on the actors’ behaviour,as this is made unnecessary by theapproach chosen. Each character iscontrolled by an autonomous planners thatgenerates behavioural plans, whoseterminal actions are low-level animation inthe graphic environment we’ve been using(the game engine Unreal Tournament).

We have adopted this framework todefine the respective behaviours of our twoleading characters. We started with theoverall narrative properties imposed by thestory genre; sitcoms offer a lightperspective on the difficulties of romance:the female character is often not aware ofthe feelings of the male character. In termsof behaviour definition, this amount todefining an “active” plan for the Rosscharacter (oriented towards invitingRachel) and a generic pattern of behaviourfor Rachel (her day-to-day activities). Toillustrate Ross’ plan: in order to inviteRachel, he must for instance acquireinformation on her preferences, find a wayto talk to her, gain her affection andfinally formulate his request (or havingsomeone acting on his behalf, etc.). Thesegoals can be broken into many differentsub-goals, corresponding to potentialcourses of action, each having a specificnarrative significance.

An essential aspect is that planning andexecution should be interleaved: this isactually a pre-requisite for interactivity anduser intervention. The character does notplan a complete solution: rather it executesactions as part of a partial plan and theseconstitute the on-going story in the eye ofthe user. From his understanding, he might

then decide to interfere with the plan.When (for whatever reason, internal causesor user intervention) the character’sactions fail, it re-plans a solution. We haveachieved this by implementing a search-based planner. This planner is based on areal-time variant of the AO* algorithm,which search the task networkcorresponding to an agent’s role. Within anarrative framework, sub-goals for an agentcorresponding to different scenes tend tobe independent: this makes possible tocompute a solution by directly searchingthe task network [64]. We have extendedthe AO* algorithm to interleave planningand execution: we use a depth-first real-time variant that selects and executes thebest action rather than computing an entiresolution plan, which will be unlikely toremain valid in a dynamic environmentwhere the agents interact with one anotherand with the user.

This system has been fully-implementedand is able to generate simple but relevantepisodes, the outcome of which is notpredictable from the initial conditions. Amore detailed description can be found in[Cavazza et al., 2001].

6. ConclusionIt should be clear from the range of workdiscussed in this report that thecombination of ideas and technologiesfrom VEs and AI is a very active fieldindeed, with many different research groupsconcerned with different aspects. We havenecessarily given a partial account, withperhaps less emphasis on ALife virtualecologies and the incorporation of AItechnologies into multi-user sharedenvironments than is deserved. Theincrease in believability as well asfunctionality that AI technologies are ableto contribute seems to have a positiveimpact for users insofar as this has beenevaluated [46]. Education, training andentertainment seem to provide the mostactive application areas, including in thelatter the somewhat piecemeal uptake ofAI technology into computer games.

The amount of activity and energyrevealed in this field is a plus: however thelack of agreement on the approachestaken, the absence of widely used commontools or methodologies and theinconsistent use of terms such as avatar,autonomy, behaviour are negatives. While

mpeg4 may offer a way out of the politicaland technical pitfalls of 3D interactivegraphics, its facilities for 3D and facialanimation are extremely low level. In thesame way the H-Anim proposals forhominoid animation lie entirely at thebasic geometrical manipulation of thevirtual skeleton. If this report contributestowards a greater understanding of both thepotential of AI technolgies in VEs, and thefacilities and integration needed to makethem widely exploitable, then it will haveachieved its objective.

Acknolwedgements

We express our thanks to Norman BadlerDemtri Terzopolous and Carlos Martinhofor the use of some of the illustrations.

References1. Axling, T; S. Haridi & L. Fahlen,

Virtual reality programming in Oz. In:Proceedings of the 3rdEUROGRAPHICS Workshop on VirtualEnvironments, Monte Carlo, 1996

2. Aylett, R; Horrobin, A; O'Hare, J.J;Osman, A & Polshaw, M. Virtualteletubbies: reapplying robotarchitecture to virtual agents.Proceedings, 3rd InternationalConference Autonomous Agents, ACMpress pp 338-9 1999

3. Badler, N; C. Phillips, and B. Webber.Simulating Humans: Computer GraphicsAnimation and Control. OxfordUniversity Press, New York, NY, 1993.

4. N. Badler, B. Webber, W. Becket, C.Geib, M. Moore, C. Pelachaud, B. Reich,and M. Stone, Planning for animation.In N. Magnenat-Thalmann and D.Thalmann (eds), Interactive ComputerAnimation, Prentice-Hall, pp. 235-262,1996.

5. Badler,N; M. Palmer, R. Bindiganavale.Animation control for real-time virtualhumans, Communications of the ACM42(8), August 1999, pp. 64-73. 1999

6. Bandi, S & D. Thalmann. SpaceDiscretization for Efficient HumanNavigation. Computer Graphics Forum,17(3), pp.195-206, 1998.

7. Bates, J. The role of emotion inbelievable agents. Communications ofthe ACM 37(7):122-125 1994

8. Blumberg, B. & Galyean, T. Multi-levelcontrol for animated autonomousagents: Do the right thing…oh no, notthat. In: Trappl. R. & Petta, P. (eds)

21

Creating personalities for syntheticactors pp74-82 Springer-Verlag 1997

9. Blumberg, B. Go with the Flow:Synthetic Vision for AutonomousAnimated Creatures. Proceedings, 1st

International Conference AutonomousAgents, ACM press pp 538-9 1998

10. Bouvier, E; Cohen, E. & Najmann, L.From crowd simulation to airbagdeployment: Particle Systems, a newparadigm of animation. J. ElectronicImaging, 6(1) 94-107 1997

11. Brooks, R. Robust Layered ControlSystem for a Mobile Robot, IEEEJournal of Robotics and Automation,pp. 14-23, 1986

12. Brooks, R. Intelligence WithoutRepresentation. Artificial Intelligence47, pp139-159, 1991

13. Calderon, C & M. Cavazza, IntelligentVirtual Environments for SpatialConfiguration Tasks. Proceedings of theVirtual Reality International Conference2001 (VRIC 2001), Laval, France,2001.

14. Canamero, D. Modelling motivationsand emotions as a basis for intelligentbehaviour. 1st International ConferenceAutonomous Agents, ACM press pp148-155 1998

15. Cassell, J. Not just another pretty face:Embodied conversational interfaceagents, Communications of the ACM,2000

16. Cavazza, M; J.-B. Bonne, D. Pernel, X.Pouteau & C. Prunet, VirtualEnvironments for Control andCommand Applications. Proceedings ofthe FIVE'95 Conference, London,1995.Cavazza, M. & I. Palmer. NaturalLanguage Control of Interactive 3DAnimation and Computer Games.Virtual Reality , 4:85-102; 1999

17. Cavazza,M. F. Charles, S.J. Mead &Alexander I. Strachan. Virtual Actors’Behaviour for 3D InteractiveStorytelling. Proceedings of theEurographics 2001 Conference (ShortPaper), to appear, 2001

18. Chi, D; M. Costa, L. Zhao, and N.Badler: The EMOTE model for Effortand Shape, ACM SIGGRAPH '00, NewOrleans, LA, July, 2000, pp. 173-1822000

19. Clay, S.R & J. Wilhelms. Put:Language-Based InteractiveManipulation of Objects, IEEE

Computer Graphics and Applications,vol. 6, n.2.

20. Codognet, P. Animating AutonomousAgents in Shared Virtual Worlds,Proceedings of DMS'99, IEEEInternational Conference on DistributedMultimedia Systems, Aizu, Japan, IEEEPress, 1999.

21. Codognet, P; Behaviours for virtualcreatures by Constraint-based AdaptiveSearch. In: Working Notes of the AAAISpring Symposium on ArtificialIntelligence and InteractiveEntertainment, Stanford, USA, pp. 25-30, 2001

22. Damasio, A. Descartes Error. AvonBooks. 1994

23. Delgado, C & Aylett,R.S. Do virtualsheep smell emotion? Proceedings,Intelligent Virtual Agents 2001, Madrid,Springer-Verlag, to be published 2001

24. Funge, D.E. Making them Behave:Cognitive Models for ComputerAnimation. PhD thesis, University ofToronto 1998

25. Geib,C. & B. Webber. A consequence ofincorporating intentions in means-endplanning.. Working Notes – AAAI SpringSymposium Series: Foundations ofAutomatic Planning: The ClassicalApproach and Beyond. AAAI Press,1993.

26. C. Geib, The Intentional PlanningSystem: ItPlans, Proceedings of the 2nd

Artificial Intelligence PlanningConference (AIPS-94), pp. 55-64,1994.

27. Grand, S. & Cliff, D. Creatures:Entertainment software agents withartificial life. Autonomous Agnets andMulti-agent Systems. 1(1) 39-57 1998

28. Gratch, J; Rickel, J; & Marsalla, S.Tears and Fears, 5Th InternationalConference on Autonomous Agents,pp113-118 2001

29. Grzeszczuk, R, D. Terzopoulos, G.Hinton. NeuroAnimator: Fast neuralnetwork emulation and control ofphysics-based models, Proc. ACMSIGGRAPH 98 Conference, Orlando,FL, July, 1998, in Computer GraphicsProceedings, Annual Conference Series,1998, 9-20.

30. Jordan, M. & Rumelhart, D. Supervisedlearning with a distal teacher. CognitiveScience 16:307-354 1992

31. Kallmann, M. & Thalmann, D. Abehavioural interface to simulate agent-

object interactions in real-time.Proceedings, Computer Animation 99.IEEE Computer Society Press 1999

32. Karlgren, J; I. Bretan, N. Frost, & L.Jonsson, Interaction Models, Reference,and Interactivity for Speech Interfacesto Virtual Environments. In:Proceedings of 2nd EurographicsWorkshop on Virtual Environments,Realism and Real Time , Monte Carlo,1995.

33. Koga, Y; Kondo, K; Kuffner, J &Latombe, J. Planning motions withintentions. Proceedings, SIGGRAPH'94, pp395-408 1994

34. Laban, R. The Mastery of Movement.Plays Inc Boston 1971

35. Laird, J; Newell, E. & Rosenbloom.P.Soar: An architecture for generalintelligence. Artificial Intelligence 33(1)1-64 1987

36. Laird. J. It Knows What You’re GoingTo Do: Adding Anticipation to aQuakebot, Working Notes of the AAAISpring Symposium on ArtificialIntelligence and InteractiveEntertainment, Technical Report SS-00-02, AAAI Press, 2000.

37. Loyall, A, & Bates, J. Real-timecontrol of animated broad agents. Proc.15th Conference of the Cognitive ScienceSociety, Boulder Co pp664-9 1993

38. Luperfoy, S. Tutoring versus Training:A Mediating Dialogue Manager forSpoken Language Systems, ProceedingsTwente Workshop on LanguageTechnology 11 (TWLT 11) DialogueManagement in Natural LanguageSystems, Twente, The Netherlands,1996.

39. Magnenat-Thalmann, N & D.Thalmann. Digital Actors forInteractive Television. Proc. IEEE,Special Issue on Digital Television,1995.

40. Martinho, C; Paiva, A. & Gomes, M.Emotions for a Motion: RapidDevelopment of believable PanthematicAgents in Intelligent VirtualEnvironments. Applied Artificialintelligence, 14-33-68 2000

41. Mateas, M. An Oz-Centric Review ofInteractive Drama and BelievableAgents. Technical Report CMU-CS-97-156, Department of Computer Science,Carnegie Mellon University, Pittsburgh,USA, 1997.

42. McCarthy, J. & Hayes, P. Somephilosophical problems from thestandpoint of artificial intelligence. In:B.Meltzer & D.Michie (eds) MachineIntelligence 4, 463-502 EdinburghUniversity Press, 1969

43. McNeil, D. Hand and Mind: WhatGestures Reveal about ThoughtUniversity of Chicago 1992

44. Milde, J-T The Instructable AgentLokutor. Working Notes – AutonomousAgents 2000 Workshop onCommunicative Agents in IntelligentVirtual Environments, Barcelona, Spain,2000.

45. Monsieurs, P; K. Coninx, & E.Flerackers, Collision Avoidance andMap Construction Using SyntheticVision. Proceedings of Virtual Agents1999, Salford, United Kingdom, 1999

46. Nass, C; Moon, Y; Fogg, B.J; Reeves,B. & Cryer D.C. Can computerpersonalities be human personalities?International Journal of Human-Computer Studies 43:223-239 1995

47. Ortony, A; Clore, G. & Collins, A. Thecognitive structure of emotions. 1998

48. Pelachaud, C. & Poggi, I. Performativefacial expressions in animated faces. In:Cassell et al (eds) EmbodiedConversational Characters, MIT Press1999

49. Perlin, K & Goldberg, A. Improv: Asystem for scripting interactive actorsin virtual worlds. In ACM ComputerGraphics Annual Conf., pages 205—--216, 1996.

50. Petta, P. & Trappl, R. Why to createpersonalities for synthetic actors. In:R.Trappl & P.Petta (eds) CreatingPersonalities for Synthetic Actors,Springer-Verlag, pp1-8 1997

51. Picard, R. Affective Computing. MITPress 1997

52. Prophet, J. Technosphere.Interpretation, 2(1) 1996

53. Reynolds, C. Flocks Herds and Schools:A Distributed behavioural model. Proc.SIGGRAPH '87, Computer Graphics21(4) 25-34 1987

54. Rickel, J. & Johnson, W.L Integratingpedagogical capabilities in a virtualenvironment agent. Proceedings, 1st

International Conference onAutonomous Agents, ACM Press pp30-38 1997

55. Rickel, J & W.L. Johnson, Task-oriented Collaboration with Embodied

23

Agents in Virtual Worlds. In: J. Cassell,J. Sullivan and S. Prevost (Eds.),Embodied Conversational Agents, MITPress, Boston, 2000.

56. Schank, R.C. & R.P. Abelson. Scripts,Plans, Goals and Understanding: anInquiry into Human KnowledgeStructures, Hillsdale, New Jersey,Lawrence Erlbaum, 1977

57. Schweiss, E; Musse, R; Garat, F. &Thalmann, D. An architecture to guidecrowds based on rule-based systems.Proceedings, 1st InternationalConference Autonomous Agents, ACMpress pp334-5 1999

58. Sgouros, N.M; G. Papakonstantinou, &P. Tsanakas, A Framework for PlotControl in Interactive Story Systems,Proceedings AAAI’96, Portland, AAAIPress, 1996

59. Sims, K. Evolving 3D morphology andbehaviour by competition. ArtificialLife 1:353-372 1995

60. Szilas, N. Interactive Drama onComputer: Beyond Linear Narrative.AAAI Fall Symposium on NarrativeIntelligence, Technical Report FS-99-01, AAAI Press, 1999

61. Terzopolous, D; Tu, X. & Grzeszczuk,R. Artificial fishes: Autonomous,Locomotion, Perception, Behavior, andLearning in a simulated physical world,Artificial Life 1(4) 327-351, 1994

62. Terzopolous, D; Rabie, T. &Grzeszczuk, R. Perception and learningin artificial animals. Proceedings,Artificial Life V pp313-20 1996

63. Thalmann, N.M. & Thalmann, D. TheVirtual Humans Story. IEEE Annals ofthe History of Computing 20(2) 50-11998

64. Tsuneto, R.; D. Nau, & J. Hendler.Plan-Refinement Strategies and Search-Space Size. Proceedings of theEuropean Conference on Planning, pp.414-426, 1997.

65. Wauchoppe, K; S. Everett, D.Perzanovski, and E. Marsh. NaturalLanguage in Four Spatial Interfaces.Proceedings of the Fifth Conference onApplied Natural Language Processing,pp. 8-11, 1997.

66. Webber, B & N. Badler, Animationthrough Reactions, Transition Nets andPlans, Proceedings of the InternationalWorkshop on Human InterfaceTechnology, Aizu, Japan, 1995.

67. Webber, B; N. Badler, B. Di Eugenio, C.Geib, L. Levison and M. Moore,Instructions, Intentions andExpectations, Artificial IntelligenceJournal, 73, pp. 253-269, 1995

68. Weld, D. An introduction to least-commitment planning. AI Magazine15(4) 27-61 1994

69. Young, R.M. Creating InteractiveNarrative Structures: The Potential forAI Approaches. AAAI SpringSymposium in Artificial Intelligenceand Computer Games, AAAI Press,2000.

70. Cavazza, M & I. Palmer. NaturalLanguage Control of Interactive 3DAnimation and Computer Games.Virtual Reality , 4:85-102; 1999.

Intelligent Virtual Environments - A State-of-the-art …ruth/Papers/vr-ar/star.pdfThe paper reviews...

Documents

Transcript of Intelligent Virtual Environments - A State-of-the-art …ruth/Papers/vr-ar/star.pdfThe paper reviews...