Probablistic Control of Human Robot Interaction: Experiments ...jpineau/files/jpineau-iarp02.pdf1...

1

ProbablisticControlof HumanRobotInteraction:Experimentswith A RoboticAssistantfor NursingHomesJoellePineau

�, MichaelMontemerlo

�, MarthaPollack

�, NicholasRoy

�andSebastianThrun

��RoboticsInstitute,CarnegieMellon University, Pittsburgh,PA, 15232,USA�Dept.of ElectricalEngineeringandComputerScience,Universityof Michigan,Ann Arbor, MI, 48019,USA

http://www.cs.cmu.edu/� nursebot

Thispaperdescribesanimplementedrobotsystemdesignedto assistnursesandelderlypersonsin institutionalizedsettings.The robotPearlhasbeendevelopedasa multi-functionalrobotic assistant.Its primary taskinvolvesguidingpeoplethrougha nursingfacility, remindingthemof upcomingeventsandtheneedto take medication,andproviding themwith informationverymuchlikeamobileinformationkiosk. In thepast24months,therobothasbeendeployedmorethansix timesin anelderlycarefacility in Oakmont,PA. In this paper, we presentthreesoftwaremodulesrelevantto ensuresuccessfulrobotperformancein a dynamicandinteractive taskdomain:anautomatedremindersystem;a people-trackinganddetectionsystem;andfinallya high-level robotcontrollerwhich performsplanningunderuncertaintyby incorporatingknowledgefrom low-level modules,andselectingappropriatecoursesof actions.

1. Intr oduction

TheUS populationis agingat analarmingrate.Atpresent,12.5%of the US populationis of age65 orolder(33). It is widely recognizedthat this ratio willincreaseas the baby-boomergenerationmoves intoretirementage. Meanwhile, the nation facesa sig-nificant shortageof nursingprofessionals.The Fed-erationof NursesandHealthCareProfessionalshasprojecteda needfor 450,000additionalnursesby theyear2008.

This acuteneedprovidessignificantopportunitiesfor roboticistsand AI researchersto develop assis-tive technologythatcanimprovethequality of life ofour agingpopulation,andhelp nursesbecomemoreeffective in their activities. The NursebotProjectwas conceived in responseto this challenge. It isformedby a multi-disciplinary teamof investigatorsfrom the fields of health-care,HCI/psychology, andAI/robotics. The overall goal of the projectis to de-velopmobile roboticassistantsthat canassistnursesandelderlypeoplein their daily activities.

To this aim, the team has developed two pro-totype autonomousmobile robots, shown in Fig-ure 1 (25). Theserobotsprimarily interactwith theworld throughspeech,visual displays,facial expres-sionsandphysicalmotion. They differ from earlierworkplacerobotsin thatthey gobeyondsimply inter-actingwith an(oftenstatic)environment,to interact-

ing with humanusersandbystanders.Thuswe lever-ageearliertechnologyfor navigation,localizationandmapping,and specifically focus on developing newalgorithmicapproachesto track people,predicttheirbehavior, andreactappropriately.

The idea of building robotic companionsfor theelderly is not new (11; 12; 16; 20; 21; 34). Fromthemany servicesanursing-assistantrobotcouldpro-vide, thework reportedhereconsidersthetaskof re-minding peopleof eventsandguiding themthroughtheir environments. Both of thesetasksareparticu-larly relevantwith theelderlycommunity. Decreasedmemoryis a commoneffect of age-relatedcognitivedecline,which oftenleadsto forgetfulnessaboutrou-tine daily activities (e.g. taking medications,attend-ing appointments,eating,drinking,bathing,toileting)thusthe needfor a robot that canoffer cognitive re-minders. In addition, nursing staff in assistedliv-ing facilities frequentlyneedto escortelderlypeoplewalking,eitherto getexercise,or to attendmeals,ap-pointmentsor social events. The fact that many el-derly peoplemove at extremelyslow speeds(e.g. 5cm/sec)makes this one of the most labor-intensivetasksin assistedliving facilities. It is alsoimportantto notethatthehelpprovidedis oftennot strictly of aphysicalnature,asmany elderly peopleselectwalk-ing aidsover physicalassistanceby nurses.Rather,nursesoftenprovide importantcognitive help, in the

2

form of reminders,guidanceandmotivation,in addi-tion to valuablesocialinteraction.

FromanAI pointof view, severalfactorsmakethistaska challengingonefor a robot to accomplishsuc-cessfully, particularly becauseof the prevalenceofuncertaintyin the task domain. The type of uncer-tainty relevant to robot decision-makingis two-fold.First, we areconcernedwith estimatingtheeffectsofa robot’s actions. For example,a robot travelling atgreatspeedmayquickly tire anelderlypersonfollow-ing it. Second,we areconcernedwith handlingpar-tial or erroneoussensormeasurements.For example,when escortinga personthroughbusy hallways therobot facesthe risk of losing that individual or con-fusinghim/herwith another. Our approachexplicitlyconsiderstheseformsof uncertaintywhenoptimizinga control strategy. Our approachalso considersthecostsof suboptimalcontrol actions,which can varywidely, from the costof unnecessarilyaskinga clar-ification questionto incorrectly moving to a remotelocation.

The work presentedfocuseson three key soft-warecomponentsof our robotic architecture:an au-tomatedreminder systemthat incorporatesknowl-edgeof a person’s typical schedulewith observationsof recentactivities, and issuespertinent remindersaboutupcomingevents;amodulewhichusesefficientparticle filter techniquesto detectand track people;and finally a high-level robot controller which usesprobabilistic reasoningtechniquesto arbitrate be-tweeninformation-gatheringandperformance-relatedactions,aswell as incorporateinformationobtainedthrough both navigation sensors(e.g. laser range-finder) andinteractionsensors(e.g. speechrecogni-tion andtouchscreen).

In systematicexperimentsconductedat a nursinghome, we found the combinationof techniquestobe highly effective in dealingwith elderly test sub-jects.In particular, duringa sequenceof one-one-onescenariosbetweenPearlandresidentsof the nursinghome,therobotdemonstratedtheability to contactaresident,remindthemof anappointment,accompanythemto thatappointment,aswell asprovideinforma-tion of interestto that person,for exampleweatherreportsor televisionschedules.

Figure1. NursebotsFlo (left) andPearl(right)

2. Hardwareand SoftwareDescription

Figure1 shows imagesof therobotsFlo (first pro-totype, now retired) and Pearl (the presentrobot).Eachrobot is equippedwith a differentialdrive sys-tem,two on-boardPCs,wirelessethernet,laserrangefinders,sonarsensors,microphonesfor speechrecog-nition, speakersfor speechsynthesis,touch-sensitivegraphicaldisplays, actuatedheadunits, and stereocamerasystems.As a resultof feedbackfrom nursesandmedicalexpertsfollowing deploymentof thefirstrobot,Flo, thesecondrobotPearlalsofeaturesanim-proved visual appearance,two sturdyhandle-bars,amorecompactdesignthatallows for cargo spaceanda removabletray, doubledbatterycapacity, a secondlaserrangefinder, anda significantlymoresophisti-catedheadunit.

On the softwareside,both robotsfeatureoff-the-shelfautonomousmobilerobotnavigationsystem(4;31), speechrecognitionsoftware(27), speechsynthe-sis software(3), fastimagecaptureandcompressionsoftware for online video streaming,facedetectiontrackingsoftware(28), aswell asthethreemajornewsoftwaremodulesdescribedin thispaper. Thesemod-ulesareprincipallyconcernedwith peopleinteractionandcontrol.They overcomeimportantdeficienciesofthe work describedby (4; 31), which hadonly rudi-mentaryabilitiesto interactwith people.

3. Plan managementwith Autominder

The Automindersoftware componentis designedasan intelligentcognitiveorthotic system,providingelderlypeoplewith remindersabouttheir daily activ-

3

ities (26). The ideaof usingcomputertechnologytoenhancetheperformanceof cognitively disabledpeo-pledatesbacknearlyforty years(13). More recently,cognitiveorthoticshaveenabledremindersto bepro-videdusingthetelephone(14), personaldigital assis-tants(10), andpagers(17). Work hasalsobeendoneon improved modellingof users’activities (19; 23),where in one casea hand-device usesAI planningtechnologyto modeltheuser’s plans,andprovidevi-sualandaudiblecuesaboutits execution.

In the Nursebotproject, the goal of this softwaresystemis to makeprincipleddecisionsaboutwhatre-mindersto issueandwhen,balancingthe followingpotentiallycompetingobjectives: (i) ensurethat theuseris awareof activitiess/heis expectedto perform,(ii) increasethe likelihood that s/hewill perform atleast the requiredactivities (e.g. taking medicine),(iii) avoid annoying the user, and(iv) avoid makingtheuseroverly relianton thesystem.To attainthesegoals,the systemmust be flexible andadaptive, re-spondingto theactionstakenby theuser.

TheAutominderarchitectureis shown in figure2.As depicted,thesystemmaintainsanaccuratemodelof a user’s daily schedule,monitorsperformanceofactivities,andplansremindersaccordingly. Thethreemain componentsare: a PlanManager(PM), whichstoresthe user’s plan of daily activities in the ClientPlan, and is responsiblefor updating it and iden-tifying any potential conflicts in it; a Client Mod-eler (CM), which usesinformation about the user’sobservable activities to track the execution of theplan, storingits beliefsaboutthe executionstatusinthe Client Model; anda PersonalCognitive Orthotic(PCO),which reasonsaboutany disparitiesbetweenwhattheuseris supposedto doandwhats/heisdoing,andmakesdecisionsaboutwhento issuereminders.

To initialize thesystem,thecaregiverfor anelderlyuserinputsa descriptionof theuser’s daily activities,aswell asany constraintson, or preferencesregard-ing, the time or mannerof their performance.Thisplan may then be changedin oneof four ways: (i)the useror caregiver may addnew activities; (ii) theuseror caregiver may modify or deleteactivities al-readyin the plan; (iii) the usermay executeoneofthe plannedactivities; or (iv) the simple passageoftime maycauseautomaticchangesto bemadein theplan. Whenever a changeoccurs, the PM updatestheuserplan,performingplanmergingandconstraint

Figure2. AutominderArchitecture

propagationasneeded.To adequatelyrepresentuserplans, it essentialto supporta rich set of temporalconstraints;we achieve this goal by modelling userplansasDisjunctive TemporalProblems(DTPs)andreasoningaboutthemusingefficientalgorithms(32).

Throughouttheday, sensorinformationis gatheredby the robot andsentto the CM, which usesthis in-formation to try to infer what activities the user isperforming. If the likelihood is high that a plannedactivity has beenexecuted,the CM reports this tothe PM, which can then updatethe user’s plan byrecordingthe time of execution,and propagateanyaffectedconstraintsaccordingly. The usermodel isrepresentedusingaQuantitativeTemporalBayesNet(QTBN), which was developedto handlethe needboth to reasonaboutfluentsand aboutprobabilistictemporalconstraints(5).

The final componentof the Autominder is thePCO (24), which usesboth the user plan and theusermodel to determinewhat remindersshouldbeissuedandwhen. The PCOidentifiesactivities thatmayrequireremindersbasedon their importanceandtheir likelihood of being executedon time as mod-eled in the CM. It also determinesthe most effec-tive timesto issueeachrequiredreminder, takingac-countof the expecteduserbehavior, andany prefer-encesexplicitly provided by the userand the care-giver. Finally, the PCO provides justificationsas towhy particular activities warrant a reminder. The

4

PCO treatsthe generationof a reminderplan as asatisficingproblemandusesa local-searchapproachcalled Planning-by-Rewriting (PbR) (1) to producea high-quality reminderplan that takes into accountthe user’s expectedbehavior, preferences,and inter-actionsamongstplannedactivities.

4. Locating People

In order to issueremindersand, when appropri-ate, guide usersto their activities, it is necessarytointeractwith peoplespatially, and most specificallyto be able to locate peoplein their living environ-ment. The problemof locating peopleis the prob-lem of determiningtheir � - � -locationrelative to therobot. Previous approachesto people tracking inroboticsarefeature-based:they analyzesensormea-surements(images,rangescans)for the presenceoffeatures(15; 29) asthebasisof tracking.In our case,the diversity of the environmentmandatesa differ-ent approach.Pearldetectspeopleusingmapdiffer-encing: the robot learnsa map, and peopleare de-tectedby significantdeviations from the map. Fig-ure3 showsanexamplemapacquiredusingpreexist-ing software(31).

Mathematically, theproblemof peopletrackingis acombinedposteriorestimationproblemandmodelse-lectionproblem.Let � bethenumberof peopleneartherobot.Theposteriorover thepeople’spositionsisgivenby

�� (1)

where �� with �� is the locationof a per-sonat time ! , � � the sequenceof all sensormeasure-ments,� � thesequenceof all robotcontrols,and � istheenvironmentmap.However, to usemapdifferenc-ing, therobothasto know its own location.Theloca-tion andtotalnumberof nearbypeopledetectedby therobot is clearly dependenton the robot’s estimateofits own locationandheadingdirection. Hence,Pearlestimatesa posteriorof thetype:

�� "�#� (2)

where � � denotesthe sequenceof robot poses(thepath) up to time ! . If � was known, estimat-ing this posteriorwould be a high-dimensionales-timation problem, with complexity cubic in � forKalmanfilters (2), or exponentialin � with particle

filters (8). Neitherof theseapproachesis, thus,ap-plicable: Kalmanfilters cannotglobally localize therobot, and particle filters would be computationallyprohibitive.

Luckily, under mildly restrictive conditions(dis-cussedbelow) the posterior(2) canbe factoredinto�%$&� conditionallyindependentestimates:

�'� � � � � � �� "�#�)( ��'� �� *� � � �"� � �� (3)

This factorizationopensthe door for a particlefilterthatscaleslinearly in � . Ourapproachis similar (butnot identical)to theRao-Blackwellizedparticlefilterdescribedin (9). First, therobotpath � � is estimatedusing a particle filter, as in the Monte Carlo local-ization (MCL) algorithm for mobile robot localiza-tion (6). Eachparticlein this filter is associatedwitha set of � particle filters, eachrepresentingone ofthepeoplepositionestimates�� . Theseconditional particle filters representpeoplepositionestimatesconditionedonrobotpathestimates—hencecapturing the inherent dependenceof people androbot location estimates. The data associationbe-tweenmeasurementsandpeopleis doneusingmax-imum likelihood,asin (2). Underthe(false)assump-tion thatthismaximumlikelihoodestimatoris alwayscorrect,ourapproachcanbeshown to convergeto thecorrectposterior, andit doessowith updatetime lin-earin � . In practice,we foundthat thedataassocia-tion is correctin thevastmajority of situations.Thenestedparticlefilter formulationhasa secondaryad-vantagethat the numberof people � can be madedependenton individual robotpathparticles.Our ap-proachfor estimating� usestheclassicalAIC crite-rion for modelselection,with a prior that imposesacomplexity penaltyexponentialin � .

Figure3 showsresultsof thefilter in action.In Fig-ure3a,therobotis globallyuncertain,andthenumberand location of the correspondingpeopleestimatesvaries drastically. As the robot reducesits uncer-tainty, the numberof modesin the robot poseposte-rior quickly becomesfinite, andeachsuchmodehasadistinctsetof peopleestimates,asshown in Figure3b.Finally, astherobotis localized,sois theperson(Fig-ure 3c). Whenguiding people,the localizationesti-mateof thepersonis usedto determinethevelocityofthe robot, so that the robot maintainsroughly a con-stantdistanceto the person. In our experimentsin

5

(a) (b) (c) (d)

Figure 3. (a)-(d) Evolution of the conditionalparticle filter from global uncertaintyto successfullocalizationandtracking. (d) Thetracker continuesto tracka personevenasthatpersonis occludedrepeatedlyby a secondindividual.

thetargetfacility, wefoundtheadaptivevelocitycon-trol to beabsolutelyessentialfor therobot’sability tocopewith the hugerangeof walking pacesfound intheelderlypopulation.Initial experimentswith fixedvelocity led almostalwaysto frustrationon the peo-ple’s side,in that therobotwaseithertoo slow or toofast.

Finally, Figure3d illustratesthe robustnessof thefilter to interferingpeople.Hereanotherpersonstepsbetweenthe robot and its target subject. The fil-ter obtainsits robustnessto occlusionfrom a care-fully craftedprobabilisticmodel of people’s motion�'� �� ,+'�-� �.� �/� . This enablesthe conditionalparticlefilters to maintaintight estimateswhile theocclusiontakes place, as shown in Figure 3d. During in-labexperimentsinvolving 31 tracking instanceswith upto five peopleat a time, the error in determiningthenumberof peoplewas9.6%. The error in the robotposition was 0 132 14 65 cm, and the peoplepositionerror was as low as � 172 89 0 cm, when comparedto measurementsobtainedwith a carefullycalibratedstaticsensorwith 2 � cmerror.

5. High Level Robot Control and Dialog Manage-ment

The most centralmodule in Pearl’s software is aprobabilisticalgorithmfor high-level control anddi-alog management.This module integratesobserva-tionsfrom lower-level modules(e.g.theAutominder,the peopletracker, the speechrecognizer, etc.) andusesthis informationto selectappropriatebehaviors

andresponses.Pearl’s high-level control architectureis a hierar-

chical variantof a partially observableMarkov deci-sionprocess(POMDP)(18). ThePOMDPis amodelfor calculatingoptimal control actionsunderuncer-tainty. Thecontroldecisionis basedonaprobabilisticbelief overpossiblestates.

In Pearl’s case,this distribution is definedover acollectionof multi-valuedstatevariables:

: robotlocation(discreteapproximation): person’s location(discreteapproximation): person’s status(inferred from speechrecog-nizer): motiongoal(whereto move): remindergoal(whatto inform theuserof): userinitiatedgoal(e.g.,aninformationrequest)

The valueof the person’s location variableis ob-served throughthe peopletracker, andsimilarly theremindergoalvariableis setby theAutomindermod-ule. Overall, thereare516possiblestates.Theinputto the POMDP is a factoredprobability distributionover thesestates,generatedby a stateestimator, suchasin Equation(2). Uncertaintyover thecurrentstatearisespredominantlyfrom the localizationmodulesandthespeechrecognitionsystem.Theconsiderationof uncertaintyis especiallyimportantin this domain,asthecostsof giving a reminderto thewrongperson,or unnecessarilysendingthe robot to a locationcanbelarge.

Unfortunately, POMDPsof the size encounteredhere are an order of magnitudelarger than today’s

6

Assist

Act

Remind

Contact

RemindPhysioPublishStatus

RingBellGotoRoom

VerifyRelease

Inform

SayTimeSayWeatherVerifyRequest

RestRechargeGotoHome

Move

BringtoPhysioCheckUserPresentDeliverUser

VerifyBring

Figure4. Dialog ProblemAction Hierarchy

bestexactPOMDPalgorithmscantackle(18). How-ever, Pearl’s domainis highly structured,sincecer-tain actionsareonly applicablein certainsituations.To exploit this structure,we developeda hierarchicalversionof POMDPs,whichbreaksdown thedecisionmakingprobleminto acollectionof smallerproblemsthat canbe solved moreefficiently. Our approachissimilar to the MAX-Q decompositionfor MDPs (7),but definedover POMDPs(where statesare unob-served).

The basic idea of the hierarchicalPOMDP is topartition the actionspace—notthe statespace,sincethestateis not fully observable—intosmallerchunks.For Pearl’s guidancetask the action hierarchy isshown in Figure4, whereabstract actions(shown incircles)areintroducedto subsumelogical subgroupsof lower-levelactions.Thisactionhierarchyinducesadecompositionof thecontrolproblem,whereat eachnodeall lower-level actions,if any, areconsideredinthe context of a local sub-controller. At the lowestlevel, the control problemis a regularPOMDP, witha reducedactionspace.At higherlevels, the controlproblemis alsoa POMDP, yet involvesa mixture ofphysicalandabstractactions(whereabstractactionscorrespondto lower level POMDPs.)

It is important to notice that such a decomposi-tion is especiallyappropriatewhenthe optimal con-trol transgressesdown alonga singlepath in the hi-erarchyto receive its reward. This is approximatelythecasein thePearldomain,wheregoalsaresatisfieduponsuccessfullydeliveringaperson,or successfully

Observation TrueState Action Rewardpearlhello requestbegun sayhello 100pearlwhatis like startmeds ask repeat -100pearlwhattime is it

for will the want time say time 100pearlwason abc want tv askwhich station -1pearlwason abc want abc sayabc 100pearlwhatis onnbc want nbc confirm channel nbc -1pearlyes want nbc saynbc 100pearlgo to thethat

prettygoodwhat sendrobot ask robot where -1pearlthatthathello be sendrobot bedroomconfirm robot place -1pearlthebedroomany i sendrobot bedroomgo to bedroom 100pearlgo it eightahello sendrobot ask robot where -1pearlthekitchenhello sendrobot kitchen go to kitchen 100

Table1Sampledialogdemonstratingtherole of clarificationactions.Actionsin bold font areclarificationactions,chosenby thePOMDPbecauseof highuncertaintyinthespeechsignal.

gatheringinformationthroughcommunication.Using the hierarchicalPOMDP, the high-level de-

cision making problem in Pearl is tractable,and anear-optimalcontrolpolicy canbecomputedoff-line.Thus, during execution time the controller simplymonitorsthestate(calculatestheposterior)andlooksup theappropriatecontrol.Table1 showsanexampledialogbetweentherobotanda testsubject.Becauseof theuncertaintymanagementin POMDPs,therobotchoosesto aska clarificationquestionat threeocca-sions.Thenumberof suchquestionsdependson theclarity of aperson’sspeech,asdetectedby theSphinxspeechrecognitionsystem.

An importantremainingquestionconcernstheim-portanceof handlinguncertaintyin high-level con-trol. To investigatethis, we ran a seriesof compar-ative experiments,all involving realdatacollectedinour lab. In oneseriesof experiments,we investigatedthe importanceof consideringtheuncertaintyarisingfrom the speechinterface. In particular, we com-paredPearl’s performanceto a systemthat ignoresthatuncertainty, but is otherwiseidentical.Theresult-ing approachis anMDP, similar to theonedescribedin (30). Figure 5 shows resultsfor three differentperformancemeasures,and threedifferent users(indecreasingorderof speechrecognitionperformance).For poorspeakers,theMDP requireslesstimeto “sat-isfy” a requestdue to the lack of clarificationques-tions (Figure 5a). However, its error rate is muchhigher(Figure5b),which negatively affectstheover-

7

0

0.5

1

1.5

2

User 1 User 2 User 3

Err

ors

per

task

;

User Data -- Error Performance

Uniform cost modelNon-uniform cost model

0.3

1.2

1.6

0.2

0.6

0.1

Figure6. Empiricalcomparisonbetweenuniformandnon-uniformcostmodels.Resultsareanaverageover10 tasks. Depictedare 3 exampleusers,with vary-ing levels of speechrecognitionaccuracy. Users2& 3 had the lowest recognitionaccuracy, and con-sequentlymore errorswhen using the uniform costmodel.

all reward received by the robot (Figure5c). Theseresultsclearlydemonstratetheimportanceof consid-ering uncertaintyat the highestrobot control level,specificallywith poorspeechrecognition.

In the secondseriesof experiments,we investi-gatedthe importanceof uncertaintymanagementinthe context of highly imbalancedcostsandrewards.For example, in Pearl’s case,asking a clarificationquestionis in factmuchcheaperthanaccidentallyde-livering a personto a wrong location, or guiding apersonwho doesnot want to be walked. We there-forecomparedperformanceusingtwo POMDPmod-els which differed only in their cost models. Onemodelassumeduniformcostsfor all actions,whereasthesecondmodelassumedamorediscriminativecostmodelin whichthecostof verbalquestionswaslowerthanthecostof performingthewrongmotionactions.A POMDPpolicy waslearnedfor eachof thesemod-els,andthentestedexperimentallyin our laboratory.The resultspresentedin figure 6 show that the non-uniform model makesmore judicioususeof confir-mationactions,thus leadingto a significantly lowererror rate, especiallyfor userswith low recognitionaccuracy.

6. Results

Following integration of the threesoftware mod-ulesontoPearl,therobotwasdeployedin aretirementcommunitylocatednearPittsburgh, PA. This sectiondescribesexperimentsinvolving elderly residentsofthis facility, with mild cognitive,perceptual,or phys-ical limitations.

We testedthe robot in five separateexperiments,eachlastingonefull day. Thefirst threedaysfocusedonopen-endedinteractionswith a largenumberof el-derlyusers,duringwhichtherobotinteractedverballyandspatiallywith elderlypeoplewith thespecifictaskof deliveredsweets. This allowed us to gaugepeo-ple’s initial reactionsto therobot.

Following this, we performedtwo daysof formalexperimentsduringwhichtherobotautonomouslyled12 full guidances,involving 6 differentelderly peo-ple. Figure7 showsanexampleguidanceexperiment,involving an elderly personwho usesa walking aid.The sequenceof imagesillustratesthe major stagesof a successfuldelivery: from contactingtheperson,delivering the reminder, walking her throughthe fa-cility, andproviding informationafter the successfuldelivery—in this caseon theweather.

In all trials, thetaskwasperformedto completion.Post-experimentaldebriefingsillustrated a uniformhigh level of excitementon the side of the elderly.Overall, only a few problemswere detectedduringtheoperation.Noneof thetestsubjectsshoweddiffi-cultiesunderstandingthemajorfunctionsof therobot.They all wereableto operatetherobotafterlessthanfive minutesof introduction. Earlier trials with apoorlyadjustedspeechrecognitionsystem,andfixed-velocity robotmotion,bothcauseddifficulties.Thesewereaddressedearlyon by increasingtherole of thetouchscreen,andincludingadaptablevelocities.

7. Discussion

This paperdescribeda mobileroboticassistantfornursesandelderly residentsin assistedliving facili-ties.Thesystemhasbeentestedsuccessfullyin exper-imentsin anassistedliving facility. Theexperimentsweresuccessfulin two main dimensions.First, theyprovidedsomeevidencetowardsthefeasibility of us-ing autonomousmobilerobotsasassistantsto nursesand institutionalizedelderly. Second,they demon-

8

0

0.5

1

1.5

2

2.5

3


Ave

rage

# o

f act

ions

to s

atis

fy r

eque

st

<

User Data -- Time to Satisfy RequestPOMDP Policy

ConventionalMDP Policy

2.0

2.5

1.631.9 1.86

1.42

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


Ave

rage

Err

ors

per

Act

ion

=

User Data -- Error Performance

ConventionalMDP Policy

POMDP Policy

0.1 0.10.18

0.55

0.825

0.36

0

10

20

30

40

50

60


Ave

rage

Rew

ard

per

Act

ion

>

User Data -- Reward AccumulationPOMDP Policy

Conventional MDP Policy52.2

36.95

49.72

24.8

6.19

44.95

(a) (b) (c)

Figure5. EmpiricalcomparisonbetweenPOMDPs(with uncertainty, shown in gray)andMDPs(no uncertainty,shown in black)for high-level robotcontrol,evaluatedon datacollectedin theassistedliving facility. Shown aretheaveragetimeto taskcompletion(a),theaveragenumberof errors(b), andtheaverageuser-assigned(notmodelassigned)reward(c), for theMDP andPOMDP. Thedatais shown for threeusers,with good,averageandpoorspeechrecognition.

stratedthat variousprobabilistic tracking and plan-ningtechniquesarewell-suitedto solveproblemsper-tainingto human-robotinteractions.

One of the key lessonslearnedwhile developingthis robot is that theelderlypopulationrequirestech-niquesthatcancopewith individual differences(e.g.walking speed),age-relateddecline (e.g. memoryloss) andnoisy perception(e.g. poor speechrecog-nition). We view theareaof assistive technologyasaprimesourcefor greatAI problemsin thefuture.

References

[1] J.L.AmbiteandC.A. Knoblock.Planningby rewriting. JAIR,15:207-261.2001.

[2] Y. Bar-ShalomandT. E. Fortmann.Tracking andData Asso-ciation. AcademicPress.1998.

[3] A.W. Black, P. Taylor, and R. Caley. The Festival SpeechSynthesisSystem. Universityof Edinburgh.1999.

[4] W. Burgard, A.B., Cremers,D. Fox, D. Hahnel, G. Lake-meyer, D. Schulz,W. Steiner, andS. Thrun. The interactivemuseumtour-guiderobot. AAAI-98.

[5] D. Colbry, B. PeintnerandM.E. Pollack. Executionmonitor-ing with quantitative temporalbayesiannetworks. AIPS-02.

[6] F. Dellaert,D. Fox, W. Burgard,andS. Thrun. Monte Carlolocalizationfor mobilerobots.ICRA-99.

[7] T. Dietterich. TheMAXQ methodfor hierarchicalreinforce-mentlearning.ICML-98.

[8] A. Doucet,N. deFreitas,andN.J.Gordon,editors.SequentialMonteCarlo MethodsIn Practice. Springer. 2001.

[9] A. Doucet,N. de Freitas,K. Murphy, andS. Russell. Rao-Blackwellised particle filtering for dynamic bayesiannet-works. UAI-2000.

[10] M.M. Dowds, and K. Robinson Assistive technologyformemoryimpairment:PalmtopcomputersandTBI. In Brain-tree Hospital Traumatic Brain Injury NeurorehabilitationConference. 1996.

[11] S. Dubowsky, F. Genot,S. Godding,H. Kozono,A. Skwer-sky, H. Yu, andL. Yy. PAMM: Aid to the elderly for mo-bility assistanceandmonitoring: A helpinghandfor the el-derly. In Proceedingsof the IEEE InternationalConferenceonRoboticsandAutomation, pages570–576,2000.

[12] G. Engelberger. Services. In Handbook of IndustrialRobotics,. JohnWiley andSons.1999.

[13] D.C. Englebart. A conceptualframework for the augmen-tation of man’s intellect. In Vistasin InformationHandling.SpartanBooks.1963.

[14] R.H.Friendman.Automatedtelephoneconversationto assesshealthbehavior anddeliver behavioral interventions.Journalof MedicalSystems. 22:95-101.1998.

[15] D. M. Gavrila. The visualanalysisof humanmovement:Asurvey. ComputerVision and Image Understanding. 73(1).1999.

[16] B. Graf. Reactive navigationof anintelligentroboticwalkingaid. In ProceedingsIRS-2000, pages252–259,Montreal,CA,2000.

[17] N.A. HershandL. Treadgold. Neuropage:The rehabilita-tion of memorydysfunctionby prostheticmemoryandcue-ing. NeroRehabilitation. 4:187-197.1994.

9

(a) (b)

(c) (d)

(e) (f)

Figure7. Exampleof a successfulguidanceexperi-ment: a) Pearlpicksup thepatientoutsideherroom,b) remindsher of a physiotherapy appointment,c)guidesthe personto the physiotherapy department,d) entersthedepartment,e) satisfiesa requestfor theweatherreport,andf) terminatesthe interactionandleaves.

[18] L.P. Kaelbling,M.L. Littman,andA.R. Cassandra.Planningandactingin partially observablestochasticdomains.Artifi-cial Intelligence. 101.1998.

[19] N. Kirsch, S.P. Levine, M. Fallon-Kureger, L.A. Jaros. Themicrocomputerasan’orthotic’ device for pationswith cogni-tivedeficits.Journalof HeadTraumaRehabilitation. 2:77-86.

[20] G. Lacey. Adaptive sharedcontrol of a robot mobility aid.In Field andServiceRobotics, pages25–30.SpringerVerlag,1999.

[21] G. Lacey and K.M. Dawson-Howe. The application ofroboticsto a mobility aid for theelderlyblind. RoboticsandAutonomousSystems. 23.1998.

[22] G. Lakemeyer, editor. NotesSecondInternationalWorkshoponCognitiveRobotics. Berlin. 2000

[23] R. Levinson. Peat-thelanning and executionassistantandtrainer. RoboticsandAutonomoussystems. 23:245-252.

[24] C.E.McCarthy, andM. Pollack. A Plan-BasedPersonalizedCognitive Orthotic. AIPS-2002.

[25] M. Montemerlo,J. Pineau,N. Roy, S. Thrun andV. VermaExperientswith aMobile RoboticGuidefor theElderlyAAAI-2002

[26] M. Pollack Planningtechnologyfor intelligentcognitive or-thoticsAIPS-2002.

[27] M. Ravishankar. Efficient algorithmsfor speechrecognition.1996.InternalReport.

[28] H.A. Rowley, S. Baluja, and T. Kanade. Neural network-basedfacedetection.IEEE Transactionson PatternAnalysisandMachineIntelligence. 20(1).1998.

[29] D. Schulz,W. Burgard,D. Fox, andA. Cremers. Trackingmultiple moving targetswith a mobile robot usingparticlesfilters andstatisticaldataassociation.ICRA-2001.

[30] S. Singh,M. Kearns,D. Litman, andM. Walker. Reinforce-mentlearningfor spokendialoguesystems.NIPS-2000.

[31] S. Thrun, M. Beetz,M. Bennewitz, W. Burgard,A.B. Cre-mers,F. Dellaert,D. Fox, D. Hahnel,C. Rosenberg, N. Roy,J. Schulte,and D. Schulz. Probabilisticalgorithmsand theinteractive museumtour-guiderobot Minerva. InternationalJournal of RoboticsResearch. 19(11).2000.

[32] I. Tsamardinos.Constraint-BasedTemporal ReasoningAl-gorithsmwith Applicationsto Planning. Ph.D.Dissertation.Universityof Pittsburgh IntelligentSystemsProgram.2001.

[33] US Departmentof Health and Human Services. Health,Unitedstates,1999.Healthandagingchartbook,1999.

[34] G. Wasson,J. Gunderson,S. Graves,andR. Felder. An as-sistive robotic agentfor pedestrianmobility. In Proceedingsof theInternationalConferenceonAutonomousAgents, pages169–173,2001.

Probablistic Control of Human Robot Interaction: Experiments ...jpineau/files/jpineau-iarp02.pdf1...

Documents

Transcript of Probablistic Control of Human Robot Interaction: Experiments ...jpineau/files/jpineau-iarp02.pdf1...