Group Perception Methods to Support Human-Robot Teamingcseweb.ucsd.edu/~amt062/SoCal2019.pdf · is...

2
Group Perception Methods to Support Human-Robot Teaming Angelique Taylor and Laurel D. Riek Computer Science and Engineering, Univ. of California San Diego {amt062, lriek}@eng.ucsd.edu I. I NTRODUCTION The field of robotics is growing at a vast pace with robot deployments in everyday environments such as hospitals, schools, and malls. On average, 70% of people in these environments are in groups [1]: they walk, work, and interact in groups 1 . Therefore, robots need a high-level understanding of groups, which reveals many exciting technical and socio- technical challenges to enable them to fluently assist and interact with groups. Yet much prior work in human-robot interaction (HRI) focuses on dyadic interaction. This prevents robots from understanding how teams work together as well as how to work alongside them. Thus, the goal of this paper is to design perception (computer vision) methods to enable robots to work seamlessly in a group. Here, we discuss our work to date as well as our future work on a group detection and tracking system, group motion forecasting, and a robotic system to enable robots to navigation and interact among groups. As a first step to achieving this goal, we conducted an in- depth analysis of state-of-the-art of group perception methods within the computer vision and social signal processing (SSP) fields. This analysis identified open challenges that need addressing in order for robots to effectively collaborate and work with groups in real-world settings [2]. For example, most group perception methods employ fixed, overhead cameras (i.e., an exo-centric / third-person perspective) to sense groups of people, rendering them impractical for mobile robots. Instead, perception methods based on an ego-centric (i.e., first- person) perspective are more suitable for mobile robots, to enable them to enter any environment and accomplish their goals without external sensing requirements. Next, we proposed a theoretical framework that enables robots to perceive groups of people and their level of affiliation (a sense of belonging), inspired by teamwork models from the social sciences [3]. This framework reflects four principles which robots can leverage to behave appropriately in groups, including: 1) the proximity principle which states that peo- ple tend to join groups in close proximity to them, 2) the elaboration principle which states that groups are dynamic systems that grow in complexity, 3) the similarity principle which states that people tend to stay in groups longer when they share common goals and interests, 4) the complementarity 1 Groups are defined as two or more people in close proximity to each other with a common motion goal. principle which states that people tend to stay in groups longer when they have mutually beneficial characteristics [4]. Computationally, this framework is reflected in three stages: group perception, group detection and tracking, and path planning among groups. Following this, we designed, implemented, and evaluated a new ego-centric, unsupervised group detection system, the Robot Group Estimation Model (RoboGEM) [5], [6]. His- torically, prior group perception work has several limitations in that it tends to: (1) focus on exo-centric perspective ap- proaches, (2) use data captured in well-controlled environ- ments and therefore cannot support real-world situations, and (3) use supervised learning methods which may potentially fail when robots encounter new situations. In contrast, RoboGEM is unsupervised and works well on ego-centric, real-world data, where both pedestrians and the robot are in motion at the same time. RoboGEM outperforms the current top-performing method by 10% in terms of accuracy, and 50% in terms of recall, and it can be used in real-world environments to enable robots to work in teams [6]. In subsequent work, we expanded the scope of RoboGEM by adding a new tracking algorithm and improving its group detection method (RoboGEM 2.0). RoboGEM 2.0 performs well in crowded environments using a tracking-by-detection approach and, to the best of our knowledge, is the first tracking method to leverage deep learning. This provides a more robust affinity representation than prior work (which employs hand- crafted feature representations), and can be used for group data association. RoboGEM 2.0 is based on the intuition that pedestrians are most likely in groups when they have similar trajectories, ground plane coordinates, and proximities. It also includes new methods for group tracking that employ Convolutional Neural Network (CNN) feature maps for group data association, and Kalman filters to track group states over time. This work achieved state-of-the-art performance, and will be submitted for publication this summer. II. METHOD One pressing challenge with deploying robots in real-world environments is ensuring the safety of pedestrians around them. Acquiring a high-level understanding of social dynamics enables robots to effectively predict future trajectories of pedestrians using group motion inferences. This will enable robots to navigate in heavily populated environments and can potentially address the “freezing the robot” problem - that

Transcript of Group Perception Methods to Support Human-Robot Teamingcseweb.ucsd.edu/~amt062/SoCal2019.pdf · is...

Page 1: Group Perception Methods to Support Human-Robot Teamingcseweb.ucsd.edu/~amt062/SoCal2019.pdf · is to design perception (computer vision) methods to enable robots to work seamlessly

Group Perception Methods to SupportHuman-Robot Teaming

Angelique Taylor and Laurel D. RiekComputer Science and Engineering, Univ. of California San Diego

{amt062, lriek}@eng.ucsd.edu

I. INTRODUCTION

The field of robotics is growing at a vast pace with robotdeployments in everyday environments such as hospitals,schools, and malls. On average, 70% of people in theseenvironments are in groups [1]: they walk, work, and interactin groups1. Therefore, robots need a high-level understandingof groups, which reveals many exciting technical and socio-technical challenges to enable them to fluently assist andinteract with groups. Yet much prior work in human-robotinteraction (HRI) focuses on dyadic interaction. This preventsrobots from understanding how teams work together as wellas how to work alongside them. Thus, the goal of this paperis to design perception (computer vision) methods to enablerobots to work seamlessly in a group. Here, we discuss ourwork to date as well as our future work on a group detectionand tracking system, group motion forecasting, and a roboticsystem to enable robots to navigation and interact amonggroups.

As a first step to achieving this goal, we conducted an in-depth analysis of state-of-the-art of group perception methodswithin the computer vision and social signal processing (SSP)fields. This analysis identified open challenges that needaddressing in order for robots to effectively collaborate andwork with groups in real-world settings [2]. For example, mostgroup perception methods employ fixed, overhead cameras(i.e., an exo-centric / third-person perspective) to sense groupsof people, rendering them impractical for mobile robots.Instead, perception methods based on an ego-centric (i.e., first-person) perspective are more suitable for mobile robots, toenable them to enter any environment and accomplish theirgoals without external sensing requirements.

Next, we proposed a theoretical framework that enablesrobots to perceive groups of people and their level of affiliation(a sense of belonging), inspired by teamwork models from thesocial sciences [3]. This framework reflects four principleswhich robots can leverage to behave appropriately in groups,including: 1) the proximity principle which states that peo-ple tend to join groups in close proximity to them, 2) theelaboration principle which states that groups are dynamicsystems that grow in complexity, 3) the similarity principlewhich states that people tend to stay in groups longer whenthey share common goals and interests, 4) the complementarity

1Groups are defined as two or more people in close proximity to each otherwith a common motion goal.

principle which states that people tend to stay in groupslonger when they have mutually beneficial characteristics [4].Computationally, this framework is reflected in three stages:group perception, group detection and tracking, and pathplanning among groups.

Following this, we designed, implemented, and evaluateda new ego-centric, unsupervised group detection system, theRobot Group Estimation Model (RoboGEM) [5], [6]. His-torically, prior group perception work has several limitationsin that it tends to: (1) focus on exo-centric perspective ap-proaches, (2) use data captured in well-controlled environ-ments and therefore cannot support real-world situations, and(3) use supervised learning methods which may potentially failwhen robots encounter new situations. In contrast, RoboGEMis unsupervised and works well on ego-centric, real-world data,where both pedestrians and the robot are in motion at thesame time. RoboGEM outperforms the current top-performingmethod by 10% in terms of accuracy, and 50% in terms ofrecall, and it can be used in real-world environments to enablerobots to work in teams [6].

In subsequent work, we expanded the scope of RoboGEMby adding a new tracking algorithm and improving its groupdetection method (RoboGEM 2.0). RoboGEM 2.0 performswell in crowded environments using a tracking-by-detectionapproach and, to the best of our knowledge, is the first trackingmethod to leverage deep learning. This provides a more robustaffinity representation than prior work (which employs hand-crafted feature representations), and can be used for groupdata association. RoboGEM 2.0 is based on the intuitionthat pedestrians are most likely in groups when they havesimilar trajectories, ground plane coordinates, and proximities.It also includes new methods for group tracking that employConvolutional Neural Network (CNN) feature maps for groupdata association, and Kalman filters to track group states overtime. This work achieved state-of-the-art performance, andwill be submitted for publication this summer.

II. METHOD

One pressing challenge with deploying robots in real-worldenvironments is ensuring the safety of pedestrians aroundthem. Acquiring a high-level understanding of social dynamicsenables robots to effectively predict future trajectories ofpedestrians using group motion inferences. This will enablerobots to navigate in heavily populated environments and canpotentially address the “freezing the robot” problem - that

Page 2: Group Perception Methods to Support Human-Robot Teamingcseweb.ucsd.edu/~amt062/SoCal2019.pdf · is to design perception (computer vision) methods to enable robots to work seamlessly

Fig. 1. This shows the trajectory of contributions of our work which includes a group detection algorithm, group tracking algorithm, group motion forecastingalgorithm, and a robotic system to enable robots to navigation and interact among groups.

is, when a robot halts due to high levels of uncertainty infuture trajectories of pedestrians. However, it is challengingto predict the future trajectory of groups due to chaoticmotion trajectories of individual pedestrians, self-occlusionswithin groups, and dynamic sensor motion. Unlike prior groupforecasting methods that rely on exo-centric based sensing anduse trajectory clusters to better predict the motion intentions ofpedestrians, our approach infers the intended motion of groupsfrom an ego-centric perspective [7].

We developed a new group motion forecasting method usinginspiration from characteristic crowd movement patterns inthe computer vision literature (c.f. [7]). Our system generatesa high-level representation of group trajectories that enablerobots to navigate more safely and effectively in real-worldenvironments. It uses a Long-Short Term Memory RecurrentNeural Network to perform group motion forecasting over longperiods of time. Furthermore, we employed a hidden spatialpooling layer to model the interactions between pedestriansto predict the future motion of groups. This enables robotsto better understand the trajectory of social interactions inhuman-centered environments. We will evaluate our method ona real-world dataset and compare it to state-of-the-art methods.

We will design a system that enables robots to navigateamong groups and socially interact with them. Our system willenable real-time deployment of mobile robots in real-worldenvironments and enable them to predict the future motionintentions of groups to ensure pedestrian safety. Furthermore,it will enable robots to join and participate in group interac-tions which will generate data to be used to build artificialintelligence (AI) systems. These AI systems will inspire robotdesigners to design methods to infer the affiliative state ofgroups and begin generating social behavior for robots.

Our system will have four modules for: 1) group detec-tion and tracking, 2) group motion forecasting, 3) naviga-tion among groups, 4) group interaction. We will use ouraforementioned systems for the group detection, tracking, andgroup motion forecasting modules. The navigation modulewill employ the Robot Operating System (ROS) navigationstack, and the group interaction module will employ severalrobot behaviors such as provide humor, talk about fun facts,or discuss recent events.

III. EVALUATION

We will employ mixed methods to evaluate the performanceof our system. First, we will use computational metrics toevaluate the robot’s ability to navigate in a natural and safemanner. These metrics include: the robot’s ability to avoidobstacles, the smoothness of its trajectory, and the robot’sability to avoid interrupting/disturbing groups. Next, we willuse qualitative metrics to evaluate human perception of therobot and its level of safety. These metrics include: people’slevel of comfort, how they perceived the robot’s behavior,and their perceived level of safety. To ensure that our systemperforms robustly in real-world settings, we will performiterative testing of our system throughout its development.Additionally, we will conduct several experiments with peoplefrom various educational, age, and cultural backgrounds toensure that our findings reflect diverse, real-world conditionsand human perspectives.

IV. DISCUSSION

This research contributes novel perception methods thatenable robots to effectively identify groups, track them overtime, infer their future trajectories, and navigate and interactamong them in real-world settings. Our work will providetransformative advances in HRI and robotics. Also, our workwill enable more robust, realistic HRI, and support safeoperation of mobile robots in human-centered environments[8].

REFERENCES

[1] M. Moussaı̈d, N. Perozo, S. Garnier, D. Helbing, and G. Theraulaz, “Thewalking behaviour of pedestrian social groups and its impact on crowddynamics,” PloS one, 2010.

[2] A. Taylor and L. D. Riek, “Robot perception of human groups in the realworld: State of the art,” in 2016 AAAI Fall Sym. Series.

[3] ——, “Robot affiliation perception for social interaction,” in 2017 Work-shop on Robots in Groups and Teams at CSCW.

[4] D. R. Forsyth, “Group dynamics,” in Cengage Learning, 2009.[5] A. Taylor and L. D. Riek, “Robot-centric human group detection,” in

2018 Workshop on Social Robots in the Wild at ACM/IEEE Intern. Conf.on HRI.

[6] ——, “Robot-centric perception of human groups,” in 2019 ACM Trans.on HRI (THRI). In Review.

[7] S. Wu, H. Yang, S. Zheng, H. Su, Y. Fan, and M. Yang, “Crowd behavioranalysis via curl and divergence of motion trajectories,” 2017 Intern. J.of Computer Vision.

[8] D. M. Chan, A. Taylor, and L. D. Riek, “Faster robot perception usingsalient depth partitioning,” in IEEE Intern. Conf. on Intell. Robots andSys. (IROS), 2017.