D4.1: Scienti c report on Modeling and Prediction of Human ... · D4.1: Modeling and Prediction of...

3rd HANDFP7-ICT-2013-10-610878

1 October 2013 (48months)

D4.1:

Scientific report on Modeling and Prediction of

Human Intent for Primitive Activation

TUDa

〈[email protected], [email protected]〉Due date of deliverable: M12Actual submission date: M13Lead Partner: InriaPartners: TUDa, Inria, USTTRevision: draftDissemination level: PU

This deliverable reports the modeling of the human-robot collaborative be-havior used to recognize actions and activate robot primitives. This reportdescribes a new framework based on probabilistic interaction primitives forcreating such a collaborative model. The assumption that human-robottrajectories can be linearly correlated in an appropriate lower dimensionalspace forms the basis of our approach. We have investigated multi patternsof interaction with a mixture of models, and also how to generalize proba-bilistic primitives when manipulating objects of different sizes and shapes.We also discuss how this work is currently being integrated with the worksin deliverable D3.1.

1 Tasks, objectives, results 31.1 Planned work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Actual work performed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Relation to the state-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Annexes 7

1

D4.1: Modeling and Prediction of Human Intent TUDa

Executive Summary

This report discusses the first year achievements in human action recog-nition and activation of the corresponding robot movement primitive forcollaborative tasks. Throughout the first year we have introduced Interac-tion Probabilistic Movement Primitives [MEL+14, ENL+15] (in short, in-teraction primitives), as a framework aimed at probabilistically modelinghuman-robot collaborative tasks. Interaction primitives are used to predictthe human intent (action recognition), which is then used to activate andcoordinate the movement primitive of the robot assistant. To generalize therobot grasping to different objects, we have also investigated how parame-ters extracted from point-cloud images can be used to warp known objectsinto new objects [BKP14], which are then used to modify the movementprimitives. Currently we are addressing the close connections between theinteraction primitives and higher level plans of deliverable D3.1. Specifically,we are considering generalization to new scenarios by making use of rela-tional policies that satisfy an underlying reward with the work of [MPG+15],and the pre-triggering of robot primitives with the recognition of assemblyphases as proposed in [BML+15].

So far progress has been evaluated by experiments with Darias, therobotic platform based in TU-Darmstadt, consisting of a 7DoF KUKAlightweight arm with a DLR-HIT five-fingered hand. We have shown proof-of-concept experiments where a toolbox is assembled with the help of Dariasacting as a third hand.

Role of Modeling and Prediction of Human Intentfor Primitive Activation in 3rdHand

This is an essential part of an assistive robot framework as, ultimately,the action that the robot takes must not only be correlated to the humanobservations, but must also be represented as a primitive that can be suitablyused by a low-level control loop. Action recognition for primitive activationis useful to inform the logical learning method in [MPG+15] in order toimprove higher level decisions. Although not addressed during the firstyear, future work will leverage on the interplay between the detection ofpair-wise interaction phases and situational awareness [BML+15, MMRL15],described in D3.1, with action recognition at the primitive level. Combinedwith D3.1, our role is to provide algorithms for assistive/collaborative robotpolicies at both relational and movement primitive level of representation.

3rd HAND 2


Contribution to the 3rdHand scenario

In this deliverable we have developed collaborative movement primitivesin multi-task scenarios, by using both supervised [MEL+14] and unsuper-vised [ENL+15] learning; and also proposed the generalization of primitivesfor grasping different objects based on warping functions [BKP14]. The con-trol accuracy of Darias under the policies generated by the proposed methodsusing hand over of different objects during the assembly of a toolbox werealso evaluated.

1 Tasks, objectives, results

1.1 Planned work

As a deliverable specified within Work Package 4 (WP4) (Learning Primi-tives for Cooperative Manipulation) the planned work is largely motivatedby the following requirements

• Our method must be able to model the mutual dependencies of twoagents, particularly the interaction between a human and a robot

• The robot must be proactive, it must predict and execute a task byreasoning about the next, most probable human action

• In the case of unsatisfactory response, the robot skills must be modifiedon-demand by means of natural human instruction, ideally by simplyobserving a demonstration from the human coworker, rather than bydirect kinesthetic teaching

• The robot must be able to compose complex actions based on simplercollaborative primitives

• The robot must optimize its own primitives in order to increase theefficiency of the interaction

1.2 Actual work performed

During the first year we have been able: (1) to develop and validate theuse of interaction primitives for handover of different parts during assemblytasks [MEL+14], (2) to generalize probabilistic primitives during graspingusing warping functions [BKP14], and (3) to address multiple interactionpatters between the human coworker and the robot assistant [ENL+15] us-ing unlabeled data, (4) we have been able to record trajectories and posesof objects with high-precision magnetic trackers (USTT), motion capture(Inria), and off-the-shelf Kinect and cameras (UIBK, TUDa).

3rd HAND 3


robot

human

(a) Handing over a plate (b) Handing over a screw (c) Holding the screw driver

robothuman

robot

human

Figure 1: Current results where the robot acts as a third hand by handingover a plate or a screw depending on the action recognized. The robot isalso able to hold and release a tool such that the coworker can have bothhands free.

We evaluated the physical capabilities and control accuracy of the Dariasplatform by proposing hand over experiments during the assembly of a tool-box. We could accomplish grasping of small and delicate parts, such as thehead of a five millimeter screw, by using auxiliary stands for objects.

Figure 1 shows the three subtasks that were used during the experiments.The bottom row shows the demonstrated pairs of human-robot trajectories.Videos of our current results in collaborative interaction primitives are avail-able from the two links:http://youtu.be/2Ok6KQQQDNQ related to the published material [MEL+14], andhttp://youtu.be/9XwqW_V0bDw for the submitted publication [ENL+15].

There are many problems to be solve in order to fulfill all requirements ofWork Package 4 of the 3rdHand proposal. Great part of future work relates tothe integration of several of the works described in D3.1. While we have been ableto recognize a task by observing the current human movement, we have not yetimplemented preemptive action in the interaction primitive framework. In the shortterm, we expect to add anticipative/preemptive robot behavior to the interactionprimitive framework, in the sense that primitives can be pre-triggered by learninga sequence of co-related chain of tasks. This can be achieved by identifying such asequence using conditional random fields with the work of temporal segmentationof pair-wise interactions [BML+15].

The results of this deliverable are constrained to a specific scenario, and theassembly of a toolbox can not be generalized since it is based on a pure imitationlearning approach. We will incorporate Relational Inverse Reinforcement Learn-ing [MPG+15] reported in D3.1 to provide generalization to interaction primitivessuch that different tasks can be accomplished from the same demonstrations.

Interaction primitives must also seamlessly integrate with optimal trajectories

3rd HAND 4


provided by the Open Robot Simulator of USTT (described in see D5.1). We havebeen working on the parsing of whole trajectories into segments encoded in a skilllibrary by using the algorithms of [LMPN15], described in D3.1.

Also, current work is addressing the use of motion capture data from two collab-orators to generate feasible trajectories for Darias. The outcome of this work willfree the coworker from doing kinesthetic teaching. Instead the worker is expectedto simply execute the task by himself, while the vision/motion capture system isused to capture his/her measurements.

1.3 Relation to the state-of-the-art

Action recognition based on data analysis and classification of interactions betweenmultiple persons has long been addressed within the computer vision community,in particular due to interests in visual surveillance tasks, e.g., tracking of multi-ple pedestrians. Action recognition has motivated several works based on hiddenMarkov models (HMMs) and graphical models in general, such as Oliver et al.[ORP00], [LON10], and [BAVE+13] for example. Inference on the graphical modelallows a robot to anticipate human activity and choose a corresponding, prepro-grammed robot response. Wang et al. [WMD+13] propose the intention-drivendynamics model. Koppula et al. [KS13] use a conditional random field with sub-activities, human poses, object affordances and object locations over time.

Also motivated by a robot assistant, Tanaka et al. [TKSK12] use a Markovmodel to predict the positions of a worker in an assembly line. The space in whichthe worker moves is discretized into different regions. A Gaussian mixture modelrelates positions to procedures. Using this information a robot, then, delivers toolsand parts to a human worker along the assembly line. Llorens et al. [LBA] presenthybrid design for a to arm robot to be used on the shoulder. Colored Petri Netshave been used to model high level transitions, e.g., the robot holds the panel inposition or the screw is in position, etc. Petri Nets accounts for discrete controltransitions while at the motion level, Partial Least Squares Regression has beenused to find the best action of the robot at future time steps.

Interaction dynamics need to be specified in a way that allows for robust repro-duction of the collaborative task under different external disturbances, and a com-mon approach is based on direct force sensing or emulation. Rozo et al. [RCC+13]proposed a framework for haptic collaboration between a human and a robot ma-nipulator. Given a set of kinesthetic demonstrations, their method learns a map-ping between measured forces and the impedance parameters used for actuatingthe robot, e.g., the stiffness of virtual springs governing the collaborative task. Inanother force-based approach, Lawitzky et al. [LMLH12] proposed learning phys-ical assistance in a collaborative transportation task. In the early learning phase,the robot uses the measured force values to follow the human guidance during thetask. Recorded force and motion patterns are then used to learn a Hidden MarkovModel (HMM) which can predict the human’s next action, and over time the robotlearns to take over a more active role in the interaction. Kulvicius et al. [KBA+13]also address a transportation task where the two agents are modeled as two pointparticles coupled by a spring. The forces applied by the other agent tell the robothow to adapt its own trajectory.

Although graphical models and HMMs have a history of successful applicationsfor action and intention recognition in a discretized symbolic level, the generation

3rd HAND 5


of the commands that determine the interaction dynamics at the low level is usu-ally addressed by a different representation, e.g. a lower-level HMM [LON10] orDMPs [GNIU14]. This needs is clear in the work of [LBA] where Petri nets are usedto control the state of the actions in a higher level representations while the actionsitself are predicted by partial least-square regression. As shown in [MEL+14], theprincipal distinction of our method in relation to the vast bodies of work of discretegraphical models and continuous interaction dynamics is that it provides solutionsto both problems with a single representation that fully correlates human-robotinteraction that can be easy to acquire by imitation learning.

References

[BAVE+13] Heni Ben Amor, David Vogt, Marco Ewerton, Erik Berger, BernhardJung, and Jan Peters. Learning responsive robot behavior by imitation.In Proceedings of the 2013 IEEE/RSJ International Conference onIntelligent Robots and Systems (IROS), pages 3257–3264, 2013.

[BKP14] S. Brandl, O. Kroemer, and J. Peters. Generalizing manipulationsbetween objects using warped parameters. In Proceedings of the In-ternational Conference on Humanoid Robots (HUMANOIDS), 2014.

[BML+15] Andrea Baisero, Yoan Mollard, Manuel Lopes, Marc Toussaint, andIngo Lutkebohle. Temporal segmentation of pair-wise interactionphases in sequential manipulation demonstrations. In Submitted to:Proceedings of 2015 IEEE International Conference on Robotics andAutomation (ICRA), 2015.

[ENL+15] M. Ewerton, G. Neumann, R. Lioutikov, H. Ben Amor, J. Peters, andG. Maeda. Learning multiple collaborative tasks with a mixture ofinteraction primitives. In Submitted to: Proceedings of 2015 IEEEInternational Conference on Robotics and Automation (ICRA), 2015.

[GNIU14] A Gams, B. Nemec, AJ. Ijspeert, and A Ude. Coupling movementprimitives: Interaction with the environment and bimanual tasks.Robotics, IEEE Transactions on, 30(4):816–830, Aug 2014.

[KBA+13] Tomas Kulvicius, Martin Biehl, Mohamad Javad Aein, Minija Tamo-siunaite, and Florentin Worgotter. Interaction learning for dynamicmovement primitives used in cooperative robotic tasks. Robotics andAutonomous Systems, 61(12):1450–1459, 2013.

[KS13] Hema Swetha Koppula and Ashutosh Saxena. Anticipating humanactivities using object affordances for reactive robotic response. InRobotics: Science and Systems, 2013.

[LBA] Baldin Llorens-Bonilla and H Harry Asada. A robot on the shoulder:Coordinated human-wearable robot control using coloured petri netsand partial least squares predictions.

[LMLH12] M. Lawitzky, J.R. Medina, Dongheui Lee, and S. Hirche. Feedbackmotion planning and learning from demonstration in physical roboticassistance: differences and synergies. In Intelligent Robots and Systems(IROS), 2012 IEEE/RSJ International Conference on, pages 3646–3652, Oct 2012.

3rd HAND 6


[LMPN15] R. Lioutikov, G. Maeda, J. Peters, and G. Neumann. A parser forconstructing movement primitive libraries. In In preparation, 2015.

[LON10] Dongheui Lee, Christian Ott, and Yoshihiko Nakamura. Mimeticcommunication model with compliant physical contact in humanhu-manoid interaction. The International Journal of Robotics Research,29(13):1684–1704, 2010.

[MEL+14] G. Maeda, M. Ewerton, R. Lioutikov, H. Ben Amor, J. Peters, andG. Neumann. Learning interaction for collaborative tasks with proba-bilistic movement primitives. In Proceedings of the International Con-ference on Humanoid Robots (HUMANOIDS), 2014.

[MMRL15] Yoan Mollard, Thibaut Munzer, Pierre Rouanet, and Manuel Lopes.Learning and representing object assembly tasks. In under preparation,2015.

[MPG+15] Thibaut Munzer, Bilal Piot, Mathieu Geist, Olivier Pietquin, andManuel Lopes. Inverse reinforcement learning in relational domains.In submitted to AAAI’15, 2015.

[ORP00] Nuria M Oliver, Barbara Rosario, and Alex P Pentland. A bayesiancomputer vision system for modeling human interactions. PatternAnalysis and Machine Intelligence, IEEE Transactions on, 22(8):831–843, 2000.

[RCC+13] L. Rozo, S. Calinon, D. G. Caldwell, P. Jimenez, and C. Torras. Learn-ing collaborative impedance-based robot behaviors. In AAAI Confer-ence on Artificial Intelligence, Bellevue, Washington, USA, 2013.

[TKSK12] Yasufumi Tanaka, Jun Kinugawa, Yusuke Sugahara, and Kazuhiro Ko-suge. Motion planning with worker’s trajectory prediction for assemblytask partner robot. In Intelligent Robots and Systems (IROS), 2012IEEE/RSJ International Conference on, pages 1525–1532. IEEE, 2012.

[WMD+13] Z. Wang, K. Muelling, M. P. Deisenroth, H. Ben Amor, D. Vogt,B. Schoelkopf, and J. Peters. Probabilistic movement modeling forintention inference in human-robot interaction. (7):841–858, 2013.

2 Annexes

• Learning Interaction for Collaborative Tasks with Probabilistic MovementPrimitives.Maeda, G.; Ewerton, M.; Lioutikov, R.; Amor, H.; Peters, J. Neumann, G.Accepted for publication: Proceedings of the International Conference onHumanoid Robots (HUMANOIDS), 2014

This paper introduces the use of Probabilistic Movement Primitives (ProMPs)for interaction. This paper uses the TUDa-based experimental platform as athird hand to perform the assembly of a toolbox.

Abstract:This paper proposes the formulation of a probabilistic interaction modelbased on movement primitives for robots that work in collaboration with

3rd HAND 7


a human partner. Since the human partner can execute a variety of unfore-seen tasks a requirement of our system is that the assistant robot must beable to adapt and learn new skills on-demand, without the need of an expertprogrammer. Thus, this paper leverages on the framework of imitation learn-ing and also on its application to human-robot interaction using the recentlyintroduced concept of Interaction Primitives (IPs). We present the use ofProbabilistic Movement Primitives (ProMPs) as an interaction frameworkthat both recognizes the action of a human and generates the appropriatemovement primitive of the assistant robot. We evaluate our method on ex-periments using a lightweight arm interacting with a human partner and alsousing motion capture trajectories of two humans assembling a box. The ad-vantages of ProMPs in relation to the original formulation for interaction areexposed.

• Learning Multiple Collaborative Tasks with a Mixture of Interaction Primi-tivesEwerton, M.; Neumann, G.; Lioutikov, R.; Amor, H.; Peters, J.; Maeda,G. Submitted to: 2015 IEEE International Conference on Robotics and Au-tomation (ICRA)

This paper proposes a mixture of interaction primitives to learn differentcollaborative tasks from unlabeled training data. This paper uses the TUDabased experimental platform as a third hand to perform the assembly of atoolbox.

Abstract:Learning human-robot interaction from demonstration presents a number ofopen issues. This paper addresses the problem of learning interaction fromunlabeled demonstrations. Our proposed method builds upon the frameworkof Interaction Primitives to encode the parameters that describe human-robot interaction in a lower dimensional space. Due to the fact that onesingle Interaction Primitive does not apply across different interaction sce-narios, it is usually assumed that the training data is labeled and one model islearned for each label. Our contribution is to propose unsupervised learningof models of different human-robot interactions. This work endows Interac-tion Primitives with the capability of performing clustering and conditioningon non-linear distributions using Gaussian Mixture Models (GMMs). We val-idate our algorithm by assembling a box with the help of a KUKA lightweightarm. We record the movements of a human using a motion capture systemand the movements of the robot, while being driven by kinesthetic teach-ing. Given a number of unlabeled demonstrations of different interactions,our algorithm learns a number of probabilistic interaction models. In thetest phase, we apply conditioning to infer the reaction of the robot given themovement of the human.

• Generalizing Manipulations Between Objects using Warped ParametersBrandl, S.; Kroemer, O.; Peters, J. Proceedings of the International Confer-ence on Humanoid Robots (HUMANOIDS), 2014

This paper proposes grasping generalization by extracting parameters ofpoint-cloud data in the form of warping functions. The method is evalu-ated in with a pouring task using the 3rdHand experimental platform atTU-Darmstadt.

3rd HAND 8


Abstract:One of the key challenges for learning manipulation skills is generalizing be-tween different objects. The robot should adapt both its actions and thetask constraints to the geometry of the object being manipulated. In this pa-per, we propose computing geometric parameters of novel objects by warpingknown objects to match their shape. We refer to the parameters computedin this manner as warped parameters, as they are defined as functions of thewarped objects point cloud. The warped parameters form the basis of thefeatures for the motor skill learning process, and they are used to generalizebetween different objects. The proposed method was successfully evaluatedon a pouring task both in simulation and on a real robot.

3rd HAND 9

Learning Interaction for Collaborative Tasks with ProbabilisticMovement Primitives

Guilherme Maeda1, Marco Ewerton1, Rudolf Lioutikov1, Heni Ben Amor2, Jan Peters1,3, Gerhard Neumann1

Abstract— This paper proposes a probabilistic frameworkbased on movement primitives for robots that work in col-laboration with a human coworker. Since the human coworkercan execute a variety of unforeseen tasks a requirement of oursystem is that the robot assistant must be able to adapt andlearn new skills on-demand, without the need of an expertprogrammer. Thus, this paper leverages on the frameworkof imitation learning and its application to human-robot in-teraction using the concept of Interaction Primitives (IPs).We introduce the use of Probabilistic Movement Primitives(ProMPs) to devise an interaction method that both recognizesthe action of a human and generates the appropriate movementprimitive of the robot assistant. We evaluate our methodon experiments using a lightweight arm interacting with ahuman partner and also using motion capture trajectories oftwo humans assembling a box. The advantages of ProMPs inrelation to the original formulation for interaction are exposedand compared.

I. INTRODUCTION

While the traditional use of robots is to replace hu-mans in dangerous and repetitive tasks we motivate thispaper by semi-autonomous robots that assist humans. Semi-autonomous robots have the ability to physically interactwith the human in order to achieve a task in a collaborativemanner. The assembly of products in factories, the aiding ofthe elderly at home, the control of actuated prosthetics, andthe shared control in tele-operated repetitive processes arejust a few examples of application.

Only recently, physical human-robot interaction becamepossible due advances in robot design and safe, compli-ant control. As a consequence, algorithms for collaborativerobots are still in the early stages of development. Assistanceposes a variety of challenges related to the human presence.For example, Fig. 1 illustrates a robot assistant that helps ahuman to assemble a box. The robot must not only predictwhat is the most probable action to be executed based onthe observations of the worker (to hand over a screw driveror to hold the box) but also the robot movement must becoordinated with the worker movement. Pre-programming arobot for all possible tasks that a worker may eventually needassistance with is unfeasible. A robot assistant must be able

1Intelligent Autonomous Systems Lab, Technische Universitaet Darm-stadt, 64289 Darmstadt Germany. Correspondence should be addressed [email protected]

2Institute for Robotics and Intelligent Machines, Georgia Institute ofTechnology, 801 Atlantic Drive, Atlanta, GA 30332-0280, USA.

3Max Planck Institute for Intelligent Systems Spemannstr. 38, 72076Tuebingen, Germany

Fig. 1. Illustration of two collaborative tasks where a semi-autonomousrobot helps a worker assembling a box. The robot must predict what isthe action to execute, to hand over the screw driver or to hold the box. Itsmovement must also be coordinated relative to the location at which thehuman worker executes the task.

to learn the interaction and to adapt to a variety of unforeseentasks without the need of an expert programmer.

Motivated by the described scenario, this work proposesthe use of imitation learning [1] in the context of collabora-tion. Imitation learning has been widely used as a method toovercome the expensive programming of autonomous robots.Only recently, however, its application for physical interac-tion has been introduced under the concept of InteractionPrimitives (IP) by Lee at al. in [2], defined as skills thatallow robots to engage in collaborative activities with ahuman partner by Ben Amor et al. in [3].

Leveraging on the framework of [3], our approach is basedon probabilistically modeling the interaction using a distribu-tion of observed trajectories. We propose using ProbabilisticMovement Primitives (ProMPs) [4] for modeling such adistribution. In a manufacturing scenario such a distributionof trajectories can be obtained by observing how two co-workers assemble a product, several times throughout theday, providing a rich data set for imitation learning. Such acollection of trajectories is used to create a prior model of theinteraction in a lower dimensional weight space. The modelis then used to recognize the intention of the observed agentand to generate the movement primitive of the unobservedagent given the same observations. The movement primitiveof the unobserved agent can then be used to control a robotassistant.

The main contribution of this paper is the introductionof the Probabilistic Movement Primitives [4] in the contextof imitation learning for human-robot interaction and action

recognition. We will show how Interaction ProMPs canbe applied to address the three main problems previouslyillustrated in Fig. 1, that is: (a) learning a collaborative modelby imitation learning and thus avoiding expert programming,(b) the ability to recognize a task by observing the worker,and (c) the coordination of the assistant movement in relationto the worker movement. We also show the advantages ofProMPs over the original DMP-based framework [3], andpresent an algorithm for aligning data using local optimiza-tion in order to avoid the issue of slope constraints typicalof dynamic time warping.

II. RELATED WORK

The data-driven analysis and classification of interactionsbetween multiple persons has long been addressed within thecomputer vision community. In particular visual surveillancetasks, e.g., tracking of multiple pedestrians, require methodsfor identifying the occurrence and type of person-to-personinteractions. In a seminal paper, Oliver et al. [5] showthat hidden Markov models (HMMs), and more generallygraphical models, are suited for representing the mutualdependencies of the behaviors between interacting agents.Graphical models have gained popularity in the field ofhuman-robot interaction as they naturally include temporalinformation into the inference process and the Bayesiansemantics provides a simple way to encode prior knowledge.In [6], Lee et al. use a hierarchical HMM to learn andrepresent responsive robot behaviors. In their approach, ahigh-level HMM identifies the current state of the inter-action and triggers low-level HMMs which correspond tothe robot’s motor primitives. In order to ensure that therobot adapts to the movement of the human partner, virtualsprings are attached between markers on the human body andcorresponding positions on the robot. In a similar vein, BenAmor et al. [7] use a path-map HMM to model interactionsin cooperative tasks. In their approach, a backbone of sharedhidden states correlates the actions of the interacting agents.

Tanaka et al. [8] use a Markov model to predict thepositions of a worker in an assembly line. The space inwhich the worker moves is discretized into different regions.A Gaussian mixture model relates positions to procedures.Using this information a robot, then, delivers tools andparts to a human worker along the assembly line. BesidesHMMs, other probabilistic graphical models have also beenused to address interaction tasks. Koppula et al. [9] use aconditional random field with sub-activities, human poses,object affordances and object locations over time. Inferenceon the graphical model, allows a robot to anticipate humanactivity and choose a corresponding, preprogrammed robotresponse. Wang et al. [10] propose the intention-drivendynamics model, which models human intentions as latentstates in graphical model. Intentions can be modeled asdiscrete variables, e.g., action labels, or continuous variables,e.g., an object’s final position. The transitions between latentstates and the mapping from latent states to observationsare modeled via Gaussian Processes. As evidenced bythese works, graphical models can be very powerful tools

in classifying interactions. However, this often requires asubstantial set of training data. In particular for humanoidmotion generation with many degrees-of-freedom, it is oftenchallenging to acquire sufficiently large and general data sets.

For more efficient learning and generalization, variousauthors investigated the projection of the original trajecto-ries into a new, low-dimensional space where correlationsbetween the agents are easier to unravel. Llorens et al. [11]show how such a low-dimensional interaction space canbe used to implement an assistive robot arm. Similarly in[7], probabilistic principal component analysis is used tofind a shared latent space. Dynamic Movement Primitives(DMPs) [12] allows for a low-dimensional, adaptive repre-sentation of a trajectory. The general idea is to encode arecorded trajectory as dynamical systems, which can be usedto generate different variations of the original movement. Inthe context of interaction, Prada et al. [13] present a modifiedversion of DMPs, that adapts the trajectory of one agent to atime-varying goal. By setting the goal to the wrist of anotheragent, the method can be used to generate handover motions.

Although graphical models and HMMs have been suc-cessfully used for action and intention recognition in adiscretized symbolic level, the generation of trajectoriesfor the continuous dynamic control of the robot is usuallyaddressed by a different level of representation (e.g. a lower-level HMM [6] or DMPs). In relation to the previouslycited works, here, we propose a framework based solelyon a continuous movement representation that is used toboth recognize actions and generate trajectories in the formof movement primitives; mainly leveraging on DMP-basedInteraction Primitives [3] and Probabilistic Movement Prim-itives (ProMPs) [4]. By using ProMPs rather than DMPs ourprosed method naturally correlates different agents directlyin the same space in which observations are made, sinceobservations of a task are usually given by their trajectories.This is an advantage in relation to the original frameworkof [3] since the representation of collaboration in the spaceof accelerations/forces due to the use of DMPs obfuscatesthe algorithm and increases its sensitivity to noise in theobservations.

III. PROPOSED METHOD

This section briefly introduces Probabilistic MovementPrimitives for a single degree of freedom as presented in [4]and proposes its extension for interaction and collaboration.Although not covered in this work, in its original proposition,the design of a feedback controller that tracks the distributionof trajectories is also part of ProMPs and the interestedreader is referred to [4] for details; here we assume theexistence of a human-safe standard feedback controller suchas a low-gain PD controller. This section also exposes themain characteristics of the interaction framework based onDMPs in [3] and its relation to the approach of this paper.Finally, a simple local optimization algorithm is proposedfor aligning several demonstrations provided by a human.

A. ProMPs for a Single DOF

For the purposes of the following derivations we generi-cally refer to each joint or Cartesian coordinates of a humanor robot simply as a degree of freedom (DOF) with positionq and velocity q. Starting with the case of a single DOF, wedenote y(t) = [q(t) q(t)]T and a trajectory as a sequenceτ = {y(t)}t=0,...T . We adopt linear regression with nGaussian basis functions ψ. The state vector y(t) can thenbe represented by a n-dimensional column vector of weightsw as

y(t) =

[q(t)q(t)

]=

[ψ(t)

ψ(t)

]w + εy, (1)

where Ψt = [ψ(t), ψ(t)]T is a 2×n dimensional time-dependent basis matrix and εy∼N (0,Σy) is zero-mean i.i.d.Gaussian noise. The probability of observing the wholetrajectory is then

p(τ |w) =T∏

0

N (y(t)|Ψtw,Σy). (2)

Similar to DMPs the speed of the execution of the move-ment is decoupled from the speed of the original trajectoryby using a phase variable z(t). The phase variable replacesthe time in order to control the location of the basis functionswith ψ(z(t)). For simplicity we will use z(t) = t such thatψ(t) = ψ(z(t)) while remembering that any monotonicallyincreasing function can be used [4].

Each trajectory is now represented by a low-dimensionalvector w since the number of basis is usually much smallerthan the number of time steps. Trajectory variations obtainedby different demonstrations are captured by defining thedistribution over the weights p(w|θ), where θ is the learningparameter. The probability of the trajectory becomes

p(τ |θ) =∫p(τ |w)p(w|θ)dw. (3)

So far θ captures the correlation among the weights withinthe trajectory and between demonstrations of the same DOF.

B. ProMPs for Collaboration

The key aspect for the realization of the interaction prim-itives is the introduction of a parameter θ that captures thecorrelation of all DOFs of multiple agents. Assuming thatthe distribution of trajectories of different agents is normal,then p(w;θ) = N (w|µw,Σw). Under this assumption weredefine the vector of weights w to account for all degrees offreedom of multiple-agents. Following the definitions in [3]we will refer to the assisted human as the observed agent,and assume that he/she provides the observed DOFs of themodel (e.g by motion capture). The robot will be referred toas the controlled agent.

For an observed agent with P DOFs and a controlledagent with Q DOFs, we construct a row weight vector byconcatenating the trajectory weights

wd = {[wT1 , ...w

Tp , ...,w

TP ]

o, [wT1 , ...w

Tq , ...,w

TQ]

c} (4)

where wd is the augmented weight vector correspondingto the d-th demonstration, wp is the n-dimensional columnvector of weights of the p-th DOF of the observed agent, andwq is the vector of weights of the q-th DOF of the controlledagent. The mean and covariance are then computed bystacking all demonstration weights

µw = mean([w1, ..., wd, , ..., wD]T ),

Σw = Cov([w1, ..., wd, , ..., wD]T ),(5)

where D is the number of demonstrations.Gaussian conditioning can then be applied on-line as each

new observation is made using recursive updates in the form

µ+w = µ−

w +K(y∗(t)−HTt µ

−w)

Σ+w = Σ−

w −K(HTt Σ−

w)

K = Σ−wH

Tt (Σ

∗y +H

Tt Σ+

wHt)−1,

(6)

where K is the Kalman gain matrix, y∗(t) is the observedvalue at time t, Σ∗

y is the measurement noise, and the upper-scripts − and + the values before and after the update. Theobservation matrix Ht is block diagonal and each diagonalentry contains the 2×n basis [ψ(t), ψ(t)]T for each observedjoint

Ht =

Ψt . . . 0...

. . ....

0 . . . Ψt

(7)

In the collaboration case only measurements of the ob-served agent are provided. By maintaining consistency withdefinition (4) where the entries of the observed agentcome before the controlled agent, the mean is then µw =[µo

w µcw]

T and the observation matrix Ht is partitioned as

Ht =

(Ψot )(1,1) 0 0 0

0 (Ψot )(P,P ) 0 0

0 0 0c(1,1) 0

0 0 0 0c(Q,Q)

(8)

where each zero entry is of 2×n dimension. Note that if onlypositions of the observed agent are provided (Ψo

t )(p,p) =[ψ(t), 0(t)]T .

In general, since (6) is a full state linear estimator, anypartial combination of observations (for example when y∗

only contains positions, or velocities, or a mixture of both)provides the optimal estimate of states µ+

w and their uncer-tainty Σ+

w .

C. Action Recognition for Primitive Activation

Here we use the ProMP framework in a multi-task scenariowhere each task is one encoded by one interaction primitive.

Consider a specific task s ∈ {1, ..,K} and assume thatfor each task an Interaction ProMP has been generated as itwas proposed in section III-B. Using the recursive notationof (6), the upper script (·)− refers to the parameters of theInteraction ProMP updated up to the previous observation,that is θ−s = {µ−

w ,Σ−w}s. The probability of the observation

at a subsequent time t given the parameters θ−s of one ofthe tasks is

p(y∗(t);θ−s ) =∫p(y∗(t)|Ψtw,Σ

∗y)p(w|θ−s )dw (9)

= N (y∗(t)|Ψtµ−w ,ΨtΣ

−wΨt + Σ∗

y). (10)

The task s can now be recognized by applying Bayes rule

p(s|y∗(t)) =p(y∗(t)|θ−s )p(s)∑Kk=1 p(y

∗(t)|θ−k )p(k), (11)

where p(s) is the initial probability of the task (e.g. p(s) =1/K for uniform distribution). We will evaluate Eqs. (9)-(11)using real collaboration data in the experimental section ofthis paper.

D. Relation to Interaction DMPs

It is now straightforward to relate our proposed methodwith the previous interaction primitives based on DMPs[3]. The principal difference is that in the framework ofinteraction DMPs the weights are mapped from the forcingfunction f(t) as opposed to the positions q(t). Using thelinear basis-function model

f(t) = ψ(t)Tw, (12)

where ψ(t) are the normalized Gaussian basis functions.Similarly to the ProMP case a distribution of weights p(w)is learned based on several demonstrations of a task.

For each DOF, the forcing function adds a nonlinearbehavior on the movement which complements a linear andstable spring-damper system

q = [αy(βy(g − q)− q/τ) + f(t)]τ2, (13)

where g is the goal attractor, αy , βy are user-defined pa-rameters that characterize the spring-damper behavior and τcontrols the speed of execution. For details on DMPs pleaserefer to [12] and references therein.

When using imitation learning a demonstration is executedand measurements are usually given in the form of positions,which must be differentiated twice such that the forcingfunction can be computed

f(t) = q/τ2 − αy(βy(g − q)− q/τ). (14)

Referring back to (6) the Gaussian conditioning is nowbased on the observation of forces or accelerations, thatis y∗(t) = f(q, (·), t)∗. As our evaluations will show,the fact that forces are usually computed using secondderivatives of the position can be restrictive for applicationswith asynchronous or sparse measurements as the observedaccelerations needed for conditioning are hard to obtain inthis case. In contrast, in the ProMP framework, it is possibleto directly condition on the observed quantities, i.e., theposition of the agent.

E. Time Warping with Local Optimization

One issue of imitation learning for trajectories is thatmultiple demonstrations provided by humans are usually,sometimes severely, warped in time. Demonstrations mustbe unwarped or time-aligned before the distribution of theweights can be computed. Here we propose aligning trajec-tories by taking one of the demonstrations as a reference yrand using local optimization of the time warping functionwith

tj+1w = vj0 + g(v

j)tjw, (15)

where tjw represents a vector containing the warped time ofdemonstration yw at the j-th iteration of the optimization.

We propose g as a smooth, linear Gaussian-basis-modelwith P weights vj = [vj1, ..., v

jP ] as the parameters to be

optimized. The extra parameter vj0 is used to shift the timewhich is useful when the reference and warped trajectoriesare, in fact, identical but start at different instants. Theoptimization is initialized with vj0 = 0 and tjw = tr forj=1. The parameters vj are optimized with gradient descentto decrease the absolute cumulative distance between thereference and warped trajectories

v = argminv

K∑

k=0

|yr(tr(k))− yw(vj0 + g(vj)tjw)|. (16)

While Dynamic Time Warping (DTW) [14] is widelyused for such problems, our local method forces alignmentwithout “jumping” the indexes of the warped time vectorwhich is an usual outcome of DTW and renders unrealisticand non-smooth trajectories. While this problem is usuallyminimized by imposing a slope constraint [14], the useof a smooth function g not only avoids the tunning ofthis parameter but also preserves the overall shape of thetrajectory.

IV. EXPERIMENTS

This section presents results on a simple simulated sce-nario to compare the differences between the original workof Interaction DMPs with Interaction ProMPs. Next, weevaluate the accuracy of Interaction ProMPs for generatingreference trajectories for an anthropomorphic robot armconditioned on the movement of a human. Finally, we willshow experimental results with Interaction ProMPs used ina collaborative scenario of a box assembly to both recognizeand predict the action of two collaborators.

A. Comparison with Interaction DMPs

In a typical interaction task the observations of a coworkermight arrive asynchronously, at irregular periods of time, forexample, when the measurement signal is prone to interrup-tion (a typical case is occlusion in motion capture systems).Fig. 2 (a) illustrates a simple case where both observed andcontrolled agents have a single joint each. The training datawas created by sketching two sets of trajectories on a PCscreen using a computer mouse. We than use these two setsas proxies of the observed and controlled agents resultingon the initial distribution of trajectories (in blue). The upper

Agent A (Observed)

X a

mpl

itude

Time (s)0 0.5 1 1.5 2 2.5 3

-0.2

0

0.2

0.4

0.6

0.8

Agent B (Controlled)

X a

mpl

itude

Time (s)0 0.5 1 1.5 2 2.5 3

-0.2

0

0.2

0.4

0.6

0.8

Agent A (Observed)

X a

mpl

itude

Time (s)0 0.5 1 1.5 2 2.5 3

-0.2

0

0.2

0.4

0.6

0.8

Agent B (Controlled)

X a

mpl

itude

Time (s)0 0.5 1 1.5 2 2.5 3

-0.2

0

0.2

0.4

0.6

0.8

Sparse observations

Initial distribution Current prediction

(a) (b)

Noisy observation

Fig. 2. Two scenarios where the Interaction ProMPs are advantageousover Interaction DMPs. (a) Sparse and asynchronous observations. (b) Noisystream of observed data (σ2 = 0.04). The patches represent the ± 2σdeviations from the mean.

Interaction DMP

RM

S p

redi

ctio

n er

ror

Observed trajectory (%)20 40 60 80 100

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Interaction ProMP

RM

S p

redi

ctio

n er

ror

Observed trajectory (%)20 40 60 80 100

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16var:0var:0.01var:0.02var:0.03var:0.04

Fig. 3. Root-mean-square prediction error of the movement of collaboratorB as a function of the number of observed samples of the trajectory ofcollaborator A. The different bars indicate the amount of noise added to theobservation of the position of collaborator A.

plot shows the prediction (in green) of the observed agentafter observing four measurements. Note that following (4)the predicted mean µ+

w has the dimension of the augmentedweight vector, that is, if each single-DOF agent trajectoryis encoded by n basis functions µ+

w is a column vector ofsize 2n. The bottom figure depicts the predicted distributionof the controlled agent. Note that the same experiment cannot be reproduced with Interaction DMPs as the secondderivative on such sparse measurement is hard to computeand introduce innacuracies on the representation of the trueforce.

In Fig. 2 (b) the ProMP is being conditioned on a constantsynchronous stream of noisy position measurements. Theplot shows the case where the true trajectory is corruptedwith a Gaussian noise with variance σ2 = 0.04. InteractionDMPs suffer from noisy position measurements as the obser-vation must be differentiated twice to compute the forcingfunction. While low-pass filters alleviate this problem, theintroduction of phase lag is an issue that can be potentially

(a)

(b)

Y

Z

Test

Training

Fig. 4. An interactive task where the robot has to point at the same positionpreviously pointed by the human. The robot, however, has no exteroceptivesensors and its predicted trajectory is based solely on the correlation withthe observed human movement. (a) The nine positions used to create theInteraction ProMP (dot markers) and the extra nine positions used to testthe method (cross markers). (b) An example where the human points at thetest position #1 and the robot points to the same position.

avoided with ProMPs.Fig. 3 compares the prediction error over the whole trajec-

tory of Interaction DMPs and ProMPs given the same noisyobserved data. With DMPs the error is greatly influenced bythe amount of noise while ProMPs show much less sensi-tivity. For the case where the full trajectory of collaboratorA is observed (indicated by the arrow) the prediction errorincreased by a factor of five times using the InteractionDMPs when noise ranged from a clean signal to a signalof noise variance 0.04. In contrast, the error deteriorates bya factor of two with Interaction ProMPs.

B. Robot Control with Interaction ProMPs

We evaluated the ability of Interaction ProMPs in gen-erating the appropriate movement primitive for controllinga robot based on observations of a human partner. Theexperiment consisted in measuring the [x, y, z] trajectorycoordinates of the wrist of an observed agent via motioncapture1 while pointing at a certain position on a tableplaced in front of our robot (a 7-DOF anthropomorphicarm with a 5-finger hand). Then, the robot was moved ingravity compensation mode to point with its index fingerat the same position on the table while its joint positionswere being recorded (kinesthetic teaching). This pair of

1All human positions were measured in relation to the world referenceframe located at the torso of the robot (as shown in Fig. 4(b)).

q1(d

eg)

Time (s)0 2 4 6

-20

0

20

40

60

q2(d

eg)

Time (s)0 2 4 6

30

40

50

60

70

80

q3(d

eg)

Time (s)0 2 4 6

20

40

60

80

100

120

q4(d

eg)

Time (s)0 2 4 6

-100

-80

-60

-40

-20

0

q5(d

eg)

x (m

)

Time (s)0 1 2 3

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

y (m

)

Time (s)0 1 2 3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

z (m

)

Time (s)0 1 2 3

-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

(b)

x (m

)

Time (s)0 1 2 3

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

y (m

)

Time (s)0 1 2 3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

z (m

)

Time (s)0 1 2 3

-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

q1(d

eg)

Time (s)0 2 4 6

-20

0

20

40

60

q2(d

eg)

Time (s)0 2 4 6

20

30

40

50

60

70

80

q3(d

eg)

Time (s)0 2 4 6

20

40

60

80

100

120

q4(d

eg)

Time (s)0 2 4 6

-100

-80

-60

-40

-20

0

q5(d

eg)

Time (s)0 2 4 6

80

90

100

110

120

q6(d

eg)

Time (s)0 2 4 6

-32

-30

-28

-26

-24

-22

q7(d

eg)

Time (s)0 2 4 6

56.965

56.97

56.975

56.98

56.985

56.99

x(m

)

Time (s)0 2 4 6

25

30

35

40

45

50

y(m

)

-25

-20

-15

-10

(a)

Hum

anR

obot

ObservedInitial distribution Prediction Predicted robot trajectory ObservedInitial distribution Prediction Predicted robot trajectory

Fig. 5. The upper row shows the human Cartesian coordinates of the wrist. The bottom row shows the first four joints of the 7-DOF robotic arm. (a)Conditioned results of the test position #6. (b) Conditioned results of the test position #8.

trajectories formed by the Cartesian positions of the wristand the joint positions of the arm where then mapped intothe weight space and concatenated as in Eq. (4). In total,nine different positions were pointed to collect training data,sparsely covering an approximate circular area of diameter30 cm. The pointed positions are shown in Fig. 4(a) by thedots.

After creating the Interaction ProMPs, as described inSection III-B, we defined extra nine marked positions shownin Fig. 4(a) by the crosses. The human then pointed at oneof the crosses while motion capture was used to measure thetrajectory of the wrist. These observations were then usedto condition the Interaction ProMP to predict trajectories foreach joint of the robot, whose mean values where used asreference for a standard trajectory tracking inverse dynamicsfeedback controller with low gains.

Fig. 4(b) shows one example of the interactive task wherethe human pointed to the position marked by the cross #1,which was not part of the training; the robot was capable ofpointing to the same position. Note that the robot was notprovided with any exteroceptive feedback, such as cameras,to reveal the location of the pointed position. Althoughthe robot was not directly “aware” of the position of thepointed cross, the interaction primitive provides the robot thecapability to predict what movement to make based solelyon the observed trajectories of the partner.

Figure 5 shows two examples on the conditioned interac-tion primitives when the human pointed at positions #6 in (a)and #8 in (b) (refer back to Fig. 4(a) for the physical positionof the crosses). The first row in each subplot shows the[x, y, z] coordinates of the wrist. The second row shows thefirst four joints of the robot, starting from the shoulder joint.Since we are only interested in the final pointing position, theinteraction primitive was conditioned on the final measure-ments of the wrist position. As positions #6 and #8 werephysically distant from each other, the difference betweentheir predicted trajectories were quite large in relation toeach other, roughly covering the whole span of the prior

Test number (#)(Cross markers)

XY

poi

ntin

g er

ror (

cm)

0 1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

3

Fig. 6. Accuracy of the pointed positions by the robot when using the testpositions given by the cross markers. The error was computed by taking theactual pointed position and the true position of the markers on the table.

distribution (in blue) for some certain DOFs of the arm.Figure 6 shows the distance error on the plane of the

table between the position pointed by the robot and its trueposition. The robot was able to reasonably point even atlocations at the limits of the training data such as position#1, #7, and #8 (see Fig. 4). The maximum error was of3 cm, or 10% in relation to the total area covered by thetraining points (approximately a circle of diameter 30 cm).The experiments show that the physical movement of therobot is clearly conditioned by the position indicated by thehuman (see the accompanying video2).

Note that this not a precision positioning experiment; themarkers on the wrist were not fixed in a rigid, repeatablemanner, neither the finger of the robot could be positionedwith milimetric precision during the kinesthetic teachingphase. The framework of Interaction ProMPs allows, how-ever, to seamlessly integrate additional sensing to increaseaccuracy in precision tasks. This is naturally achieved adjust-ing the observation vector y∗ in (6) and the zero entries in(8) to include new sensory information such as the referenceposition of a hole in which the robot must insert a peg.

C. Action Recognition for Primitive Activation

While in the previous experiments interaction primitiveswere evaluated for the case of a single task, here we show

2also available from http://youtu.be/2Ok6KQQQDNQ

Start

Start

Start

Start Start

Start

Agent A Agent B Agent A Agent B

Pick screw

Hand over

Screwing

Pick screw driver

Reaching box

Box flipped

Holdiing box

Grasping plate

Agent A Agent B

(a) Hand over (b) Fastening (c) Plate insertion

Fig. 7. Three collaborative tasks involved when assembling a box by two co-workers. From left to right, the photos show the hand over of a screw, thefastening of the screw where one agent grasps the screw driver while the other holds the box steadily, and the insertion of a plate, which requires oneagent to flip the box such that the slot becomes accessible to the other agent. The distribution of aligned demonstrations for each task are shown undertheir respective photos. The plot shows the covariance in the x-y plane at each corresponding z height.

how Interaction ProMPs can be used for recognizing theaction of the observed agent and to select the appropriatedesired movement primitive of the controlled agent. Thiscapability allows the robot to maintain a library of severaltasks encoded as Interaction ProMPs and to activate theappropriate primitive based on the observation of the currenttask.

As shown in the photos of Fig. 7, we collected collabo-rative data in the form of the Cartesian coordinate positionsof the wrists of two humans assembling a box. As in theprevious experiment of section IV-B, all measurements weretaken in relation to the torso of the robot. The collaboratoron the right plays the role of the observed agent while thecollaborator at the left plays the role of the controlled agent.The controlled agent will be referred to as the predicted agentsince he/she can not be controlled.

In the ”hand-over” task shown in Fig. 7(a), the observedagent stretches his hand as a gesture to request a screw.The predicted agent then grasps a screw sitting on the tableand hand it over to the collaborator. In the ”fastening” taskshown in Fig. 7(b), the observed agent grasps an electricalscrewdriver. The predicted agent reacts by holding the boxfirmly while the observed agent fastens the screw. In the”plate insertion” task shown in Fig. 7(c), the observed agentgrasps the bottom plate of the box. The predicted agent thenflips the box such that the slots to which the plate slides inare directed towards the observed agent.

Each task is repeated 40 times. All trajectories are alignedusing the method described in Section III-E. The alignedtrajectories are shown in Fig. 7 under their respective photosas three-dimensional plots for each of the tasks where thecovariance in x-y directions are shown at the corresponding

x (c

m)

Time (s)0 2 4

20

30

40

50

60

70

80

y (c

m)

Time (s)0 2 4

10

20

30

40

50

60

70

z (c

m)

Time (s)0 2 4

-40

-30

-20

-10

0

10

x (c

m)

Time (s)0 2 4 6

20

30

40

50

60

70y

(cm

)

Time (s)0 2 4 6

25

30

35

40

45

50

55

60

z (c

m)

Time (s)0 2 4 6

-40

-30

-20

-10

0

10

(a) Interaction model: fastening task

(b) Interaction model: hand over task

ObservedInitial distribution Current prediction

Fig. 8. Action recognition based on conditioning the movement primitivesof the observed agent. In this example the observations of the fastening taskalso overlaps with the primitives of the hand over task.

heights (Z direction) of the movement. Interaction ProMPsare created for each task using the distribution of alignedtrajectories.

We evaluated action recognition using Eqs. (9)-(11) onthe three presented tasks for box assembly. Fig. 8 showsone evaluation as an example. Note from the figure that themajority of observations indicate that the fastening task istaking place. The last five observations (surrounded by theellipse), however, fits both tasks and could be a potentialsource of ambiguity in task recognition. Even in those

Number of observations

Pro

babi

lity

of th

e ta

sk

0 2 4 6 8 10 12 14 16

0

0.2

0.4

0.6

0.8

1


Pro

babi

lity

of th

e ta

sk

0 2 4 6 8 10 12 14 16

0

0.2

0.4

0.6

0.8

1


Pro

babi

lity

of th

e ta

sk

0 2 4 6 8 10 12 14 16

0

0.2

0.4

0.6

0.8

1

Number of observations60 80 100

hand_overfasteningplate_insertion


Like

lihoo

d of

the

task

0 20 40 60 80 1000

0.2

0.4

0.6

0.8 fasteningplate_insertion

Number of observations0 20 40 60 80 100

0

0.2

0.4

0.6

0.8hand_overfasteningplate_insertion

(c)

(a)

(b)

Fig. 9. Action recognition given three different Interaction ProMPs, one foreach task involved in assembling the box. The three Interaction ProMPs areconditioned on the same observations of the observed agent. Probabilitiesof tasks are shown as a function of the number of observations along thetrajectory of the observed agent. (a) Recognition of the hand over task. (b)Recognition of the fastening task. (c) Recognition of the plate insertion task.

cases, ProMPs can clearly distinguish among tasks as shownby the plots in Fig. 9 where the probabilities of the taskare given as a function of the number of observations.Subplots (a), (b) and (c) show the task recognition for thefastening, hand over and plant insertion tasks, respectively.In general, we observed that 3-5 observations are required toachieve a 100% certainty for each task. (The last part of theaccompanying video shows our method controlling the robotassistant to assembly a box with recognition of two differenthandover tasks).

V. CONCLUSION

This paper introduced a method for collaboration suitedfor new applications using semi-autonomous robots whosemovements must be coordinated with the movements of ahuman partner. By leveraging on the original framework ofInteraction Primitives [3] we proposed the use of ProMPsfor the realization of primitives that capture the correlationbetween trajectories of multiple agents. This work com-pared the main differences between DMPs and ProMPs forinteraction and advocates the later for applications wheremeasurements are noisy and/or prone to interruption. Usinga 7-DOF lightweight arm we evaluated the capability ofInteraction ProMPs in generating the appropriate primitivefor controlling the robot in an interactive task involvinga human partner. We also proposed a method for taskrecognition that naturally fits the ProMP framework.

Our current work addresses the use of mixture-models to

automatically generate different Interaction ProMPs withouta priori hand labeling of multiple tasks. We are also inves-tigating tasks in which some of the involved DOFs do notcorrelate linearly and also when certain tasks do not inducecorrelation. The later is especially true for tasks where themovement of the agents are not related by causality.

VI. ACKNOWLEDGMENTS

The research leading to these results has received fundingfrom the European Community’s Seventh Framework Pro-grammes (FP7-ICT-2013-10) under grant agreement 610878(3rdHand) and (FP7-ICT-2009-6) under grant agreement270327 (ComPLACS). The authors would like to acknowl-edge Filipe Veiga, Tucker Hermans and Serena Ivaldi fortheir assistance during the preparation of this manuscript.

REFERENCES

[1] S. Schaal, “Is imitation learning the route to humanoid robots?” Trendsin cognitive sciences, vol. 3, no. 6, pp. 233–242, 1999.

[2] D. Lee, C. Ott, and Y. Nakamura, “Mimetic communication modelwith compliant physical contact in humanhumanoid interaction,” TheInternational Journal of Robotics Research, vol. 29, no. 13, pp. 1684–1704, 2010.

[3] H. Ben Amor, G. Neumann, S. Kamthe, O. Kroemer, and J. Peters,“Interaction primitives for human-robot cooperation tasks,” in Pro-ceedings of 2014 IEEE International Conference on Robotics andAutomation (ICRA), 2014.

[4] A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Probabilisticmovement primitives,” in Advances in Neural Information ProcessingSystems (NIPS), 2013, pp. 2616–2624.

[5] N. Oliver, B. Rosario, and A. Pentland, “A bayesian computer visionsystem for modeling human interactions,” Pattern Analysis and Ma-chine Intelligence, IEEE Transactions on, vol. 22, no. 8, pp. 831–843,Aug 2000.

[6] D. Lee, C. Ott, and Y. Nakamura, “Mimetic communication modelwith compliant physical contact in human-humanoid interaction,” Int.Journal of Robotics Research., vol. 29, no. 13, pp. 1684–1704, Nov.2010.

[7] H. Ben Amor, D. Vogt, M. Ewerton, E. Berger, B. Jung, and J. Peters,“Learning responsive robot behavior by imitation,” in Proceedings ofthe 2013 IEEE/RSJ International Conference on Intelligent Robots andSystems (IROS), 2013, pp. 3257–3264.

[8] Y. Tanaka, J. Kinugawa, Y. Sugahara, and K. Kosuge, “Motionplanning with worker’s trajectory prediction for assembly task partnerrobot,” in Proceedings of the 2012 IEEE/RSJ International Conferenceon Intelligent Robots and Systems (IROS). IEEE, 2012, pp. 1525–1532.

[9] H. S. Koppula and A. Saxena, “Anticipating human activities usingobject affordances for reactive robotic response.” in Robotics: Scienceand Systems, 2013.

[10] Z. Wang, K. Mulling, M. P. Deisenroth, H. B. Amor, D. Vogt,B. Scholkopf, and J. Peters, “Probabilistic movement modeling forintention inference in human–robot interaction,” The InternationalJournal of Robotics Research, vol. 32, no. 7, pp. 841–858, 2013.

[11] B. Llorens-Bonilla and H. H. Asada, “A robot on the shoulder:Coordinated human-wearable robot control using coloured petri netsand partial least squares predictions,” in Proceedings of the 2014 IEEEInternational Conference on Robotics and Automation, 2014.

[12] A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal,“Dynamical movement primitives: learning attractor models for motorbehaviors,” Neural computation, vol. 25, no. 2, pp. 328–373, 2013.

[13] M. Prada, A. Remazeilles, A. Koene, and S. Endo, “Dynamic move-ment primitives for human-robot interaction: comparison with humanbehavioral observation,” in Proceedings of IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS), 2013, pp. 1168–1175.

[14] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimiza-tion for spoken word recognition,” Acoustics, Speech and SignalProcessing, IEEE Transactions on, vol. 26, no. 1, pp. 43–49, 1978.

Learning Multiple Collaborative Tasks with aMixture of Interaction Primitives

Marco Ewerton1, Gerhard Neumann1, Rudolf Lioutikov1, Heni Ben Amor2, Jan Peters1,3 and Guilherme Maeda1

Abstract— Robots that interact with humans must learn tonot only adapt to different human partners but also to newinteractions. Such a form of learning can be achieved bydemonstrations and imitation. A recently introduced methodto learn interactions from demonstrations is the frameworkof Interaction Primitives. While this framework is limitedto represent and generalize a single interaction pattern, inpractice, interactions between a human and a robot can consistof many different patterns. To overcome this limitation thispaper proposes a Mixture of Interaction Primitives to learnmultiple interaction patterns from unlabeled demonstrations.Specifically the proposed method uses Gaussian Mixture Mod-els of Interaction Primitives to model non-linear correlationsbetween the movements of the different agents. We validateour algorithm with two experiments involving interactive tasksbetween a human and a lightweight robotic arm. In the firstwe compare our proposed method with conventional InteractionPrimitives in a toy problem scenario where the robot and thehuman are not linearly correlated. In the second we present aproof-of-concept experiment where the robot assists a humanin assembling a box.

I. INTRODUCTION

Robots that can assist us in the industry, in the household,in hospitals, etc. can be of great benefit to the society. Thevariety of tasks in which a human may need assistance is,however, practically unlimited. Thus, it is very hard (if notimpossible) to program a robot in the traditional way to assisthumans in scenarios that have not been exactly prespecified.

Learning from demonstrations is therefore a promisingidea. Based on this idea, Interaction Primitive (IP) is aframework that has been recently proposed to alleviate theproblem of programming a robot for physical collaborationand assistive tasks [1], [2]. At the core, IPs are primitivesthat capture the correlation between the movements of twoagents—usually a human and a robot. Then, by observingone of the agents, say the human, it is possible to infer thecontrols for the robot such that collaboration can be achieved.

A main limitation of IPs is the assumption that themovements of the human and the movements of the robotassistant are linearly correlated. This assumption is reflectedin the underlying Gaussian distribution that is used to model

1Intelligent Autonomous Systems Lab, Department of ComputerScience, Technical University Darmstadt, Hochschulstr. 10, 64289Darmstadt, Germany {ewerton, neumann, lioutikov,peters, maeda}@ias.tu-darmstadt.de

2Institute of Robotics and Intelligent Machines, Georgia Instituteof Technology, 801 Atlantic Drive, Atlanta, GA 30332-0280, [email protected]

3Max Planck Institute for Intelligent Systems, Spemannstr. 38, 72076Tuebingen, Germany [email protected]

Human trajectories Robot trajectoriesPlate handover Holding tool Screw handover

Plate handoverHolding toolScrew handover

Plate handoverHolding toolScrew handover

Human trajectories Robot trajectories

Fig. 1. Illustration of a task consisting of multiple interaction patterns,where each can be represented as an Interaction Primitive. In this work, wewant to learn multiple interaction patterns from an unlabeled data set ofinteraction trajectories.

the demonstrations. While this assumption holds for tasksthat cover a small region of the workspace (a high-five taskin [1] or handover of objects in [2]), it limits the use of IPs intwo aspects. First, as illustrated in Fig. 1, a task such as theassembly of a toolbox consists of several interaction patternsthat differ significantly from each other and therefore can notbe captured by a single Gaussian. Moreover, even within asingle interaction pattern, the correlation between the twoagents may not be linear, for example, if the movements ofthe human are measured in the Cartesian space, while themovements of the robot are measured in joint space.

Manually labeling each subtask (e.g. “plate handover",“screw handover", “holding screw driver") is a way to modelinteractions with multiple subtasks. Ideally, however, robotsshould be able to identify different subtasks by themselves.Moreover, it may not be clear to a human how to separatea number of demonstrated interactions in different, linearlycorrelated groups. Thus, a method to learn multiple inter-action patterns from unlabeled demonstrations is necessary.The main contribution of this paper is the developmentof such a method. In particular, this work uses GaussianMixture Models (GMMs) to create a Mixture of Interaction

Probabilistic Movement Primitives [2].The remainder of this paper is organized as follows.

Section II presents related work. In Section III, ProbabilisticMovement Primitives (ProMPs) and Interaction ProMPs arebriefly introduced, followed by the proposition of the maincontribution of this paper: a Mixture of Interaction ProMPsbased on Gaussian Mixture Models (GMMs). Section IVevaluates the proposed method, first on a toy problem that isuseful to clarify the characteristics of the method, and then ona practical application of a collaborative toolbox assembly.Section V presents conclusions and ideas for future work.

II. RELATED WORK

Physical human-robot interaction poses the problem ofboth action recognition and movement control. Interactiondynamics need to be specified in a way that allows forrobust reproduction of the collaborative task under differentexternal disturbances, and a common approach is based ondirect force sensing or emulation. Rozo et al. [3] proposed aframework for haptic collaboration between a human and arobot manipulator. Given a set of kinesthetic demonstrations,their method learns a mapping between measured forces andthe impedance parameters used for actuating the robot, e.g.,the stiffness of virtual springs governing the collaborativetask. In another force-based approach, Lawitzky et al. [4]proposed learning physical assistance in a collaborative trans-portation task. In the early learning phase, the robot uses themeasured force values to follow the human guidance duringthe task. Recorded force and motion patterns are then usedto learn a Hidden Markov Model (HMM) which can predictthe human’s next action, and over time the robot learns totake over a more active role in the interaction. Kulvicius etal. [5] also address a transportation task where the two agentsare modeled as two point particles coupled by a spring. Theforces applied by the other agent tell the robot how to adaptits own trajectory.

Our work differs significantly from the cited works in thesense that our method does not use nor emulate force signals,but instead learns the correlation between the trajectories oftwo agents. Correlating trajectories not only simplifies theproblem in terms of hardware and planning/control but alsoallows us to correlate multi-agent movements that do notgenerate force during the interaction, for example, the simplegesture of asking and receiving an object.

Graphical models have also been used to describe interac-tion dynamics. In the computer vision community, HMMshave been widely adopted to model interaction dynamicsfrom input video streams [6], [7]. As a result, graphicalmodels have also gained considerable attention in the fieldof human-robot interaction. In [8], Hawkins and colleaguesuse a Bayes network to improve the fluency in a jointassembly task. The Bayes network learns to infer the currentstate of the interaction, as well as task constraints and theanticipated timing of human actions. Tanaka et al. [9] usea Markov model to predict the positions of a worker in anassembly line. Wang et al. [10] propose the Intention-DrivenDynamics Model (IDDM) as a probabilistic graphical model

with observations, latent states and intentions where thetransitions between latent states and the mapping from latentstates to observations are modeled as Gaussian Processes.Koppula et al. [11] use a conditional random field withsub-activities, human poses, object affordances and objectlocations over time. Inference on the graphical model, allowsa robot to anticipate human activity and choose a corre-sponding, preprogrammed robot response. Lee et al. [12]learn a hierarchical HMM which triggers action primitivesin response to observed behaviors of a human partner.

While very successful for classifying actions, graphicalmodels, however, may not be the best option when it comesto generating motions. In [13], for example, the use of aHMM with discrete states, although very successful in actionclassification, introduces artifacts into the motion generationpart that hinders motion generalization. Therefore, a clearproblem in physical human-robot interaction is that whilegraphical models may be suitable in the action recognitiondomain, motion generation at the continuous level must alsobe taken into account. Llorens et al. [14] present hybriddesign for a robot to be used on the shoulder. In their work,Petri Nets accounts for discrete control transitions while atthe motion level, Partial Least Squares Regression has beenused to find the best action of the robot at future time steps.

Perhaps the principal distinction of our method is the useof Interaction Primitives (IPs), introduced by Ben Amor etal. [1] and initially based on dynamical movement prim-itives [15] and later extended to Probabilistic MovementPrimitives [16] with action recognition in the work of Maedaet al. [2]. As shown in [2], Interaction Primitives can beused to not only recognize the action of an agent, but alsoto coordinate the actions of a collaborator at the movementlevel; thus overcoming in a single framework both layers ofdiscrete action recognition and continuous movement con-trol. Differently from [2], where different interaction patternsmust be hand-labeled, our contribution is the unsupervisedlearning of a Mixture of Interaction Primitives.

III. MIXTURE OF INTERACTION PRIMITIVES

In this section, we will briefly discuss the InteractionPrimitive framework based on Probabilistic Movement Prim-itives [2], [16], followed by the presentation of the proposedmethod, based on Gaussian Mixture Models.

A. Probabilistic Movement Primitives

A Probabilistic Movement Primitive (ProMP) [16] is arepresentation of movement based on a distribution overtrajectories. The probabilistic formulation of a movementprimitive allows operations from probability theory to seam-lessly combine primitives, specify via points, and correlatejoints via conditioning. Given a number of demonstrations,ProMPs are designed to capture the variance of the positionsq and velocities q as well as the covariance between differentjoints.

For simplicity, let us first consider only the positions qfor one degree of freedom (DOF). The position qt at time

step t can be approximated by a linear combination of basisfunctions,

qt = ψTt w + ε, (1)

where ε is Gaussian noise. The vector ψt contains the Nbasis functions ψi, i ∈ {1, 2, 3, ..., N}, evaluated at timestep t where we will use the standard normalized Gaussianbasis functions.

The weight vector w is a compact representation of atrajectory1. Having recorded a number of trajectories of q,we can infer a probability distribution over the weights w.Typically, a single Gaussian distibution is used to representp(w). While a single w represents a single trajectory wecan obtain a distribution p(q1:T ) over trajectories q1:T byintegrating w out,

p(q1:T ) =

∫p(q1:T |w)p(w)dw. (2)

If p(w) is a Gaussian, p(q1:T ) is also Gaussian. The distri-bution p(q1:T ) is called a Probabilistic Movement Primitive(ProMP).

B. Interaction ProMP

An Interaction ProMP builds upon the ProMP formula-tion, with the fundamental difference that we will use adistribution over the trajectories of all agents involved inthe interaction. Hence, q is multi-dimensional and containsthe positions in joint angles or Cartesian coordinates of allagents. In this paper, we are interested in the interactionbetween two agents, here defined as the observed agent(human) and the controlled agent (robot). Thus, the vector qis now given as q = [qo, qc]T , where (·)o and (·)c refer tothe observed and controlled agent, respectively.

Let us suppose we have observed a sequence of positionsqot at m specific time steps t, m ≤ T . We will denote thissequence by D. Given those observations, we want to inferthe most likely remaining trajectory of both the human andthe robot.

Defining w = [wTo , w

Tc ]

T as an augmented vector thatcontains the weights of the human and of the robot forone demonstration, we write the conditional probability overtrajectories q1:T given the observations D of the human as

p(q1:T |D) =

∫p(q1:T |w)p(w|D)dw. (3)

We compute a normal distribution of n demonstrationsby stacking several weight vectors [w1, ..., wn]

T , one foreach demonstration, such that w ∼ N (µw,Σw). A posteriordistribution can be obtained after observing D with

µneww = µw +K(D −HT

t µw)

Σneww = Σw −K(HT

t Σw)(4)

1In order to cope with the different speeds of execution during demon-stration, the trajectories must be time-aligned before parameterization. Theinterested reader is referred to [2] for details.

where K = ΣwHTt (ΣD + HT

t ΣwHt)−1, ΣD is the

observation noise, and

Ht =

(ψot )(1,1) 0 0 0

0 (ψot )(P,P ) 0 0

0 0 0c(1,1) 0

0 0 0 0c(Q,Q)

(5)

is the observation matrix where the unobserved states of therobot are filled with zero bases. Here, the human and therobot are assumed to have P and Q DOFs, respectively.

Now, by combining (1), (3) and (4), we can compute theprobability distribution over the trajectories q1:T given theobservation D. For a detailed implementation the interestedreader is referred to [2].

C. Mixture of Interaction ProMPs

The goal of our method is to learn several interactionpatterns given the weight vectors that parameterize our unla-beled training trajectories. For this purpose, we learn a GMMin the weight space, using the Expectation-Maximizationalgorithm (EM) [17].

Assume a training set with n vectors w representing theconcatenated vectors of human-robot weights as defined insection III-B. In order to implement EM for a GMM with anumber K of Gaussian mixture components, we need to im-plement the Expectation step and the Maximization step anditerate over those steps until convergence of the probabilitydistribution over the weights, p(w|α1:K ,µ1:K ,Σ1:K), whereα1:K = {α1, α2, · · · , αK}, µ1:K = {µ1,µ2, · · · ,µK} andΣ1:K = {Σ1,Σ2, · · · ,ΣK}. Here, αk = p(k), µk andΣk are the prior probability, the mean and the covariancematrix of mixture component k respectively. We initialize theparameters α1:K , µ1:K and Σ1:K using k-means clusteringbefore starting the Expectation-Maximization loop. The num-ber K of Gaussian mixture components is found by leave-one-out cross-validation.

The mixture model can be formalized as

p(w) =K∑

k=1

p(k)p(w|k) =K∑

k=1

αkN (w|µk,Σk). (6)

Expectation step: Compute the responsibilities rik, whererik is the probability of cluster k given weight vector wi.

rik = p(k|wi) =N (wi|µk,Σk)αk∑Kl=1 αlN (wi|µl,Σl)

(7)

Maximization step: Update the parameters αk, µk andΣk of each cluster k, using

nk =n∑

i=1

rik, αk =nkn, (8)

µk =

∑ni=1 rikwi

nk, (9)

Σk =1

nk

(n∑

i=1

rik(wi − µk)(wi − µk)T

). (10)

Finally, we want to use our model to infer the trajectoriesof the controlled agent, given observations from the observedagents. We need to find the posterior probability distributionover trajectories q1:T given the observations D, as in SectionIII-B.

In order to compute this posterior using our GMM prior,first we find the most probable cluster k∗ given the obser-vation D, using the Bayes’ theorem. The posterior over theclusters k given the observation D is given by

p(k|D) ∝ p(D|k)p(k), (11)

wherep(D|k) =

∫p(D|w)p(w|k)dw

andp(w|k) = p(w|µk,Σk).

Thus the most probable cluster k∗ given the observationD is

k∗ = argmaxk

p(k|D). (12)

The output of our algorithm is the posterior probabilitydistribution over trajectories q1:T , using cluster k∗ to modelour prior in the weight space,

p(q1:T |k∗) =∫p(q1:T |w)p(w|k∗)dw, (13)

wherep(w|k∗) = p(w|µk∗ ,Σk∗).

Algorithms 1 and 2 provide a compact description of themethods we propose for training and inference respectively.

The mixture model, given observation D, can be writtenas

p(w|D) =K∑

k=1

p(k|D)p(w|k,D), (14)

where p(w|k,D) is the conditioned mixture component k.In the case of mixture models, we have a combination oflinear models which is gated by p(k|D) and hence we canmodel non-linear correlations.

IV. EXPERIMENTS

This section presents experimental results in two differentscenarios using a 7-DOF KUKA lightweight arm with a 5-finger hand2.

The goal of the first scenario is to expose the issue ofthe original Interaction Primitives [1], [2] when dealing withtrajectories that have a clear multi-modal distribution. In thesecond scenario we propose a real application of our methodwhere the robot assistant acts as a third hand of a workerassembling a toolbox (please, refer to the accompanyingvideo3).

2Regarding the control of the robot, the design of a stochastic controllercapable of reproducing the distribution of trajectories is also part of ProMPsand the interested reader is referred to [16] for details. Here we use acompliant, human-safe standard inverse-dynamics based feedback controller.

3Also available from http://youtu.be/9XwqW_V0bDw

Algorithm 1 Training1) Parameterize demonstrated trajectories:Find vector of weights w for each trajectory, such thatqt ≈ ψT

t w.2) Find GMM in parameter space, using EM:Initialize GMM parameters α1:K , µ1:K and Σ1:K withk-means clustering.repeatE step

rik = p(k|wi) =N (wi|µk,Σk)αk∑Kl=1 αlN (wi|µl,Σl)

M step

nk =n∑

i=1

rik, αk =nkn

µk =

∑ni=1 rikwi

nk

Σk =1

nk

(n∑

i=1

rik(wi − µk)(wi − µk)T

)

until p(w|α1:K ,µ1:K ,Σ1:K) converges

Algorithm 2 Inference1) Find most probable cluster given observation:

p(k|D) ∝ p(D|k)p(k)k∗ = argmax

kp(k|D)

2) Condition on observation, using cluster k∗ as prior:

p(q1:T |k∗) =∫p(q1:T |w)p(w|k∗)dw

A. Non-Linear Correlations between the Human and theRobot on a Single Task

To expose the capability of our method of dealing withmulti-modal distributions, we propose a toy problem wherea human specifies a position on a table and the robot mustpoint at the same position. The robot is not provided anyform of exteroceptive sensors; the only way it is capable togenerate the appropriate pointing trajectory is by correlatingits movement with the trajectories of the human. As shownin Fig. 2, however, we placed a pole in front of the robotsuch that the robot can only achieve the position specified bythe human by moving either to the right or to the left of thepole. This scenario forces the robot to assume quite differentconfigurations, depending on which side of the pole its armis moving around.

During demonstrations the robot was moved by kinestheticteaching to point at the same positions indicated by thehuman (tracked by motion capture) without touching the

(a) (b)

Fig. 2. Experimental setup of a toy problem used to illustrate the propertiesof the Mixture of Interaction Primitives. The robot is driven by kinestheticteaching to point at the positions specified by the human (pointed withthe wand). Certain pointed positions can be achieved by either moving thearm to the right (a) or to left (b) of the pole placed on the table. Otherpositions, such as the one indicated by the arrow, can only be achieved byone interaction pattern.

ground truthprediction

(a) (b)

Fig. 3. Results of the predictions of the robot trajectories in Cartesianspace. Both subplots show the same ground truth trajectories generatedby driving the robot in kinesthetic teaching. The predictions are generatedby leave-one-out cross-validation on the whole data set comprised of 28demonstrations. (a) Prediction using the conventional Interaction ProMPswith a single Gaussian. (b) Prediction using the proposed method with amixture of Gaussians.

pole. For certain positions, as the one indicated by thearrow in Fig. 2(a), only one demonstration was possible. Forother positions, both right and left demonstrations could beprovided as shown in Fig. 2(a) and 2(b). The demonstrations,totaling 28 pairs of human-robot trajectories, resulted in amulti-modal distribution of right and left trajectory patternsmoving around the pole.

In this scenario, modeling the whole distribution overthe parameters of the trajectories with one single Gaussian(as in the original Interaction Primitive formulation) is notcapable of generalizing the movements of the robot toother positions in a way that resembles the training, as theoriginal framework is limited by assuming a single pattern.This limitation is clearly shown in Fig. 3(a) where severaltrajectories generated by a single cluster GMM (as in theoriginal Interaction Primitive) cross over the middle of thedemonstrated trajectories, which, in fact, represents the meanof the single Gaussian distribution.

Fig. 3(b) shows the predictions using the proposed methodwith a mixture of Gaussians. By modeling the distributionover the parameters of the trajectories using GMMs asdescribed in section III-C, a much better performance couldbe achieved. The GMM assumption that the parameters are

Number of clusters

RM

S e

rror

(m)

2 4 6 8 10 12 14 160

0.02

0.04

0.06

0.08

0.1

Fig. 4. Root Mean Square Error with models using up to 17 Gaussians.

only locally linear correlated seemed to represent the datamuch more accurately. As shown in Fig. 4, this improvementis quantified in terms of the Root Mean Square (RMS) Errorof the prediction of the trajectory in relation to the groundtruth using leave-one-out cross-validation over the whole dataset. The same figure also shows that there is a sharp decreasein the RMS error up to six clusters, especially when takinginto account the variance among the 28 tests. Beyond sevenclusters it is observed that the prediction error fluctuatesaround 4 cm. The experiments previously shown in Fig. 3(b)were done with eight clusters.

B. Assembling a Box with a Robot Assistant

In this experiment, we recorded a number of demon-strations of different interaction patterns between a humanand the robot cooperating to assemble a box. We used thesame robot described in the previous experiment. Duringdemonstrations the human wore a bracelet with markerswhose trajectories in Cartesian coordinates were recordedby motion capture. Similar to the first scenario, the robotwas moved in gravity compensation mode by another humanduring the training phase and the trajectories of the robot injoint space were recorded.

There are three interaction patterns. Each interaction pat-tern was demonstrated several times to reveal the variance ofthe movements. In one of them, the human extends his/herhand to receive a plate. The robot fetches a plate from astand and gives it to the human. In a second interaction thehuman fetches the screwdriver and the robot grasps and givesa screw to the human as a pre-emptive collaborator woulddo. The third type of interaction consists of giving/receivinga screwdriver. Each interaction of plate handover, screwhandover and holding the screwdriver was demonstrated 15,20, and 13 times, respectively. The pairs of trajectories ofeach interaction are shown in Fig. 54.

As described in section III, all training data is fed to thealgorithm resulting in 48 human-robot pairs of unlabeleddemonstrations as shown in the upper row of Fig. 7. Thepresented method parameterizes the trajectories and performs

4Due to the experimental setup, for the sub-tasks of plate and screwhandover we added an initial hand-coded trajectory that runs before thekinesthetic teaching effectively starts. These trajectories are used to makethe robot grasp and remove the plate or screw from their respective stands.This is reflected in the figure as the deterministic part at the beginning ofthe trajectory of the robot. This initial trajectory, however, has no effect onthe proposed method itself.

robot

human

(a) Handing over a plate (b) Handing over a screw (c) Holding the screw driver

robothuman

robot

human

Fig. 5. Demonstrations of the three different interactions and their respective trajectories. For the case of plate and screw handover the beginning of therobot trajectory shows a deterministic part that accounts for the fact that the robot has to remove objects from their respective stands, which is not part ofthe kinesthetic teaching, and which does not affect the algorithm in any sense.

Number of clusters

Join

t RM

S p

redi

ctio

n er

ror (

deg)

1 2 3 4 5 6 7 80

10

20

30

40

Fig. 6. Root-mean-square error of the joint trajectories (averaged over alltests) using a leave-one-out cross-validation as a function of the number ofclusters (mixture components). The plateau after three clusters seem to beconsistent with the training data since it consists of three distinct interactionpatterns.

clustering in the parameter space in order to encode themixture of primitives. In the figure, the human is representedby the (x, y, z) Cartesian coordinates while the robot isrepresented by the seven joints of the arm. Due to lackof space, the figure only shows up to the first four joints(starting from the base).

Figure 6 shows the RMS prediction error averaged overall tests as the number of mixture components increase. Theprediction is obtained by leave-one-out cross-validation overthe whole set of 48 demonstrations. As one would expect,since the unlabeled data contains three distinct interactionpatterns, the improvement is clearly visible up to threemixture components. No significant improvement is obtainedafterwards, thus the GMM with three mixture componentswas selected for experiments.

In the inference/execution phase, the algorithm first com-putes the most probable Interaction Primitive mixture com-

ponent based on the observation of the position of the wristof the human with (12). Using the same observation, wethen condition the most probable Interaction Primitive, whichallows computing a posterior distribution over trajectories forall seven joints of the robot arm as in (13). Finally, the meanof each joint posterior distribution is fed to a standard inversedynamics feedback tracking controller.

The lower row of Fig. 7 depicts the posterior distributionfor one test example where a three-cluster GMM was trainedwith the other 47 trajectories. The GMM prior is shown ingray where the patches of different clusters overlap. Theobservation consists only of the final position of the wrist,shown as asterisks in the figure. The black lines are theground truth trajectories of each degree of freedom. Theposterior, in red, is represented by its mean and by the regioninside ± two standard deviations. The mean of this posterioris the most probable trajectory for each degree of freedomgiven the observed end position of the wrist of the human.

We assembled the toolbox, consisting of seven parts and12 screws, two times. The experiments demanded more than40 executions of the Interaction Primitives. The selection ofthe right mixture component was 100% correct. (Please referto the accompanying video).

We evaluated the precision of the interactions by com-puting the final position of the hand of the robot withforward kinematics. The forward kinematics was fed withthe conditioned robot trajectories predicted by leave-one-out cross validation. The interactions of plate handoverand holding screwdriver resulted in mean error with twostandard deviations (mean error ±2σ) of 3.2 ± 2.6 cm and2.1±2.3 cm, respectively. We did not evaluate the precisionof the handover of the screw, as the position at which

Fig. 7. Upper row: Mixture components represented by their mean trajectories and the region inside two standard deviations (µ± 2σ). Obs.: The plotsshow only the part of the trajectories generated by kinesthetic teaching. Lower row: Posterior probability distribution given observation depicted by theblue asterisks.

Fig. 8. Handover of a plate. Conditioning on three different positions ofthe wrist (using motion capture) of a human coworker.

the robot hands the screw is not correlated to the human(please refer to the accompanying video). As an example,Fig. 8 shows the robot executing the plate handover at threedifferent positions based on the location of the wrist marker.Note that the postures of the arm are very different, althoughthey are all captured by the same Interaction Primitive.

V. CONCLUSIONS

In this paper we presented a Mixture of Interaction Prim-itives where Gaussian Mixture Models are used to modelmultiple interaction patterns from unlabeled data. The multi-modal prior probability distribution is obtained over param-eterized demonstration trajectories of two agents working incollaboration. During the execution, the algorithm selects themixture component with the highest probability given theobservation of the human, which is then conditioned to infer

the appropriate robot reaction. The proposed method is ableto learn and recognize multiple human-robot collaborationtasks from an arbitrary number of demonstrations consistingof unlabeled interaction patterns, what is not possible withthe current Interaction Primitive framework.

In the context of human-robot interaction we are currentlyaddressing the estimation of the phase of the execution ofthe primitive for switching tasks in real time. Also, weare addressing the use of the stochastic feedback controllerprovided by the original ProMP work in [16]. Although thiswork focused on human-robot trajectories, we are currentlyconsidering extensions of our work where the human isreplaced by other variables of interest. For example, thesame framework can be used to correlate joint and end-effector trajectories of the same robot to learn nonlinearforward/inverse kinematic models. Similarly the Mixture ofInteraction Primitives can be used to correlate the interactionbetween motor commands and joint trajectories to learninverse dynamic models.

VI. ACKNOWLEDGMENTS

The research leading to these results has received fundingfrom the European Community’s Seventh Framework Pro-grammes (FP7-ICT-2013-10) under grant agreement 610878(3rdHand) and (FP7-ICT-2009-6) under grant agreement270327 (ComPLACS).

REFERENCES

[1] H. Ben Amor, G. Neumann, S. Kamthe, O. Kroemer, and J. Peters,“Interaction primitives for human-robot cooperation tasks,” in Pro-ceedings of 2014 IEEE International Conference on Robotics andAutomation (ICRA), 2014.

[2] G. Maeda, M. Ewerton, R. Lioutikov, H. Ben Amor, J. Peters, andG. Neumann, “Learning interaction for collaborative tasks with prob-abilistic movement primitives,” in Proceedings of the InternationalConference on Humanoid Robots (HUMANOIDS), 2014.

[3] L. Rozo, S. Calinon, D. G. Caldwell, P. Jimenez, and C. Torras,“Learning collaborative impedance-based robot behaviors,” in AAAIConference on Artificial Intelligence, Bellevue, Washington, USA,2013.

[4] M. Lawitzky, J. Medina, D. Lee, and S. Hirche, “Feedback motionplanning and learning from demonstration in physical robotic assis-tance: differences and synergies,” in Intelligent Robots and Systems(IROS), 2012 IEEE/RSJ International Conference on, Oct 2012, pp.3646–3652.

[5] T. Kulvicius, M. Biehl, M. J. Aein, M. Tamosiunaite, and F. Wörgöt-ter, “Interaction learning for dynamic movement primitives used incooperative robotic tasks,” Robotics and Autonomous Systems, vol. 61,no. 12, pp. 1450–1459, 2013.

[6] M. Brand, N. Oliver, and A. Pentland, “Coupled hidden markovmodels for complex action recognition,” in Proceedings of the 1997Conference on Computer Vision and Pattern Recognition (CVPR ’97),ser. CVPR ’97. Washington, DC, USA: IEEE Computer Society,1997, pp. 994–.

[7] N. Oliver, B. Rosario, and A. Pentland, “A bayesian computer visionsystem for modeling human interactions,” Pattern Analysis and Ma-chine Intelligence, IEEE Transactions on, vol. 22, no. 8, pp. 831–843,Aug 2000.

[8] K. Hawkins, N. Vo, S. Bansal, and A. F. Bobic, “Probabilistic humanaction prediction and wait-sensitive planning for responsive human-robot collaboration,” in Proceedings of the International Conferenceon Humanoid Robots (HUMANOIDS), 2013.

[9] Y. Tanaka, J. Kinugawa, Y. Sugahara, and K. Kosuge, “Motionplanning with worker’s trajectory prediction for assembly task partnerrobot,” in Proceedings of the 2012 IEEE/RSJ International Conferenceon Intelligent Robots and Systems (IROS). IEEE, 2012, pp. 1525–1532.

[10] Z. Wang, K. Mülling, M. P. Deisenroth, H. Ben Amor, D. Vogt,B. Schölkopf, and J. Peters, “Probabilistic movement modeling forintention inference in human–robot interaction,” The InternationalJournal of Robotics Research, vol. 32, no. 7, pp. 841–858, 2013.

[11] H. S. Koppula and A. Saxena, “Anticipating human activities usingobject affordances for reactive robotic response.” in Robotics: Scienceand Systems, 2013.

[12] D. Lee, C. Ott, Y. Nakamura, and G. Hirzinger, “Physical human robotinteraction in imitation learning,” in Robotics and Automation (ICRA),2011 IEEE International Conference on. IEEE, 2011, pp. 3439–3440.

[13] H. Ben Amor, D. Vogt, M. Ewerton, E. Berger, B. Jung, and J. Peters,“Learning responsive robot behavior by imitation,” in Proceedings ofthe 2013 IEEE/RSJ International Conference on Intelligent Robots andSystems (IROS), 2013, pp. 3257–3264.

[14] B. Llorens-Bonilla and H. H. Asada, “A robot on the shoulder:Coordinated human-wearable robot control using coloured petri netsand partial least squares predictions,” in Proceedings of the 2014 IEEEInternational Conference on Robotics and Automation, 2014.

[15] A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal,“Dynamical movement primitives: learning attractor models for motorbehaviors,” Neural computation, vol. 25, no. 2, pp. 328–373, 2013.

[16] A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Probabilisticmovement primitives,” in Advances in Neural Information ProcessingSystems (NIPS), 2013, pp. 2616–2624.

[17] C. M. Bishop et al., Pattern recognition and machine learning.springer New York, 2006, vol. 1.

Generalizing Pouring Actions Between Objects using Warped Parameters

Sascha Brandl, Oliver Kroemer, and Jan Peters

Abstract— One of the key challenges for learning manipula-tion skills is generalizing between different objects. The robotshould adapt both its actions and the task constraints to thegeometry of the object being manipulated. In this paper, wepropose computing geometric parameters of novel objects bywarping known objects to match their shape. We refer to theparameters computed in this manner as warped parameters,as they are defined as functions of the warped object’s pointcloud. The warped parameters form the basis of the featuresfor the motor skill learning process, and they are used togeneralize between different objects. The proposed method wassuccessfully evaluated on a pouring task both in simulation andon a real robot.

I. INTRODUCTION

In order to perform tasks in everyday environments, robotswill need to be capable of manipulating a wide range ofdifferent objects. As objects of the same type may havedifferent shapes and sizes, the robot will have to adaptits actions to the geometry of the specific object that it ismanipulating. The shape of objects is particularly importantwhen manipulating liquids, e.g., pouring a glass of water,as liquids conform to the shape of their container. The robotmust therefore take into consideration a container’s geometrywhen using it in a pouring task.

Although containers come in a wide variety of shapesand sizes, the important differences can usually be definedby a few geometric parameters [1], [2]. For example, thevolume of a container indicates how much fluid it can hold,regardless of whether it has a spherical, or cylindrical shape.A robot can generalize pouring actions between differentcontainers by using these geometric parameters. However,the robot will not be provided with the geometric parametersfor most of the novel objects that it encounters. While ahuman may annotate the geometric information for a coupleof objects, the robot will usually need to compute theseparameters on its own.

In this paper, we investigate using warped parameters togeneralize pouring skills between different objects. A warpedparameter is defined as a function on the points of a knownobject’s point cloud. For example, a warped parameter maycompute the volume of a set of points’ convex hull. Whenthe robot encounters a novel object, it warps the point cloudof the known object to the new object’s shape. As a result ofthe warping, the value of the warped parameter changes tomatch the geometry of the new object. Once the geometricparameters have been computed, the robot can use them

All authors are members of the IAS group at the TU Darmstadt, Germany.Jan Peters is also a member of the MPI for Intelligent Systems, Germany.{brandl, kroemer, peters}@ias.tu-darmstadt.de

Fig. 1. The robot performs a pouring task with two previously unknownobjects. The pouring action was learned from human demonstrations usinga taller cup and a wider container to pour into.

to generalize actions and task constraints between differentobjects.

In Section II, we explain the process of computing thewarped parameters. In Section III, we describe how the robotcan learn pouring actions and task constraints that generalizebetween objects using the warped parameters. The proposedmethod was successfully evaluated both in simulation andon the robot shown in Fig. 1. The results of the experimentsare detailed in Section IV.

Related Work

Several previous works have used warping to generalizemanipulations between objects. Hillenbrand et al. [3], [4]used warping to map contact points onto novel objects, inorder to transfer grasps between objects. A similar approachwas used by Rainer et. al [5], [6] for transferring coordinateframes of task constraints between objects. However, thesize and shape of the constraint regions were not adaptedto the new object’s geometry. Rather than warping only thepoints on the object, Schulman et al. [7] computed warpingfunctions for the entire scene. The warping was then appliedto the demonstrated trajectory of the source scene in order toobtain a trajectory for the current scene. These approachesfocus on mapping specific points from the source scene tothe target scene, and are therefore especially well-suited forcontact-based manipulations. Warped parameters can be used

to model more general features of the objects, such as areasand volumes.

Several methods have also been proposed for learningto perform pouring tasks. Pastor et al. [8] learned dy-namic motor primitives (DMPs) for pouring from humandemonstrations, and used these to generalize to different cupplacements. Similarly, Muehlig et al. [9] encoded demon-strated bimanual pouring trajectories using Gaussian mixturemodels. Rozo et al. [10] proposed learning a controller forpouring tasks based on the observed forces. The work onlearning pouring from demonstration has mainly focused onlearning with the same set of objects. In comparison, wepropose learning in a feature space defined by the warpedparameters, in order to automatically generalize betweenobjects.

Some work has also been done on generalizing pour-ing actions between different objects using reinforcementlearning. Kroemer et al. [11] learned a pouring DMP fromhuman demonstrations, and then used a trial-and-error ap-proach to learn the location of a novel container’s opening.The opening was detected using a shape-similarity kernel.Tamosiunaite et al. [12] used reinforcement learning to learnthe shape of the pouring DMP, as well as the goal point.Reinforcement learning was also used to adapt the learnedmotion to novel objects, without explicitly considering thedifferences in geometry.

II. GENERALIZATION WITH WARPED PARAMETERS

In this section, we describe how a robot can computegeometric parameters of an object by warping a knownobject to match its shape. The object models and the warpingprocess used in this paper are described in Sections II-A toII-C. The computation of the warped parameters for pouringtasks is described in Section II-D.

A. Geometric Object Models

In order to generalize manipulations to a novel object,the robot first computes correspondences between a knownsource object Os and the unknown target object Ot. Anobject Oi is modeled as a set of ci points located at positionspij 2 R3 with corresponding normals nij 2 R3, wherej 2 {1, ..., ci}.

Objects often consist of multiple parts, and a manipulationmay only depend on the shape of a part of an object. Hence,geometric parameters often describe the shape of a part ratherthan the whole object. We therefore also assign each pointpij a vector lij of length ⇢ with binary labels, which indicatewhich of the ⇢ object parts the point corresponds to. Thelabels of the target object Ot are initially unknown, but canbe computed using the warping process.

An example of an annotated cup can be seen in Fig. 2. Thefirst part is the CONTAINER, which holds the liquids. Thesecond part is the RIM around the opening. We also labelthe HANDLE as a dummy part. As not all containers havehandles, it is not used to define any warped parameters forthe pouring task, and is only included to help align objectsduring the warping process.

ANNOTATED SOURCE OBJECT

−0.02 0 0.02 0.04 0.06

−0.08

−0.07

−0.06

−0.05

−0.04

−0.03

−0.02

−0.01

Handle

Container

Rim

INTIAL PART LABELING

−0.02 0 0.02 0.04 0.06

−0.09

−0.08

−0.07

−0.06

−0.05

−0.04

−0.03

−0.02

−0.01

0

−0.04 −0.02 0 0.02 0.04 0.06 0.08

−0.08

−0.07

−0.06

−0.05

−0.04

−0.03

−0.02

−0.01

0

WARPED PART LABELING

−0.02 0 0.02 0.04 0.06

−0.09

−0.08

−0.07

−0.06

−0.05

−0.04

−0.03

−0.02

−0.01

0

−0.04 −0.02 0 0.02 0.04 0.06 0.08

−0.08

−0.07

−0.06

−0.05

−0.04

−0.03

−0.02

−0.01

0

Fig. 2. The top row shows the point cloud of the source object, annotatedby a human user. The middle row shows the point clouds of two targetobjects. The points were labelled using a classifier based on local features.This intial estimate is only used to compute a coarse alignment with thesource object. The point clouds were pre-aligned for this figure to show moreclearly how the labels change during the warping process. The bottom rowshows the final results of the label mapping approach.

B. Warping

Given a source object and a target object, the robot cancompute correspondences between the two objects. Thesecorrespondences are determined by warping the shape ofthe source object onto that of the target object. There arevarious methods for computing 3D warpings between object[13], [14], and the proposed approach does not depend ona specific warping algorithm. We therefore employ a basicwarping algorithm for finding correspondences between thecontainers. The warping process consists of two stages: 1)object alignment, and 2) point mapping

In the first stage, the source object is coarsely alignedwith the target object, such that their corresponding objectparts are close together. This alignment is accomplishedby computing a coordinate system based on the objects’parts. The origin of the coordinate frame is the mean of thecontainer points. The first axis is given by the direction to themean of the rim points, and the second axis is the orthogonaldirection to the mean of the handle points. The third axis iscomputed by the cross product of the first two axes. As thepart labels of the target object lt are unknown, an initialestimate of the labels is computed using logistic regression.One classifier is trained for each of the three object parts.Each point pti is classified based on the local distribution

of points in its neighborhood. The features used to describethe local distribution of points include the eigenvalues of thecovariance matrix, and the distance from the point pti to themean of the neighborhood points. The classifiers were trainedon the labelled points of the source object. An example ofthe initial labeling can be seen in Fig. 2. The coordinateframe of the object is estimated using this initial labeling ofpoints. Once the two objects are aligned, the source objectwas scaled in each direction such that the variances of itscontainer part matched those of the target object. We denotethe aligned source objects and target objects by Os and Ot

respectively.In the second stage of the warping algorithm, the points

from the source object Os are mapped onto the targetobject Ot. This step is similar to the approach proposed byHillenbrand [15]. Each point of the aligned source object ismapped to the mean of the k nearest neighbors in the alignedtarget object. In our experiments, we set k = 1. Hence, thewarped source point pwi, with corresponding normal nwi

and labels lwi, is given by

pwi = ptj , nwi = ntj , and lwi = lsi,

s.t. j = arg min��psi � ptj

�� and nTsintj > 0.

Thus, each source point is mapped to the closest target pointwith a normal pointing in the same direction. The warpedobject and its point cloud are denoted by Ow.

C. Point Mapping vs. Label Mapping

The warping process defines a new position and normalfor each of the cs point of the source object Os. The locationof these new points can be used to define warped parameters,as detailed in the next section. We refer to this approach aspoint mapping, as the points of the source object are mappedonto the target object.

However, if the source object has considerably fewerpoints than the target object, then some details of the targetobject may not be captured by the warped object. This issuecan be addressed by warping the target object to match thesource object. The alignment and scaling of the objects isperformed as before. However, the points of the target objectare mapped onto the source object. The label of each of thetarget points is then determined using a k-nearest neighborsclassifier. In our experiments, we again used k = 1, suchthat

pwi = pti , nwi = nti , and lwi = lsj ,

s.t. j = arg min��psi � ptj

�� and nTsintj > 0.

We refer to this approach as label mapping, as the labels ofthe source object are mapped onto the target object. Whenusing multiple neighbors k > 1, the point is assigned to apart if the majority of its k neighbors belong to that part.

The benefit of using the label mapping approach is thatit guarantees that all of the points of the target object areused for computing the warped parameters. However, whenusing label mapping, points can only be referred to by theirlabel and not as individual points. In comparison, when using

point mapping, one can refer to individual points, e.g., pw72,which correspond to specific points on the source object.The bottom row of Fig. 2 shows an example of using labelmapping.

D. Warped Parameters

Having computed the correspondences between the knownsource object and the novel target object, the robot cancompute the warped parameters for the target object. Awarped parameter is defined as a function on the warpedpoint cloud f(Ow). Warped parameters can be used to definegeometric reference parameters, such as lengths, areas, andvolumes, of an object’s part. Warped parameters can also beused to define task frames.

For pouring, the task frame is defined by the lip point ofthe first container, and the center of the second container’sopening. The center of the opening is defined as the meanof the rim points. The lip point is defined as the rim pointthat is the closest to the other container. A pouring motionis defined by the trajectory of the held container’s lip pointrelative to the center of the second container’s opening. Thetrajectory includes the relative 3D position and the tilt ofthe first container about its lip point. The other two rotationdimensions are usually assumed to be zero. If there is nosecond container, the lowest rim point is defined as the lippoint.

The geometric reference parameters for pouring includethe radius of the opening, the volume of the container, theheight of the container, and a reference angle for tilting thecup. The radius of the opening is given by the mean distancebetween the rim points and the center of the opening. Thevolume of the container is given by the volume of thecontainer points’ convex hull. The height of the containeris given by the range of all of the points along the firstdimension. A tilt reference angle is defined by the amountthat the cup must be rotated about the lip point, suchthat half of the container’s volume is above the lip point.As the warping process reshapes the points of the sourceobject, the estimates of the reference parameters will changeaccordingly. In this manner, the warped parameter functiondefines how the parameter’s value is grounded in the object’sgeometry.

As the above examples show, warped parameters can beused to define various object properties, and can even buildon each other. These parameters can then be automaticallycomputed for new objects using the warping process.

III. LEARNING WITH WARPED PARAMETERS

In this section, we describe how a robot can learn pouringactions and task constraints that generalize to new objectsusing the warped parameters.

A. Learning Task Constraints

When performing a pouring task, the liquid should remainin the cup while it is being transported, and it should only bepoured out if it will be transferred to another container. Thesetask constraints correspond to phase transitions [16] and can

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive

Tru

e P

osi

tive

Pouring Experiment

Standard Feats. Same ObjectLabel Mapping Same ObjectPoint Mapping Same ObjectStandard Feats. Different ObjectsLabel Mapping Different Objects Point Mapping Different Objects

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive

Tru

e P

osi

tive

Filling Experiment

Standard Feats. Same ObjectLabel Mapping Same ObjectPoint Mapping Same ObjectStandard Feats. Different Objects Label Mapping Different ObjectsPoint Mapping Different Objects

Fig. 3. The figure shows the ROC curves for the learned classifiers for both the pouring experiment and the filling experiment. The dashed lines indicatethe performance when the classifier is applied to data from the same object that was used for training the classifier. The solid lines indicate the performancewhen the classifiers are applied to novel objects, for which they had no training data. A classifier is generally considered to perform better if it gets closerto the top left corner. Classifiers were trained using features based on the warped parameters computed using both the label mappings and point mappingsapproaches. The standard features approach did not use the reference values given by the warped parameters.

be fulfilled by learning to predict when the held containerwill start to pour and when the poured liquid will fill thesecond container. The conditions for pouring and fillingare learned by training a classifier for each condition. Theclassification is performed using logistic regression, which isa form of probabilistic classifier. The probability of pouringyp = 1 from the first container is given by

p(yp = 1|xu) = (1 + exp(�!T'(x)))�1

where '(x) is a vector of features describing the state ofthe container x, and the weight vector ! is computed fromtraining data using iterative reweighted least squares. Thefeatures '(x) are of the form ↵/↵r, where ↵ is a variableand ↵r is a reference value defined by a warped parameter.For predicting pouring, the features include the tilt angleof the cup divided by the tilt reference angle, and the fluidvolume divided by the volume of the container. The resultingfeatures are dimensionless quantities that automatically adaptto the geometry of the container.

For predicting when the poured liquid increases the fluidvolume in the second container yf = 1, we expand the set offeatures to include both objects and their relative positions.The vertical distance between the containers is divided bythe height of the first container. The horizontal distancesbetween the containers are divided by the radius of thesecond container. These features allow the robot to learnwhen the poured liquid will miss the second container, aswell as predict when the container will overflow.

B. Learning Motor Primitives in Warped Spaces

The proposed warping approach can also be used to learnmotor primitives that adapt to the shape of the objectsbeing manipulated. Motor primitives are often used to definedesired trajectories that can be easily adapted to differentsituations. In order to model distributions of trajectories, we

use the probabilistic motor primitives (ProMPs) [17]. Thesemotor primitives encode correlations between the differentdimensions of the trajectory, and can be conditioned on theinitial state of the objects.

The learned motor primitive defines a desired trajectoryin the task space described in Section II-D. Similar to thefeatures used to generalize task constraints, the trajectoriesare defined as dimensionless quantities. The vertical distancebetween the objects is divided by the height of the heldcontainer, and the tilt angle is divided by the reference tiltangle. The horizontal distances are divided by the radius ofthe second container.

The motor primitives are learned by scaling the demon-strated trajectories according to the warped parameters ofthe objects used in the demonstrations. In order to executea pouring action, the robot samples a trajectory from theProMP, and rescales it according to the current objects’warped parameters.

IV. EXPERIMENTS

The proposed method was implemented and evaluatedboth in simulation and on a real robot. The robot, shownin Fig. 1, consists of two Kuka light weight robot arms,each equipped with a five-fingered DLR hand [18]. The robotobserves the table-top scene from above using a MicrosoftKinect camera. Ten different cups and bowls were scannedfrom multiple views. 3D mesh models were generated usingan implicit surface representation and marching cubes [19].

A. Simulated Pouring and Filling Experiments

In the first experiment, we evaluated how well taskconstraints generalize between objects when using warpedparameters. The objects were simulated using the Bulletphysics engine [20] together with Fluids 2 for incorporatingsmoothed particle hydrodynamics [21].

0 2 4 6 8 10

0

1

2

Tilt

0 2 4 6 8 10

0

2

4

X

0 2 4 6 8 10

0

2

4

Y

0 2 4 6 8 10

0

1

2

Z

Time

Fig. 4. The plots show the distribution over trajectories learned by theProMPs in the generalized space. The blue line indicates the mean trajectory,and the shaded regions correspond to +/- two standard deviations. The blackhorizontal lines indicate when the value is one. The tilt is one when the cupis tilted such that half of the container’s volume is above the lip points.The X and Y values are one when the lip point is one radius away fromthe second container’s center. The Z value is one when the vertical distancebetween the cup and the container is the same as the height of the cup. Thered region indicates when the X-Y position of the cup’s lip point is withinone radius of the container’s center.

Each object was filled 1000 times with a random amountof liquid, and tilted by a random angle around the lip point.If the volume of the fluid in the cup decreased, the trial waslabelled as pouring yp = 1. Otherwise it was labelled as notpouring yp = 0. The classifiers were trained on sets of 50samples. The classifiers were tested on two test sets: the 950other samples from the same object, and the 9000 samplesfrom the other objects. The latter dataset is used to test howwell the classifiers generalize between different objects.

A similar procedure was used for the filling experiment.However, the cup used for pouring always contained 10particles at the start of the trial, and the second containerwas filled by a random amount. The cup was always tilted by120�. The relative positions of the cups were varied betweentrials. A trial was considered as successful yf = 1 iff noneof the particles ended up outside of the second container.

For each training set, three classifiers were computed.The first two classifiers were trained using the warpedparameters from the point mapping and the label mappingapproaches respectively. The features used for training theclassifiers were described in Section III-A. As a benchmark,we also evaluated the classifiers without using the warpedparameters. In this case, all of the reference values ↵r wereset to one, regardless of the objects being manipulated, andthe relative positions of the objects were defined by theircenters.

The results of the pouring and filling experiments can beseen in Fig. 3. As one would expect, the classifiers generallyachieved similar levels of performance when evaluated on the

training object. The standard features performed considerablyworse in the filling experiment, as different cups were usedfor pouring even though the second container remained thesame. The ROC curves show that the performance of allthree classifiers decreases when generalizing to novel objects.However, the drop in performance is considerably less whenusing the warped parameters. The features based on thewarped parameters are therefore better at separating the posi-tive and negative examples across different objects. While thetwo warping methods performed similarly well on the fillingexperiment, the label mapping approach performed better inthe pouring experiment, detecting more than 50% of the truepositives with almost no false positives. The results show thatthe warping parameters can be used to reliably generalize theconstraints of the pouring task between different containers.

B. Robot Pouring Experiment

In the second experiment, the robot used warped parame-ters to generalize pouring actions between different objects.The robot was provided with ten demonstrations of a pouringtask using kinaesthetic teaching. All of the demonstrationswere performed with the same two objects shown in the leftpicture of Fig. 5. For safety reasons, the task was performedwith gel balls rather than an actual liquid. The cup was halffull at the start of each trial. Using the ten demonstrations, therobot learned a ProMP for pouring, as described in SectionIII-B. The learned distribution over trajectories is shownin Fig. 4. The robot was then given the task of pouringwith different objects. The robot successfully learned to pourfrom a shorter cup into a bigger bowl, a smaller cup, anda square bowl, as shown in Fig. 5. Only a couple of gelballs were spilled during the experiments. A video of therobot performing the pouring task is also provided in thesupplementary material.

As the cups were half-full, pouring usually commencedwhen the tilt value went above one. Fig. 4 shows that thedistribution over trajectories remains safely below this valueuntil the lip point is above the opening. When moving thecup back, most of the liquid has been poured out, and hencethe cup can be tilted more. The pictures in Fig. 5 show thatthe cup was often placed close to the rim of the secondcontainer, which indicates that the robot was able to adaptthe learned trajectory to the geometry of the object beingmanipulated.

C. Future Work

The autonomy of the robot can be increased by learningwarped parameters. The points could be labelled by using anunsupervised approach to segmenting the objects into parts.A set of generic geometric functions could then be applied toeach part in order to generate a library of warped parameters.Feature selection methods could then be applied to select asuitable set of warped parameters for the given task.

The focus in this paper was on learning pouring skillsfrom a single object. The generalization between objectstherefore relies on using the warped parameters to constructdimensionless features for the robot. However, given data

DEMONSTRATION AUTONOMOUS GENERALIZATION

Pouring Task Larger Container Smaller Container Square ContainerFig. 5. The pictures show the key results of the real robot experiment. The robot was provided with multiple demonstrations of the pouring task usingkinaesthetic teaching, as shown on the left. Using the warped parameters approach, the robot successfully generalized the demonstrated actions to novelobjects with different shapes and sizes, as shown on the right.

from multiple objects, the robot can also learn how to gen-eralize between objects. In this case, the warped parameterscould be treated as separate features that describe the object.For example, ProMPs can be used to learn the correlationsbetween the trajectory parameters and the warped objectparameters. Object-specific trajectories can be obtained byconditioning on the current object parameters. This approachwould even allow the robot to learn that only some segmentsof the trajectory depend on the object parameters. However,learning ProMPs in this manner would require additionaltraining trajectories with different objects. These trajectoriescould be obtained from human demonstrations, or by adapt-ing trajectories using reinforcement learning [12].

V. CONCLUSION

We proposed using warped parameters to generalize pour-ing skills between different objects. Warped parameters arefunctions defined on the point cloud of a known object. Theparameter can be computed for a novel object by warpingthe known object’s point cloud to match the geometry of thenovel object.

The proposed method was successfully evaluated both insimulation and on a real robot pouring task. The experimentsshowed that the warped parameters can be used to generalizetask constraints and motor primitives between containers ofdifferent shapes and sizes

ACKNOWLEDGEMENTS

The research leading to these results has received fundingfrom the European Community’s Seventh Framework Pro-gramme (FP7-ICT-2013-10) under grant agreements 248273(GeRT) and 610878 (3rdHand). The authors would also liketo thank Ullrich Hillenbrand for the insightful discussions.

REFERENCES

[1] G. Bartels, I. Kresse, and M. Beetz, “Constraint-based movementrepresentation grounded in geometric features,” in International Con-ference on Humanoid Robots, (Atlanta, Georgia, USA), 2013.

[2] M. Tenorth, S. Profanter, F. Balint-Benczedi, and M. Beetz, “Decom-posing cad models of objects of daily use and reasoning about theirfunctional parts,” in International Conference on Intelligent Robotsand Systems, pp. 5943–5949, 2013.

[3] U. Hillenbrand and M. Roa, “Transferring functional grasps throughcontact warping and local replanning,” in International Conference onIntelligent Robots and Systems, pp. 2963–2970, Oct 2012.

[4] H. Ben Amor, O. Kroemer, U. Hillenbrand, G. Neumann, and J. Peters,“Generalization of human grasping for multi-fingered robot hands,” inInternational Conference on Intelligent Robots and Systems, 2012.

[5] R. Jäkel, Learning of Generalized Manipulation Strategies in ServiceRobotics. PhD thesis, Institut für Anthropomatik, Karlsruhe, 2013.

[6] R. Jäkel, S. Schmidt-Rohr, S. Rühl, A. Kasper, Z. Xue, and R. Dill-mann, “Learning of planning models for dexterous manipulation basedon human demonstrations,” International Journal of Social Robotics,vol. 4, no. 4, pp. 437–448, 2012.

[7] J. Schulman, J. Ho, C. Lee, and P. Abbeel, “Learning from demon-strations through the use of non-rigid registration,” in InternationalSymposium on Robotics Research, 2013.

[8] P. Pastor, H. Hoffmann, T. Asfour, and S. Schaal, “Learning andgeneralization of motor skills by learning from demonstration,” inInternational Conference on Robotics and Automation, 2009.

[9] M. Muehlig, M. Gienger, S. Hellbach, J. J. Steil, and C. Goerick,“Task-level imitation learning using variance-based movement opti-mization,” in International Conference on Robotics and Automation,(Kobe, Japan), pp. 1177–1184, IEEE, 05/2009 2009.

[10] L. Rozo, P. Jimenez, and C. Torras, “Force-based robot learning ofpouring skills using parametric hidden markov models,” in RobotMotion and Control, pp. 227–232, July 2013.

[11] O. Kroemer, E. Ugur, E. Oztop, and J. Peters, “A kernel-basedapproach to direct action perception,” in International Conference onRobotics and Automation, 2012.

[12] M. Tamosiunaite, B. Nemec, A. Ude, and F. Wörgötter, “Learning topour with a robot arm combining goal and shape learning for dynamicmovement primitives,” Robotics and Autonomous Systems, 2011.

[13] A. W. F. Lee, D. Dobkin, W. Sweldens, and P. Schröder, “Multireso-lution mesh morphing,” in Annual Conference on Computer Graphicsand Interactive Techniques, (New York, NY, USA), pp. 343–350, 1999.

[14] F. Steinke, B. Schölkopf, and V. Blanz, “Learning dense 3d corre-spondence.,” in Advances in Neural Information Processing Systems,pp. 1313–1320, MIT Press, 2006.

[15] U. Hillenbrand, “Non-parametric 3d shape warping,” in InternationalConference on Pattern Recognition, pp. 2656–2659, IEEE, 2010.

[16] O. Kroemer, H. van Hoof, G. Neumann, and J. Peters, “Learning topredict phases of manipulation tasks as hidden states,” in InternationalConference on Robotics and Automation, 2014.

[17] A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Probabilisticmovement primitives,” in Advances in Neural Information ProcessingSystems, 2013.

[18] Z. Chen, N. Y. Lii, T. Wimboeck, S. Fan, M. Jin, C. Borst, and H. Liu,“Experimental study on impedance control for the five-finger dexterousrobot hand dlr-hit ii.,” in International Conference on IntelligentRobots and Systems, pp. 5867–5874, IEEE, 2010.

[19] S. Dragiev, M. Toussaint, and M. Gienger, “Gaussian process implicitsurfaces for shape estimation and grasping,” in International Confer-ence on Robotics and Automation, 2011.

[20] “Bullet physics library,” October 2013.[21] “Bullet-fluids project,” October 2013.

D4.1: Scienti c report on Modeling and Prediction of Human ... · D4.1: Modeling and Prediction of...

Documents

Transcript of D4.1: Scienti c report on Modeling and Prediction of Human ... · D4.1: Modeling and Prediction of...