Conditional Neural Movement Primitives -...

Conditional Neural Movement PrimitivesM. Yunus Seker∗, Mert Imre∗, Justus Piater† and Emre Ugur∗

∗Computer Engineering Department Bogazici University, Istanbul, TurkeyEmail: {yunus.seker1, mert.imre, emre.ugur}@boun.edu.tr

†Department of Computer Science, Universitat Innsbruck, AustriaEmail: [email protected]

Abstract—Conditional Neural Movement Primitives (CNMPs)is a learning from demonstration framework that is designed asa robotic movement learning and generation system built on topof a recent deep neural architecture, namely Conditional NeuralProcesses (CNPs). Based on CNPs, CNMPs extract the priorknowledge directly from the training data by sampling observa-tions from it, and uses it to predict a conditional distributionover any other target points. CNMPs specifically learns complextemporal multi-modal sensorimotor relations in connection withexternal parameters and goals; produces movement trajectoriesin joint or task space; and executes these trajectories through ahigh-level feedback control loop. Conditioned with an externalgoal that is encoded in the sensorimotor space of the robot,predicted sensorimotor trajectory that is expected to be observedduring the successful execution of the task is generated by theCNMP, and the corresponding motor commands are executed.In order to detect and react to unexpected events duringaction execution, CNMP is further conditioned with the actualsensor readings in each time-step. Through simulations and realrobot experiments, we showed that CNMPs can learn the non-linear relations between low-dimensional parameter spaces andcomplex movement trajectories from few demonstrations; andthey can also model the associations between high-dimensionalsensorimotor spaces and complex motions using large number ofdemonstrations. The experiments further showed that even thetask parameters were not explicitly provided to the system, therobot could learn their influence by associating the learned sen-sorimotor representations with the movement trajectories. Therobot, for example, learned the influence of object weights andshapes through exploiting its sensorimotor space that includesproprioception and force measurements; and be able to changethe movement trajectory on the fly when one of these factorswere changed through external intervention.

I. INTRODUCTION

Acquiring an advanced robotic skill set requires a robotto learn complex temporal multi-modal sensorimotor relationsin connection with external parameters and goals. Learningfrom demonstration (LfD) framework [1] has been proposedin robotics as an efficient and intuitive way to teach such skillsto the robots, in which the robot observes, learns and repro-duces the observed demonstrations. During teaching the skills,how the task is influenced by different factors is generallyobvious to humans, yet this knowledge is mostly hidden inthe experienced sensorimotor data of the robot and difficult toextract autonomously. Development of feature extraction andlearning methods that are sufficiently general and flexible fora broad range of robotic tasks still stands as an importantchallenge in robotics. In order to deal with large varietyof tasks that are influenced by factors defined in different

levels of abstractions, we require a single framework that canlearn feature-movement and raw-data-movement associationsfrom small and large number of demonstrations, respectively,automatically filtering out the irrelevant information.

Another challenge in LfD is to deal with multiple trajec-tories: Multiple demonstrations might be required to teach askill either because different action trajectories are required indifferent situations or simply because there are multiple meansto achieve the same task even in the same settings. The robot,in return, needs to capture the important characteristics in theseobservations coming from several demonstrations, and be ableto reproduce the learned skill in new configurations reacting tounexpected events, such as external perturbations, on the fly.For this, the robot needs to develop an understanding on if andhow the sensorimotor data and externally provided parametersare related to each other and to the motor commands.

While one or more of the above properties were addressedby the existing movement frameworks [14, 15, 6, 2, 16, 4, 13],none of these approaches can handle all these requirements inone framework. In this paper, we propose Conditional NeuralMovement Primitives (CNMP), a robotic framework that isbuilt on top a recent neural network architecture, namelyconditional neural processes (CNP), to encode movementprimitives with the identified functionalities. Given multipledemonstrations, CNMP encodes the statistical regularities bylearning about the distributions of the observations fromreasonably few input data. Conditioning mechanism is usedto predict a sensorimotor trajectory given external goals andthe current sensorimotor information. Conditioning can beapplied over the learned sensorimotor distribution using anyset of variables such as joint positions, visual features or hapticfeedback at any time point. For example for a learned graspskill, the system might be queried to predict and generate handand finger motion trajectory with a conditioning on the colorof the object, weight measured in the wrist joint, and targetaperture width for the finger. Given low- or high-dimensionalinput, CNMP can utilize simple MLPs or convolution opera-tion, respectively, to encode the correlations. Such networksallow the system to automatically extract the features thatare relevant to the task. Finally, the predicted trajectory isrealized by generating the corresponding actuator commands.CNMP can also learn multiple modes of operations frommultiple demonstrations of a movement primitive. Importantly,CNMP specifically produces movement trajectories in joint ortask space, and generates the corresponding motor commands

to achieve these trajectories through a high-level feedbackcontrol loop. The encoded sensorimotor representations arecontinuously conditioned through the feedback coming fromthe sensory readings, enabling the system to ensure matchbetween predicted and actual trajectories and to respond tounexpected events in its sensorimotor space.

II. RELATED WORK

LfD [1] has been applied to various robotic learning prob-lems including object grasping and manipulation [6, 2, 16,4, 13]. Among others, learning methods that are based ondynamic systems [19] and statistical modeling [5] have beenpopular in the recent years. Dynamic Movement Primitives(DMPs)[19] encode the demonstrated trajectory as a set ofdifferential equations, and offers advantages such as one-shot learning of non-linear movements, real-time stability androbustness under perturbations with guarantees in reachingthe goal state, generalization of the movement for differentgoals, and linear combination of parameters. The parameters ofthe system can be learned with different advanced algorithmssuch as Locally Weighted Regression (LWR) [3] and LocallyWeighted Projection Regression [22]. Statistical modelingtechniques, on the other hand, use probabilistic models suchas Gaussian Mixture Models[6] or Hidden Markov Models[12]to encode the motion from multiple demonstrations where thestatistical regularities are learned in the spatiatemporal data.

Zhou et al.[23] used a simple 2D obstacle avoidance taskto show that standard DMP approach is not designed to learnfrom multiple trajectories and therefore cannot encode theimportant parts of multiple demonstrations. They developed amethod that performs LWR by defining a new objective func-tion dependent on the environment restrictions and parameters,and that uses a model-switching algorithm to be able to trainover all the parametric space. Ude et al.[20] used LWR andGPR methods to compute new DMP based trajectories givenparameters that describe the characteristics of the action, suchas the goal point. Their method could be generalized withcomplex trajectories for both periodic and discrete movements.Pervez et al. [18] learned a separate GMM for each demon-stration and mixed the separately learned GMMs to generatetrajectories for new task parameter values.

In order to enable physical collaboration, the robot isrequired to learn haptic feedback models that encodes thedefault predicted force feedback measured during execution ofthe corresponding skill. Memorized force and tactile profileshave already been successfully utilized in modulating learnedDMPs in difficult manipulation tasks that contain high degreesof noise in perception such as grasping and in-hand manip-ulation of objects from incorrect positions or flipping boxesusing chopsticks [17]. HMMs were used to learn multi-modalmodels from temperature, pressure and fingertip informationfor exploratory object classification tasks [7]; and PHMMswere used to learn haptic feedback trajectory models to adaptactions in response to external perturbations [10, 21]. Deepnetworks [8] were used to learn multi-modal models fromdifferent sensory information such as temperature, pressure,

fingertip, contacts, proprioception, and speech; however thesemodels were used only to categorize the sensory data withoutany effect on action execution. More recently, the same prob-lem generalization of force/torque profiles was investigatedfor contact tasks [11] where these profiles were modeled byLocally Weighted Regression (LWR) which has local general-ization capabilities, hence successful at the intermediate querypoints.

III. METHOD

Fig. 1: Structure of the CNP adapted from [9].A. Background

Until recently, most of the neural networks in deep learningwere trained to approximate a single output function. However,when the data is a distribution which can be represented bystatistical models such as Gaussian distributions, networksthat approximates to the single functions cannot represent theunderlying model. Instead the network can be modelled asa probabilistic approximator that can predict the distributionparameters, namely mean and variance, instead of producingone single output value. Garnelo et al. [9] proposed a newfamily of neural models that processes the training data basedon Bayesian Inference principles by combining the advantagesof the neural networks with Gaussian Processes. The proposedmethod is able to extract prior knowledge directly from thetraining data by sampling observations from it, and use thisinformation to predict a conditional distribution over any othertarget points. The general structure of the model is shownin the Figure 1. The model consists of three parts: (1) Aparameter-sharing Encoder Network E is used to encode allsampled observations into fixed size representations ri in alatent space. Notice that the numbers and orders of the comingobservations can change during both training and test timebecause empirical observations can change dynamically. (2)The individual representations are aggregated into one generalrepresentation r that covers all the observations sampled at thebeginning. Therefore, according to the prior knowledge r, themodel can predict a distribution over the target input/inputs byconditioning on the observations. (3) Finally, a Query NetworkQ produces distribution parameters (µq, σq) over the queryinput xq by using the general representation r as a prior. Asa summary, the CNP model can be formulated as

µq, σq = Qθ

(xq ⊕

∑ni Eφ((xi, yi))

n

)where {(xi, yi)}ni=0 are observation pairs sampled from data,Eφ is the encoder network with parameters φ, xq is the target

Fig. 2: Training and test procedures of Conditional Neural Movement Primitives. Refer to Section III for more details.

input, i.e., the query over which we want the model to predicta Gaussian distribution, ⊕ is the concatenation operator, andQθ is the query network with parameters θ that outputs theGaussian distribution parameters (µq, σq). The whole model istrained together as a stochastic process where at each trainingiteration, n random observation pairs (xi, yi)

n∈[1,nmax]i=0 and

one query point (xq, yq) are sampled from the training data.nmax is the hyperparameter giving the maximum number ofobservations that can be used during the training process. Atthe end of each iteration, neural network parameters (θ and φ)of both Encoder and Query network are optimized accordingto the following loss function:

L(θ, φ) = − logP (yq | µq, softmax(σq)) (1)

where µq and σq are the outputs of the CNP, yq is thecorresponding output of the xq for this training iteration, and Pis the Gaussian probability function that returns the conditionalprobability of the yq for given mean and variance parameters.

B. Proposed Method: Conditional Neural Movement Primi-tives

Conditional Neural Movement Primitives (CNMP) is alearning from demonstration framework that is designed asa robotic movement learning and generation system built ontop of CNPs. It specifically produces movement trajectories injoint or task space, and executes these trajectories through ahigh-level feedback control loop. The encoded sensorimotorrepresentations are conditioned externally to constrain themovement and set goals. Additionally, as detailed later, CNMPis also continuously conditioned by the current sensory read-ings, predicting the desired sensorimotor values based on thesereadings. This allows to detect unexpected events and respondto these events by changing the movement trajectory on thefly.

A CNMP is trained for each skill that is taught to therobot using one or more demonstrations in possibly differentconfigurations. In each demonstration, the robot stores the ac-tuator positions, the perceptual features, and the correspondingtime points. Therefore each demonstration Dj is stored as atrajectory of sensorimotor features and the corresponding timepoints, i.e. {(t, SM(t))}t=τt=0 where SM(t) corresponds to theobserved sensorimotor data at time t. SM can encode any in-formation including task and joint space positions of the robotarm and gripper, force/torque measurements, low-dimensionalenvironment properties such as size of manipulated objectsor high-dimensional world state such as top-down 2D rawimage mask of the environment. In the rest of this section,SM is described as a one dimensional vector which might beflexibly referred as sensory or motor data in different timesfor simplicity and clarity.

CNMP training and trajectory reproduction steps are ex-plained in Fig. 2 and detailed in the next subsections. Recallthat although SM is displayed as a one dimensional vector,it is normally a multi-dimensional vector that includes anysensory or motor information. Therefore, CNMP is designed toenable learning from and constraining in any set of dimensionsin the SM space.

Learning from Demonstrations using CNMP: For eachdemonstration, first, time invariance is ensured by scaling thetime values into [0,1]. In each training iteration, a demon-stration Dj is selected randomly from a uniform distribu-tion. From the chosen demonstration, n observation tuples{(ti, SM(ti))}ni=0 ∈ Dj and one target query (tq, SM(tq)) ∈Dj are sampled uniformly. n is a random number between[0, nmax] where nmax is a hyper-parameter that sets themaximum number of samples collected from the chosendemonstration by CNMP. This allows our framework to beflexible over changing numbers of observations at the testtime. Fig. 2.1c illustrates the neural network operations for

Fig. 3: CNMP execution loop with feedback

an example of four observation points. Each observation tuplegoes through the parameter sharing Encoder Network (E) andproduces the representation for the corresponding observationsin a fixed sized latent space. Using a mean operator, individualrepresentations are merged into one general representation r.The general representation r and target query time tq areconcatenated and given to the Query Network (Q) as input. Asthe final step of the neural network, Query Network predictsthe distribution parameters (µq, σq) of SM for target time tq .Figure 2.1d shows the predicted Gaussian distribution and theactual SM(tq) of the robot at the queried time. The errorbetween the predicted and actual SM values is calculatedusing the loss function in Eq. 1 for each query. The computederrors are back-propagated to train the weights of the network.Task specific external parameters (γ) can also be included inencoding the movement primitive by directly concatenatingwith the observation and query data and providing to the En-coder and Query Networks directly. Such external parameterscan be regarded as part of sensory data, therefore for simplicitywe will not provide separate explanation for handling theseparameters in the rest of this section.

Reproducing movement primitives: After learning fromdemonstrations, the CNMP is used to generate the SMtrajectory that is predicted to satisfy the externally providedgoals given the current sensorimotor information, where bothgoals and sensorimotor information are represented with aset of observation tuples. The external goals are set tothe desired sensorimotor values at the desired time points({(ti, SM(ti))}). The means of identifying and querying thegoals is extremely flexible (2.2e). For example, the goals canrefer to the starting, intermediate or final positions of themovement trajectory, and set once to generate and executethe desired trajectory. As another example, the goals canbe set as the desired force/torque readings expected to beobserved during movement generation. The current sensori-motor information, on the other hand, corresponds to the SMat the current time point (tc): ({(tc, SM(tc))}). Continuousquerying with the current SM enables the system to detect andrespond to unpredicted events such as external perturbationsthat generates discrepancy between the actual and predictedsensorimotor values. In case such prediction error is detected,the movement trajectory changes on the fly, and executed asdetailed in the next subsection. This effectively corresponds

to autonomously reacting to external perturbations exploitingthe encoded multi-modal distributions1.

Figure 2.2f illustrates an example case where the goalis set as two test time observations that correspond to thedesired start and end positions of the movement. These goalobservations are given to the trained CNMP model and thegeneral representation r is produced after the Encoding andAveraging operations in the neural network (Fig 2.2.g). Thisgeneral representation r holds the compact information for allthe given goal observations, thus, guides the Query networkto produce movement primitive distributions that satisfy allthe desired goals/constraints at the once. To extract the wholemovement trajectory, all time steps ti

1i=0 are concatenated

with the pre-computed general representation r, and the setof mean and variance values for the whole trajectory ispredicted by using the Query network. Figure 2.2h shows thepredicted trajectory distribution that satisfies the given goals.The predicted trajectory is sent to the controller of the robotto be executed as detailed next.

Execution and Online Adaptation of the generated move-ment: The CNMP-driven control loop that aims to bring therobot to the sensorimotor state predicted by the CNMP ineach time step is provided in Fig. 3. In each time step,the environment is perceived through sensors of the robot,and the current SM is provided along with the externallydefined goals to the CNMP for generation of the sensorimotorvalues that are expected to be observed to achieve the task.In an online control loop, CNMP predicts the SM(t)tnt=tc+1

configurations of the next time step and continuously producesmotor commands to achieve the predictions. In case thepredicted SM and the current SM does not match, i.e. thecurrent SM does not match with the SM that is predictedto match with the corresponding skill, time stops advancinguntil the error drops below a threshold. With this constantlychanging online observation loop, our system can react to theperturbations of the sensor measurements, because of the thelow computation complexity of the CNMP at the test time.

Note that the structure of the CNMP does not need anystrict requirements on the controller side. In its current form,the motor variables in the generated SM are set as the desiredmotor commands in each time step, and executed using a PDcontroller. The fact that the output of the CNMP is constructedas a distribution allows the robot to move freely withinthe boundaries of the predicted distribution. This flexibilitycan potentially be exploited to achieve the given task withalternative trajectories. Another option is to use variance in-formation to reason about which segments of trajectory shouldbe followed stiffly and accurately, automatically adjusting thecompliance based on the variance in the distribution.

1The complexity of the trajectory generation is O(no+nq) where no andnq are the numbers of the observations and queried time points respectively.Single query takes 0.9 msec. with Inter Core i7 processor and GeForce GTX1050 gpu, the robot can quickly adapt to the changes in the environment atthe test time.

Fig. 4: 2d object avoidance experiment and the results. Thedemonstrations are given with colored trajectories. The con-ditions and generated movement distributions are shown withthe black dots and gray areas, respectively.

IV. EXPERIMENTAL RESULTS

The performance, capabilities, and limitations of the CNMPwere studied in two simulated and two real robot exper-iments. In Section IV.A, the CNMP and another widelyused method that encodes distribution of the demonstrations,namely ProMP[15], was compared in learning of a 2d sim-ulated obstacle avoidance task where different modes of theavoidance were demonstrated. In Section IV.B, the capacity ofthe CNMP to learn in high-dimensional SM was investigatedin a 2d obstacle avoidance task where the system learnsplanning movement trajectory avoiding two obstacles withdifferent using top-down image masks of the environment.In Section IV.C, the performance of the CNMP in learningnon-linear environment-trajectory relations in a real robotmanipulator was investigated through analyzing generalizationcapability to environment configurations inside and outside therange of demonstrations. In Section IV.D, the capability of theCNMP model to react to external perturbations detected in theproprioception and force data was verified in the real robotexploiting the learned multi-modal information.

A. Learning of different modes of operations of the same skill

The aim of this experiment is to investigate if and inwhich conditions the models can automatically discover thecategories inherent to the demonstrated primitive; and re-produce new avoidance trajectories taking into account thisknowledge. For this purpose, the capabilities of the CNMPand ProMP models were studied in a 2d object avoidancetask where demonstrated trajectories passed around eitherone side or the other side of the obstacle and from whichside it passes was not explicitly provided to the system. Thetask and the demonstrated trajectories are provided in Fig. 4.Each trajectories started from the left side (t=0,(x=0,y=y0)), itpassed around the upper or lower side of the obstacle at timepoint t=0.5, and ended up on the right side at a symmetricposition (t=1,(x=1,y=y0)). The 6 trajectories shown in the

Fig. 5: Left: The 2d obstacle avoidance setup used in learningexperiments from top-down images. An example environmentand demonstration trajectory were presented. Right: The struc-ture of Convolutional CNMP

figure with 300 time steps were used to train both models. TheProMP model2 was trained with 31 basis functions, setting theσ value to 0.05. The CNMP, on the other hand, was trained bysampling a number of observations {(ti, (xi, yi))}n∈[1,5]

i=0 froma randomly selected demonstration in each iteration. Encoderand Query networks were structured in 2 fully connectedlayers with 128 neurons each. The neural network was trainedby using Adam optimizer with an empirically found learningrate of 0.0001.

After training, new avoidance trajectories were generatedby setting goals, i.e. by conditioning the models at differentpoints. The obtained results are presented in Fig. 4. Theupper and lower rows in this figure correspond to the resultsobtained by ProMP and CNMP, respectively. In each plot inthe figure, the black dot shows the conditioned position. Theblack line and the shaded area correspond to the mean andvariance of the predicted movement distribution computed forthe corresponding conditioning. As shown in the first column,when the models were conditioned at a position that ensuresavoidance, both models generated distributions and trajectoriesthat enabled the system to avoid the obstacle. On the otherhand, when the conditioning was set to other positions such asto an intermediate point (t=0.2,(x=0.2,y=-0.37)) or to an endpoint (t=1,(x=1,y=0.13)), the ProMP started failing whereasCNMP continued generating movement trajectories that avoidthe invisible obstacle. These results show that CNMP learnedto produce multiple distributions by observing different cate-gories of the same primitive from the data itself without anyexplicit parameterization. With this, we conclude that CNMPallows learning and encoding different modes of operationof the same movement primitive, and reliable and flexiblereproduction of the trajectories automatically following oneof the distinct modes.

B. Movement primitive learning in high-dimensional sensori-motor space from top-down raw image masks

In this section, the capability of CNMPs in learning directlyfrom top-down raw image masks in a simulated experiment

2Our implementation and training data for the experiments can be found inhttps://github.com/rssCNMP/CNMP. To compare our framework with ProMP,ProMp library of the Flowers INRIA Laboratory in learning and generatingthe movement primitives is used. See https://github.com/flowersteam/promplib

(a) C-CNMP conditioned with start and final positions

(b) C-CNMP conditioned with intermediate and final position

(c) C-CNMP conditioned with one intermediate position

Fig. 6: CNMP was conditioned by (a) start and final positions,(b) intermediate and final positions, and (c) one intermediateposition.

setting is studied. Two obstacles of varying sizes were attachedto the opposite sides of a 2d room at different positionsas shown in Fig. 5. For each room configuration, a motiontrajectory that moves the robot from the left to the rightside of the room was generated with a simple heuristic. Themotion trajectories were presented as demonstrations alongwith the raw images to train the CNMP model. Convolutionoperation was included into the model (Fig. 5) to deal with theraw images. Convolution was achieved with a neural networkthat consists of 4 convolution layers followed by maxpoolingoperation with channel numbers 8, 16, 32 and 64. At theend, SM vector was composed of 2d position, image maskencoding (m), and the corresponding time-point.

The CNMP model, after trained with 1000 trajectories (inconfigurations with random start/end positions and randomlyplaced and sized obstacles) was queried to generate trajectorieswith different conditions. The conditions were encoded inSM vectors that include time, position and image informa-tion. Fig. 6 shows the trajectories generated by the CNMPconditioned with (a) start and end positions, (b) intermediateand end positions, and (c) an intermediate position for anumber of test configurations that were not experienced before.As shown, the trained CNMP could successfully generatemovement trajectories avoiding the red walls in all config-urations. Interestingly, when the CNMP was provided onlyan intermediate pose at a specified time, it changed thestart and end positions, shifting the movement if necessary(Fig. 6c). These results indicate that the CNMP framework wassuccessful in movement primitive learning in high-dimensionalsensorimotor space and it is a promising framework for end-

Fig. 7: Experimental setup and snapshots from demonstrationsin the obstacle avoidance task where heights were provided asparameters.

Fig. 8: Qualitative results of selected three joints on novelinterpolation (a,b) and extrapolation (c) cases

to-end learning of complex trajectories.

C. Task Parameterization and Generalization in Real RobotThis experiment investigates the capability of the CNMP

in learning of non-linear environment-movement relations inan object pick and place task that requires obstacle avoidance.UR10 robotic arm equipped with a Robotiq gripper and a wristmounted force/torque sensor was used for this purpose. Afterthe robot was provided a number of demonstrations in differentenvironments, we analyzed the generalization capabilities innovel configurations. The heights of the manipulated objectand the static obstacle were provided as task parameters (γ)and the joint angles were included in the SM vector. The taskwas composed of moving the gripper to a suitable positionto pick up the object according to its height first, and thenmoving it towards the target position following an arc avoidingthe obstacle. The experimental setup and snapshots from an

Fig. 9: The experimental setup for pick-and-place task where the placement position depends on the weight and shape ofthe container. On the left, the movement trajectories for different configurations are provided: The container is reached andpicked-up (1), lifted (2), and carried toward target position (3). On the right: demonstration SM data.

example demonstration is provided in Fig. 7. CNMP wastrained with 8 movement trajectories that were obtained indifferent object-obstacle configurations. The heights of theobject and obstacle were provided as parameters: γo=(2, 6),γh=(2, 4, 6, 8). The non-linear relationship between joint val-ues and γ can be observed in Fig 8, which provides thedemonstrated trajectories in gray, especially in elbow andshoulder joints.

The generalization capacity of the system to novel object-obstacle settings was tested in two different conditions wherethe test distributions were sampled from inside and outside ofthe range of demonstration. Both conditions were tested withthree different environments. In the interpolation condition, therobot generated a movement trajectory in the joint space suc-cessfully picking and placing the object avoiding the obstaclein all three tasks. The predicted and executed trajectories werecompared against ground truth trajectories and the results oftwo example test cases are provided in the Fig. 8(a,b). Thegray lines indicate the demonstrations used for the training,the blue dashed line indicates the ground-truth, and the redline is the prediction of the system. The average absolute errorsmeasured in degrees are also summarized in Table I, where starindicates the novel conditions. The maximum average absoluteerror along the trajectories was 0.28 degrees. Consideringthese results, we conclude that the prediction mechanism ofthe CNMP can generalize to the novel situations sampledinside the training distribution. In the extrapolation cases, our

TABLE I: Average joint errors (in degrees) between predictedand ground truth trajectories. The upper and lower threerows show the errors in the interpolation and extrapolationconditions, respectively.

Joint Errors (deg)Task Parameters Base Shoulder Elbow Wrist1 Wrist2 Wrist3Obj: 3∗ Obs.: 3∗ 0.23 0.271 0.173 0.232 0.064 0.254Obj: 3∗ Obs.: 6 0.158 0.167 0.145 0.163 0.061 0.126Obj: 4∗ Obs.: 7∗ 0.099 0.101 0.269 0.280 0.055 0.093

Obj: 7∗ Obs.: 9∗ 0.719 0.895 3.569 3.687 0.156 0.665Obj: 8∗ Obs.: 10∗ 1.038 2.325 8.879 8.094 0.360 0.830Obj: 9∗ Obs: 11∗ 1.861 5.380 17.978 14.894 0.590 1.386

motivation was to gradually increase difficulty and evaluatethe limits of the system. For this purpose, the heights ofthe object and obstacle was gradually increased. The taskwas successfully executed in γ = (7, 9) parameter configu-ration but the robot started failing when the object and wallheight exceeded γ = (8, 10). Fig. 8(c) shows the trajectoryprediction of one of the trajectories generated and Table Ibottom three rows present the average absolute errors for theextrapolation cases. As the results show the performance of thetrajectory prediction for extrapolation novel cases decreasedsignificantly, especially when the extremity of the case withrespect to the boundaries of the training.

D. Movement Primitive Classification and Adaptation to Per-turbations in Real Robot

This experiment aimed to demonstrate the movement gener-ation capability of the CNMP taking into account the categori-cal semantics included in the demonstrations and its adaptationcapacity in response to external perturbations. The UR10 robotarm, Robotiq gripper and force/torque sensor system was usedto teach a pick and place task applied to a container placed ona table. The task is designed with the following schedule: Theobject was picked up from the middle of the table, and placedat one of the four target locations depending on the type of thecontainer and its weight. Two different containers were usedwith different amounts of weights as shown in Fig. 9, resultingin four different types (2 containers x 2 weight categories) ofdemonstrations. 5 different weights were used for each weightcategory, therefore 4x5=20 demonstrations were provided tothe system for training, following the trajectories shown withthe dashed lines in the same figure. SM vector included the3d position of the gripper, the joint positions of the fingers,and the force/torque readings. In testing, given a container(cube with a low weight) unknown to the robot, the robotstarted the execution without any information on the conditionsand updated its predictions online. Before interacting withthe container, any four target-points were equally likely forthe system, therefore the variance of the prediction startedhigh as shown in the middle-left plot in Fig. 11. The robot

Fig. 10: Movement adaptation experiment. The robot gen-erates the required movement trajectories in its end-effectorand finger position spaces, automatically perceiving the task-related features such as the weight and shape of the objectthrough its sensors (proprioception and force/torque readings).https://youtu.be/cPKOIaf0mUc

next reached to the object and started interacting with it: Assoon as it enclosed its fingers and started lifting the object,the received feedback from the interacted container throughproprioception and force/torque measurements automaticallyconditioned the robot to generate a trajectory that ends up inthe lower-bottom corner from a low-variance distribution (thebottom-left plot in Fig. 11). During the movement of the robot,we stepped in and put more marbles inside the box inside thegripper, effectively increasing the weight of the cube carriedby the robot. The updated SM conditioning automaticallyupdated the distribution of the movement trajectory, settingthe final point to the bottom-right region on the table (see thebottom-right plot in Fig. 11) and changing the motion of theend-effector towards that position through the PD controller.The increase in the time variable is suspended until the errorbetween predicted and actual SM vanished. Finally, the cubewith the increased weight was placed to the position that wasdemonstrated for high-weight boxes. This experiment showedthat although the task parameters (type and weight of thecontainer) were not explicitly provided as parameters, the asso-ciation between the movement trajectory and these parameterswere learned in the SM space of the robot through associatingfinger proprioception values and force/torque readings with themovement trajectories of different types. Furthermore, thanksto conditioning the system with the current SM in each timestep, the system was shown to effectively react to changes thatinfluence the task execution. Such a reaction was possible asthe robot inferred that the target position only depended onthe sensor measurements and was invariant from the currentlocation of the robot.

V. CONCLUSION

Our CNMP method was shown to learn the non-linear rela-tionships between environment parameters and complex actiontrajectories, deal with raw input through convolution operation,learn multiple modes of operations for the same primitive,

Fig. 11: Online conditioning and adaptation to unexpectedchanges during motion execution. Final position estimationsare shown at 4 timestamps with black dot and gray shade.Container is approached (1), grasped but not lifted (2), liftedand carried (3), made heavier (4).

and generalize to novel configurations if they are sampledfrom the experienced range. A wide range of relationshipswere shown to be automatically discovered and learned. Inthe first real robot pick-and-place task, the robot learned acontinuous non-linear relation between the parameters and theaction trajectories, whereas in the second robot experiment, itlearned that containers with different weights were requiredto be placed in the same final positions. In the second robotexperiment, the factors that influenced the task execution,such as weight and shape of the object, were never providedto the robot. Still, based on its interaction experience, therobot successfully learned to generate the required movementtrajectory through exploiting the learned encoding in its low-level sensorimotor space that included proprioception andforce readings. Furthermore, our continuous online condition-ing with the current sensorimotor values enabled the robot todetect and respond to changes in the factors that influence taskexecution, on the fly. The PD controller enabled reacting tosuch external perturbations, however without any convergenceor goal-achievement guarantees. In the future, we plan to studyon mechanisms that makes search in the execution history andin the encoding space of the sensorimotor representations, andpossibly rewinding the taken steps in order to successfullyreach to the new environment state exploiting the learnedmulti-modal relations.

ACKNOWLEDGMENTS

This research has received funding from the EuropeanUnion’s Horizon 2020 research and innovation programmeunder grant agreement no. 731761, IMAGINE. Also, thanksto Ahmet E. Tekden for early discussions.

REFERENCES

[1] Brenna D Argall, Sonia Chernova, Manuela Veloso, andBrett Browning. A survey of robot learning from demon-stration. Rob. and Auto. Sys., 57(5):469–483, 2009.

[2] Tamim Asfour, Pedram Azad, Florian Gyarfas, andRudiger Dillmann. Imitation learning of dual-arm manip-ulation tasks in humanoid robots. International Journalof Humanoid Robotics, 5(02):183–202, 2008.

[3] Christopher G Atkeson, Andrew W Moore, and StefanSchaal. Locally weighted learning for control. In Lazylearning, pages 75–113. Springer, 1997.

[4] H Ben Amor, Oliver Kroemer, Ulrich Hillenbrand, Ger-hard Neumann, and Jan Peters. Generalization of humangrasping for multi-fingered robot hands. In IROS, pages2043–2050. IEEE, 2012.

[5] Sylvain Calinon. A tutorial on task-parameterized move-ment learning and retrieval. Intelligent Service Robotics,9(1):1–29, 2016.

[6] Sylvain Calinon, Paul Evrard, Elena Gribovskaya, AudeBillard, and Abderrahmane Kheddar. Learning collabora-tive manipulation tasks by demonstration using a hapticinterface. In Advanced Robotics, 2009. ICAR 2009.International Conference on, pages 1–6. IEEE, 2009.

[7] Vivian Chu, Ian McMahon, Lorenzo Riano, Craig GMcDonald, Qin He, J Martinez Perez-Tejada, MichaelArrigo, Naomi Fitter, John C Nappo, Trevor Darrell,et al. Using robotic exploratory procedures to learn themeaning of haptic adjectives. In ICRA, 3048–3055, 2013.

[8] Alain Droniou, Serena Ivaldi, and Olivier Sigaud. Deepunsupervised network for multimodal perception, repre-sentation and classification. Robotics and AutonomousSystems, 71:83–98, 2015.

[9] Marta Garnelo, Dan Rosenbaum, Christopher Maddi-son, Tiago Ramalho, David Saxton, Murray Shanahan,Yee Whye Teh, Danilo Rezende, and S. M. Ali Eslami.Conditional neural processes. In ICML, 1704-1713 2018.

[10] Hakan Girgin and Emre Ugur. Associative skill memorymodels. In IROS, pages 6043–6048, 2018.

[11] Alja Kramberger, Andrej Gams, Bojan Nemec, DimitriosChrysostomou, Ole Madsen, and Ale Ude. Generaliza-tion of orientation trajectories and force-torque profilesfor robotic assembly. Robot. Auton. Syst., 98(C), 2017.

[12] Dongheui Lee and Christian Ott. Incremental kinestheticteaching of motion primitives using the motion refine-ment tube. Autonomous Robots, 31(2-3):115–131, 2011.

[13] Manuel Muhlig, Michael Gienger, and Jochen J Steil.Interactive imitation learning of object movement skills.Autonomous Robots, 32(2):97–114, 2012.

[14] R. Pahic, A. Gams, A. Ude, and J. Morimoto. Deepencoder-decoder networks for mapping raw images todynamic movement primitives. In 2018 IEEE Interna-tional Conference on Robotics and Automation (ICRA),pages 1–6. IEEE, 2018.

[15] Alexandros Paraschos, Christian Daniel, Jan Peters, andGerhard Neumann. Probabilistic movement primitives.

In NIPS, pages 2616–2624, 2013.[16] Peter Pastor, Ludo Righetti, Mrinal Kalakrishnan, and

Stefan Schaal. Online movement adaptation based onprevious sensor experiences. In IROS, 365–371, 2011.

[17] Peter Pastor, Heiko Hoffmann, Tamim Asfour, and StefanSchaal. Learning and generalization of motor skills bylearning from demonstration. In ICRA, 763–768, 2009.

[18] Affan Pervez and Dongheui Lee. Learning task param-eterized dynamic movement primitives using mixture ofgmms. Intelligent Service Robotics, 11:61–78, 2018.

[19] Stefan Schaal. Dynamic movement primitives-a frame-work for motor control in humans and humanoidrobotics. In Adaptive Motion of Animals and Machines,pages 261–280. Springer, 2006.

[20] Ales Ude, Andrej Gams, Tamim Asfour, and Jun Mori-moto. Task-specific generalization of discrete and peri-odic dynamic movement primitives. IEEE Transactionson Robotics, 26(5):800–815, 2010.

[21] Emre Ugur and Hakan Girgin. Compliant parametricdynamic movement primitives. Robotica, 2019. in press.

[22] Sethu Vijayakumar and Stefan Schaal. Locally weightedprojection regression: Incr. real time learning in highdimensional space. In ICML, pages 1079–1086, 2000.

[23] Y. Zhou and T. Asfour. Task-oriented generalization ofdynamic movement primitive. In IROS, pages 3202–3209, 2017.

Conditional Neural Movement Primitives -...

Documents

Transcript of Conditional Neural Movement Primitives -...