Accelerating Synchronization of Movement Primitives: Dual-Arm … · 2020. 9. 25. · Accelerating...

Accelerating Synchronization of Movement Primitives: Dual-ArmDiscrete-Periodic Motion of a Humanoid Robot

Andrej Gams1, Aleš Ude2 and Jun Morimoto3

Abstract— Human-demonstrated motion transferred to arobotic platform often needs to be adapted to the currentstate of the environment or to modified task requirements.Adaptation, i. e. learning of a modified behavior, needs to befast to enable quick utilization of the robot either in industryor in future household-assistant tasks. In this paper we showhow to accelerate trajectory adaptation based on learningof coupling terms in the framework of dynamic movementprimitives (DMPs). Our method applies ideas from feedbackerror learning to iterative learning control (ILC). By takinginto account the actual physical constraints of the synchronousmotion – through synchronization of both positions (or forces)and velocities – it is not only a more faithful representation ofactual real-world processes, but it also accelerates the speedof convergence. To show the applicability of the approachin the framework of DMPs, we tested it on a formulationwhich encodes an initial discrete motion, followed by a periodicbehavior, all in a single system. Modifications of the originaldiscrete-periodic formulation now also allow for the use of DMPtemporal scaling property. In the paper we also show how theDMP coupling can be implemented in joint space, whereas themeasured forces and previous approaches always remained inthe task space. We applied our approach to an example dual-arm synchronization task on Sarcos humanoid robot CB-i.

I. INTRODUCTION

Robot programming by demonstration has been exten-sively used for the generation of robotic motions [1], includ-ing full body motions [2]. Neurological findings motivatedalso the use of action primitives [3]. Amongst the variousmethods of encoding demonstrated motion using primitives,dynamic movement primitives (DMP) have emerged for bothdiscrete point-to-point motion as well as for periodic motion[4]. Ernesti et al. [5] presented a combined DMP repre-sentation for both discrete transient motion and subsequentperiodic behavior, in a single system. In our study we buildon the same, but slightly modified computational model thatcombines discrete and rhythmic movement primitives in aunified representation. In this paper we show how the initially

*This work was supported by EU Seventh Framework Programme grant270273, Xperience. A.G. was also supported by JSPS-ARRS grant; A.U.was supported by NICT, Japan Trust International Research CooperationProgram; J.M. was supported by MEXT KAKENHI 23120004; by MEXTSRPBS; by MIC-SCOPE; by JST-SICP; by ImPACT Program of Councilfor Science, Technology and Innovation.

1Andrej Gams is with Humanoid and Cognitive Robotics Lab, Dept. ofAutomatics, Biocybernetics and Robotics, Jožef Stefan Institute, Ljubljana,Slovenia [email protected]

2Aleš Ude is with Humanoid and Cognitive Robotics Lab, Dept. ofAutomatics, Biocybernetics and Robotics, Jožef Stefan Institute, Ljubljana,Slovenia and with Department of Brain Robot Interface (BRI), ATR Com-putational Neuroscience Labs, Kyoto, Japan [email protected]

3Jun Morimoto is with Department of Brain Robot Interface (BRI) ATRComputational Neuroscience Labs, Kyoto, Japan [email protected]

trained "discrete-transition-into-periodic" behaviour can bemodified to account for the constraints of the task. Themain contribution of the paper is in accelerating the speedof convergence through the modification of the error signal,based on the feedback error learning approach.

Combining discrete and periodic motion has been studiedfrom the perspective of common neural control systemsin humans [6], [7]. Besides the approach of Ernesti etal. [5] within the DMP framework, other approaches havebeen suggested for robotics applications. Motor primitivesand central pattern generators were utilized for a combinedrhythmic-discrete motion representation by Degallier et al.[8], who applied their framework to infant-like crawling anddrumming on a humanoid robot. As an underlying structurethey used nonlinear oscillators, which allowed the transitionbetween different types of behaviors. Oscillators were alsoused in a control structure by Ajallooeian et al. [9], whoshowed how nonlinear phase oscillators can be morphed toan arbitrary limit cycle, including an initial transient motion.

Despite the rich and favorable properties of DMPs forrobotic control, direct replication of the demonstrated move-ment on the robot usually does not produce the desired taskbehavior. Different methods were previously reported formodification and synchronization of trajectories encoded asmotor primitives. Direct demonstration using DMPs can bemodified by external feedback [10], [11], or can serve as aninitial approximation for learning of the correct signal, e. g.using reinforcement learning [12], [13]. In order to reduce theerror of task adaptation through a feedback controller, Gamset al. [14] proposed using iterative learning control (ILC)to learn task-appropriate coupling signals, which minimizedfeedback errors. The nature of ILC makes it applicable todiscrete processes [15], and consequently repetitive control(RC) was proposed for periodic DMPs [16]. One of thecontributions of this paper is in showing how couplingand ILC can be used for combined discrete-periodic tasks,represented by a single DMP system. In this paper we namethis kind of a DMP a discrete-periodic dynamic movementprimitive (DP-DMP). Periodic DMPs were modified usingfeedback error learning also in [17]. The authors learned thefeed-forward accelerations provided to an inverse dynamicsmodel in order to minimize the feedback control effort ofan additional stabilizing control law. DMPs have also beenextended to Interaction Primitives [18], incorporating toolsfor synchronizing, adapting, and correlating motor primitivesbetween cooperating agents. Interaction forces were learnedfrom human demonstrations in [19], [20].

In this paper we show how the method of synchroniz-

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)Congress Center HamburgSept 28 - Oct 2, 2015. Hamburg, Germany

978-1-4799-9993-4/15/$31.00 ©2015 IEEE 2754

ing DMP trajectories based on learning of task-appropriatecoupling signals [14] can be 1) accelerated by incorporatingthe ideas of feedback error learning and 2) applied to jointspace motion in contrast to the previous applications thatwere limited to the task space. Through the first contribu-tion, the modification of the feedback error to include bothposition and velocity components, we also depict a morefaithful representation of actual-real world behavior, wheresynchronized dual-arm behavior requires both. The secondcontribution enables the modification of trajectories that wereinitially acquired in joint space. This has the advantage ofpreserving the configuration of the robot, which might beimportant for redundant robots such as humanoids.

Two additional contributions show 1) how our acceleratedsynchronization of DMPs through the modified feedbackerror can be applied to the combined discrete-periodic DMPs,i. e. a DP-DMP; and 2) a modification of the original DP-DMP formulation to implement the basic DMP property ofmodulating the speed of execution.

The paper is organized as follows. In Section II we showhow to modify the feedback error based on the ideas fromfeedback error learning framework initially proposed byKawato [21]. The application of the DMP modulation injoint space is provided in Section III. Section IV explainsthe structure of the DP-DMP, including the modificationsof the original contribution of Ernesti et al. [5]. Section Vdiscusses simulated and real-world results. Discussion andconclusions follow.

II. ACCELERATING THE SYNCHRONIZATION OFMOVEMENT PRIMITIVES

When two independently controlled robots or agents, i. e.,without a central controller, have to work in a cooperativetask, their motion needs to be synchronized. Synchronizingtwo independent motion primitives can be achieved by con-necting them, that is by coupling them so that the forcesexhibited through the actions of one have an effect of thecontrol of the other. In this Section we summarize how wecan couple two DMPs as was described in [14] and mostimportantly, how we accelerate the learning of a feedforwardpart of the coupling term uff by using ideas from feedbackerror learning [21]. We first provide basics of the DMPs.

A. DMP Basics

Standard dynamic movement primitives (DMPs) are basedon a damped spring model [4]

z = Ω (αz (βz (g − y)− z) + f) , (1)y = Ωz, (2)

where Ω is a constant governing the speed of execution (thefrequency) and αz = 4βz are positive constants that makethe system critically damped. The parameter y is one of thedegrees of freedom used to control the robot and z is anauxiliary variable. The goal parameter g defines the uniqueattractor point, i. e. the point to which y converges if theforcing term f tends to zero as a function of time. The robotis controlled by integrating system (1) – (2) with the given

initial parameters y = y0, z = y0/Ω. The choice of theforcing function f determines whether the movement encodedby a DMP is periodic or discrete.

To couple a DMP to an external signal we add the controlsignal u to Eq. (2)

y = Ωz + u, (3)

The control signal, also referred to as the coupling term,is defined as a combination of a feedback and a feedforwardterm u = ufb+uff . ILC can be used to learn the feedforwardcomponent, as proposed in [14]. ILC takes a few iterationsto learn. In the next Sections we show how we can acceleratethe learning and apply it to joint space.

B. Modification of Feedback Error

Lets consider the task of synchronizing the motion oftwo robot arms in the context of dual arm manipulation.We considered the task of lifting an unknown object withtwo arms in such a way that the relative object positionand orientation between the two arms remain constant. Thishappens if the robot holds the object so that the forceacting on the arms is constant. If the movement of thetwo arms is specified by two DMPs given by (1) – (2)and an appropriate canonical system (see Section IV for theDP-DMP), synchronous arm behavior can be achieved bymodifying one of the two DMPs with (3) so that the forcesand torques acting on both arms are constant, i. e. F(t) = Fdand M(t) = Md.

If the initial, not synchronized arm movements are spec-ified in task coordinates, then the sensory feedback is pro-vided by measuring gravity-compensated forces and torquesacting on both arms. If the task is to modify the movementof the second arm so that it lifts the object together with thefirst arm, then the feedback signal can be defined as

e(t) =

[Fd − F2(t)Md −M2(t)

], (4)

where Fd, Md are the desired forces and torques and F2(t)and M2(t) the actual forces and torques acting on the secondarm. In a simulated setting, we can model the forces andtorques with a spring, i. e.,

e(t) =

[Kp(xd(t)− x2(t))

Ko2 log(qd(t) ∗ q2(t)

) ] , (5)

where Kp,Ko stand for the translational and rotationalpositive definite stiffness matrices, respectively. The valuescan differ for separate directions. x is the position vec-tor and q the orientation quaternion. Note that the term2 log

(qd ∗ q2(t)

)corresponds to the angular velocity that

rotates quaternion q(t) to qd(t) in unit time, just like xd(t)−x2(t) corresponds to the linear velocity that translates x2(t)to xd(t) in unit time. Note also that standard DMP equations(1) – (2) need to be modified to enable the integration oforientational motion represented by quaternions. See [20],[22] for more details and formal definition of the logarithmicmap log and quaternion DMPs. For simplicity we assumethat everything is in the world coordinate system.

2755

Ideally, one would use the full inverse dynamic modelto calculate the motor command error given the task spaceerror. Since accurate inverse dynamics models are difficultto obtain, feedback error learning approach [21] uses theoutput of a simple controller consisting of proportional, dif-ferentiation, and acceleration feedback to iteratively improvethe motor command. The proportional part is provided by(5). We propose to improve the coupling performance byadding a differentiation term. Such an improvement is also amore faithful representation of the physical behavior. If thepositions are to be synchronized, the velocities have to be aswell. We therefore use (for a position-based error learningin simulated setting)

e(t) = Ksk1

[xd − x2(t)

2 log(qd(t) ∗ q2(t)

) ]+

Ksk2

[x1(t)− x2(t)ω1(t)− ω2(t)

], (6)

where Ks comprises the elements of both Kp and Ko onits diagonal. The terms k1, k2 are positive constants andrepresent the gains of the separate position and velocitycomponents. The terms k1, k2 originate from the notion ofa simple controller in feedback error learning. We definedthe values empirically. The error as defined in (6) is nowused instead of (5) to update the coupling term. Note that anadditional acceleration term is not used because the couplingof the DMPs, as shown in (3), is at the first derivative of theDMP, leaving only one level in the second order DMPs weuse. The DP-DMP we use is explained in Section IV.

C. Iterative Learning Control

Iterative learning control (ILC) provides an effective wayto minimize feedback error. It uses feedback error to improvethe performance in the next repetition of the same behavior.We use it to update the coupling term u after every attemptat a task, i. e., an epoch. We propose to apply the current-iteration ILC, which is given by the formula [15]

uj+1(k) = Q(uj(k) + Lej(k + 1))︸︷︷︸feedforward term

+ Cej+1(k)︸︷︷︸feedback term

, (7)

where u is the coupling term (or control signal), k denotes thek-th time sample, j denotes the learning iteration, and Q andL are the learning parameters. The value of Q determines therobustness, with Q = 1 the most accurate but the least robustimplementation. We used Q = 0.99. L is the gain of the errorof the previous cycle and is often the same as the proportionalgain of the feedback controller. ILC is distinguished fromsimple feedback control by the prediction of the error e(k+1, j) in the j + 1 iteration, which serves to anticipate theerror caused by the action taken at the k-th time step. ILCmodifies the control input in the next iteration based on thecontrol input and feedback error in the previous iteration.

III. DP-DMP COUPLING AT THE JOINT LEVELWhile demonstrations may be in task space, transfer of

motion to the robot can also include joint space trajecto-ries, acquired by kinesthetic guiding. By maintaining the

movement representation in the joint space, we preservethe information about the robot configuration, which isspecifically important if the robot is redundant with respectto the task. Previously modulating the DMP trajectory usingexternal (virtual) force feedback considered only task spacetrajectories [14]. In the following we show how it can beapplied to joint trajectories.

To map the task-space feedback into the joint space, weneed to separately map both the position and the velocityfeedback. In the case of modifying only the motion of thesecond arm with respect to the first, we obtain the followingerror terms:

• position-based error learning

e(t) = Ksk1JT2

[xd − x2(t)

2 log(qd(t) ∗ q2(t)

) ]+

Ksk2J+2

[x1(t)− x2(t)ω1(t)− ω2(t)

], (8)

• force-based error learning

e(t) = Kfk1JT2

[Fd − F2(t)Md −M2(t)

]+

Ksk2J+2

[x1(t)− x2(t)ω1(t)− ω2(t)

]. (9)

Here J+2 denotes the pseudoinverse of the task Jacobian of

the second arm. The second term in the above equation isbased on the fact that the relative motion of the two armsholding an object should be equal to zero.

IV. UNIFIED REPRESENTATION FOR DISCRETEAND PERIODIC MOVEMENTS

To show an extended applicability of the modulation ap-proach, we apply it to the DMP formulation that allows us toencode a periodic motion and its initial discrete transient part,i. e. a DP-DMP. The formulation of DP-DMPs is based onthe original work described in [5], with a changed canonicalsystem to allow for temporal scaling, a basic DMP propertywhich was not possible with the original formulation.

The structure of a DP-DMP follows the standard DMPformulation as given by Eq. (1) – (2). To encode the initialdiscrete movement and subsequent periodic movement, theforcing term f is defined as a function of two phase variablesφ and r, i. e. f(φ, r). The evolution of the two-dimensionalphase variable (r, φ) is governed by the following canonicalsystem

φ = Ω, (10)r = Ωη (µα − rα) rβ . (11)

In contrast to [5], we added parameter Ω also to Eq. (11),enabling temporal scaling. Ω is defined as Ω = 2π/p, wherep is equal to the period of rhythmic movement in seconds.The parameters α, β, and η are positive constants with whichthe speed of convergence of r to µ can be adjusted. Weused η = 6, α = 1/6, β = 0.001 and µ = 1 as constants.To determine the initial phase (rinit, φinit) from where theintegration of (10) – (11) should be started, we first define

2756

the phase at which the discrete movement transitions intothe periodic movement. In our experiments this transitionwas defined to occur at the phase (µ1, 0). The initial phasewas then calculated by back-integrating Eq. (10) – (11) forthe duration Td of the discrete part of the movement. Thisduration can be obtained from the training data.

Note that both the transformation system (1) – (2) and thecanonical system (10) – (11) are only indirectly dependent ontime, which is essential to ensure many favourable propertiesof DMPs, e. g. scale and temporal invariance and easymodulation of control parameters.

Using the above phase, f can be defined as [5]

f(φ, r) =

∑Nj=1 vjψj(r, φ) +

∑Mi=1 wjξj(r, φ)∑N

j=1 ψj(r, φ) +∑Mi=1 ξj(r, φ)

, (12)

where ψj and ξj are the forcing terms specifying the discreteand periodic parts of movement, respectively:

ψj(r, φ) = b(r) exp (−lj(cos(φ− cj)− 1)) , (13)

ξj(r, φ) = a(r) exp

(−hj

∥∥∥∥[ r cos(φ)r sin(φ)

]− qj

∥∥∥∥2).

(14)

The centres of periodic forcing terms cj = (j − 1)2π/N +π/N are uniformly distributed with constant widths lj =1.25N, j = 1, . . . , N . The centers of discrete forcing termsare taken at qj = [rj cos(φj), rj sin(φj)]

T, where the phasecenters (rj , φj) are determined so that approximately thesame number of integration steps is needed to integrate (10)– (11) from (rj−1, φj−1) to (rj , φj), for all j. The firstcenter point (r1, φ1) is equal to the initial phase of themovement, (r1, φ1) = (rinit, φinit), while (rM , φM ) is equalto the phase where discrete movement transitions into theperiodic movement, (rM , φM ) = (µ1, 0). The widths of thekernels are determined as hj = 0.5/(‖qj+1 − qj‖2), j =1, . . . ,M − 1, hM = hM−1.

Kernel functions a and b implement the transition fromdiscrete to period movement. We defined them using atricube kernel

b(r) =

1, r < µ1(

1−(r − µ1µ2 − µ1

)3)3

, µ1 ≤ r ≤ µ2

0, r > µ2

(15)

a(r) =

0, r < µ1(

1−(µ2 − rµ2 − µ1

)3)3

, µ1 ≤ r ≤ µ2

1, r > µ2

(16)

Note that in [5] Gaussian kernels were used to define a andb. The advantage of our definition is that the above kernelfunctions really become equal to zero and not just tend tozero as Gaussian kernels do. In our experiments we usedµ1 = 1.2µ and µ2 = 1.4µ as constants.

A. Learning of Coupling Terms in DP-DMPs

In the case of DP-DMP, the learning needs to apply both tothe discrete and periodic part. However, ILC was originallydeveloped for discrete processes only. Repetitive control(RC) was proposed instead of ILC to modify periodic DMPs[16], but this method cannot be applied to the initial discretepart. However, as stated in [23], ILC and RC have distinctdifferences, but their essential features are nearly equivalent,and ILC has been applied to processes with periodic inputsas no-reset ILC [24]. We therefore apply no-rest ILC to adaptboth parts of the DP-DMP. An attempt to apply RC to theperiodic part and ILC to the discrete part of motion wouldresult in a discontinuity at the transition point from discreteto periodic movement because different learning algorithmsresult in different learning processes even if the underlyingcost function is the same. One attempt at a trajectory, i. e.one epoch, now consists of one instance of the discrete part,followed by ν periods of the periodic part.

A DP-DMPs can be coupled to an external signal as inEq. (3). We define u as follows

u(φ, r) =

∑Nj=1 vjψj(r, φ) +

∑Mi=1 wjξj(r, φ)∑N

j=1 ψj(r, φ) +∑Mi=1 ξj(r, φ)

. (17)

Thus u is defined using the same kernel functions (13 – 14)as the forcing term (12). It contains adjustable parameters[v, w]T, v = [v1, . . . , vN ], w = [w1, . . . , wM ], which aremapped to the control signal uj = [uj(1), . . . , uj(T )] duringthe execution. Similarly, mapping from the phase dependentcontrol signal u to [v, w]T is accomplished with regression[4]. Thus learning the feedforward signal u involves itsadaptation using formula (7), followed by regression tocalculate the parameters [v, w]T.

V. RESULTS

A. Experimental Setup

In our experiments we considered dual arm manipulation,where the trajectories of the arms were encoded as DP-DMPs in joint space. Our example task was to synchronizethe movement of both arms in 3D space so that the robotcould perform the pre-defined object manipulation motion.The trajectory of the right arm was predefined, while the jointtrajectories of the left arm had to adapt using our approachso that the two arms moved in synchrony, keeping the boxheld by both arms at constant relative position between thetwo arms. In the experiments we used a full-size humanoidrobot Sarcos CB-i [25]. The setup and the resulting motionafter learning is presented in the image sequence of Fig. 5.The accompanying video shows that the robot was able tohold the box between the two arms.

B. Dual Arm Manipulation

As described above, the joint trajectories of the arms aresynchronized to minimize the feedback received in the taskspace. The final goal was that the arms move with constantdistance for the complete discrete-periodic trajectory. Oneepoch of motion consists of a discrete part, followed by 4

2757

periods of the periodic part. The robot then returns to theoriginal position, which is also depicted in the plots.

Figure 1 shows the trajectories in z (up-down) directionof the task space. The other two task-directions were alreadysynchronized so that the robot could manipulate the object.We can see in the top plot the discrete-periodic nature ofthe motion, where four periods of periodic motion wereexecuted in every epoch. The top plot shows the trajectoryof the right arm in blue. The red and the green trajectoriesshow the trajectory of the left arm – the one that adapts itsmotion. Two trajectories are shown for different feedbackerrors. Results show a clear acceleration of adaptation whenusing (6) instead of (5) for iterative learning control of thecoupling term in (7). The bottom plot shows the relative errorof motion in z direction at the end of each epoch. Exponentialconvergence can be observed. We can see that the proposedextension makes adaptation considerably faster, convergingin roughly 4 epochs, as opposite to roughly 10 epochs withthe previous method. Acceleration is observable also in thevelocities, as shown in Fig 2.

C. Adaptation in Joint Space

The actual adaptation of trajectories was implemented inthe joint space, as proposed by (8). The adaptation of thejoint space trajectories for the same experiment is shownin Fig. 3. These joint trajectories correspond to task-spacetrajectories shown in Figs. 1 and 2. In Fig. 3 the blue lineshows the original trajectories of the first 4 joints of therobot’s left arm – the top three approximating the shoulderjoint and the last one the elbow. The green lines show theadapted trajectories of motion for the same joints. Wristjoints were not modulated.

It is important to note that since the adaptation of theDMP occurs in joint space, they in fact do not match fromthe right arm to the left arm, but the task-space motionis synchronized. This is clearly seen in Fig. 4, where the

0 20 40 60 80 100 120 140

0.1

0.2

t [s]

z[m

]

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

iteration

relative

error

oldmodified

Fig. 1. The top plot shows the location of the lead, right arm in z directionin blue, and the location of the left arm as it is adapting over time. Thegreen plot shows the modified approach from this paper using (6), while thered plot the original approach of using (5). We can clearly see acceleratedadaptation. The bottom plot shows the relative error of motion in z directionat the end of each epoch. Accelerated adaptation is clearly seen.

0 50 100 150−1

−0.5

0

0.5

1

t [s]

z[m

/s]

right new old

Fig. 2. Results of adaptation of velocity when using (6) instead of (5) forthe feedback error. Accelerated adaptation is shown with the new method.

trajectories of the left and the right arm joints are shown.Since we used 4 DOFs for the robot’s arm and only 3 areneeded to control the task space position trajectory, the robotarm is redundant for the task. Therefore the joint trajectoriesof the left arm are not symmetrical to the joint trajectoriesof the right arm, which is clearly visible when comparingshoulder abduction-adduction (SAA) and elbow joints (EB).

D. Modulation of Execution Frequency

We improved the original formulation of the DP-DMPcanonical system from [5] to allow for temporal scalingby modifying (11). Without this change DP-DMPs do notinclude one of the basic DMP properties – modulation ofexecution (playback) speed with only one parameter. Fig. 6shows the results of reducing the parameter Ω from 4π to2π. In the top plot we can see that when using the originalcanonical system the execution is not stable, given in red.The blue trajectory shows the behavior of the system with

0 10 20 30 40 50 60 700

0.2

0.4

LSFE

0 10 20 30 40 50 60 70−0.4

−0.2

0

LSAA

0 10 20 30 40 50 60 70−0.4

−0.2

0

LHR

0 10 20 30 40 50 60 701.2

1.4

1.6

1.8

2

LEB

t [s]

Fig. 3. Adaptation of the robotic arm in the joint space, where the modula-tion actually takes place. The blue lines show the original joint trajectoriesand the green lines the adapted trajectories for left shoulder flexion extension(LSFE), abduction-adduction (LSAA), left humerus rotation (LHR) and leftelbow (LEB).

2758

Fig. 5. Dual arm box manipulation. The movement consists of initial point-to-point movement (first three images), followed by periodic box shaking.

0 10 20 30 40 50 60 70

0

0.2

0.4

SFE

0 10 20 30 40 50 60 70−0.4

−0.2

0

SAA

0 10 20 30 40 50 60 70−0.4

−0.2

0

HR

0 10 20 30 40 50 60 701

1.5

2

EB

t [s]

Fig. 4. Trajectories of the left arm joints as they adapt, in green. The rightarm joint trajectories are depicted in blue in all four plots.

Ω = 4π. The bottom plot shows the behavior of the systemwhen using the modified canonical system. Note that theadaptation of the trajectories is not affected.

VI. DISCUSSION AND CONCLUSION

Before the onset of the periodic movement pattern, rhyth-mic movements are often started in a non-periodic way. Apractical example is walking, where the first step is differentfrom the following steps. In running, this condition is evenmore prominent because the initial motion acceleration mighttake a few steps. Treating the whole discrete-periodic process

0 10 20 30 40 50 60 700

0.2

0.4

0.6

0.8

z[m

]

orig. can. sysmod. can. sys

0 10 20 30 40 50 60 700

0.1

0.2

0.3

0.4

t [s]

z[m

]

R L

Fig. 6. Top: The results of changing the frequency of the execution tofrom Ω = 4π to Ω = 2π when using the original canonical system from[5], shown in red. The blue trajectory shows the behavior of the system atΩ = 4π. Bottom: Results of changing Ω = 4π to Ω = 2π when usingour proposed canonical system. We can also see that trajectory adaptationis not affected.

in a uniform system simplifies the structure of the controlsystem for such tasks.

As stated in [7], rhythmic and discrete movements arenot the same from the neural point of view. This differencecomes into play also when adapting trajectories as proposedin our paper. For example, the convergence of coupling termlearning by ILC is different for each part. Furthermore, theiterative nature of the algorithm has led to the notion ofrepetitive control and no-reset ILC [24].

In the paper we proposed a modified feedback within thecurrent-iteration iterative learning control framework, takingthe clue from the theory of feedback error learning [21],where position, velocity and acceleration error terms are usedin a simple controller to model the behavior of a system.As evident from our results, this allows for much fasteradaptation, opening up possibilities for practical implemen-tation of different tasks. Could the same be achieved by onlychanging the L and Q parameters of the ILC? While it is

2759

true that increasing these parameters will increase the adap-tation speed, one must take into consideration that highervalues will make the system less stable. For a thoroughstability analysis of ILC and DMPs see [14]. In our casethese parameters can be set lower to ensure stability. Evenwith significantly increased parameters, which resulted inapparent stability, i. e., the behavior initially seems stablebut eventually diverges after some iterations, we could notachieve as fast adaptation as with including the velocity term.On top of that, when including the velocity term as in (6),the behavior remained stable.

The developed approach maintains a small set of tuningparameters, as effectively k1, k2 in the feedback term andQ, L and C have to be specified. While the former aretask-specific, the latter follow a well established ILC theory[15], where decreasing the Q (from the value of 1) willincrease robustness but also the steady-state error. Therefore,L needs to be calculated in order to maintain the stability[15], while C is again a task-specific feedback gain. Theproposed implementation of the algorithm in joint spaceallows for maintaining the demonstrated joint space posture,which is especially practical for redundant robots such ashumanoids.

Our example task has demonstrated the applicability ofthe proposed approach in the real-world, but it only servesas a test for more demanding tasks. Lets imagine applyingthe approach to walking. The initial motion imitation from ahuman demonstration could result in the robot tipping over.Hence the movements need to be adapted very fast so thatthe robot does not break in the attempts. We have shownthat adaptation can take as little as 3 or 4 repetitions. Thebiggest advantage of our system is that it is designed toallow for modulation of trajectories based on some other,more complex feedback variable. An example of such ismodifying (8) or (9) to include a posture-stability criterion,which would update the trajectory of walking so that therobot would remain (more) stable. Extensions of the systemthat include such feedback errors remain a subject of ourfuture work.

REFERENCES

[1] R. Dillmann, “Teaching and learning of robot tasks via observation ofhuman performance,” Robotics and Autonomous Systems, vol. 47, no.2-3, pp. 109–116, 2004.

[2] D. Kulic, C. Ott, D. Lee, J. Ishikawa, and Y. Nakamura, “Incrementallearning of full body motion primitives and their sequencing throughhuman motion observation,” The International Journal of RoboticsResearch, vol. 31, no. 3, pp. 330 – 345, 2012.

[3] D. Kulic, D. Kragic, and V. Krüger, “Learning action primitives,” inVisual Analysis of Humans - Looking at People., 2011, pp. 333–353.

[4] A. Ijspeert, J. Nakanishi, P. Pastor, H. Hoffmann, and S. Schaal,“Dynamical movement primitives: Learning attractor models for motorbehaviors,” Neural Computation, vol. 25, no. 2, pp. 328–373, 2013.

[5] J. Ernesti, L. Righetti, M. Do, T. Asfour, and S. Schaal, “Encoding ofperiodic and their transient motions by a single dynamic movementprimitive,” in 2012 IEEE-RAS International Conference on HumanoidRobots (Humanoids), Osaka, Japan, 2012, pp. 57–64.

[6] A. d. Rugy and D. Sternad, “Interaction Between Discrete and Rhyth-mic Movements: Reaction Time and Phase of Discrete MovementInitiation During Oscillatory Movements,” Brain Research, vol. 994,no. 2, pp. 160–174, 2003.

[7] S. Schaal, D. Sternad, R. Osu, and M. Kawato, “Rhythmic MovementIs Not Discrete,” Nature Neuroscience, vol. 7, no. 10, pp. 1137–1144,2004.

[8] S. Degallier, L. Righetti, S. Gay, and A. Ijspeert, “Toward simplecontrol for complex, autonomous robotic applications: Combiningdiscrete and rhythmic motor primitives,” Autonomous Robots, vol. 31,no. 2-3, pp. 155–181, 2011.

[9] M. Ajallooeian, J. van den Kieboom, A. Mukovskiy, M. A. Giese, andA. J. Ijspeert, “A general family of morphed nonlinear phase oscillatorswith arbitrary limit cycle shape,” Physica D: Nonlinear Phenomena,vol. 263, pp. 41–56, 2013.

[10] P. Pastor, L. Righetti, M. Kalakrishnan, and S. Schaal, “Onlinemovement adaptation based on previous sensor experiences,” in 2011IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), San Francisco, CA, 2011, pp. 365–371.

[11] T. Kulvicius, M. Biehl, M. J. Aein, M. Tamosiunaite, and F. Wörgöt-ter, “Interaction learning for dynamic movement primitives used incooperative robotic tasks,” Robotics and Autonomous Systems, vol. 61,no. 12, pp. 1450–1459, 2013.

[12] J. Kober, D. Bagnell, and J. Peters, “Reinforcement learning inrobotics: A survey,” International Journal of Robotics Research,vol. 32, no. 11, pp. 1238–1274, 2013.

[13] F. Stulp and O. Sigaud, “Robot skill learning: From reinforcementlearning to evolution strategies,” Paladyn. Journal of BehavioralRobotics, vol. 4, no. 1, pp. 49–61, September 2013.

[14] A. Gams, B. Nemec, A. Ijspeert, and A. Ude, “Coupling movementprimitives: Interaction with the environment and bimanual tasks,”IEEE Transactions on Robotics, vol. 30, no. 4, pp. 816–830, 2014.

[15] D. Bristow, M. Tharayil, and A. Alleyne, “A survey of iterativelearning control,” IEEE Control Systems Magazine, vol. 26, no. 3,pp. 96–114, 2006.

[16] A. Gams, J. van den Kieboom, M. Vespignani, L. Guyot, A. Ude, andA. Ijspeert, “Rich periodic motor skills on humanoid robots: Ridingthe pedal racer,” in 2014 IEEE International Conference on Roboticsand Automation (ICRA), Hong Kong, 2014, pp. 2326–2332.

[17] N. Gopalan, M. Deisenroth, and J. Peters, “Feedback error learning forrhythmic motor primitives,” in 2013 IEEE International Conferenceon Robotics and Automation (ICRA), Karlsruhe, Germany, 2013, pp.1317–1322.

[18] H. Ben Amor, G. Neumann, S. Kamthe, O. Kroemer, and J. Peters,“Interaction primitives for human-robot cooperation tasks,” in 2014IEEE International Conference on Robotics and Automation (ICRA),Hong Kong, 2014, pp. 2831–2837.

[19] V. Koropouli, D. Lee, and S. Hirche, “Learning interaction controlpolicies by demonstration,” in 2011 IEEE/RSJ International Confer-ence on Intelligent Robots and Systems (IROS), 2011, pp. 344–349.

[20] F. J. Abu-Dakka, B. Nemec, J. A. Jørgensen, T. R. Savarimuthu,N. Krüger, and A. Ude, “Adaptation of manipulation skills in physicalcontact with the environment to reference force profiles,” AutonomousRobots, vol. 39, no. 2, pp. 199–217, 2015.

[21] M. Kawato, “Feedback-errror-learning neural network for supervisedmotor learning,” in Advanced Neural Computers, R. Eckmiller, Ed.North-Holland: Elsevier, 1990, pp. 365–372.

[22] A. Ude, B. Nemec, T. Petric, and J. Morimoto, “Orientation in Carte-sian space dynamic movement primitives,” in 2014 IEEE InternationalConference on Robotics and Automation (ICRA), Hong Kong, 2014,pp. 2997–3004.

[23] Y. Wang, F. Gao, and F. J. Doyle III, “Survey on iterative learningcontrol, repetitive control, and run-to-run control,” Journal of ProcessControl, vol. 19, no. 10, pp. 1589–1600, 2009.

[24] L. Sison and E. Chong, “No-reset iterative learning control,” in 35thIEEE Conference on Decision and Control, Kobe, Japan, 1996, pp.3062–3063.

[25] G. Cheng, S.-H. Hyon, J. Morimoto, A. Ude, J. G. Hale, G. Colvin,W. Scroggin, and S. C. Jacobsen, “CB: A humanoid research platformfor exploring neuroscience,” Advanced Robotics, vol. 21, no. 10, pp.1097–1114, 2007.

2760

Accelerating Synchronization of Movement Primitives: Dual-Arm … · 2020. 9. 25. · Accelerating...

Documents

Transcript of Accelerating Synchronization of Movement Primitives: Dual-Arm … · 2020. 9. 25. · Accelerating...