Real-time behaviour synthesis for dynamic...

Real-time behaviour synthesis for dynamic Hand-Manipulation

Vikash Kumar, Yuval Tassa, Tom Erez, Emanuel Todorov

Abstract— Dexterous hand manipulation is one of the mostcomplex types of biological movement, and has proven verydifficult to replicate in robots. The usual approaches to roboticcontrol – following pre-defined trajectories or planning onlinewith reduced models – are both inapplicable. Dexterous manip-ulation is so sensitive to small variations in contact force andobject location that it seems to require online planning withoutany simplifications. Here we demonstrate for the first timeonline planning (or model-predictive control) with a full physicsmodel of a humanoid hand, with 28 degrees of freedom and48 pneumatic actuators. We augment the actuation space withmotor synergies which speed up optimization without removingdexterity. Most of our results are in simulation, showing non-prehensile object manipulation as well as typing. In both casesthe input to the system is a high level task description, whileall details of the hand movement emerge online from fullyautomated numerical optimization.

I. INTRODUCTION

Dexterous manipulation is one of the most complex andexpressive classes of biological movement. Unlike otherdynamic movements such as locomotion – which have beendiscovered by evolution in many different forms – manip-ulation appears to be challenging even for biology. Indeeda large part of the primate brain is devoted to controllingthe hand and its complex musculo-tendon network. Manualdexterity is perhaps the most distinguishing sensorimotorcapability of humans. Therefore, if we aspire to integraterobots into our human-centered world, we must equip themwith manipulators and control strategies of similar dexterity.

Dexterous manipulation not only involves effective co-ordination of a high degree-of-freedom (dof) manipulator,but also requires stabilizing an object that only becomescontrollable through contacts. High dof, including mutuallycoupled dof, operate in a very compact space co-inhabited bythe object being manipulated. This results in a large numberof contacts and dynamic phenomena such as rolling, slidingand deformation. Movement of objects within the workspacecauses contacts to appear and disappear often, which greatlycomplicates motion planning.

In addition to high dof, a dexterous manipulator mustalso exhibit a well balanced combination of capabilities suchas speed, strength, low loop latency, and compliance. Highnumber of actuated dof in a small space makes it hard todesign such manipulators. Dexterous robotic hands exist [1][2] [3], but the difficulty in controlling them has motivatedmany researchers to focus on more limited mechanisms. Ourgoal instead is to develop general-purpose control solutions

The authors are with the Departments of Computer Science & Engineer-ing and Applied Mathematics, University of Washington, WA 98195, USA

E-mail: {vikash, tassa, etom, todorov }@cs.washington.edu

that can be applied to mechanisms as complex as a humanhand.

Most prior work on hand control has focused on achievingstable grasp. While a lot of progress has been made on thetheoretical side, practical applications are largely limited tousing task-specific manipulators in carefully controlled en-vironments. Part of the difficulty in deploying manipulationin less constrained environments comes from challenges insensing and state estimation – due to visual occlusions andlimited tactile sensing. But control is also challenging. Evenif we assume a perfect model, most existing methods forgrasp synthesis are too slow to operate in real-time, makingit impossible to adapt to an uncertain environment.

For applications of dexterous manipulation in uncon-strained human environments, we need the capability toplan in real time. This is because no pre-computed solutionor simplifying set of assumptions are likely to work forsuch a complex behavior, that is so exquisitely sensitive tosmall variations in contact force or object position. Planningneeds to be error tolerant and capable of quickly generatingcorrective maneuvers that stabilize the object in the faceof unexpected disturbances. This is particularly challengingin non-prenehsile manipulation, i.e. in the absence of astatically-stable grasp. Numerical optimization is the onlygeneral approach we are aware of that is at least in principlecapable of ”inventing” complex movements fully automati-cally, without intervention from a human. This is why ourapproach is centered on optimization.

The challenges of legged locomotion bare resemblance tothose of dexterous manipulation: in both cases, an effectivesolution must deal with the forces acting between the robot’sbody, over which it exerts full control, and the unactuateddynamics – the root joint for locomotion, and the objectin manipulation. After decades of research in legged loco-motion, we are now capable of producing careful walkingwith humanoid robots [4] [5]. However these developmentshave not yet had an impact on hand movement control, inpart because they rely on domain-specific abstractions suchas ZMP, capture points, inverted pendulum approximations,etc. In fact this may be why manipulation is so much harder:in manipulation, constructing a reduced model that capturesthe essence of the behavior seems impossible. Any suchmodel will clearly need to include the object and all the handsurfaces that can interact with it, along with the kinematiclimitations of the manipulator – so it is not really a reduction.

From a computational perspective, hand movement syn-thesis is challenging because high dof (both actuated andunactuated) results in high dimensionally of the search space.Inter-finger contacts and object-finger interactions produce

large number of contacts that impose discontinuities in thesearch space. The optimization landscape is further compli-cated by active involvement of dynamic phenomenon suchas rolling, sliding and large number of contacts that makeand break often.

In this paper we describe our ongoing work in real-timeplanning (or model-precitive control, MPC) of dexterousmanipulation. The most interesting results are obtained insimulation, we present these results in light of a hardwareplatform we are developing – a ShadowHand skeleton thatwe have equipped with faster and more compliant actuation[6]. This hardware platform called ”ADROIT” is described insection IV. MPC control of object manipulation in simulationis presented in section VI. We also introduce an elementwhich is often considered in hand manipulation but is novelto the MPC setting: namely hand synergies. Synergy-spaceplanning techniques are described in section VI-E.

Our movement synthesis for dynamic hand manipulationbuilds upon our previous work [7] [8] which was ableto synthesize full body movements. However, it was un-able to scale up for motion synthesis in case of dexter-ous hand manipulating owing to difference in the natureof the optimization manifold. After carefully investigatingthe challenges, we introduce synergy-space planning anddemonstrate for the first time online synthesis of dexterousobject manipulation. The input to the planner is a high-leveltask description: we only specify the desired position of theobject being manipulated, and the entire hand movement isthen synthesised automatically. Similarity, in case of typing,we specify the sequence of desired key presses and thebehavior emerges automatically. Instead of imposing thesespecification as hard constraints, we pose them as costs andmix them with other intuitive costs such as being gentle andmoving at a nominal speed.

II. RELATED WORK

Grasping has received more attention from the roboticscommunity than dexterous manipulation. Researchers haveapproached grasping primarily from two different directions.One approach focuses on improving the design of the manip-ulator so as to improve its capabilities in terms of producingstable grasps. The other approach exploits optimization basedtechniques to evaluate the stability of the grasp using graspmetrics [9]. Various grasp metrics has been proposed with noclear advantage of using one over the other. These methodsare less likely to generalize for dexterous manipulation butare based on important ideas. One such idea is force closurewhich can be treated as the ZMP equivalent in case ofgrasping. Another idea is position the manipulator in pre-grasp configuration and close the finger while relying on thecompliance of the joints to grasp the object. Graspit [10]is a tool for simulating and evaluating grasp for differentshapes which is quite popular in the community. Whilegrasping has always been the center of hand research, onecommon approach towards reactive manipulation is tele-operation using data gloves. In tele-operation the hard part ofplanning and decision making is left to the intelligent user

while the controllers blindly follows the joint trajectories.Contact invariant trajectory optimization [11] is an offlinemethod capable of synthesising hand manipulation but itsslow and trades physics for visually expressive manipulationbehavior. Such approaches are hard to generalizable and areless likely to scale in real applications. Also, see review paper[12] [13]

A. Motion capture techniquesThe graphics community has long utilized motion capture

system and dimensionality reduction techniques (see below)to synthesize movements for animated characters. While thefocus has always been on full body movements, motioncapture techniques have been used for hand movementsynthesis [14] [15]. These approaches have not been widelysuccessful in the robotics research owing to the real worldphysics constraints (that can be violated in animation), itsinability to generalize and computationally expensive postprocessing step. An up-to-date discussion on motion capturebased hand manipulation can be found in [16]

B. Dimensionality reduction and synergy based controlAlthough rich and diverse, animal forms exhibit charac-

teristic movements that can be attributed to its morphology,neural system and habitat. Researchers have ascribed these tobio-muscular and neural factors [17] [18]. Low dimensionalembedding has been found at the level of kinematics [19],instantaneous muscle activity [20], spatio-temporal muscleactivity [21] and feedback control law [22]. These lowdimensional embeddings have long been exploited in thegraphics community to reduce the dimensionality of searchspaces to synthesize full body movements. It has also beendemonstrated that such embeddings also exists in handmovements. Approximately 95% of the postural varianceassociated with hand grasping can be explained using fourprincipal components [23]. Such low dimensional embeddingand synergy spaces have been exploited by the roboticscommunity to accelerate the pace of grasping research [10].However, it has restricted the capabilities of present roboticdevices to simple grasps. Similar to biological systems,present day robots have many dofs. While synergies and lowdimensional spaces have helped us control and emulate someof the functionalities; they have restricted the behavioursto simple movements that conceal the expressiveness anddexterity of these robots.

III. ONLINE TRAJECTORY OPTIMIZATIONThe hand manipulation behaviors presented here are gener-

ated autonomously via numerical optimization. At any givenmoment, the controller has a plan that predicts the system’sstate over some finite horizon into the future. This planis constantly optimized in a receding-horizon fashion, andthe initial state is set according to the estimated currentstate. At every iteration, our trajectory optimization algorithmemploys a model of the system’s dynamics to solve a finite-horizon optimal control problem, an approach known asModel Predictive Control (MPC). Here we use a first-orderversion of the Differential Dynamic Programming algorithm

[24]. For brevity, here we present only an outline of thealgorithm; for more details on our optimization approach,see [7], [8].

A. Finite-Horizon Optimal ControlWe formulate the dynamics in discrete time; given the

state xi and control signal ui at time i, we use the modelxi+1 = f(xi,ui) to compute the next state: The optimizationis driven by a cost function `(xi,ui) that depends on theinstantaneous state and control. We also define a final costfunction `f (xN ) that depends only on the final state of thetrajectory.

Given an initial state x0 and a sequence of controlsU ≡ {u0,u1 . . . ,uN−1}, we compute the resulting statetrajectory by repeatedly applying the dynamics f for Ntimes. The solution of the optimal control problem is thecontrol sequence that minimizes the total cost along atrajectory for a given initial state x0:

U∗(x) ≡ argminU

N−1∑i=0

`(xi,ui) + `f (xN ).

B. Trajectory OptimizerOur optimization algorithm relies on the principle of

dynamic programming: the Value at time i indicates theminimum cost one can hope to obtain for the remainingN − i steps when acting optimally. The final value is equalto the final cost: V (x, N) = `N (x), and the rest is computedrecursively going backwards in time along the trajectory:

V (x, i) = minu

[`(x,u) + V (f(x,u), i+1)].

At every iteration, we compute a second-order approximationof the derivatives of V WRT xi, and derive from it a searchdirection ∆U. We then perform a line search, rolling outmultiple trajectories with control sequences U0 + α∆U fordifferent values of α, and select the best one.

Our algorithm is the control analog of the Gauss-Newtonmethod for nonlinear least-squares optimization: we usea first-order expansion of the dynamics and second-orderexpansion of the cost. For linear dynamics and quadraticcost we can compute the global optimum in a single iteration.However, the cost functions we use are not quadratic in statespace (sec. VII), and the hand dynamics is highly-nonlinear,mostly due to the effects of contact making and breaking aswell as due to the synergy-based actuation model. Therefore,converging to the optimal control U∗ requires multipleiterations of local approximation. However, the system is inmotion while the optimization is being computed. Thereforewe find it more computationally advantageous to update x0after every iteration than to find the optimal control sequencefor a stale initial state.

IV. ADROIT: MANIPULATION PLATFORM

ADROIT is a reconfigurable manipulation platform beingdeveloped for exploring and addressing challenges in dy-namic and dexterous manipulation. The physics model usedin our simulations is based on the ADROIT platform. Westarted our search for an ideal platform from a pneumatic

robot ShadowHand [1], well known for its dexterity andhuman-like morphology. We soon attributed its speed andcompliance bottlenecks to its actuation system consisting ofMcKibben muscles and low flow-rate binary valves.

In [6] we developed a general purpose universal pneumaticactuation for tendons driven systems. Our actuation systemallows us to move the ShadowHand skeleton faster thana human hand (70 msec limit-to-limit movement) with 30msec overall reflex latency. The system is almost frictionsless and so compliant that only 6 grams of external force atthe fingertip displaces it when the system is powered. Thiscombination of speed, force and compliance is a prerequisitefor dexterous manipulation, yet it has never before beenachieved with a tendon-driven system, let alone a systemwith 24 dof and 40 tendons.

Fig. 1. The ADROIT platform (ceiling mount)

ADROIT (figure 1) is a reconfigurable platform with 24dof hand mounted on a 4 dof arm. All the joints canbe actuated independently by its two exclusive opposingcylinders; with the exception of DIP and PIP joints of thefour fingers that are mutually coupled.

The skeleton is mostly dominated by the Shadowhandskeleton with modifications to accommodate our pneumaticactuation system. Other major modifications include, thereinforcement of base joint to support upside down mounting.A low level driver written in ’C’ talks to the system. Thesystem is capable of sampling all the tendons length andpressure sensors at 9000Hz. A PIC microprocessor embed-ded in the palm samples all the joint angle and reports dataat 500Hz via a CAN channel. The pneumatic valves can becommanded at 125Hz.

The entire platform is in a stage of constant evolutionwith the iterative improvements between (a) the hardwarerequirements, as demanded by the controller while synthe-sising dynamic manipulation behaviors in simulation and(b) manipulation capabilities, given the development in ourcontrol strategies and hardware limitations.

V. MODELLING

In order to evaluate the hardware and test our controlstrategies, the ADROIT simulator (figure 2) is developedusing the Mujoco Physics engine. Mujoco [25] is a newphysics engine that works with generalised co-ordinates andsupports a number of unique features including tendons

actuation. Let q denote the vector of joint angles. Mujocorepresents tendons L(q) as path via routing points (i.e. sites),such that it does not penetrate any geometric wrappingobjects (sphere and cylinders). At each time step Mujocoautomatically computes the tendon’s moment arms ∂L(q)

∂q

which links tendon velocities L to the joint velocities q andjoint torques τ to the scalar tension f applied on the tendonby the corresponding linear actuator.

The upper bound on tension f is determined by the typeof actuator connected to the tendons. The lower bound fslackneeds to be carefully maintained to avoid tendon slack.While the upper bound is strict, lower bound depends on thephysical properties of the actuator (like friction, damping,stiction of the actuator and corresponding joints) and thenature of task being performed.

**

COUPLER

EXTENSORFLEXOR

DISTAL

DIP-JOINT

MIDDLE

PIP-JOINT

PROXIMAL

MCP-JOINT

KNUCKLE

2625

45

Fig. 2. ADROIT Simulator visualizations (table mount) Left: A full CADmodel of our 28-DOF platform. Center: An equivalent simplified modelusing capsules to simplify contact detection. The kinematic tree is thesame in both models. Right: Tendon schematics representing DIP-PIP jointcoupling.

Mujoco’s tendon length constraints are exploited to modelPIP and DIP joint coupling. These joints are coupled using atendon network that constraints the DIP flexion to be greaterthat PIP flexion. Figure 2 illustrates how this coupling ismodelled in Mujoco using a soft tendon length constrain.

The simulator supports two different actuator mechanisms.First, a linear actuator that directly acts on a tendon toproduce tension. Second, third order pneumatic actuator thatuses pressure dynamics to produce tendon tension. [26]presents the modelling results of pressure dynamics of ourmuscle assembly (pneumatic cylinder, pipeline and valves).We present a generic parametric model of pressure dynamicsthat allow us to predict pressure upto 5 seconds in thefuture (given the current state and future voltage trajectories).The linear actuator provides a nice abstraction secludingpneumatics from the rest of the robot. Since our pres-sure dynamics are 3rd order, secluding pneumatics removespressure from the list of state variables, thus reducing thedimensionality of the state space. This reduction is extremelyhelpful while exploring hardware capabilities independent ofpneumatic actuation.

The dynamic manipulation results and the typing resultsmentioned in the paper were generated using linear actua-tors.

VI. BEHAVIOR SYNTHESIS

Our approach for synthesising dynamic hand manipulationbehaviors is to use MPC, also known as online trajectory-optimization or receding-horizon control. Each MPC itera-tion begins with updating the state of the system using thelatest estimates from the estimator. An estimator processesreadings from all the sensors and maintains a coherentestimate of the state at all times. A single iteration of atrajectory-optimization algorithm is applied starting from thisstate, and the initial part of the resulting policy is applied tothe system until the process can be repeated and the policyupdated again. For MPC to succeed, large updates to thepolicy must be made in each step so that the optimizercan “keep-up” with the changing state. This means that theregularization parameter µ, which slows the optimizer shouldbe small and carefully chosen.

A. StateADROIT platform has 28 dof and is actuated using 48

pneumatics cylinders. Depending on wether linear or 3rdorder pneumatic actuators are in use, our state space is either56 (q + q for 28 joints) or 104 (q + q for 28 joints + p for48 tendons) dimensional. Manipulation example adds another12 dimensions for the object state and typing on keyboardadds extra 24 dimensions for the keyboard state. WhileAdroit is equipped with effective sensing capabilities (joint,touch, pressure and tendon length sensors) which makes itsstate observable, additional sensing options are required forsensing the bottle and the keyboard. We use the PhaseSpace3D tracking system.

B. Trajectory optimizationWe have demonstrated earlier [7] [8] that iLQG can be

used under MPC setting to generate full body movements inreal time, while the very same approach failed in producingdynamic hand manipulation behaviors.

After careful investigation we attributed its inability tothe nature of optimization space to be navigated and tothe difference in the morphology of hardware and actuationmechanism other than direct torque control. Unlike fullbody movements, large number of dof exists in a verycompact workspace in hand manipulation, resulting in acompact high dimensional optimization landscape. High dofto workspace volume ratio results in large number of contactsthat make and break too often embedding discontinuities inthe already compact high dimensional space. High dexterityallows multiple solution to co-exists inviting optimizers tobe stuck in local minima.

In following section we will address these challenges oneby one and discuss ways to mitigate them.

C. Contact pruningIn MuJoCo, the most intensive part of computing a single

step is the handling of contacts. Unlike full body movementswhere the number of active contacts are small with lowvariance, manipulation with high dof manipulator involveslarge number of active contacts. Self contacts in case of fullbody movements are rarely active and few when active. For

dexterous hands like ours, dof to workspace volume ratio isvery high which results in most of the contacts being activeall the time. Number of contacts with the environment in theformer case is mostly small (feet-ground contacts). Howeverin the later case, object under manipulation interacts withalmost all the links of the hand resulting in large number ofactive contacts. Moreover these contacts make and break toooften producing huge dynamic non-linearities in the searchspace. Computation time is severely affected as numberof contacts (inter-finger and finger-object) increases. Highvariability in the number of contacts introduces variabilityin computation timings and hence policy lags.

For real-time behaviours synthesis, mesh collisions werenever an option. Even simple geometric collision modelsusing capsule resulted in too many active contacts (40 onaverage) for real time behaviours. To speed up the computa-tions, contact space was very carefully populated consideringkinematic constraints and joint couplings. Furthermore, na-ture of these contacts were carefully picked. Mujoco supports3 types of contacts: 1-D contacts, 3-D contacts, 6-D contacts.1-D contacts are only capable of producing normal forces.All inter-finger contacts are 1-D contacts. 3-D contacts arecapable of producing normal and surface friction. In additionto normal and surface friction, 6D contacts produces rollingand torsional friction. All object finger contacts are 3Dcontacts, except finger tip-object contacts which are 6D.

It should be noted that we never made compromises whilepruning the contact space. If violations were ever found, weimmediately add the respective contacts back.

D. Passive springs and armature inertiaLow-weight finger segments, strong and low friction ac-

tuators provide ADROIT its unique combination of speed,strength and compliance [6]. Such a combination is desirablefor any robot hardware but not-so-desirable for numericaloptimizers and naive controllers. Numerical optimizers enjoyaccelerating the light weight segments producing behavioursthat are either unsafe or look unnatural. Weak joint springswere added around the resting configuration of the hand (asshown in Figure 2) in the model that was available to theoptimizers for planning. These springs acted as a passiveattractors towards the resting configuration of the hand andprevented optimizers from choosing unnatural behaviours.Behavioural artifacts resulting from high acceleration weremitigated by adding armature inertia to the joints propor-tional to the link masses. It not only removed the artifactsbut also improved convergence thereby improving the qualityof the solution.

E. Synergy spacesAnimal life form exhibits complex and diverse set of

behaviours. These behaviours are produced by numerousbio-mechanical muscles that transmit force using skeletaltendons. Unlike the rest of the body, hands exhibit com-plicated tendon network that span across multiple jointsthat introduce substantial coupling between joints. For fullbody movements, its possible to have individual controlover each joints. Kinematics constraints and a regulariser

around default body posture have long been exploited tosynthesize movements close to animal life forms. However,these tricks were not enough for synthesis of human like handmovements. Optimization rarely produces desired behaviorsand if it does, it produces a solution that is too rich to beexhibited by human hands.

To make the optimization space more tractable for numer-ical optimizers, we introduced five hand synergy dimension[23] on top of individual control dimensions and made themslightly cheaper for the optimizer to choose them. Firstfour synergies individually influence the curl of the fourfingers and the fifth synergy influences the spread of thefour fingers. Addition of synergy dimensions increased thespace of control dimensions by five but made the problemmore tractable for the optimizer, without compromising thecontrol over individual joints (hence dexterity) of the hand.Synergies help the optimizer to quickly find a pre-graspingpose, after which individual joints dominate to produceexpressive behaviours.

This is for the first time that iLQG has been demonstratedto be amenable in synergy spaces. Traditionally synergyspaces have always been exploited as a mechanism for di-mensionality reduction to make a high dimensional problemtrackable. However, here we are adding synergy dimensionswithout removing existing dimensions, thus expanding thedimensionality of an already high dimensional problem.This is a little counterintuitive. We found that the optimizerexploits synergy spaces to quickly navigate through the spacefull of nonlinearities and discontinuities to localize itselfin the correct neighbourhood, where dimensions other thansynergies (i.e. individual joint actuation) dominate to produceexpressive behaviours.

VII. COST TERMS

Synthesis of behaviours involves specification of highlevel cost function that encodes the task objectives. Allbehaviours presented in the paper were generated usingvery simple high level cost functions. These cost functionare composed of few simple terms with intuitive meaningassociated with each term that talks about the task. Notethat no cost terms outlining the movement of the hand (orgrasp metric) are provided. We focused our efforts on genericbehaviors like dynamic manipulation and object relocationwhich constitutes significant portion of our daily activities.Once the framework was in place it was easily extendiblefor specific tasks like typing. In addition to the regularcontrol penalization, which was common for all behaviors,the following other cost terms were used:

Dynamic manipulation:Dynamic manipulation behaviours include dynamic catch-

ing of a falling objects, grasping and stabilizing of unstableobjects. Cost terms penalizes

• Control synergies. These bear slightly lower penaltythan the rest of the independent actuators.

• Velocity of the object being manipulated.• Distance of the object to its desired goal configuration

(position and orientation)

• The last cost term, with a small coefficient penalizes thedistance of the object from the center of hand’s graspenvelop.

Ideally, if the MPC horizon is long enough, we will not needthe last cost term. In practice, real-time constraints restrictsus from using very long horizon. In absence of this term,if the object is far away, the optimizer will not be able tofind an improvement within the given time horizon. In suchcases, this term acts as hint term for the hand to make aninitial approach towards the object. We used a static pointequidistant from - the palm base, finger’s (Index, middle, ringand little) middle segments and thumb’s distal segment as thecenter of the grasp envelop. Once the initial approach hasbeen achieved, the first three terms dominate. Note that weused no grasp-specific costs promoting force closure or othersupposedly desirable grasp properties. All the behaviors seenin the movie [goo.gl/WPTjcS]emerged from these simplecost terms.

Object relocation:

All costs for relocation were same as for dynamic ma-nipulation except object desired configuration cost. Desiredconfiguration was exposed as a parameter to the user. Userswere allowed to change this cost by dynamically changingdesired configuration in middle of the simulation, resultingin interactive grasping and relocation.

Object relocation starts as a dynamic stabilization se-quence, but with a pre-specified intermediate goal configu-ration, called the ‘hold configuration’. Any relaxed positionof the hand where it can stably hold the object away fromall the obstacles can be used as the hold configuration.The hand waits at the hold configuration for the user tospecify the final desired relocation configuration, after whichthe final relocation maneuver is attempted. If the desiredconfiguration has been preemptively specified by the user, thehand immediately exits the hold configuration and attemptsto position the object at the desired configuration. Once theuser is satisfied with the final configuration of the objecthe/she triggers the release, which switches off the grasp costresulting in hand gracefully leaving and moving away fromthe object once its stable.

Typing:

This behavior include interactively typing specific num-bers on a number pad. Keys on number pad are spring loadedto make the problem challenging. Key press is not detecteduntil the key is fully pressed. Typing results exploit keyboardheat maps to pre-allocate key-finger pair. One rational behindthis assumption is that although every individual has his/herown typing preferences, key-finger pairs are almost static.Downside of this approach is that if fingers get stuck inlocal minima, alternative options will never be explored bythe optimizer. Behaviors include cost on

• Desired key pressed• Desired key approach by the assigned finger’s tip.• Next key hover by the next assigned finger: This cost

encouraged the next assigned finger, to make the initial

approach towards its key if not being used thus speedingup the typing.

• Auto-correct. This terms kicks-in when accidental typ-ing mistakes are made. A back space is pressed beforemoving ahead.

Our approach towards typing is event driven. Once a keypress is detected, the typing sequences increments and theentire plan is recomputed for the next assignment. In otherwords, optimizer is not able to plan through the transitions.To allow the optimizer to plan through the transition, anadditional cost term ‘next key press’ is appended. Smoothstep-up Sup and step-down Sdn = (1 − Sup) functions areused as cost coefficients for ‘desired key press’ and ‘next keypress’ respectively. When key event is detected the steps areswitched which allows the cost to smoothly transition fromkey hover to key press.

Sup(t) =1

2(

(t− a1)√(t− a1)2 + b21

− (t− a2)√(t− a2)2 + b22

)

where t is time, a1 and a2 are the step locations, and b1 andb2 are the step smoothness.

VIII. RESULTS AND DISCUSSIONS

Fig. 4. ADROIT while typing the key-1 on the numeric keypad

We will now present the real time interactive behavioursynthesis results for ADROIT in general tasks like dynamicmanipulation, object relocation and specific tasks like typing.All the results mentioned in this paper are generated usinga Intel Xeon X5690 @3.47 Ghz processor with 12GB ofmemory running Windows7. Typical timings for differentparts of the computation can be found in table I. Optimizationparameters can be found in table II. To better appreciateour results we highly recommend watching the video at-tachment of our paper. Our latest results can be found at[goo.gl/WPTjcS].

A. Hand manipulation behaviorsCharacteristic behaviours in the dynamic hand manipu-

lation sequence involve catching falling objects, graspingobjects from different configurations and stabilization ofunstable objects. Figure 3 shows frames (150ms apart) oftwo agile recovery maneuvers.

Fig. 3. Two sequences from the accompanying movie. The time between consecutive frames is 150ms. Both sequences show an agile recovery manoeuver.The top sequence begins with a back-handed fumble of the object, which is then caught and brought into the desired position. In the second sequence theinitial grip on the object is not robust and the object begins to slip. The controller releases the slipping object and re-grips it in mid-flight.

Use of carefully picked contacts pairs of different naturehelped reduce the average number of active contacts from 40to 20 which provided a major boost in simulation timings.For dynamic grasping, we observed that our optimizers enjoyslightly smoother contacts over the hard ones. Large numberof frequently changing contacts introduce discontinuities inthe search space. Smoother contacts makes these disconti-nuities better behaved for optimisers to plan through them.Use of armature inertia to tame large accelerations workedmuch better than trying to force it using velocity cost. Thesuccess rate for dynamic manipulation was around 80%. Forunsuccessful cases, either the hand gets stuck in some localminima and never emerges from it or is unable to make agood grasp due to faulty initial approach.

B. Typing on a numeric keypadFigure 4 shows a snapshot from the typing sequence while

ADROIT is pressing the first key. Average number of activecontacts for typing stayed relatively low (5 on average,mostly inter finger). Our typing worked fairly well. Thoughrare, we did observe the fingers getting stuck in local minimafor some sequences. Due to preassigned key-finger pairs,optimizers don’t even explore alternative approaches to makeprogress, when stuck. Naive approach to get out of the localminima will be to momentarily pause typing and relax allfingers before moving ahead. We leave key-finger assignmentproblem as future work. Mistyping due to accidental contactof finger with the edge of the adjacent key is also observed.

IX. FUTURE DIRECTIONS

Our motivations and goals are to implement dynamic handmanipulation on our Adroit hardware platform. To the bestof our capabilities, utmost measures have being taken to notmake any assumptions in our simulation results that will notbe valid on the real hardware. We are in process of iteratively

TABLE IMPC TIMINGS (SEC) FOR INTEL XEON X5690 @3.47 GHZ, 12GB

MEMORY, WINDOWS7.

Timings Dynamic Manipulation TypingTotal 0.059 0.047Policy lag 0.050 0.038Rollout 0.006 0.004Derivatives 0.036 0.025Cost computations 0.001 0.001Backpass 0.010 0.010Line search 0.007 0.006

TABLE IIOPTIMIZATION PARAMETERS

Specifications Dynamic Manipulation TypingSimulation dt (sec) 0.005 0.001Optimizer dt (sec) 0.005 0.020Horizon (sec) 0.300 0.640Mindist(m) 0.010 0.001Contact softness(m) 0.005 0.001

adapting our simulation results to the capabilities of thehardware and work on the hardware bottlenecks identifiedfrom the simulation results. Strength of the shoulder jointwas one of the major hardware bottleneck that was identifiedand we are currently addressing it.

In future, we intend to carefully investigate application ofMPC and other control strategy on ADROIT. Initial testingof our behavior synthesis framework on the actual hardwareindicates sensitivity to modelling errors. We are performingcareful system identification to obtain a good model of thesystem. Such a model will never be perfect and we mighthave to explore options with better tolerance to modelingerrors.

ADROIT has the capability to emulate biological muscles.

We want to experiment with muscles models from the bio-mechanics literate to check if they have something to offerto the robotics community.

With respect to individual behaviors, our approach towardstyping is shortsighted as the optimizer only sees and plansfor present and the next key in the typing sequence, evenif the entire sequence is specified in advance. One option isto incorporate a phase variable (encoding the developmentmade by the trajectory) in the optimization and let theoptimizer choose the number of key presses and plan throughit. Piano playing is another problem with temporal objectivethat will require specific attention to key-finger assignmentproblem.

X. ACKNOWLEDGEMENT

This work was supported by the NSF and the NIH.

REFERENCES

[1] Shadow Robot Company, www.shadowrobot.com.[2] S. Haidacher, J. Butterfass, M. Fischer, M. Grebenstein, K. Joehl,

K. Kunze, M. Nickl, N. Seitz, and G. Hirzinger, “DLR hand ii:hard- and software architecture for information processing,” Roboticsand Automation, 2003. Proceedings. ICRA ’03. IEEE InternationalConference on, vol. 1, pp. 684–689 vol.1, Sept. 2003.

[3] A. Deshpande, Z. Xu, M. Weghe, L. Chang, B. Brown, D. Wilkinson,S. Bidic, and Y. Matsuoka, “Mechanisms of the anatomically correcttestbed (act) hand,” IEEE/ASME Trasactions on Mechatronics, 2011.

[4] C. Joel, M. Lau, G. Cheung, J. Kuffner, J. Hodgins, and T. Kanade,“Footstep planning for the honda asimo humanoid,” in IEEE Interna-tional Conference on Robotics and Automation, 2005.

[5] “Simulation and control of biped walking robots,” http://mediatum.ub.tum.de/doc/997204/997204.pdf.

[6] V. Kumar, Z. Xu, and E. Todorov, “Fast, strong and compliantpneumatic actuation for dexterous tendon-driven hands,” in IEEEInternational Conference on Robotics and Automation, 2013.

[7] Y. Tassa, T. Erez, and E. Todorov, “Synthesis and stabilization of com-plex behaviors through online trajectory optimization,” in Conferenceon Intelligent Robots and Systems, 2012, pp. 4906–4913.

[8] T. Erez, K. Lowrey, Y. Tassa, V. Kumar, S. Kolev, and E. Todorov, “Anintegrated system for real-time model predictive control of humanoidrobots,” in International Conference on Humanoid Robots, 2013.

[9] “Grasp quality measures,” http://personalrobotics.ri.cmu.edu/courses/papers/SuarezEtal06.pdf.

[10] A. T. Miller and P. K. Allen, “Graspit! a versatile simulator for roboticgrasping,” Robotics & Automation Magazine, IEEE, vol. 11, no. 4, pp.110–122, 2004.

[11] I. Mordatch, Z. Popovic, and E. Todorov, “Contact-invariant opti-mization for hand manipulation,” in Proceedings of the ACM SIG-GRAPH/Eurographics Symposium on Computer Animation, 2012.

[12] A. Bicchi and V. Kumar, “Robotic grasping and contact: A review,”in Robotics and Automation, 2000. Proceedings. ICRA’00. IEEEInternational Conference on, vol. 1. IEEE, 2000, pp. 348–353.

[13] A. M. Okamura, N. Smaby, and M. R. Cutkosky, “An overview of dex-terous manipulation,” in Robotics and Automation, 2000. Proceedings.ICRA’00. IEEE International Conference on, vol. 1. IEEE, 2000, pp.255–262.

[14] K. Yamane, J. J. Kuffner, and J. K. Hodgins, “Synthesizing animationsof human manipulation tasks,” in ACM Transactions on Graphics(TOG).

[15] A. Safonova and J. K. Hodgins, “Construction and optimal search ofinterpolated motion graphs,” in ACM Transactions on Graphics (TOG).

[16] Y. Ye and C. K. Liu, “Synthesis of detailed hand manipulations usingcontact sampling,” ACM Transactions on Graphics (TOG).

[17] J. J. Kutch and F. J. Valero-Cuevas, “Challenges and new approachesto proving the existence of muscle synergies of neural origin,” PLoScomputational biology, vol. 8, no. 5, p. e1002434, 2012.

[18] M. C. Tresch and A. Jarc, “The case for and against muscle synergies,”Current opinion in neurobiology, 2009.

[19] M. Santello, M. Flanders, and J. F. Soechting, “Postural hand synergiesfor tool use,” The Journal of Neuroscience, 1998.

[20] M. C. Tresch, P. Saltiel, and E. Bizzi, “The construction of movementby the spinal cord,” Nature neuroscience, 1999.

[21] A. d’Avella, P. Saltiel, and E. Bizzi, “Combinations of musclesynergies in the construction of a natural motor behavior,” Natureneuroscience, 2003.

[22] D. B. Lockhart and L. H. Ting, “Optimal sensorimotor transformationsfor balance,” Nature neuroscience, 2007.

[23] E. Todorov and Z. Ghahramani, “Analysis of the synergies underlyingcomplex hand manipulation,” in Engineering in Medicine and BiologySociety, 2004. IEMBS’04. 26th Annual International Conference of theIEEE. IEEE, 2004.

[24] D. H. Jacobson and D. Q. Mayne, Differential Dynamic Programming.Elsevier, 1970.

[25] E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine formodel-based control,” in Intelligent Robots and Systems (IROS), 2012IEEE/RSJ International Conference on. IEEE, 2012.

[26] T. Yuval, T. Wu, J. Movellan, and E. Todorov, “Modeling andidentification of pneumatic actuators,” in International Conference onMechatronics anf Automation (ICMA), 2013.

Real-time behaviour synthesis for dynamic...

Documents

Transcript of Real-time behaviour synthesis for dynamic...