Replication of Human Two-Fingered Grasping Motions in ... project/matthew_c… · from a...
Transcript of Replication of Human Two-Fingered Grasping Motions in ... project/matthew_c… · from a...
Replication of Human Two-Fingered Grasping Motions in RoboticManipulation of Household Objects
Andrew Vo, Stephen Moseson, and Matthew Cong
Abstract— In our project, we consider the problem ofreplicating human two-fingered grasping motions on theAX12 Smart Robotic Arm. Two-fingered grasping motionsare among the most common motions used in everyday life.Examples include picking up a plate from a dishwasher,moving a chess piece, and as we will demonstrate, openinga drawer. In fact, many more complicated motions, suchas holding a pen/pencil and writing involve a two-fingeredgrasping component. To accomplish this goal, we designeda sensor mount positioned behind the robotic arm thataccommodates a variety of sensors including a 720p high-definition webcam. Then, using SIFT features, we locate (inthe webcam image) the feature associated with the drawerthat we want to open and close defined by a point in the im-age. Using previous calibration and bilinear interpolation,we convert this coordinate into global Cartesian space. Thiscoordinate is subsequently fed into the analytical inversekinematics service which replies with the joint coordinates.Subsequently, these are given to the controller whichmoves the arm to a desired location. Then, using a seriesof movements, we can perform our task. In particular,this project takes advantage of ROS’s service/client andpublisher/subscriber model. These paradigms form thebasis for communication between different components ofthe project.
I. INTRODUCTION
Two-fingered grasping motions involving the thumband the index finger are some of the most common mo-tions used in everyday tasks such as picking up a platefrom a dishwasher, moving a chess piece, and opening adrawer. However, the two-fingered grasp usually plays afundamental part in more complicated motions such as
Authors are enrolled at Cornell University in the CS 4758 RobotLearning class taught by Professor Ashutosh Saxena in the Departmentof Computer Science, Cornell University, Ithaca, NY 14853, [email protected]
Andrew Vo is an Masters of Engineering student in Electrical andComputer Engineering [email protected]
Steven Moseson is a senior in Mechanical Engineering with a minorin Computer Science [email protected]
Matthew Cong is a junior in Computer Science and EngineeringPhysics [email protected]
Bilinear Interpolation and Camera Clicker code written by DanielCabrini-Hauagge, PhD. student in the Department of Computer Sci-ence at Cornell University [email protected]
Robot and software platform set-up and maintained by Jigar J.Shah, senior in the school of Electrical and Computer [email protected]
writing, using chopsticks, and even pitching a baseball.It is a easy to see that many of these motions applyto everyday household tasks which are often repetitiveand could, technical challenges aside, be successfullyperformed by a robot. However, the technical problemsthat present themselves in the design of such a robot areextremely varied and difficult.
Consider what occurs when you move a rook duringa game of chess when the only other pieces left on theboard are the opposing kings. Using information fromtwo eyes, the brain processes the images that are focusedon each retina and obtains the location of the rook. Thebrain subsequently, through experience gained duringchildhood and throughout life, instructs your arm tomove to that position. However, this position is reachedby changing the angles within each of our numerousjoints. The shoulder, elbow, wrist and the other moreminor joints must be at the correct location to ensure asuccessful grasp of the object.
In a robot, this first involves obtaining a Cartesiancoordinate in global space. This coordinate is in theanalogous robot problem, in an environment withoutobstacles where it has some knowledge about the coor-dinates of its surroundings. This is much more difficultbecause with a single camera, as in our setup, we canonly obtain a projection of the three-dimensional world.Thus, we must extrapolate based on known points inorder to obtain a three-dimensional global coordinatefrom the two-dimensional image. Obtaining the 2Dcoordinate on which the object is located is an additionalproblem. We must find the features that are unique tothe object in image and then obtain a representativepoint, usually the center. This is an extremely difficultproblem but fortunately, algorithms such as SIFT havebeen developed that provide solutions to the problemthat can be adapted for our needs. Then, we mustdetermine the joint angles for each joint in the robot thatwill move the end-effector to the desired position. Thisrequires solving the inverse kinematics problem whichunlike the forward kinematics problem, is non-linear andusually has multiple solutions depending on the robotand the different constrains. Subsequently, we must use
a controller to send the correct instruction bits to eachindividual servo in order to move to the desired point.
Thus, we can see that a simple everyday task for thehuman is extremely challenging for a robot to perform,even in a structured environment with no obstacles.This paper outlines the related work and describes ourapproach to the problem. This framework is tested inthe experiments described in Section IV. Lastly, wesummarize the results of the experiment and possibleextensions.
II. RELATED WORK
Grasping objects is a widely studied area in robotics.However the specific task of opening drawers has notbeen studied extensively. The paper ”Learning to GraspNovel Objects using Vision” [1] provides a good ap-proach to tackle the challenge of determining where tograsp unknown objects using only a webcam. In ourproject we also rely only on a webcam, however insteadof using machine learning to classify grasping points werely on the pre-known structure of the drawers and usethe SIFT algorithm to determine their location.
The control of our arm is also similar to the controlof the 5-DOF Katana arm used on the STAIR platform.They describe two classes of grasps that the arm cantake: ”downward and outward”. The downward grasp istaken when objects are close, and the outward is forobjects that are further away from the arm. We haveadopted this control with our arm and choose to alwaystake an outward grasp because the drawers must belocated away from the arm in order to open them oncethey are grasped.
III. APPROACH
A. Hardware
The AX-12 Smart Robotic Arm (Figure 1) with adual gripper has six degrees of freedom consisting ofa base swivel, shoulder pivot, elbow pivot, wrist swivel,and two independently controlled gripping fingers. Italso has an angled fixed joint connected to the baseswivel. The AX-12 servos have been a major issue inthis project because of torque limitations. If the arm is ina horizontal fully extended position the shoulder joint isunable to lift the arm up. This is a major problem whenworking with drawers because we need the arm in afully extended position in order to reach a drawer.
The only sensor we use is a 720p Microsoft webcampositioned above the arm looking down at a 45 degreeangle, as shown in Figure 2. This is attached to a custommount designed to hold it rigidly in place with a clearview of the workspace.
Fig. 1. AX 12+ Smart Robotic Arm with Dual Gripper
B. Controller
The low-level control is based upon a ROS pack-age developed by the University of Arizona RoboticsResearch Group. It provides low-level commands forcontrolling the joint servos on the AX-12 Smart RoboticArm. Additional work was done in modifying the con-troller architecture to accommodate smoother move-ments of the robotic arm.
Given a set of waypoints generated by the path plan-ner, the controller is responsible for smoothly movingthe arm between each waypoint. In order to generatesmooth movements, the joints will move simultaneouslyrather than sequentially. The two controller architecturesinvestigated are the joint-space controller and the re-solved motion-rate controller.
The joint-space controller uses feedback to controleach joint independently. As shown in the followingsystem diagram, the joint-space controller incorporatesmultiple independent controllers for each joint. In turn,each of these controllers can be implemented as a PDcontroller to drive the joint angle error to zero.
The resolved motion-rate controller relies on the in-verse of the kinematics model to drive the joint stateto the desired position. The Jacobian of the arm canbe derived from the forward kinematics equation. Thefollowing equation expresses how variations in the jointstate maps to variations in the end-effector state.
δx = J(θ)δθ (1)
Since the dimension of the joint space is greater thanthe dimension of the end-effector state, one can solvefor δθ by applying the pseudo-inverse.
δθ = J†δx (2)
Fig. 2. Webcam is located above and behind robot arm looking downat a 45 degree angle.
The resulting solution using the pseudo-inverse mini-mizes the least-squares error of the joint-state. A simplecontrol law can be derived based upon this result.
θ(new) = θ(old) + J†δx (3)
where δx = xd − x, the error between the desired end-effector position and the measured end-effector position.
C. Inverse Kinematics
In the AX12 Smart Robotic Arm, we have a cylin-drically rotating base with a two degree of freedom ma-nipulator. This arrangement actually allows the inversekinematics problem to be solved analytically. Consider aproblem similar to obtaining joint coordinates from theglobal Cartesian coordinates: obtaining joint coordinatesfrom the cylindrical coordinates in the global frame.Looking at the problem from this angle, we can seethat ρ, the radial length, is only determined by the 2R
Control
Control
Control
Control
Control Joint 1
Joint 2
Joint 3
Joint 4
Joint 5
!1
!2
!3
!4
!5
!1
!2
!3
!4
!5
xd !Inverse
Kinematics
Fig. 3. Joint-space controller system diagram
J+
Control
Control
Control
Control
Control Joint 1
Joint 2
Joint 3
Joint 4
Joint 5
!1
!2
!3
!4
!5
Forward
Kinematics
!1
!2
!3
!4
!5
xd
x
-!
Fig. 4. Resolved motion-rate controller system diagram
section of the arm through a project based on the fixed45◦ angle of the fixed joint. A similar projection allowsus to determine the z coordinate contribution from the2R section of the arm. In addition, the ratio of thex coordinate to the y coordinate is determined by φ,the azimuthal angle. This provides us with one of thejoint coordinates: the base angle θ1 = φ. The processfor obtaining the other two joint angles is much morecomplicated.
Thus, our strategy is as follows. Using the given x andy coordinate, we can easily determine ρ =
√x2 + y2.
Then, we can obtain φ = arctan yx . Then we can
subtract the contribution to the z coordinate from thebase section of the arm and the section of the armfrom the fixed joint to the shoulder joint. Thus, we areleft with the zadj coordinate which corresponds to thecontribution to the z coordinate from the 2R section ofthe arm. Then, we can obtain ρadj by subtracting thecontribution of the diagonal section of the base. Thiseffectively decouples our problem. All that is left todetermine is θ2, the angle of the shoulder joint, and θ3,the angle of the elbow joint.
To simplify the problem, we rotate the coordinatesystem by 45◦ counter-clockwise. This allows the equi-librium position of the arm to be on the horizontal axis
on the ρ′ and z′ two-dimensional space.Applying the law of cosines on the 2R section of the
arm (similar to the approach in [2] with different angleconventions), we obtain
cos θ3 =ρ′2z′2 − l21 − l22
2l1l2(4)
which gives us a closed form expression for θ3 wherel1 and l2 represent the length of the shoulder and elbowlinks respectively.. However, because of the inaccuracyof the arccos operation, we can employ the half-angleformula: tan2 θ2 = 1−cos θ
1+cos θ . Thus, we obtain that
θ3 = ±2 arctan
√−ρ′2 − z′2 + (l1 + l2)2
ρ′2 + z′2 − (l1 − l2)2(5)
where the plus-minus sign represents the “elbow-up”and “elbow-down” solutions. Then, by subtracting thecontribution of the elbow section of the arm, we canobtain
θ2 = arctanz′(l1 + l2 cos θ3)− ρ′l2 sin θ3ρ′(l1 + l2 cos θ3) + z′l2 sin θ3
(6)
which based on a flag, we can choose either the “elbow-up” and “elbow-down” solution to suit the task we areperforming.
One of the largest advantages to this approach isthe speed. We do not need an iterative solution whichtakes significantly more time to run. This allows us torecalculate joint angles frequently when needed. How-ever, this does not take into account the position of thegripper in the end-effector. We will treat this as a small-perturbation of the current inverse kinematics problemin the experiments described in Section IV due to thelimited surface area of the gripper arm.
D. SIFT features for Drawer Identification
The SIFT algorithm is used to locate the pixel locationof a given drawer in the webcam image. Each drawer hasa unique image taped on the front which can be locatedby matching SIFT features between the target imageand the camera image. The SIFT algorithm was chosenbecause it works under different lighting conditions andcamera angles.
The algorithm was implemented in ROS by using theSIFT binaries created by David Lowe. These were runby a C++ wrapper that first created the sift features forboth the image key and the camera output and thenchecked for matches between the images. If even onematch was found between the images the program woulduse the location of the match in each image along withthe dimensions of the key to determine the center ofthe key in the image, shown in Figure 5. It would then
Fig. 5. Image keys were compared with the camera image to findmatching features using the SIFT algorithm.
P0(i)
= (x0(i)
,y0(i)
) P1(i)
= (x1(i)
,y1(i)
)
P2(i)
= (x2(i)
,y2(i)
) P3(i)
= (x3(i)
,y3(i)
)
P0(c)
= (x0(c)
,y0(c)
,z0(c)
) P1(c)
= (x1(c)
,y1(c)
,z1(c)
)
P2(c)
= (x2(c)
,y2(c)
,z2(c)
) P3(c)
= (x3(c)
,y3(c)
,z3(c)
)
P(i)
= (x(i)
,y(i)
) P(c)
= (x(c)
,y(c)
,z(c)
)f
Fig. 6. Mapping from Image Space to Cartesian Space
publishes the pixel x and y coordinates so other ROSnodes could use the drawer coordinates.
The SIFT image keys used on the drawers were scenesin nature that were free of patterns or regular shapes.These types of images work well with sift because theygenerate a lot of unique features that can be matchedto the camera feed. The tolerance for matching wasset fairly low so that no false matches were identified,however this also meant that it was more difficult toensure a match in difficult lighting or an obscure angle.
E. Camera Calibration using Bilinear Interpolation
The camera needs to be calibrated in order to convertthe image pixel coordinates to Cartesian coordinatesin 3D space. Since all of the tasks we consider areperformed on a plane, bilinear interpolation provides agood solution. Consider the mapping of a point in theimage space to a point in the Cartesian space shown inthe following figure.
We assume the image points P(i)j and the corre-
sponding Cartesian points P (c)j both form a quadrilat-
eral. Using bilinear interpolation, any point within theboundary can be expressed as a linear combination of
these calibration points. We can express the coordinatesof the image point P (i) as a function of two constants0 ≤ s, t ≤ 1.
x(i) = (1− s)((1− t)x(i)0 + tx(i)2 )
+ s((1− t)x(i)1 + tx(i)3 ) (7)
y(i) = (1− s)((1− t)y(i)0 + ty(i)2 )
+ s((1− t)y(i)1 + ty(i)3 ) (8)
Solving for t yields
t =(1− s)(x(i)0 − x(i)) + s(x
(i)1 − x(i))
(1− s)(x(i)0 − x(i)2 ) + s(x
(i)1 − x
(i)3 )
(9)
t =(1− s)(y(i)0 − y(i)) + s(y
(i)1 − y(i))
(1− s)(y(i)0 − y(i)2 ) + s(y
(i)1 − y
(i)3 )
(10)
Eliminating t from these two equations yields aquadratic equation in s. The 2nd-order polynomial iswritten in Bernstein form for numerical stability whensolving the system equations.
A(1− s)2 + 2B(1− s)s+ Cs2 = 0 (11)
where the constants A,B,C are given by
A = (P(i)0 − P (i))× (P
(i)0 − P
(i)2 ) (12)
B =1
2((P
(i)0 − P (i))× (P
(i)1 − P
(i)3 )
+ (P(i)1 − P (i))× (P
(i)0 − P
(i)2 )) (13)
C = (P(i)1 − P (i))× (P
(i)1 − P
(i)3 ) (14)
where × denotes the 3D cross product (e.g. P0 × P1 =x0y1 − y0x1). The solution to the quadratic equation isgiven by
s =(A−B)±
√B2 −AC
A− 2B + C(15)
When A− 2B + C = 0, then the quadratic equation islinear and the solution is given by
s =A
A− C(16)
With s and t known, the coordinates of the desiredCartesian point P (c) are given by
x(c) = (1− s)((1− t)x(c)0 + tx(c)2 )
+ s((1− t)x(c)1 + tx(c)3 ) (17)
y(c) = (1− s)((1− t)y(c)0 + ty(c)2 )
+ s((1− t)y(c)1 + ty(c)3 ) (18)
z(c) = (1− s)((1− t)z(c)0 + tz(c)2 )
+ s((1− t)z(c)1 + tz(c)3 ) (19)
Fig. 7. Specifying the calibration points.
The user calibrates the camera by specifying the fourpoints in the image space as shown in the follow-ing figure. The user also provides the correspondingCartesian coordinates for each of these four points.The program then takes an image pixel coordinate asinput and produces a Cartesian coordinate. The Cartesiancoordinate can be passed to the inverse kinematics toproduce the joint angles to move the arm to the desireddrawer position.
F. Communication of Components
In ROS, there are two main paradigms for communi-cation between different nodes. Nodes are modular pro-cesses that perform computations: publisher/subscriberand service/client. In the publisher/subscriber paradigm,nodes send out a message by publishing it to a giventopic name. This topic has a name specified through astring which identifies the content of the message. Othernodes can subsequently subscribe to a topic and obtainthe message. This paradigm extends to many concurrentpublishers and subscribers to the same topic.
The service/client paradigm differs from the pub-lisher/subscriber paradigm in that involves a two wayrequest/reply communication as opposed to the one-waypublishing of a message. A node will offer a serviceunder a certain name and a client can send this servicea message an receive the appropriate reply.
The controller is encapsulated in a publish/subscribemodel. This fits with its purpose because it sends amessage to each joint in the robotic arm. These messagesdo not need to receive a reply and it is thus beneficialto use the publish/subscribe model because allows manyconcurrent subscribers which allow other nodes to listen
in on the arm’s current position. In addition, we canspecify the frequency and duration of the publishingof the command which is useful in planning out se-quential movements of the arm. In contrast, the inversekinematics calculations are used with a service/clientmodel. This model is appropriate because given certaininputs for the global Cartesian coordinates, we outputthe subsequent joint space coordinates. Thus, with theclient can send a request to the service with the globalcoordinates and the service will respond with the jointspace coordinates. This is important because we onlyneed to make the calculation when a new position isdetermined for the robot and this can be handled with asingle request instead of a series of published messages.
Then, using a separate program, we obtain the lo-cation of the object in the image using SIFT featuremapping. Once we have obtained the image coordinate,we continually publish it. Similarly, the central modulewhich unifies the components subscribes to the imagecoordinate. Using bilinear interpolation, it obtains the3D global coordinate. From this, a request is sent tothe inverse kinematics service and the correspondingjoint space coordinates are received. Then, we call thecontroller to publish the joint space coordinates of eachjoint to the servos which allows the robot to move tothe desired position as seen in the image.
IV. EXPERIMENTS
A. Control through Specification of Global Coordinates
Using the controller and inverse kinematics, we wereable to specify a global coordinate and subsequentlymove there. In addition, using the Camera Clicker code,we were able to click on a point in an image from thewebcam and the arm would subsequently move there.
In the first experiment, we tested the robustness ofthe inverse kinematics and the controller. Using a seriesof preprogrammed commands that specified the globalcoordinate to move to and whether the fingers of theAX12 Smart Robotic Arm should open or close, weaimed to pick up four bolts at different parts of the planeand drop them into a drawer. The robot arm performedwell on this test and was able to pick up each of the fourbolts and drop them into the box that was approximately2 cm by 4 cm from a height of 10 cm.
This demonstrates that the inverse kinematics and thecontroller were working correctly and had reasonableaccuracy in tasks. This is important because roboticproblems are generally phrased in global coordinates.In addition, the controller is extremely important sothat actions prescribed by the underlying algorithms andtransformations can actually be executed in real-life. In
Fig. 8. Before Execution of the Camera Clicker code
addition, many problems of picking up objects has thecombination of the inverse kinematics and controller asa fundamental component.
Using the camera clicker code, we were able to movethe arm to a certain position that was specified in animage as shown in Figure 8 and 9. Once we clicked ona certain number of points, these points on the 2D planewere published. Using bilinear interpolation, we subse-quently transformed these to the 3D global Cartesiancoordinates. This allowed us to use the controller andinverse kinematics service as described above to movethe arm to the desired position.
The success of this test is shown in the images.We can see that before and after, the arm has movedextremely close to the clicked position. This furtherdemonstrates the robustness of the controller and theinverse kinematics service.
Overall, this experiment provided us with the chanceto test basic components along with the bilinear interpo-lation code that will be extremely important in achievingour final goal of opening a drawer.
B. SIFT Feature Identification
The SIFT algorithm is supposed to be light and scaleinvariant, however under very different lighting or whenthe drawer is greatly angled the algorithm is not be ableto find the drawer image key (cropped out of the camerasource image for improved accuracy). It is difficult todetermine the bounds for exactly how angled or whatkind of lighting causes the algorithm to fail both becauseof the difficulty of quantifying changes in lighting, andbecause some keys worked better than others. When thesame lighting was used for both generating the key and
Fig. 9. After Execution of the Camera Clicker code
Fig. 10. Left: An uninformative key with low contrast and few details.Right: An ideal key with high contrast and lots of unique features.
finding the drawer and a good key was used the drawercould be angled up to around 37◦ and SIFT could stillfind the image accurately.
The SIFT images were tested for accuracy by check-ing for the same key in the same place multiple timesand comparing the results. It was found that the imagekeys that worked best were ones with high contrastand sharp detail, as shown in Figure 10. The numberof features generated for the keys varied from as littleas 11 to as many as 53. With good keys chosen amatch was almost always ensured, and from runningthe algorithm 55 times with five different good keys theaverage number of matches was 6.8 with no incorrectmatches found. As stated previously, the algorithm onlyneeds one match to correctly find the drawers pixellocation. This means the algorithm found the drawer100% of the time when using good keys. The pixelcoordinate precision was found to be to less than 3pixels, and most of the time the algorithm would outputthe exact same coordinates if the drawers were in thesame place.
C. Drawer Manipulations
Using the camera and SIFT, the drawers were iden-tified and the optimal grasping point was determinedheuristically. Although the container has to remain ata fixed known location, the drawers themselves canmove around within the container. Given a successful
TABLE ISUCCESS RATE FOR BOTH ALGORITHMS USING DIFFERENT
DEFLECTION ANGLES.
Algorithm Success Rate (0◦ ) Success Rate (15◦ ) Success Rate (30◦ )1 4/5 4/5 4/52 5/5 4/5 3/5
SIFT identification of the drawers, we were able tosuccessfully open the drawers.
We developed two drawer opening algorithms thatallowed us to successfully complete the task. Once thedrawer has been successfully identified using SIFT, thefirst algorithm moves the end-effector to a position thatis an inch normal to the drawer grasping point. Thefingers then open halfway and the end-effector movesan inch along the normal. At this point, the finger closesand the end-effector moves three inches back along thenormal to open the drawer. In order to smooth thismovement, an interpolated trajectory along the normalwas computed. The second algorithm interpolates asequence of waypoints so that the end-effector will slidein from the sides of the container. The remaining stepsare the same as in the first algorithm.
By moving the end-effector along the normal, wewere able to open drawers that were not parallel to thebase. We found the second algorithm to be less robustwhen the drawers were not parallel to the base. The end-effector would sometimes collide with the sides beforereaching the drawer. However, the second algorithm hada higher success rate of opening the drawers than the firstalgorithm when the drawers were parallel to the base.For each algorithm, we ran 5 trials for 3 different drawerdeflection angles (0◦ indicates that the drawer is parallelto the base).
V. CONCLUSIONS
In this project we have developed a hardware andsoftware platform for arm manipulation using the AX-12 Smart Robotic Arm. On the hardware side, wehave developed an adjustable mounting system for thesensors. On the software side, we have developed ROSnodes for the controller, inverse kinematics, and drawermanipulations. By using SIFT and heuristic algorithms,we were successfully able to open drawers at variousdeflection angles. Through 15 trials with 2 differentalgorithms, we have empirically demonstrated that therobotic arm can robustly perform the task.
Additional work on this project can include develop-ing mechanisms for opening the drawers in unstructuredenvironments. We would like to also incorporate visualservoing so we handle unexpected collisions between thearm and environment. A path planner would be needed
to implement the collision avoidance. We would also liketo expand the project to other manipulation tasks likeloading dishes and folding towels. We hope to addressthese challenges in future work.
VI. ACKNOWLEDGEMENTS
The authors would like to acknowledge ProfessorAshutosh Saxena for his mentorship and help on theproject. They would also like to thank Jonathan Di-amond, Yue Gao, Mark Verheggen, Daniel Cabrini-Hauagge, and Jigar J. Shah for their help in CS 4758and contributions to the project.
Additional components to the project include: Codecommits to the Personal Robotics Repository andvideos/pictures submitted through Professor Saxena.
REFERENCES
[1] Ashutosh Saxena, Justin Driemeyer, Justin Kearns, Chioma Os-ondu, and Andrew Y. Ng,“Learning to Grasp Novel ObjectsUsing Vision”, in 10th International Symposium on ExperimentalRobotics (ISER), Rio De Janeiro, Brazil, 1961.
[2] Rena N. Jazar, Theory of Applied Robotics: Kinematics, Dynam-ics, and Control. Springer, 1st Edition, 2007.