Caixia Cai, Nikhil Somani, Alois Knoll · Orthogonal Image Features for Visual Servoing of a 6 DOF...

1

Orthogonal Image Features for Visual Servoing of a 6 DOFManipulator with Uncalibrated Stereo Cameras

Caixia Cai, Nikhil Somani, Alois Knoll

Abstract—We present an approach to control a 6 DOFmanipulator using an uncalibrated visual servoing (VS) approachthat addresses the challenges of choosing proper image featuresfor target objects and designing a VS controller to enhance thetracking performance. The main contribution of this article isthe definition of a new virtual visual space (image space). Anovel stereo camera model employing virtual orthogonal camerasis used to map 6D poses from Cartesian space to this virtualvisual space. Each component of the 6D pose vector definedin this virtual visual space is linearly independent, leading toa full-rank 6× 6 image Jacobian matrix which allows avoidingclassical problems, such as, image space singularities and localminima. Furthermore, the control for rotational and translationalmotion of robot are decoupled due to the diagonal imageJacobian. Finally, simulation results with an eye-to-hand roboticsystem confirm the improvement in controller stability andmotion performance with respect to conventional VS approaches.Experimental results on a 6 DOF industrial robot are providedto illustrate the effectiveness of the proposed method and thefeasibility of using this method in practical scenarios.

Index Terms—Orthogonal Image Features, Visual Servoing.

I. INTRODUCTION

Visual Servoing (VS) has been used in a wide spectrumof applications, from fruit picking to robotized surgery, andespecially in industrial fields for tasks such as assembling,packaging, drilling and painting [1], [2]. According to thefeatures used as feedback in minimizing the positioning error,visual servoing is classified into three categories [1]: Position-Based Visual Servoing (PBVS), Image-Based Visual Servoing(IBVS) and Hybrid Visual Servoing (HYVS).

In general, a PBVS system has a good 3D trajectory but issensitive to calibration errors. Compared to PBVS, IBVS isknown to be robust to camera model errors [3] and the imagefeature point trajectories are controlled to move approximatelyalong straight lines[4]. However, one of the main drawbacksof IBVS is that there may exist image singularities andimage local minima leading to IBVS failure. The selectionof visual features is a key point to solve the problem of imagesingularities. A great deal of effort has been dedicated todetermine decoupling visual features that deliver a triangularor diagonal Jacobian matrix [5], [6].

In IBVS, geometric features in the image such as points,segments or straight lines [1] are usually chosen as imagefeatures and used as the inputs of controllers. Several novelfeatures such as laser points [7], the polar signatures of anobject contour [8], and image moments [6], [9], [10] havebeen developed to track objects which do not have enoughdetectable geometric features and to enhance the robustness ofvisual servoing systems. The image interaction matrix (image

Authors Affiliation: Robotics and Embedded Systems Group, TechnischeUniversitat Munchen, Germany. Email: {caica,somani,knoll}@in.tum.de

Jacobian), can be computed using direct depth information[11], [12], by approximation via on-line estimation of depthof the features[13], [14], [15], [16], or using depth-independentimage Jacobian matrix [17], [18]. Additionally, many papersdirectly estimate on-line the complete image Jacobian in dif-ferent ways [19], [20], [21], [22]. However, all these methodsgenerally use redundant image point coordinates to define anon-square image Jacobian leading to well-known problemssuch as image singularities.

It is also possible to combine the advantages of 2D and3D visual servoing while avoiding their respective drawbacks.This approach is called 2-1/2D visual servoing because theused input is expressed partly in the 3D Cartesian space andpartly in the 2D image space [23]. Contrary to PBVS, the 2-1/2D approach does not need any geometric 3D model of theobject. In comparison to IBVS, the 2-1/2D approach ensuresthe convergence of the control law in the whole task space.

In this paper, we propose a new 2-1/2D visual servoing(coined 6DVS) which extracts new orthogonal image featuresand decouples the translational and the rotational control of arobot under visual feedback from fixed stereo cameras. Moreprecisely, instead of using the classical visual features, wedefine a new virtual visual space (image space), where a3D position vector is extracted as a feature. Each principalcomponent of the position vector is linearly independent andorthogonal. We compute the orientation through a rotationmatrix with Euler angles representation. We thus obtain adiagonal interaction matrix with very satisfactory decouplingproperties. It is interesting to note that this Jacobian matrixhas no singularity in the whole task space and the controls forthe position and orientation are independent. Simulations andexperimental results confirm that this new formulation (6DVS)is more efficient than existing classic VS approaches and theerrors in both the virtual visual space and Cartesian spaceconverge without local minima 1. Moreover, it is less sensitiveto image noise than classical 2-1/2D visual servoing.

In Section II we formulate the classical 2-1/2D VS ap-proach, highlight its shortcomings and state the core issues.In Section III we introduce a new camera model to constructa virtual visual space and define a visual pose vector whoseelements are chosen as image features. Using this 6D visualpose, a square full-rank image Jacobian is obtained, which isused in Section IV to simulate an adaptive 6D visual servoingcontroller and evaluate its properties. Section V presents tworeal-world experiments (Fig. 1) and shows the results obtainedin a dynamic environment. Finally, Section VI presents the

1Parts of this work have already been presented at IROS’14 [24]. In thispaper, quantitative validations of the approach and comparisons to classicalmethods in terms of steady state errors, transient systems performance androbustness to uncertainties have been added.

2

DistributedSystem

VisualStereoTracker

RobotLow Level

Control

3DVisualization

System

2 cameras [Uncalibrated Stereo System] {30ms}

Virtual

Camera 1Virtual

Camera 2

t or qd

Voltage

q,qq,q

b) Top Viewa) General View VirtualObstacle

RobotEF

Target

Stereo System

Virtual

Camera 1Virtual

Camera 2Virtual

Obstacle

TCP/IP

{4 ms}

Fig. 1. Description of robotic experimental setup with fixed stereo cameras.

conclusions of our work and directions for future work.

II. PROBLEM FORMULATION

A. The Problem of Classical Methods

Suppose that the robot end-effector is moving with an-gular velocity ω = [ωx,ωy,ωz]

T and translational velocityv = [vx,vy,vz]

T both w.r.t. the camera frame in a fixed camerasystem (eye-to-hand configuration, see Fig. 1). Let Pc be apoint rigidly attached to the end-effector with X = [x,y,z]T .

In classical 2-1/2-D visual servoing, as described by Malis etal.[23], the selected feature vector is h= [XT

h ,θUT ]T , where Xhis the position vector, θ and U are the rotation angle and axisof the rotation matrix R. The corresponding image Jacobian isan upper block triangular matrix given by

J =

[ 1z Jv Jvω

03 Jω

]h = J

[vω

](1)

The position vector Xh = [u,v, ln(z)]T is defined in extendedimage coordinates, where u,v are the image features, 2Ddata (pixel) and z is the depth of the considered point, 3Ddata (meter). Moreover, according to image Jacobian (1),the translational velocity of the point Pc is affected by bothposition and orientation errors.

u and v are two orthogonal principal axes in image coor-dinates. If we can find a third, normalized zs component forimage coordinates which is orthogonal to u,v and measuredin pixels, then all the points in the image coordinates can bedecomposed into 3 principal components in such a way thatall the elements of the position vector can be controlled in alinearly independent way.

B. Design of proposed VS Features

Motivated by the desire to find a zs component of theposition vector which is also in the image plane (pixel) anddecouples the control of the translational and the rotationalmotion, we define a new virtual visual space (image space),where a 3D pixel position Xs is extracted as a feature. Allelements of this 3D position vector are linearly independentand orthogonal to each other. We solve the orientation using arotation matrix with ZY X Euler angles representation, denotedby θ . Thus, the new feature vector is Ws = [xs,ys,zs,α,β ,γ]T

and the new mapping is given by

Ws =

[Xsθ

]=

Jimg︷︸︸︷[Jv 0303 Jω

][vω

](2)

where the new image Jacobian (Jimg ∈ R6×6) is a decouplingdiagonal matrix that decouples the translational and rotationalcontrol.

III. 6D ADAPTIVE VISUAL SERVOING

Consider the motion of a plane π attached to the end effectorof a robot that rotates and translates through space in order toobtain a desired position and orientation of the end-effector.We define four target points on π denoted by Pi, ∀i = 1,2,3,4.In this work, we investigate the translational and rotationalmotion of the end-effector of a robot under visual feedbackfrom a fixed stereo camera system. By assuming knowledge ofthe camera intrinsic parameters, we obtain the pixel translationmotion using triangulation on the center of the four pointswhile utilizing the rotational information of the end-effectorthrough the motion of four tracked points.

The image Jacobian Jimg has a decoupled structure, whichis divided into position image Jacobian (Jv ∈ R3×3) andorientation image Jacobian (Jω ∈ R3×3).

A. Image Jacobian for 3D Position Jv

Since Xs ∈R3×1 represents the position in the image featurespace, the maximum number of independent elements forposition is 3. Hence, in this work we construct a virtual visualspace using the information generated from the stereo visionsystem where 3 linearly independent elements can be extractedto get a full-rank image Jacobian (Jv).

Jv ∈ R3×3 describes the relationship between the velocitiesof 3D Cartesian position Xb (meters) and 3D visual positionXs (pixels). The key idea of this model is to combine thestereo camera model with a virtual composite camera modelto get a full-rank image Jacobian, see Fig. 2.

This new 3D visual model can be computed in two steps:• The standard stereo vision model [25] is used to analyt-

ically recover the 3D relative position (XCl ) of an objectwith respect to the reference frame of the stereo systemOCl .

• The Cartesian position XCl is projected into two virtualcameras Ov1 and Ov2 .

1) Stereo Vision Model: Defining the observed imagepoints in each camera as pl = [ul ,vl ]

T , pr = [ur,vr]T , we

can use triangulation [25] to compute the relative positionXCl = [xc,yc,zc]

T with respect to the left camera OCl . Then theposition XCl can be mapped to the world frame Xb through

XCl = RbCl

Xb + tbCl

(3)

where T bc = [Rb

Cl, tb

Cl] is the transformation matrix between

coordinate frame OCl and Ob.Before integrating the stereo cameras model with the virtual

composite model, a re-orientation of the coordinate frame OClto a new coordinate frame OV with the same origin is required.The projection position XV = [xV ,yV ,zV ]

T with (3) is (Fig. 2)

XV = RClV XCl = RCl

V (RbCl

Xb + tbCl) (4)

where RClV is the orientation of the reference frame2 OV with

respect to OCl .

2This reference frame is fixed to the left camera coordinate frame and isdefined by the user, therefore RCl

V is assumed to be known.

3

Ov2

Ov1

Ocl

Ocr

pl

pr

pV2

pV1

Ob

X [m]b

Object

OV X ( )Cl XV

VirtualCameras

StereoSystem

RCl

V

X =R XV ClCl

V

RbCl

Ov is located at the same position as Ocl

vz

v

21 22

Xs=(u1, u2, v2)

o o

11 12o o

v1

v 1

v1v1

Xv=(xv, yv, zv) meterpixel

(u1,v1)

(u2,v2)

yv

(a) Composite Camera Model(b)

xv

x

yz

v

x

zy

v

v

v2

2

2

2

Fig. 2. Image projections: (a) The figure depicts the different coordinate frames used to obtain a general 3D virtual visual space. Xb ∈ R3×1 is the positionin meters [m] of an Object with respect to the world coordinate frame (wcf) denoted by Ob. Moreover, OCl and OCr are the coordinate frames for the leftand right cameras, respectively. Rb

Cl∈ SO(3) represents the orientation of wcf with respect to the left camera. OV is a reference coordinate frame for the

virtual orthogonal cameras Ov1,2 where RClV ∈ SO(3) is its orientation with respect to OCl . λ is the distance from Ovi to OV along each optical axis i. The

vectors pl , pr ∈R2×1 are the projections of the point Xb in the left and right cameras. Finally, pvi ∈R2×1 represents the projection of the Object in the virtualcameras Ovi . (b) Placement of the composite camera model with respect to left camera.

2) Virtual Composite Camera Model: In order to computethe 3D virtual visual space, we define two virtual camerasattached to the stereo camera system using the coordinateframe OV (Fig. 2 (b)). We use the pinhole camera model [25]to project the relative position XV to each of the virtual camerasOv1 and Ov2 .

The model for the virtual camera 1 is given by

pv1 =

[uv1

vv1

]=

1−yV +λ

αR(φ)

[xV −o11zV −o12

]+

[cxcy

]. (5)

where φ is the rotation angle of the virtual camera along itsoptical axis, O1 = [o11,o12]

T is the projected position of theoptical center with respect to the coordinate frame OV , C1 =[cx,cy]

T is the position of the principal point in image plane, λ

is the distance from the virtual camera coordinate frame Ov1to the reference frame OV along its optical axis. α and therotation matrix R(φ) are defined as:

α =

[f β 00 f β

]R(φ) =

[cosφ −sinφ

sinφ cosφ

]. (6)

where f is the focal length of the lens used and β is themagnification factor of the camera.

Since this model represents a user-defined virtual camera,all its parameters 3 are known in the defined configuration ofthe virtual cameras φ = 04 (Fig. 2 (b)).

Similarly, the model for virtual camera 2 is defined as:

pv2 =

[uv2vv2

]=

1xV +λ

αR(φ)

[yV −o21zV −o22

]+

[cxcy

]. (7)

In order to construct the 3D virtual visual space Xs ∈R3×1,we combine both virtual camera models.

3Since the virtual cameras are user-defined, we can set the same intrinsicparameters and λ values for both cameras.

4The reason to introduce the auxiliary coordinate frame OV is to simplifythe composite camera model by rotating the coordinate frame OCl in a specificorientation such as φ = 0.

Using properties of the rotation matrix R(φ) and the factthat α is a diagonal matrix, from (5), uv1 can be written in theform

uv1 = γ1xV −o11

−yV +λ− γ2vv1 + γ3 (8)

where the constant parameters γ1, γ2, γ3 ∈ R are explicitlydefined as

γ1 =f β

cosφ, γ2 = tan(φ), and γ3 = cx + cyγ2. (9)

Based on (7) and (8), we define a visual camera model (Os)representation Xs = [xs,ys,zs]

T using the orthogonal elements[uv1 ,uv2 ,vv2 ]

T as

Xs =

uv1uv2vv2

=

Rα︷︸︸︷[γ1 01×2

02×1 αR(φ)

]xV−o11−yV+λyV−o21xV+λ

zV−o22xV+λ

+ρ (10)

where ρ = [γ3− γ2vv1 ,cx,cy]T . The pixel position Xs constructs

the 3D virtual visual space.Given that φ = 0, then γ1 = f β , γ2 = 0, γ3 = cx, implies

that ρ = [cx,cx,cy]T and Rα = diag( f β ) ∈ R3×3. Therefore,

the mapping in (10) can be simplified as

Xs = diag( f β )

xV−o11−yV+λyV−o21xV+λ

zV−o22xV+λ

+ cx

cxcy

. (11)

The velocity mapping can be obtained with the time derivativeof (11) as follows:

Xs = Rα JoXV = Jα XV (12)

where the Jacobian matrix Jo ∈ R3×3 is defined as

Jo =

1

−yV+λ

xV−o11(−yV+λ )2 0

− yV−o21(xV+λ )2

1xV+λ

0− zV−o22

(xV+λ )2 0 1xV+λ

. (13)

4

Taking the time derivative of (4), (12) can be rewritten as

Xs = Jα(RClV Rb

Cl)Xb = JvXb (14)

where we define Jv ∈ R3×3 as the position image Jacobian.Remark 1: Virtual Cameras. The two virtual cameras are

selected in such a way that their optical axes intersect at 90degrees. Since the cameras are virtual they have infinite field ofview and pixel positions Xs can be either negative or positive.

B. Image Jacobian for 3D Orientation Jω

Let θ = [α,β ,γ]T be a vector of ZY X Euler angles, whichdenotes a minimal representation for the orientation of theend-effector frame relative to the robot base frame. Then, thedefinition of the angular velocity ω is given by [26]

ω = T (θ)θ . (15)

If the rotation matrix Re f = Rz,γ Ry,β Rx,α is the Euler angletransformation, then

T (θ) =

cos(γ)cos(β ) −sin(γ) 0sin(γ)cos(β ) cos(γ) 0−sin(β ) 0 1

(16)

Singularities of the matrix T (θ) are called representationalsingularities. It can easily be shown that T (θ) is invertibleprovided cos(β ) 6= 0.

Therefore,θ = T−1(θ)ω = Jω ·ω. (17)

where Jω ∈R3×3 is defined as the orientation image Jacobian.Combining (14) and (17) we have the full expression

Ws =

[Xsθ

]=

[Jv 00 Jω

][vω

](18)

= Jimg ·V (19)

where the matrix Jimg ∈ R6×6 is defined as the new imageJacobian, which is a block diagonal Jacobian matrix.

C. Control Scheme

Substituting the robot differential kinematics V = J(q)q,equation (19) can be rewritten in the form

Ws = JimgJ(q) · q = Js · q (20)

where J(q) ∈ R6×6 is the Jacobian matrix of the robotmanipulator and the matrix Js ∈R6×6 is defined as the visualJacobian.

According to (20) the corresponding control law is

qr = Js−1Wsr (21)

where qr is the joint velocity nominal reference and is usedin an adaptive second order sliding mode controller, which isdescribed in detail in the paper [27].

Remark 2: Singularity-free Jimg.From (18), we can see that det(Jimg) = det(Jv)det(Jω). Hence,the set of singular configurations of Jimg is the union of theset of position configurations satisfying det(Jv) = 0 and theset of orientation configurations satisfying det(Jω) = 0.

From (14), we can see that J−1v = Rb

Cl

−1RClV−1

Jo−1R−1

α . Thematrices Rb

Cl,RCl

V ∈ SO(3) and Rα = diag( f β )∈R3×3 are non-singular. Then, det(Jo) = 0→ det(Jv) = 0. This condition ispresent only when: 1) O11 + λ = 0 and O21 − λ = 0 or 2)xV =−λ and yV = O21 or 3) yV = λ and xV = O11. However,O11, O21 and λ are all defined by the user. Hence, a non-singular Jv can be obtained by enforcing the condition O11 =O21,λ >max(xVmax ,yVmax), where xVmax and yVmax are delimitedby the robot workspace defined with respect to OV . Therefore,det(Jv) 6= 0 and can not become infinite.

Provided det(T (θ)) 6= 0, J−1ω always exists. Therefore, the

singularities of Js are defined only by the singularities of J(q).

IV. SIMULATION

We simulate a 6DOF industrial robot with real robot param-eters in closed loop with the control approach. Real cameraparameters are used to simulate the camera projections. Oursimulation platform is identical to the real experiments, exceptthat we simulate the 6D desired pose. The robot motions arevisualized in a 3D visualization system (Section V-A3).

Simulation tests have been carried out with four featurepoints, which give us a 16×6 interaction matrix in the classicalIBVS with stereo vision system and a 6× 6 Jacobian matrixin our algorithm (6DVS). In proposed method 6DVS, theimage features s are mapped to the virtual visual space toget Xs, which is used to design the error function for thecontrol scheme. For classical stereo IBVS, the image featuress are directly used to design the error function and the realz obtained from the stereo vision system is used to computethe interaction matrix. The 3D Cartesian position from thestereo vision system is used to perform PBVS. Classical 2-1/2D (2.5DVS) with Euler angle representation is also used inthe comparisons.

TABLE IINITIAL(I) AND DESIRED(D) LOCATION OF FEATURE POINTS IN IMAGE

PLANE (PIXEL) OF LEFT CAMERA

Point 1 Point 2 Point 3 Point 4(u v) (u v) (u v) (u v)

I (207 254) (194 212) (213 200) (225 238)Test 1 D (422 379) (406 307) (471 290) (486 360)I (207 254) (194 212) (213 200) (225 238)Test 2 D (422 379) (406 307) (471 290) (486 360)I (318 224) (304 191) (312 177) (325 208)Test 3 D (226 228) (200 184) (244 159) (269 203)I (207 254) (194 212) (213 200) (225 238)Test 4 D (291 444) (304 382) (367 376) (354 434)

The initial (I) and desired (D) configurations of the imagefeatures (s = [u1,v1,u2,v2,u3,v3,u4,v4]

T )5 in left camera foreach of the tests are shown in Table I.

1) Test 1: In this test, we examine the convergence of eachimage feature point and pose errors when the desired locationis far away from the initial one. The Cartesian and imagetrajectories of the proposed 6D visual servoing (6DVS) andthe conventional approaches (IBVS, PBVS and 2.5DVS) arecompared in Fig. 3.

5In simulation figures, only image features in the left camera are illustrated,since the results in the right camera are similar.

5

200 300 400 500

150

200

250

300

350

400

450

1

1

2 3

4

2 3

41

1

2 3

4

2 3

41

1

2 3

4

2 3

41

1

2 3

4

2 3

4

u (pixel)

v (

pix

el)

0 5 10 15 20 25 30−100

0

100

200

300

Time (sec)

∆u

an

d ∆

v (

pix

el)

(3) Feature errors in Image Plane

0 5 10 15 20 25 30−100

0

100

200

300

Time (sec)

∆u

an

d ∆

v (

pix

el)

0 5 10 15 20 25 30−100

0

100

200

300

Time (sec)

∆u

an

d ∆

v (

pix

el)

error of u1error of v1error of u2error of v2error of u3error of v3error of u4error of v4

0 5 10 15 20 25 30−1.5

−1

−0.5

0

0.5

Time (sec)

∆X

(m

) a

nd

∆θ (

rad

)

0 5 10 15 20 25 30−1.5

−1

−0.5

0

0.5

Time (sec)

∆X

(m

) a

nd

∆θ (

rad

)

0 5 10 15 20 25 30−1.5

−1

−0.5

0

0.5

Time (sec)

∆X

(m

) a

nd

∆θ (

rad

)

0 5 10 15 20 25 30−1.5

−1

−0.5

0

0.5

Time (sec)

∆X

(m

) a

nd

∆θ (

rad

)

∆x

∆y

∆z

∆α

∆β

∆γ

−0.6−0.4

−0.20

0.20.4 −0.8

−0.7

−0.6

−0.5

−0.40.25

0.3

0.35

0.4

0.45

0.5

y (m)

(1) 3D Position in Cartesian Space

x (m)

z (

m)

6DVS

2.5DVS

IBVS

PBVS

200 300 400 500

150

200

250

300

350

400

450

1

1

2 3

4

2 3

4

u (pixel)

v (

pix

el)

(2) Trajectories in Image Plane

200 300 400 500

150

200

250

300

350

400

450

1

1

2 3

4

2 3

4

u (pixel)

v (

pix

el)

200 300 400 500

150

200

250

300

350

400

450

1

1

2 3

4

2 3

4

u (pixel)

v (

pix

el)

Intial pose

Target pose

0 5 10 15 20 25 30−50

0

50

100

150

200

250

300

Time (sec)

∆u

an

d ∆

v (

pix

el)

PBVS

6DVS

IBVS

2.5DVS

2.5DVS

2.5DVS6DVS

6DVS IBVS

IBVS PBVS

PBVS

(4) Pose errors in Cartesian Space

Fig. 3. Simulation Test 1: Large translational and rotational motion. (1) 3D robot end-effector trajectory in Cartesian (Xb); (2) image plane trajectories of6DVS, 2.5DVS, IBVS and PBVS (s); (3) feature errors in the image plane, (4) robot end-effector pose errors in 3D Cartesian space.

As shown in Fig. 3 (1), PBVS results in a straight line end-effector Cartesian trajectory. Since there is no control of theimage features, the image trajectory, (as shown in Fig. 3 (2)),is unpredictable and may leave the camera field of view. InIBVS, straight-line image trajectory is observed while the end-effector Cartesian trajectory is not controlled. For 2.5DVS, thetrajectory of the reference point in the image is a straight lineand the Cartesian trajectory is also well behaved. However,other points in image plane have curved trajectories.

Contrary to the previous approaches, the proposed 6DVShas a straight-line Cartesian trajectory (similar to that ofPBVS) and all the image features are “indirectly” controlled tomove approximately along straight-line trajectories like IBVS,see Fig. 3 (1), (2). Moreover, both the features errors and theCartesian pose errors converge to zero very smoothly withoutany overshooting (Fig. 3 (3), (4)). Although 2.5DVS has asimilar trade-off between these properties, the proposed 6DVSis more efficient and has better performance than 2.5DVS.Hence, 6DVS combines the advantages of PBVS in termsof controlling straight trajectories in Cartesian Space, and theadvantages of IBVS in terms of controlling image trajectories.

2) Test 2: For this test we compare the robustness of theproposed 6DVS, the classical IBVS and PBVS to cameraerrors. These errors are formulated as:

• Camera intrinsic parameter errors f = 1.1 f .• Camera extrinsic parameter errors T b

c = 1.1T bc .

The results from this test show that despite the cameraerrors, the controllers for each of the evaluated approachesdon’t become unstable. The resulting Cartesian and imagetrajectories have some notable differences (see Fig. 4), butare still well behaved.

Due to camera errors, the end-effector Cartesian trajectoryof PBVS deviates from the original straight-line trajectory toa circular motion, and slight effects on the image trajectoriescan also be observed (see Fig. 4 (c)). Slight differences inthe end-effector Cartesian trajectory of IBVS are shown inFig. 4 (b). As expected, the image trajectories for IBVS arerobust to camera errors. Fig. 4 (a) illustrates that the effects onboth Cartesian and image trajectories in the proposed 6DVSare minor. Hence, 6DVS is as robust to camera calibrationerrors as IBVS.

3) Test 3: In this test we evaluate a common problem inclassical IBVS: local minima. By definition, local minima arecases where V = 0 and s 6= sd . So, a local minimum is reachedwhen the point velocity on the robot end-effector is zero whileits final position is far away from the desired position. At thatposition, the errors s− sd in image plane do not completelyvanish (residual error is approximately two pixels on each uand v coordinate). Introducing noise in the image measurementleads to the same results.

Reaching such a local minimum is illustrated in Fig. 5 (b)for IBVS. Each component of the feature errors e has aexponential convergence but is not exactly zero (s 6= sd) while

6

100 150 200 250 300 350 400

100

150

200

250

300

11

2

3

4

2 3

4

u (pixel)

v (

pix

el)


Robot intial pose

Target pose

0 5 10 15 20 25 30−120

−100

−80

−60

−40

−20

0

20

40

Time (sec)

∆u

an

d ∆

v (

pix

el)


error of u1

error of v1

error of u2

error of v2

error of u3

error of v3

error of u4

error of v40

0.20.4

0.60.8

−0.8

−0.6

−0.4

0.5

0.55

0.6

0.65

0.7

y (m)


x (m)

z (

m)

Robot Intial position

Target: Xd

End−effector: X

0 5 10 15 20 25 30−0.4

−0.3

−0.2

−0.1

0

0.1

Time (sec)

vX (

m/s

) a

nd

ωθ (

rad

/s)

(3) The Vecolity of the end−effector

vx

vy

vz

ωα

ωβ

ωγ

100 150 200 250 300 350 400

100

150

200

250

300

11

2

3

4

2 3

4

u (pixel)

v (

pix

el)


Robot intial pose

Target pose

0 20 40 60 80−150

−100

−50

0

50

Time (sec)

∆u

an

d ∆

v (

pix

el)


error of u1

error of v1

error of u2

error of v2

error of u3

error of v3

error of u4

error of v40 0.2 0.4 0.6 0.8

−0.8

−0.6

−0.4

0.5

0.55

0.6

0.65

0.7

y (m)


x (m)

z (

m)


Target: Xd

End−effector: X

0 20 40 60 80−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

Time (sec)

vX (

m/s

) a

nd

ωθ (

rad

/s)

(3) The Vecolity of the end−effector

vx

vy

vz

ωα

ωβ

ωγ

(b) IBVS

(a) 6DVS

Fig. 5. Simulation Test 3: Reaching (or not) local minima. Row (a) is reaching a global minimum using the 6DVS algorithm: (1) feature trajectories (s)in image plane, (2) feature errors e, (3) the evolution of the six components of the point velocity on the robot end-effector, (4) the trajectory of the robotend-effector in 3D Cartesian Space. Row (b) is reaching a local minima using the classical IBVS.

200 300 400 500

150

200

250

300

350

400

450

1

1

2 3

4

2 3

4

u (pixel)

v (

pix

el)


Robot intial pose

Target pose

−0.50

0.5 −0.8−0.6

−0.40.2

0.3

0.4

0.5

y (m)

(2) Robot Cartesian Trajectories

x (m)

z (

m)

new Trajectory

orignal Trajectory

200 300 400 500

150

200

250

300

350

400

450

1

1

2 3

4

2 3

4

u (pixel)

v (

pix

el)

−0.50

0.5 −0.8−0.6

−0.40.2

0.3

0.4

0.5

y (m)x (m)

z (

m)

200 300 400 500

150

200

250

300

350

400

450

1

1

2 3

4

2 3

4

u (pixel)

v (

pix

el)

−0.50

0.5 −0.8−0.6

−0.40.2

0.3

0.4

0.5

y (m)x (m)

z (

m)

(b)IBVS

(c)PBVS

(a)6DVS

Fig. 4. Simulation Test 2: Comparison of the robustness of the controllerswith effects of camera errors. Comparison of 6DVS, IBVS and PBVS in termsof (1) image feature trajectories and (2) 3D Cartesian trajectories.

the robot velocity is close to zero in Fig. 5 (b).(3). It is clearfrom Fig. 5 (b).(4) that the system has been attracted to a localminimum far away from the desired configuration.

In the proposed scheme (6DVS), the image Jacobian Jimghas full rank of 6, which implies there are no local minima.The global minimum is correctly reached from the same initialposition if the proposed Jimg is used in the control scheme

(Fig. 5 (a)). In this case, the trajectories in image plane arestraight and each component of the errors e has a exponentialconvergence to zero without local minima. Moreover, when thevelocity reaches zero, the errors in image plane and Cartesianspace are both close to zero (V → 0,∆s→ 0,∆Xb→ 0).

4) Test 4: Motion decoupling is compared between theproposed 6DVS and conventional 2.5DVS in this test. Bothapproaches receive the same desired Cartesian position andorientation. First the desired position is given and both meth-ods perform equally well. As shown in Fig. 6, the trajectoriesin both Cartesian space and the image plane are smooth for6DVS and 2.5DVS in the position task.

At around t = 30s, the desired orientation is modified. Inthe standard 2-1/2D method, a triangular interaction matrix isused for motion control, as defined in (1). Hence, the rotationalerror can affect the translation, as shown in Fig. 6 (b). Thevisual signals are coupled and both position and orientation inthe Cartesian space change when only the desired orientationis updated. For the proposed 6DVS, we introduce the virtualvisual space to decouple the control features and get a diagonalimage Jacobian. This decoupling of rotational and translationalmotions allows a better control design. Fig. 6 (a) demonstratesthe decoupled performance in the proposed method.

Simulation results of four different tests demonstrate thenovel properties and better performance of the proposed6DVS algorithm over conventional VS approaches. 6DVS hasa reliable straight 3D Cartesian trajectory like PBVS, andstraight feature trajectories like IBVS. Moreover, 6DVS allowsto avoid local minima (unlike IBVS) and is robust to cameracalibration errors. Contrary to classical 2-1/2D visual servoing,6DVS decouples the control of the translational and rotationalmotion.

V. EXPERIMENTS

Two experiments were performed to validate and evaluatethis work on a standard industrial manipulator in a realistic

7

0 10 20 30 40 50 60−1

0

1

2

3

4

Time (s)

X(m

) and θ

(rad)

x

y

zαβγ

0 10 20 30 40 50 60−1

−0.5

0

0.5

Time (s)

X(m

) and θ

(rad)

(2) Cartesian Pose errors

∆x

∆y

∆z

∆α∆β∆γ −0.5

0

0.5 −0.8

−0.6

−0.40.1

0.2

0.3

0.4

0.5

y (m)

(3) 3D Cartesian Trajectory

x (m)

z (

m)


Target position

150 200 250 300 350 400

150

200

250

300

350

400

450

500

1

1

23

4

23

4

u (pixel)

v (

pix

el)

(4) Features Trajectory

Robot intial pose

Target pose

0 10 20 30 40 50 60−1

0

1

2

3

4

Time (s)

X(m

) and θ

(rad)

x

y

zαβγ

0 10 20 30 40 50 60−1

−0.5

0

0.5

Time (s)

X(m

) and θ

(rad)

∆x

∆y

∆z

∆α∆β∆γ

−0.5

0

0.5 −0.8

−0.6

−0.40.1

0.2

0.3

0.4

0.5

y (m)x (m)

z (

m)

150 200 250 300 350 400

150

200

250

300

350

400

450

500

1

1

23

4

23

4

u (pixel)

v (

pix

el)

Robot intial pose

Target pose

Orientaion taskPosition task

(1) Desired Cartesian Pose

Position are modified

Position and Orientation are decoupled

reference point

Rotational motion affect the position(b)

2.5DVS

(a)6DVS

Fig. 6. Simulation Test 4: Decoupling analysis of position and orientation. Row (a) is the decoupled result for 6DVS: (1) the desired Cartesian pose, (2)Cartesian pose errors, (3) 3D Cartesian trajectory of the robot end-effector, (4) image feature trajectory of the reference point. Row (b) is coupled result usingthe classical 2.5DVS.

human-robot interaction scenario. In the first experiment, wecontrol the robot without environment constraints to betterillustrate the stability of the control scheme and the conver-gence of 6D visual trajectory error. The second experimentuses the 6DVS scheme for tracking a moving target in real-time. It is also an example of how the proposed algorithm isused in a practical human-robot interaction scenario. To thiseffect, several other features such as singularity avoidance,self-collision avoidance and obstacle detection and avoidanceare implemented to ensure safety of the robot and human.

A. System Overview

The experimental setup consists of 3 sub-systems:1) Visual Stereo Tracker: The stereo system is composed of

2 USB cameras fixed on a tripod, in a eye-to-hand configura-tion (Fig. 1). The stereo rig is uncalibrated with respect to therobot base frame and can be manually moved. The parametersof the virtual cameras (see Section III) are selected such thatJα is always non-singular. In order to compute torque (τ) andavoid a multiple-sampling system, an extended Kalman filter(EKF) is used to estimate the visual position (sampling period4ms), whereas the reference is updated each 30ms with thereal visual data of both cameras.

2) Robot Control System: The robot system comprises aStaubliTX90 industrial robot arm (6DOF), a CS8C control unitand a workstation running on GNU/Linux OS with real-timekernel (Fig. 1). The robot is controlled in torque mode usinga Low Level Interface (LLI) library.

3) 3D Visualization System: This module performsOpenGL based real-time rendering of the workspace in 3D,using the Robotics Library6. The system updates the configu-ration of the robot arm and the position of the target in realtime, which is achieved by means of TCP/IP communication.

B. Experiment 1: 6D Visual Tracking

2D image features are extracted from AR markers usinga stereo vision system. We use the ArUco library based on

6www.roboticslibrary.org

OpenCV to detect markers. Every marker provides 2D imagefeatures for 4 corner points. 3D position and rotation withrespect to the camera frame are obtained from the imagefeatures using the camera intrinsic parameters.

left

right

(a) Teaching Interface (b) Automatic Execution

Fig. 7. Snapshot of the 6D visual tracking: (a) The human uses the ARmarker to show the desired trajectories, (b) The robot executes a 6D visualtracking and shows identical linear and angular motions as taught.

This experiment consists of two phases: teaching and exe-cution.

1) Teaching Interface: We provide a teaching interface(Fig. 7 (a)), where the user is holding an AR marker, detectedby the stereo camera system to provide 2D image features. Ared square and a marker ID (cyan) in the image shows thedetection. In this task, the user moves the marker, creatingsome visual trajectories, e.g., two orthogonal straight lineson the table and two smooth curves on the surface of theGlobe. These trajectories include both translation and rotationmotions. During the movement, the 2D features for four cornerpoints of the marker are recorded and saved. At some points,when the marker is lost or can not be detected, the lastavailable detection is stored thereby, guaranteeing that thedesired pose can be reached and is safe for the robot execution.

2) Automatic Execution: After the teaching phase, therobot can automatically execute the recorded visual trajec-

8

300320

340360

380 −200

−100

0

10080

100

120

140

160

180

y (pixel)

(1) 3D Position in Virtual Visual Space

x (pixel)

z (

pix

el)

Target: Xsd

End−effector: Xs

−0.6−0.4

−0.20

0.2 −1

−0.5

0

0.4

0.5

0.6

0.7

0.8

y (m)


x (m)

z (

m)

Target: Xd

End−effector: X

0 20 40 60 80 100 120 140−250

−200

−150

−100

−50

0

50

Time (sec)

Eu

ler

An

gle

s (

de

gre

e)

(3) 3D Orientation

αd

αβ

d

βγd

γ

Fig. 8. Experiment results for 6D visual tracking: (1) 3D visual position (pixel) for robot end-effector and the desired one in virtual visual space, (2) 3Dposition (meter) in Cartesian Space,(3) Euler angles (degree) for robot end-effector and the desired target.

tories. Another AR marker with the same size is attachedto the robot end-effector. The current robot position (Ws) isobtained from the visual features tracked from this marker.Target inputs to visual servoing are the 2D image featureswhich were recorded in the teaching phase. From the recordedfeatures we extract our desired visual feature vector Wsd =[xsd ,ysd ,zsd ,αd ,βd ,γd ]

T , which is used to create the errorfunction (see Section III).

Visual servoing is accomplished by driving the error func-tion to zero. In our case, the error function is e=Ws−Wsd . Theposition components of e are from the new position vector Xsas (11) and the rotational components are obtained through 3Ddata. According to the properties of our image Jacobian andthe control scheme, when the errors in the virtual visual spaceconverge to zero, the errors in Cartesian space also convergeto zero without local minima. Therefore during execution, theAR marker on the robot end-effector shows identical linear andangular motions as instructed in the teaching phase (Fig. 7).This experiment illustrates how the visual servoing system cantrack a given desired trajectory.

Experimental results are depicted in Fig. 8. The plot (1)shows the 3D linear visual tracking in the virtual visual spacewhile the second plot (2) depicts the target trajectory trackingin Cartesian space. Plot (3) illustrates the rotational motiontracking. The red lines in plots (1) and (2) are the targettrajectories, which exhibit some noise and chattering due to theunsteady movement of the user. However, the blue lines whichshow the trajectories of the robot end-effector, are smooth andchatter free.

C. Experiment 2: 6D Uncalibrated VS in HRI Scenario

In this experiment, we integrate the proposed visual ser-voing system in a Human-Robot Interaction scenario (HRI).The robot manipulator follows, in real time, an AR markermanipulated by a user while avoiding environment constraintssuch as robot singularities and collision with itself or obstacles.The artificial potential field approach [28] is used to modelthese environment constraints. A coarse on-line estimation ofcamera parameters is computed using the real-time informa-tion generated by the robot (more details are shown in paper[24]).

Interaction results: This experiment demonstrates real timetracking of a moving target held by the user by the robot end-effector. Both translation and rotation motions are tracked in

this system (Fig. 9 (a)). The system proves to be stable andsafe for HRI scenarios, even in situations where the target islost (due to occlusions by the robot or the human), Fig. 9 (b).

To demonstrate stability, we test our system under severalenvironmental constraints. Fig. 9 (c) illustrates the results ofsingularity avoidance, where the robot does not reach thesingular condition (q3 = 0), even when the user tries to forceit. Fig. 9 (d) depicts the table avoidance where the motion ofthe robot is constrained in the zb− axis by the height of thetable (the end-effector is not allowed to go under the table)but it can still move in the xb and yb axes, and Fig. 9 (e)shows how the robot handles self-collisions. Fig. 9 (f) showsobstacle avoidance while continuing to track the target.

A video illustrating more details for all these experimentalresults can be seen at: http://youtu.be/zqmapL51g9I

VI. CONCLUSIONS

In this paper, we have investigated the control of transla-tional and rotational motion for the end-effector of a roboticmanipulator under visual feedback from fixed stereo cameras.We have proposed a new virtual visual space (measured inpixels) for visual servoing using an uncalibrated stereo visionsystem in combination with virtual orthogonal cameras. Usinga 6D visual pose vector defined in this virtual space, weobtain a new full-rank image Jacobian that can avoid the well-known problems such as image space singularities and localminima. Moreover, the rotational and translational motions ofrobot end-effector are decoupled due to the diagonal imageJacobian. According to simulation results, these new featuresperform better than classical ones since the system combinesthe advantages of 2D and 3D visual servoing. Furthermore,the proposed algorithm was integrated in a practical human-robot-interaction scenario with environmental and kinematicconstraints to generate a robot dynamic system with a trajec-tory free of collisions and singularities.

Future work includes further analysis of the limitations ofthe proposed scheme and more applications utilizing this newvirtual visual space. The choice of the controller was not amajor contribution of this work and further research couldbe done to evaluate the effects of using different controllerswith the proposed VS approach. Also, we plan to implementthis approach in more real world applications, by fusing thistechnique with different sensors, e.g. depth-cameras and forcesensors.

http://youtu.be/zqmapL51g9I

9

(a) (b) (c) (d) (e) (f)

Fig. 9. System behaviors: (a) Position and orientation tracking, (b) case when the target is lost, (c) case with singularity avoidance, (d) case with tablecollision avoidance, (e) case with self-collision avoidance and (f) obstacle avoidance.

REFERENCES

[1] S. Hutchinson, G. Hager, and P. Corke, “A tutorial on visual servocontrol,” IEEE Trans. Robot. Autom., vol. 12, no. 5, pp. 651–670, Oct.1996.

[2] J.-K. Oh, S. Lee, and C.-H. Lee, “Stereo vision based automation fora bin-picking solution,” Int. J. Control, Autom. Syst., vol. 10, no. 2, pp.362–373, 2012.

[3] B. Espiau, “Effect of camera calibration errors on visual servoing inrobotics,” in The 3rd Int. Symp. Experimental Robotics. London, UK:Springer-Verlag, 1993, pp. 182–192.

[4] F. Chaumette, “Potential problems of stability and convergence in image-based and position-based visual servoing,” in The Confluence of Visionand Control. LNCIS Series, No 237, Springer-Verlag, 1998, pp. 66–78.

[5] J. Wang and H. Cho, “Micropeg and hole alignment using imagemoments based visual servoing method,” IEEE Trans. Ind. Electron.,vol. 55, no. 3, pp. 1286–1294, Mar. 2008.

[6] F. Chaumette, “Image moments: a general and useful set of features forvisual servoing,” IEEE Trans. Robot., vol. 20, no. 4, pp. 713–723, Aug.2004.

[7] A. Krupa, J. Gangloff, M. de Mathelin, C. Doignon, G. Morel, L. Soler,J. Leroy, and J. Marescaux, “Autonomous retrieval and positioningof surgical instruments in robotized laparoscopic surgery using visualservoing and laser pointers,” in IEEE Int. Conf. Robot. Autom., vol. 4,May 2002, pp. 3769–3774.

[8] C. Collewet and F. Chaumette, “A contour approach for image-basedcontrol on objects with complex shape,” in IEEE/RSJ Int. Conf. Intell.Robots Syst., vol. 1, Nov. 2000, pp. 751–756.

[9] O. Tahri and F. Chaumette, “Point-based and region-based image mo-ments for visual servoing of planar objects,” IEEE Trans. Robot., vol. 21,no. 6, pp. 1116–1127, Dec. 2005.

[10] Y. Zhao, W.-F. Xie, and S. Liu, “Image-based visual servoing usingimproved image moments in 6-dof robot systems,” Int. J. Control,Autom. Syst., vol. 11, no. 3, pp. 586–596, 2013.

[11] J. Feddema, C. Lee, and O. Mitchell, “Model-based visual feedbackcontrol for a hand-eye coordinated robotic system,” Computer, vol. 25,no. 8, pp. 21–31, Aug. 1992.

[12] Y. Mezouar and F. Chaumette, “Optimal camera trajectory with image-based control.” Int. J. Robot. Res., vol. 22, no. 10, pp. 781–804, 2003.

[13] F. Chaumette and S. Hutchinson, “Visual servo control I: basic ap-proaches,” IEEE Robot. Autom. Mag., vol. 13, no. 4, pp. 82–90, Dec.2006.

[14] F. Janabi-Sharifi, L. Deng, and W. Wilson, “Comparison of basic visualservoing methods,” IEEE/ASME Trans. Mechatronics, vol. 16, no. 5, pp.967–983, Oct. 2011.

[15] N. Papanikolopoulos and P. Khosla, “Adaptive robotic visual tracking:theory and experiments,” IEEE Trans. Autom. Control, vol. 38, no. 3,pp. 429–445, Mar. 1993.

[16] E. Nematollahi and F. Janabi-Sharifi, “Generalizations to control lawsof image-based visual servoing,” Int. J. Optomechatronics, vol. 3, no. 3,pp. 167–186, 2009.

[17] D. Kim, A. Rizzi, G. Hager, and D. Zoditschek, “A robust convergentvisual servoing system,” in IEEE/RSJ Int. Conf. Intell. Robots Syst.,vol. 1, Aug. 1995, pp. 348–353.

[18] Y. Liu, H. Wang, C. Wang, and K. K. Lam, “Uncalibrated visual servoingof robots using a depth-independent interaction matrix,” IEEE Trans.Robot., vol. 22, no. 4, pp. 804–817, Aug. 2006.

[19] B. Yoshimi and P. Allen, “Alignment using an uncalibrated camerasystem,” IEEE Trans. Robot. Autom., vol. 11, no. 4, pp. 516–521, Aug.1995.

[20] K. Hosoda and M. Asada, “Versatile visual servoing without knowledgeof true jacobian,” in IEEE/RSJ Int. Conf. Intell. Robots Syst., vol. 1, Sep.1994, pp. 186–193.

[21] J. Piepmeier, G. McMurray, and H. Lipkin, “Uncalibrated dynamicvisual servoing,” IEEE Trans. Robot. Autom., vol. 20, no. 1, pp. 143–147, Feb. 2004.

[22] S. Azad, Farahmand, Amir-Massoud, and M. Jagersand, “Robust jaco-bian estimation for uncalibrated visual servoing,” in IEEE Int. Conf.Robot. Autom., May 2010, pp. 5564–5569.

[23] E. Malis, F. Chaumette, and S. Boudet, “2-1/2-D visual servoing,” IEEETrans. Robot. Autom., vol. 15, no. 2, pp. 238–250, Apr. 1999.

[24] C. Cai, E. Dean-Leon, N. Somani, and A. Knoll, “6d image-based visualservoing for robot manipulators with uncalibrated stereo cameras,” inIEEE/RSJ Int. Conf. Intell. Robots Syst., Sep. 2014.

[25] R. Hartley and A. Zisserman, Multiple View Geometry in ComputerVision (2. ed.). Cambridge University Press, 2004.

[26] M. W. Spong, S. Hutchinson, and M. Vidyasagar, Robot Dynamics andControl-Second Edition, 2004.

[27] C. Cai, E. Dean-Leon, D. Mendoza, N. Somani, and A. Knoll, “Un-calibrated 3d stereo image-based dynamic visual servoing for robotmanipulators,” in IEEE/RSJ Int. Conf. Intell. Robots Syst., Nov. 2013.

[28] C. Cai, N. Somani, S. Nair, D. Mendoza, and A. Knoll, “Uncalibratedstereo visual servoing for manipulators using virtual impedance control,”in Int. Conf. Control, Autom., Robot. Vision, Dec. 2014.

Caixia Cai, Nikhil Somani, Alois Knoll · Orthogonal Image Features for Visual Servoing of a 6 DOF...

Documents

Transcript of Caixia Cai, Nikhil Somani, Alois Knoll · Orthogonal Image Features for Visual Servoing of a 6 DOF...