06746183

11
1896 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 19, NO. 6, DECEMBER 2014 Error Analysis for Visual Odometry on Indoor, Wheeled Mobile Robots With 3-D Sensors Joshua Fabian and Garrett M. Clayton Abstract—The objective of this paper is to improve the visual odometry performance through the analysis of the sensor noise and the propagation of an error through the entire visual odome- try system. The visual odometry algorithm is implemented on an indoor, wheeled mobile robot (WMR) constrained to planar mo- tion, and uses an integrated color-depth (RGB-D) camera, and a one-point (1-pt), 3 degree-of-freedom inverse kinematic solution, enabling a closed-form bound on the propagated error. There are three main contributions of this paper. First, feature location er- rors for the RGB-D camera are quantified. Second, these feature location errors are propagated through the entire visual odometry algorithm. Third, the visual odometry performance is improved by using the predicted error to weight individual 1-pt solutions. The error bounds and the improved visual odometry scheme are exper- imentally verified on a WMR. Using the error-weighting scheme, the proposed visual odometry algorithm achieves the performance of approximately 1.5% error, without the use of iterative, outlier- rejection tools. Index Terms—Mobile robots, robot vision systems. I. INTRODUCTION T HE objective of this paper is to improve the visual odom- etry performance through the analysis of the sensor noise and the propagation of error through the feature tracking process to the visual odometry output. This paper focuses specifically on visual odometry with 3-D sensor systems and leverages work by the authors on the use of integrated color and depth (RGB-D) cameras for visual odometry [1], [2]. This paper is motivated by the proliferation of vision-based sensing and control solutions for mobile robots across a broad range of platforms, from small, indoor robots [3], [4], to vehicle-sized, outdoor robots [5]–[7] to interplanetary rovers [8], [9]. In this paper, we focus on indoor, planar robots, and use the Microsoft Kinect RGB-D camera to generate the relative 3-D locations of visual features [10]. These 3-D features are tracked over time and a one-point (1-pt), 3 degree-of-freedom (DOF) inverse kinematic solution is used to calculate the visual odometry of the robot between frames. This enables a closed-form solution for the propagation of the camera error through the entire mechatronic system, from the Manuscript received February 12, 2013; revised June 6, 2013, August 2, 2013, and November 6, 2013; accepted January 14, 2014. Date of publication February 20, 2014; date of current version June 13, 2014. Recommended by Technical Editor S. Verma. The authors are with the Department of Mechanical Engineering and the Center for Nonlinear Dynamics and Control, Villanova University, Villanova, PA 19083 USA (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMECH.2014.2302910 RGB-D sensor to the visual odometry output. This study is gen- erally applicable to a range of indoor, planar platforms, includ- ing differential-drive and skid-steered wheeled mobile robots (WMRs). A. Visual Odometry In many visual odometry algorithms, visual features are tracked over successive video frames. The relative pose of the camera is determined from the motion of these tracked features. Many algorithms implement some form of the random sam- ple and consensus (RANSAC) outlier-rejection method [11] to augment the feature tracking process [12], [13], since these feature-based visual odometry algorithms are sensitive to incor- rect feature matches [14]. There are also algorithms in which prior probabilistic [15] or estimated egomotion [4] information is used to guide the search for feature matches, including previ- ous work by the authors [1], [2]. Once a set of matched features has been determined, the fi- nal step in visual odometry is to calculate the relative pose and orientation of the robot from the set of matched features. There are a significant number of solution approaches implemented in feature-based visual odometry algorithms. Most of these ap- proaches estimate the relative pose and orientation of the camera between frames through projective geometric analysis (i.e., ho- mography or epipolar geometry). The general, unconstrained, 6-DOF camera motion can be resolved using five individual fea- ture correspondences (5-pt solutions) [5], [16]. In many mobile robotics applications, the motion can be constrained in order to reduce the number of required features and, as a consequence, reduce the computational load [1], [2], [6], [13]. B. Three-Dimensional Vision Systems Vision-based robotics algorithms, like visual odometry, have been implemented with a wide range of 3-D sensor systems, in- cluding stereo and trinocular camera systems [4], [5]. Since their introduction, RGB-D sensors, like the Microsoft Kinect, have also been implemented in visual odometry and scene recon- struction applications. These devices combine a VGA-quality video camera and a depth sensor, along with several other pe- ripheral sensors, into an integrated low-power, low-cost pack- age [17], [18]. Current research in RGB-D cameras for visual odometry and localization and mapping has investigated the use of the depth sensor, without integrating information from the RGB camera [19]–[21]. There are also methods in which both cameras are used [3], [22]–[24]. In [3], [22], and [23], the color and depth images are fused to build dense, 3-D maps of the environment. In [24], 2-D visual features are converted to 3-D 1083-4435 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

description

af

Transcript of 06746183

Page 1: 06746183

1896 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 19, NO. 6, DECEMBER 2014

Error Analysis for Visual Odometry on Indoor,Wheeled Mobile Robots With 3-D Sensors

Joshua Fabian and Garrett M. Clayton

Abstract—The objective of this paper is to improve the visualodometry performance through the analysis of the sensor noiseand the propagation of an error through the entire visual odome-try system. The visual odometry algorithm is implemented on anindoor, wheeled mobile robot (WMR) constrained to planar mo-tion, and uses an integrated color-depth (RGB-D) camera, and aone-point (1-pt), 3 degree-of-freedom inverse kinematic solution,enabling a closed-form bound on the propagated error. There arethree main contributions of this paper. First, feature location er-rors for the RGB-D camera are quantified. Second, these featurelocation errors are propagated through the entire visual odometryalgorithm. Third, the visual odometry performance is improved byusing the predicted error to weight individual 1-pt solutions. Theerror bounds and the improved visual odometry scheme are exper-imentally verified on a WMR. Using the error-weighting scheme,the proposed visual odometry algorithm achieves the performanceof approximately 1.5% error, without the use of iterative, outlier-rejection tools.

Index Terms—Mobile robots, robot vision systems.

I. INTRODUCTION

THE objective of this paper is to improve the visual odom-etry performance through the analysis of the sensor noise

and the propagation of error through the feature tracking processto the visual odometry output. This paper focuses specificallyon visual odometry with 3-D sensor systems and leverages workby the authors on the use of integrated color and depth (RGB-D)cameras for visual odometry [1], [2]. This paper is motivated bythe proliferation of vision-based sensing and control solutionsfor mobile robots across a broad range of platforms, from small,indoor robots [3], [4], to vehicle-sized, outdoor robots [5]–[7] tointerplanetary rovers [8], [9]. In this paper, we focus on indoor,planar robots, and use the Microsoft Kinect RGB-D camerato generate the relative 3-D locations of visual features [10].These 3-D features are tracked over time and a one-point (1-pt),3 degree-of-freedom (DOF) inverse kinematic solution is usedto calculate the visual odometry of the robot between frames.This enables a closed-form solution for the propagation of thecamera error through the entire mechatronic system, from the

Manuscript received February 12, 2013; revised June 6, 2013, August 2,2013, and November 6, 2013; accepted January 14, 2014. Date of publicationFebruary 20, 2014; date of current version June 13, 2014. Recommended byTechnical Editor S. Verma.

The authors are with the Department of Mechanical Engineering and theCenter for Nonlinear Dynamics and Control, Villanova University, Villanova, PA19083 USA (e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMECH.2014.2302910

RGB-D sensor to the visual odometry output. This study is gen-erally applicable to a range of indoor, planar platforms, includ-ing differential-drive and skid-steered wheeled mobile robots(WMRs).

A. Visual Odometry

In many visual odometry algorithms, visual features aretracked over successive video frames. The relative pose of thecamera is determined from the motion of these tracked features.Many algorithms implement some form of the random sam-ple and consensus (RANSAC) outlier-rejection method [11] toaugment the feature tracking process [12], [13], since thesefeature-based visual odometry algorithms are sensitive to incor-rect feature matches [14]. There are also algorithms in whichprior probabilistic [15] or estimated egomotion [4] informationis used to guide the search for feature matches, including previ-ous work by the authors [1], [2].

Once a set of matched features has been determined, the fi-nal step in visual odometry is to calculate the relative pose andorientation of the robot from the set of matched features. Thereare a significant number of solution approaches implementedin feature-based visual odometry algorithms. Most of these ap-proaches estimate the relative pose and orientation of the camerabetween frames through projective geometric analysis (i.e., ho-mography or epipolar geometry). The general, unconstrained,6-DOF camera motion can be resolved using five individual fea-ture correspondences (5-pt solutions) [5], [16]. In many mobilerobotics applications, the motion can be constrained in order toreduce the number of required features and, as a consequence,reduce the computational load [1], [2], [6], [13].

B. Three-Dimensional Vision Systems

Vision-based robotics algorithms, like visual odometry, havebeen implemented with a wide range of 3-D sensor systems, in-cluding stereo and trinocular camera systems [4], [5]. Since theirintroduction, RGB-D sensors, like the Microsoft Kinect, havealso been implemented in visual odometry and scene recon-struction applications. These devices combine a VGA-qualityvideo camera and a depth sensor, along with several other pe-ripheral sensors, into an integrated low-power, low-cost pack-age [17], [18]. Current research in RGB-D cameras for visualodometry and localization and mapping has investigated the useof the depth sensor, without integrating information from theRGB camera [19]–[21]. There are also methods in which bothcameras are used [3], [22]–[24]. In [3], [22], and [23], the colorand depth images are fused to build dense, 3-D maps of theenvironment. In [24], 2-D visual features are converted to 3-D

1083-4435 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Page 2: 06746183

FABIAN AND CLAYTON: ERROR ANALYSIS FOR VISUAL ODOMETRY ON INDOOR, WHEELED MOBILE ROBOTS WITH 3-D SENSORS 1897

coordinates to reduce the number of points required to solvethe relative pose problem. Previous work by the authors hasused the 3-D information from the depth sensor and an inversekinematic approach to develop a 1-pt, 3-DOF visual odometrysolution [1], [2].

C. Error Propagation Through Visual Odometry

There has been significant work on error and uncertaintyanalysis in visual odometry and structure from motion algo-rithms. Much of this study has focused on the propagation ofcamera calibration uncertainties and feature localization errorsthrough the algorithms. Methods for propagating approximatelyadditive random perturbations through various computer visionalgorithms have been developed [25]. There has also been re-search done to obtain covariance matrices for structure frommotion estimators with uncalibrated cameras [26], and to relatethe error covariance of feature tracks to the error covarianceof the structure from the motion output [27]. Both of theseworks deal with the propagation of error and uncertainty once amatched set of correspondences is determined.

There has also been research investigating the effects of fea-ture location errors on visual odometry solutions. In [28], er-rors in camera intrinsic and extrinsic parameters are propagatedthrough a planar motion model for a WMR measuring visualodometry with a monocular camera directed at the ceiling. In [4],a methodology for propagating 2-D visual feature location un-certainty, which is estimated from triangulated 2-D features, into3-D feature location covariance is proposed. This 3-D feature lo-cation uncertainty is used to build and maintain a map of featurelocations. In [8], a method for using feature location uncertain-ties from stereo images to weight the relative pose solution bydeveloping an error covariance matrix for each feature is pro-posed. The covariance matrices for all feature correspondencesform the basis of a nonlinear cost function that is optimized inthe least-squares sense. Both [4] and [8] match features in the2-D image plane with gradient-based descriptors and assumethat outliers have been removed.

D. Contributions of This Work

This paper leverages previous work by the authors on the useof RGB-D cameras for visual odometry [1] in which visual fea-tures are tracked in 3-D based on spatial proximity. The positionand orientation of the robot are calculated using a 1-pt inversekinematic solution, rather than projective geometry. A benefit ofthis approach is that it enables a complete, closed-form visualodometry solution from a single feature correspondence. Bymatching features in 3-D and implementing a 3-DOF, 1-pt so-lution, this approach also enables a direct, closed-form solutionfor the error propagation of each individual feature correspon-dence, from the sensor, through the matching process, to thevisual odometry output. In contrast to other visual odometrymethods, the proposed method also quantifies the error fromboth correct and incorrect feature matches.

There are three primary contributions of this paper. First,accurate RGB-D sensor error propagation models are used tobound the error in measured 3-D feature location. Second, the

Fig. 1. Experimental differential drive WMR.

feature location error bounds are propagated through the featuretracking process to the visual odometry output. Due to the natureof 3-D feature matching, this analysis will also bound the visualodometry error introduced by incorrect feature matches. Third,this study develops a method for using the error analysis toimprove the overall visual odometry performance. All of theerror models and performance bounds developed in this paperare experimentally verified on a small WMR.

It is important to note that the sensor error models and theanalysis of error propagation through the feature matching pro-cess are directly applicable to any 3-D sensor system (e.g., stereovision, sonar, etc.) and 3-D-to-3-D feature matching method,respectively. The proposed analysis of the error propagationthrough the visual odometry solution is predicated on the use ofan inverse kinematic solution.

The remainder of this paper is organized as follows. Section IIdescribes the experimental testbed used for all evaluation. Theperformance of the RGB-D sensor is modeled and verified inSection III. Section IV summarizes the 3-D feature trackingalgorithm and the 1-pt visual odometry solution. Section Vpresents the error propagation analysis from the RGB-D sen-sor through the visual odometry output, as well as experimentalverification of the method. Section VI presents the method forusing the error analysis to improve the visual odometry per-formance along with experimental verification. Conclusions arepresented in Section VII.

II. MOBILE ROBOTICS TESTBED

The error analyses and visual odometry algorithms presentedin this paper are all experimentally verified, with the resultspresented and discussed in the relevant sections of this paper.The experimental testbed used for verification is comprised ofa small, WMR and an overhead motion capture system.

A. Wheeled Mobile Robot

The WMR, shown in Fig. 1, is a two-wheeled, differential-drive robot that operates autonomously. It uses a Pandaboardsingle-board computer running a Linux operating system forcomputation, and an Arduino microcontroller board for motorcontrol. The robot uses only the Kinect for sensor information.The control program, including the VU-Kinect interface to the

Page 3: 06746183

1898 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 19, NO. 6, DECEMBER 2014

Kinect, is built in the MathWorks Simulink environment [17].Through the Simulink Real-Time Workshop, the program iscompiled and built as a real-time executable for the Pandaboardtarget.

B. Overhead Camera Motion Capture System

An overhead motion capture system is used in all experimen-tal trials to measure the actual motion of the WMR. The perfor-mance of the visual odometry system is quantified by comparingthe actual motion with the visual odometry output. This motioncapture system is calibrated by measuring the 3-D locations ofa grid of points spanning the camera’s field of view. The pixellocations of the grid points are then measured in the cameraimage. Through linear interpolation, the 3-D world coordinatesof each individual pixel location in the image are generated. Thelocation and orientation of the WMR in the overhead camerafield of view are tracked via three markers affixed to the top ofthe WMR.

The three primary sources of error in the overhead motioncapture system are the quantization error, related to the resolu-tion of the camera, nonlinear lens distortion which affects theaccuracy of linear interpolation between measured grid points,and errors in identifying the centers of the markers on the WMR.Based on these factors, the overhead motion capture system hasa maximum error in measuring position of ±0.28 cm and amaximum error in measuring orientation of ±0.1 rad.

III. QUANTIFYING SENSOR PERFORMANCE

The visual odometry algorithm uses the SURF feature de-tector to extract features from the RGB image [29]. These 2-Dvisual features are converted into 3-D coordinates using infor-mation from the depth sensor. In order to analyze the propaga-tion of error through the visual odometry system, the noise anduncertainty in the RGB-D camera must first be characterized.The effect of this noise and uncertainty is the introduction of anerror in the 3-D location of visual features relative to the camera.As in [4], it is assumed that the x-, y-, and z-coordinates of agiven feature are independent and that the individual errors areuncorrelated. Therefore, the total error covariance in the 3-Dfeature location is reduced to Δxyz = [d2

x , d2y , d2

z ]T .

A. Three-Dimensional Feature Location Error

The primary sources of 3-D location errors are the inherentnoise in the depth camera, estimation errors in the RGB-D cam-era calibration process, and uncertainty in the feature locationmeasured by the SURF feature detector.

1) Error in the z-Direction: The inherent noise in the depthsensor manifests as a fluctuation in the raw depth value measuredby the sensor. This raw depth value is an integer value that isinversely proportional to the depth of an object [1]. The noise inthe depth camera is experimentally quantified by measuring thefluctuation of individual pixels while the camera is stationary.Over several hundred video frames in a representative indoorenvironment, pixels in the depth image fluctuated between a

maximum of four consecutive raw depth values (i.e., an averagefluctuation of ±1.5 pixel intensity values).

In addition to the inherent noise in the depth sensor, there isalso a nonlinear quantization error that is well modeled by

Qz = 1.4 × 10−5z2 (1)

where z is the depth of an object in centimeters [1].We are interested in quantifying the maximum error in the

depth measurements. Therefore, the quantization error of thesensor, Qz , and the pixel fluctuation noise are combined intoa single upper bound on the error in depth measurements, dz ,given by

dz = 4Qz = 5.6 × 10−5z2 . (2)

It is important to note that structured light depth sensors, likethe one in the Kinect, also experience difficulty with planarsurfaces parallel to and close to the optical axis (e.g., the floor,if the Kinect is close to the floor) and near the edges of objects,where the IR pattern may be obstructed. In these cases, depthvalues for affected pixels are generally not available. Thesetypes of anomalies are not considered in the present analysissince visual features without corresponding depth values areignored.

2) Errors in the x- and y-Directions: In order to determinethe bounds on variability in the x- and y-directions, the under-lying errors from the camera calibration process are propagatedthrough the governing projective geometric equations. From thepinhole camera model [30], the x- and y- locations of imagepoint are given by

[xy

]=

⎡⎢⎢⎣

z(u − uo)fx

z(v − vo)fy

⎤⎥⎥⎦ (3)

where u and v are the image coordinates of a feature in thehorizontal and vertical directions, uo and vo are the horizontaland vertical coordinates of the principal point, and fx and fy

are the focal lengths in the x- and y-directions, respectively.The error in a dependent variable, g = f(i, j, k), is modeled

by the first-order propagation approximation

d2g =

(∂f

∂i

)2d2

i +(∂f

∂j

)2d2

j +(∂f

∂k

)2d2

k (4)

where di denotes the error in variable i [31].The RGB-D camera is calibrated through an iterative esti-

mation process described in [32]. There are estimation errorsassociated with all of the intrinsic camera parameters. Theseerrors are designated as duo

, dvo, dfx

, and dfy. For the RGB-D

used in this paper, the principal points, uo ± duoand vo ± dvo

,are estimated as 324.2 ± 3.6 and 232.6 ± 4.5 pixels, respec-tively. The focal lengths, fx ± dfx

and fy ± dfy, are estimated

as 590.5 ± 4.9 and 582.8 ± 7.7 pixels, respectively. Note thatthe estimation ranges shown are three times the parameter stan-dard deviations, making them approximate upper bounds. Ad-ditionally, the variability in the location of the feature in theimage plane measured by the SURF detector is experimentallymeasured as du = dv = ±1 pixel.

Page 4: 06746183

FABIAN AND CLAYTON: ERROR ANALYSIS FOR VISUAL ODOMETRY ON INDOOR, WHEELED MOBILE ROBOTS WITH 3-D SENSORS 1899

By applying (4) into (3), the location errors in the x- andy-directions are given, respectively, by

dx =

√x2d2

z

z2 +x2dfx

2

f 2x

+z2(duo

2 + d2u )

f 2x

(5)

and

dy =

√y2d2

z

z2 +y2dfy

2

f 2y

+z2(dvo

2 + d2v )

f 2y

. (6)

These equations, along with (2), define bounds on the 3-D fea-ture location error.

B. Experimental Verification of 3-D FeatureLocation Uncertainties

An experiment was conducted to measure the actual uncer-tainties in 3-D feature location. The objective of this experimentis to verify that the 3-D location error bounds in (1), (5), and(6) are accurate. In this experiment, a stationary RGB-D sensorwas used to record the 3-D locations of features over time. TheSURF feature detector was used to extract features from theRGB video which were then converted to 3-D locations withthe depth sensor. These 3-D features were tracked over time inorder to calculate 3-D location statistics for comparison againstthe error approximations developed in (1), (5), and (6).

In order to establish expected bounds on the 3-D locationerrors, the camera calibration parameters and uncertainties aresubstituted into (5) and (6) yielding

dx =√

3.1 × 10−9z2x2 + 6.9 × 10−5x2 + 4.0 × 10−5z2

(7)and

dy =√

3.1 × 10−9z2y2 + 1.8 × 10−4y2 + 6.3 × 10−5z2 .(8)

Fig. 2 shows the measured maximum variability for the indi-vidual features that were tracked, as well as bounds on error(shown as circles in the top two figures and a dashed line inthe bottom figure). In these figures, the measured dx , dy , anddz values are calculated as the maximum deviations from theaverage 3-D feature locations. The results in Fig. 2 demonstratethat the predicted feature location error bounds are reasonable.

IV. THREE-DIMENSIONAL VISUAL ODOMETRY ALGORITHM

There are two primary components of the proposed visualodometry system that are relevant to the error propagation anal-ysis. First, a 1-pt visual odometry solution is implemented.Second, features are matched deterministically based on their3-D spatial proximity.

A. Assumptions and Constraints

There are several important assumptions and constraints in-volved in the visual odometry algorithm. First, the motion ofthe WMR is assumed to be planar, which is realistic for indoormobile robots and enables the 3-DOF motion model. Second,the motion of the WMR is assumed to be instantaneously cir-cular. This assumption is accurate for differential-drive robots

Fig. 2. Three-dimensional feature location uncertainties in the x-, y-, andz-directions. The predicted error bounds are calculated by (2), (7), and (8).

in flat, indoor environments, as well as a variety of skid-steeredrobots at typical video frame rates and enables the 1-pt inversekinematic solution.

B. 1-pt, 3-DOF, Inverse Kinematic Solution

The motion of the robot between subsequent frames is cal-culated from a single feature correspondence using an inversekinematic solution. The WMR is modeled as a 3-DOF systemwith coordinates θ1 , θ2 , and L, as shown in Fig. 3(a). Note thatthe angles θ1 and θ2 in the figure are positive in the counter-clockwise direction. Using the Denavit–Hartenberg (DH) con-vention [33], the relationship between locations of a feature inframes k and (k − 1) is given by

xk−1 =[

cos θ12 − sin θ12sin θ12 cos θ12

]xk + L

[cos θ1sin θ1

](9)

where xk = [xk , zk ]T , xk−1 = [xk−1 , zk−1 ]T , and θ12 = θ1 +θ2 is the change in heading of the robot between frames.

For observed feature locations, xk−1 and xk , (9) can besolved for L, θ1 , and θ12 by modeling the motion of the robotas instantaneously circular, as in [6]. This circular motion con-straint accurately models the actual motion of a differential-driveWMR since the wheel velocities remain constant between videoframes. The constraint is derived by noting that the angles, γ, inFig. 3(a) are identical for the circular motion. These angles canbe written in terms of the DH variables, θ1 and θ2 , and equated

Page 5: 06746183

1900 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 19, NO. 6, DECEMBER 2014

(a) (b) (c)

Fig. 3. Illustrations showing examples of (a) the actual motion of the WMR with the estimated feature location and the matching region, (b) the estimated motionof the WMR, and (c) a detailed illustration of the matching region. Note that, by convention, angles are positive in the counterclockwise direction. (a) ActualMotion with Estimated Feature Location and Matching Region (B). (b) Estimated Motion. (c) Matching Region Detail

as

γ = θ1 −π

2= θ2 +

π

2. (10)

Equation (10) is used to derive the circular motion constraint interms of the DH angles, θ1 and θ2 , by

θ2 = θ1 − π. (11)

By solving (9) and (11) simultaneously, the motion of the robotbetween frames can be calculated from a single feature matchby [

θ12L

]=

[2θ1 − π

cscθ1(zk−1 − sin θ12xk − cos θ12zk )

](12)

where

θ1 = arctan(

zk−1 + zk

xk−1 − xk

). (13)

Note that in cases where the WMR has no rotation betweenframes (i.e., θ1 = π), an alternate solution for the translation is

L = secθ1(xk−1 + sin θ12zk − cos θ12xk ). (14)

In this manner, the visual odometry of the WMR at each con-secutive frame is calculated individually for each feature cor-respondence. The cumulative 3-DOF orientation and positionof the WMR at time, k, in the world coordinate system, W , isdenoted as [ΘW

k ,XWk , ZW

k ]T , and is calculated by

⎡⎢⎣

ΘWk

XWk

ZWk

⎤⎥⎦ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

k∑i=1

θ12,i

XWk−1 − Lk sin

(ΘW

k−1 +θ12,k

2

)

ZWk−1 + Lk cos

(ΘW

k−1 +θ12,k

2

)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦

(15)

where θ12,k , Lk , and θ1,k are the visual odometry coordinatesat time k, calculated from (12) and (13), respectively.

It is important to note that with the 3-D information providedby the RGB-D camera, a 6-DOF inverse kinematic solutionis possible with two feature correspondences. For the presentapplication, the 1-pt solution was favored over the increased

dimensionality, given that the WMR is well-modeled as a 3DOFsystem.

C. Three-Dimensional Feature Tracking

In order to implement the 1-pt solution, feature correspon-dences between frames must be identified. The proposed fea-ture tracking algorithm matches features between frames basedon 3-D spatial proximity, rather than gradient-based feature de-scriptors such as SIFT [34] or SURF [29]. Many typical visualodometry methods use some form of iterative, outlier-rejectiontool, like RANSAC [11], because gradient-based descriptorsare prone to gross matching errors [5], [6], [35]. The typicalRANSAC process requires

N =log(1 − Ps)log(1 − Pn

i )(16)

iterations to determine the set of inliers, where Ps is the des-ignated probability of successfully finding the set inliers, Pi isthe probability that an individual data point is an inlier, and nis the number of data points required to instantiate the motionmodel [11]. It is important to note that the criteria for determin-ing when the process has successfully found all of the inliersis generally based on an estimated numbers of inliers, not aphysical quantity associated with the model.

In contrast, a key benefit of the proposed 3-D matchingmethod is that the errors in the matching process are boundedby a user-defined, physical threshold, and therefore, accuratevisual odometry is achieved without the use of iterative outlier-rejection tools. Therefore, there is a potential computationalsavings of N RANSAC iterations.

There are three steps in the 3-D feature tracking process.1) The 3-D location of a feature from the previous frame,

xk−1 , is projected into the current frame, k. This pro-jection is accomplished by estimating the motion of therobot between frames in order to estimate the expectedlocation of the feature in the current frame, xk , as shownin Fig. 3(b). The estimated location of a feature is derivedfrom the 3-DOF kinematic model by first solving (9) for

Page 6: 06746183

FABIAN AND CLAYTON: ERROR ANALYSIS FOR VISUAL ODOMETRY ON INDOOR, WHEELED MOBILE ROBOTS WITH 3-D SENSORS 1901

current frame feature location, xk ,

xk =[

cos θ12 sin θ12− sin θ12 cos θ12

] [xk−1 − L cos θ1zk−1 − L sin θ1

]. (17)

Then, the visual odometry coordinates L and θ12 are re-placed with the estimates L and θ, as shown in Fig. 3(b).Given the circular motion constraint (11) and the defi-nition θ12 = θ1 + θ2 , θ1 is estimated as θ1 = 1

2 (θ + π).Therefore, (17) is written as

xk =[

xk

zk

]

=[

cos θ sin θ− sin θ cos θ

]⎡⎢⎢⎣

xk−1 + L sinθ

2

zk−1 − L cosθ

2

⎤⎥⎥⎦ . (18)

In most mobile robotics applications, an estimate of therobot motion is available. This motion estimate may comefrom wheel odometry, as in [4], [6], and [9], or froman inertial measurement unit, as in [36]. In this paper,the estimate is taken from the trajectory planning systemwhere the desired trajectory serves as the estimate.

2) A spatial matching region, Bxk , i , is defined around eachestimated 3-D feature location in the current frame, asshown by the dotted line in Fig. 3(a). This spatial matchingregion is used to compensate for errors in the estimatedrobot motion between frames and the uncertainties in the3-D locations of features. The spatial matching region alsoinherently defines a bound on the error from incorrectmatches.

3) The actual features detected in the current frame, xk , arecompared to the estimated feature locations, xk , to findmatches. A match is declared if an actual feature in thecurrent frame is within the spatial matching region of aprevious frame feature. If multiple features are within theregion, the current frame feature, xk,j , closest to the pro-jected feature, xk,i , is declared as the match. In the casethat no matches are found in successive frames, the motionestimate [L, θ] can be used in place of visual odometry. Inaddition to the set of matched features, there are also typ-ical features that are not matched for a variety of reasons,including features moving out of the field of view of thecamera, sensor noise, and inconsistency of the feature de-tector. These features are not used in the visual odometrycalculation.

1) Design of Spatial Matching Region: The spatial matchingregion,Bx , is designed to compensate for errors in the robot mo-tion estimate, [θ, L], and 3-D feature location errors, [dx, dy , dz ].The robot motion estimate error is given by

[δα

]=

[Lθ12

]−

[Lθ

]. (19)

The robot motion estimate error induces an error in the projectedfeature locations (18), as shown in Fig. 3(a). The feature motionestimate error, ef = [ex, ez ]T , is related to the robot motion

estimate error, [δ, α]T , by

ex = cos α

(L sin

θ

2+ xk + (L + δ) sin

(α − θ

2

))

+ sinα

(L cos

θ

2+ zk − (L + δ) cos

(α − θ

2

))− xk

(20)

and

ez = − sin α

(L sin

θ

2+ xk + (L + δ) sin

(α − θ

2

))

+ cos α

(L cos

θ

2+ zk − (L + δ) cos

(α − θ

2

))− zk .

(21)

Fig. 3(c) illustrates the extent of possible feature motion es-timate errors, ef , for an example projected feature location. Inorder to compensate for errors in both the motion estimate andin the feature locations, the extent of the spatial matching regionis defined as

Bx = KB

[dx + ex,maxdz + ez,max

](22)

where KB is a constant used to parameterize the size of thematching region, and ex,max and ez,max are the maximum valuesof the feature motion estimate errors in the x- and z-directions,respectively. For this study, the spatial matching region is a3-D region approximated by a quadrilateral in the xz plane, asdepicted in Fig. 3(c), with a vertical dimension y = KBdy whichallows for vertical measurement error. Fig. 3(c) also shows theparameters εx and εz that define the maximum extent of thematching region in the x- and z-directions, respectively.

V. ERROR IN VISUAL ODOMETRY

Visual odometry is predicated on tracking visual features oversuccessive video frames. Regardless of the matching schemeused, there are generally two possible outcomes from a givenfeature match declaration. First, it is possible that the declaredmatch is, in fact, a correct feature correspondence betweenframes. In this case, there is an error in the visual odometrysolution, regardless of the relative orientation solution method,caused by uncertainties in the actual location of the features.Second, it is possible that the declared match is an incorrectfeature correspondence. In this case, there is also an error intro-duced into the visual odometry output.

A. Visual Odometry Error From Correct Matches

In order to determine how feature location errors derived inSection III-A propagate through the 3-DOF inverse kinematicsolution, the first-order approximation for error propagation de-scribed by (4) is applied to the visual odometry solution in (12).

Page 7: 06746183

1902 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 19, NO. 6, DECEMBER 2014

This approximation gives expected errors of

dθ1 =|xk−1 − xk |

√d2

zk −1+ d2

zk+ tan θ1(d2

xk −1+ d2

xk)

(xk−1 − xk )2 + (zk−1 + zk )2

(23)and

dL = |cscθ1 |[d2

zk −1 + s212d

2xk

+ c212d

2zk

+(s12zk − c12xk )2d2θ1 2

+ c21L

2d2θ1

] 12

(24)

where s1 = sin θ1 , s12 = sin θ12 , c12 = cos θ12 . From the cir-cular motion constraint, the expected error in heading, dθ1 2 , iscalculated by

dθ1 2 = 2dθ1 . (25)

Equations (23)–(25) approximate the error in the visual odom-etry calculated from a single feature correspondence betweenframes (k − 1) and k. Individual error values accumulate overtime as

dL,final =

√√√√ n∑k=1

d2L,k (26)

and

dθ1 2 ,final =

√√√√ n∑k=1

d2θ1 2 ,k (27)

where n is the total number of video frames. In this manner, theerrors in the final visual odometry solution can be accumulatedfor any time history of matched features.

B. Visual Odometry Error From Incorrect Matches

The 3-D feature tracking algorithm matches features bysearching a region around the estimated feature location, asdescribed in Section IV-C. Given feature location errors, the po-tential inconsistency of features between frames, and errors inthe robot motion estimate, it is possible that an incorrect featurematch will be declared. However, a benefit of the 3-D matchingapproach is that errors from incorrect matches are bounded, andtherefore, accurate visual odometry is possible without iterativeoutlier-rejection tools [1].

For a given pair of incorrectly matched features, xk−1,i andxk,j , the visual odometry error is defined by

eL = |Lii − Lij | (28)

and

eθ = |θ12,ii − θ12,ij | (29)

where the subscripts ii and ij denote the solutions computedfrom the correct and incorrect matches, respectively (e.g., Lii isthe translation computed from the correct feature match), and Land θ12 are computed from (12).

An expression for eL as a function of the size of the spatialmatching region is derived by substituting

eL =∣∣[cscθ1(zk−1 − sin θ12xk,i − cos θ12zk,i)

]−

[cscθ1(zk−1 − sin θ12xk,j − cos θ12zk,j )

]∣∣. (30)

Given the size of the spatial matching region, εx and εz inFig. 3(c), the maximum distance that the incorrect feature, xk,j ,can be from the correct feature, xk,i , can be expressed as

|xk,j − xk,i| ≤ 2[

εx

εz

]. (31)

Substituting (31) into (30) and reducing yields

eL =∣∣∣cscθ1

(sin θ12(xk,j − xk,i)

+ cos θ12(zk,j − zk,i))∣∣∣

≤ |cscθ1 |(| sin θ12‖(xk,j − xk,i)|

+ | cos θ12‖(zk,j − zk,i)|)

≤ 2|cscθ1 |(| sin θ12 |εx + | cos θ12 |εz

). (32)

For typical indoor WMR motion and video frame rates, θ12 ≈O(10−2). Additionally, the circular motion constraint, (11), canbe used to express θ1 in terms of θ12 as

θ1 = θ12 − θ2 =θ12 + π

2(33)

and therefore, sin θ1 = cos θ1 22 ≈ 1, cscθ1 = 1, and cos θ1 =

− sin θ1 22 ≈ θ1 2

2 . With this small-angle approximation, the ex-pression for the maximum error in L from an incorrect matchcan be reduced to

eL,max ≈ 2(|θ12 |εx + εz ). (34)

In the case of a small change in heading between frames,|θ12 |εx � εz , and the error in the distance traveled (L) cal-culated from an incorrect match is bounded by 2εz .

The derivation of eθ follows similarly. Let u = u1u2

=zk −1 , i +zk , i

xk −1 , i −xk , iand v = v1

v2= zk −1 , i +zk , j

xk −1 , i −xk , j, then the error in head-

ing from an incorrect feature match is written as

eθ = 2 |atan(u) − atan(v)|

= 2atan∣∣∣∣ u − v

1 + uv

∣∣∣∣≤ 2

∣∣∣∣ u − v

1 + uv

∣∣∣∣= 2

∣∣∣∣u1v2 − u2v1

u1u2 + v1v2

∣∣∣∣ . (35)

In practice, z ≥ 50 cm and |xk−1 − xk | ∼ O(100) cm [aspredicted by (9) with typical values of θ12 = 0.01 radiansand L = 1 cm]. Therefore, u1u2 = (zk−1,i + zk,i)(zk−1,i +

Page 8: 06746183

FABIAN AND CLAYTON: ERROR ANALYSIS FOR VISUAL ODOMETRY ON INDOOR, WHEELED MOBILE ROBOTS WITH 3-D SENSORS 1903

zk,j ) � v1v2 = (xk−1,i − xk,i)(xk−1,i − xk,j ), and from (35)

2∣∣∣∣u1v2 − u2v1

u1u2 + v1v2

∣∣∣∣ ≈ 2∣∣∣∣u1v2 − u2v1

u1u2

∣∣∣∣ . (36)

Expanding the numerator and denominator, respectively, yields

zk−1,i(xk,i − xk,j ) + xk−1,i(zk,i − zk,j )

−zk,ixk,j + xk,izk,j (37)

and

z2k−1,i + zk−1,i(zk,i + zk,j ) + zk,izk,j . (38)

Applying the bound in (31) and recognizing 2εz |zk−1,i | +|zk,i | � |zk−1,i | + |zk,i |2 (which is reasonable since εz ∼O(100) cm) the error in heading is

eθ,max ≈ 4εx(|zk−1,i | + |zk,i |) + εz (|xk−1,i | + |xk,i |)(|zk−1,i | + |zk,i |)2 − 2εz (|zk−1,i | + |zk,i |)

. (39)

This bound can be simplified using (12), which projects xk−1,i

into the xk,i coordinate system, and the small-angle approxima-tion. The numerator and denominator of (39) are

4εx(|θ12xk,i | + 2|zk,i | + L) + εz (2|xk,i | + |θ12zk,i | +θ1,2

2L)

(40)

and

A2 − 2εA (41)

where A = |θ12xk,i | + 2|zk,i | + L, respectively. Finally, thebound can be reduced through an order of magnitude ap-proximation. Specifically, substituting θ12 ∼ O(10−2) rad, L ∼O(100) cm, z ≥ 50 cm and x ∼ O(100 − 102) cm, yields

eθ,max ≈ 2εx + εz |xk , i

zk , i|

|zk,i | − εz

≈ 2εx

|zk,i |. (42)

Therefore, the maximum error in heading from an incorrectmatch is approximately proportional to the angular separationbetween the correct and incorrect feature, εx

zk.

C. Experimental Verification of Visual OdometryError Propagation

In order to verify the accuracy of the error predictions, a seriesof experimental trials was conducted. In these trials, the WMRmoved in an oval trajectory within the motion capture systemfield of view. Although the motion capture system imposes con-straints on the trajectory of the WMR, it provides the precisionmeasurements of the WMR position and orientation requiredfor this experimental verification. An example experimental tra-jectory is shown in Fig. 4, as seen from the overhead camera.This figure also shows representative scenes from the WMR’sperspective. The total trajectory length was approximately 10 m.

At each frame of the trajectory, the WMR makes three calcu-lations for each individual feature match:

1) the visual odometry solution, [θ12 , L], from (12);

Fig. 4. Scene from the overhead motion capture camera, including the trajec-tory followed by the robot. The insets depict the views of the environment fromthe indicated locations.

Fig. 5. Unweighted visual odometry output from Trial 9 in (a) cumulativedistance traveled, ΣL, and (b) cumulative change in heading, Σθ12 . The dashedlines are linear regression fits forced to intersect the origin.

2) the predicted bound on error in distance traveled, dL , from(24);

3) the predicted bound on error in heading, dθ , from (23).The calculations from each feature match are averaged over

all matches in the frame. In postprocessing, the averaged errorpredictions from each frame, dL and dθ , are propagated over theentire trajectory by (26) and (27), yielding the total error boundson the visual odometry solution. In this manner, predicted errorbounds for the entire trial can be calculated, based on the specificfeatures that were tracked in each trial. These predicted errorbounds are compared to the actual visual odometry performancemeasured by the overhead camera system for each individualtrial.

In Fig. 5, the measured errors in distance traveled, edt , andheading, eh , are shown for the individual Trial 9. In order tosmooth the higher frequency variations over the trajectory andgenerate representative metrics for each trial, linear regressionfits, shown by the dashed lines in Fig. 5, are used to quantify thepercent error in each trial.

Fig. 6 shows the percent errors for each experimental trial(calculated by the linear regression fits). The figure also showsthe average of all ten trials, and the error bounds predicted by(24) and (25).

Page 9: 06746183

1904 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 19, NO. 6, DECEMBER 2014

Fig. 6. Visual odometry results compared to predicted error bounds. (a) Errorin the distance traveled, edt , and (b) error in heading, eh , both measured asslopes of linear regression fits.

D. Discussion of Experimental Results

Fig. 6(a) shows that the predicted error, dL , is a reasonableupper bound on the visual odometry performance, since all ofthe individual trials are below the bound. On average, there isapproximately a 50% margin between the error bound and theerror in the measured distance traveled. Fig. 6(b) indicates thatthe predicted error in heading, dθ , does not bound one of the tentrials, although the average of all the trials is below the bound.As expected by the coupling of the variables L and θ12 , there isa correlation between the error in distance traveled and heading.

The predicted bounds on the performance, dL and dθ , arederived assuming that all features are correctly matched betweenframes. Given the conservative nature of the error bounds, it ishypothesized that the individual trials that are near or abovethe error bounds are contaminated with incorrect matches. Toshow that incorrect matches could account for the deviations inmeasured heading, consider Trial 3 in Fig. 6. In this trial, thetotal error in heading relative to the average is, Σeh = 0.23 rad.Assuming that there was an equal error in each of the 800 framesin the trajectory, this total error corresponds to a per frameerror in heading of eh = 0.008 rad. The average number ofmatches per frame was nΣ = 15. Finally, assume that eθ = 0.1is an average value of the eθ contour calculated by (35) (notethat this assumption will be validated in the Section VI). Then,the number of incorrect matches, ni , required to produce themeasured error with an unweighted average is given by

ni =eh

eθnΣ = (0.08)nΣ = 1.2. (43)

Therefore, it is reasonable that incorrect matches are the causeof the errors in Fig. 6.

VI. ERROR-WEIGHTED SOLUTION

A logical next step is to use the error information to improvethe visual odometry performance. The visual odometry betweenframes, and the associated errors, are calculated individuallyfor each pair of matched features. Rather than averaging all

Fig. 7. Error-weighted visual odometry output from Trial 9 with in (a) cumu-lative distance traveled, ΣL, and (b) cumulative change in heading, Σθ12 . Thedotted lines are the linear regression fits forced to intersect the origin, and thedashed lines are the results from the unweighted average.

individual solutions as in Section V, a weighted-average can beimplemented to give higher value to individual solutions withlower predicted errors.

For this paper, the individual solution weights are designedto be inversely proportional to the errors in the L and θ12 coor-dinates. Specifically, the weights for each individual solution, i,are calculated by

wLi=

d−1Li

− d−1Lm a x

d−1Lm in

− d−1Lm a x

(44)

and

wθi=

d−1θi

− d−1θm a x

d−1θm in

− d−1θm a x

(45)

where dLiand dθi

are calculated by (24) and (25), and dLm in ,dLm a x , dθm in , and dθm a x are the minimum and maximum errorsin the L and θ12 coordinates, respectively, for a given frame, k.

A. Experimental Evaluation of the Error-WeightedAveraging Solution

By using the weights presented in (44) and (45) and theexperimental data from Section V-C, the visual odometry wascalculated using a weighted average. Experimental results foran individual Trial 9 are shown in Fig. 7. The aggregate resultsof all ten trials are shown in Fig. 8.

B. Discussion of Weighted-Averaging Results

The slopes of the linear regression fits shown in Fig. 7 illus-trate the improvement of the error-weighted averaging method,relative to the unweighted solution. These figures indicate thatthere is a significant improvement in L, but that the results in θ12are essentially unchanged. As shown in Fig. 8, this trend holdsfor the entire dataset. Fig. 8(a) shows that, for all ten trials, theerror-weighted average provides an approximate 50% improve-ment over the unweighted average in the L coordinate. Fig. 8(b)shows that the error-weighted average does not significantly im-prove the performance in measuring heading. In fact, not only isthe average of the trials nearly identical, each individual trial is

Page 10: 06746183

FABIAN AND CLAYTON: ERROR ANALYSIS FOR VISUAL ODOMETRY ON INDOOR, WHEELED MOBILE ROBOTS WITH 3-D SENSORS 1905

Fig. 8. Visual odometry with weighted-averaging of individual results com-pared to predictions. (a) Error in the distance traveled, edt , normalized by thetotal trajectory length. (b) Error in heading, eh , normalized by the total changein heading.

Fig. 9. (a) Error in L from feature location error, dL , (b) error in L fromincorrect matches, eL , (c) error in θ12 from feature location error, dθ , and(d) error in θ12 from incorrect matches, eθ , over the RGB-D cameras field ofview calculated by (25), (24), (32), and (35). For these figures, L = 1.2 cm,θ12 = 0.02 rad, α = 0.03 rad, δ = 0.8 cm, and KB = 1.4.

also nearly identical to the unweighted solution. It is importantto note that, with both weighting schemes, the performance isgenerally good (<1.5%, which is comparable to other visualodometry systems [5], [13]).

In order to better understand this lack of improvement, thepredicted visual odometry error bounds from correct and in-correct feature matches over the range of feature locations areshown in Fig. 9. Fig. 9(a) and (b) shows the errors in L from cor-rect matches, dL , and incorrect matches, eL , from (24) and (32),respectively. The steep gradient in the z-direction of Fig. 9(a)indicates that the error-weighting method will highly value fea-tures at close range. Since the gradient in Fig. 9(b), the errorfrom incorrect matches, follows the same general trend, the

error-weighting method will also inherently reduce the error in-duced by incorrect matches. This result is evidenced in Fig. 8(a)by the decrease in error in distance traveled, relative to the un-weighted average, and by the relatively small deviations of theindividual trials from the average.

In contrast to the results in distance traveled, Fig. 9(c) and (d)shows that there are no significant gradients in error in headingfrom correct matches, dθ (25), or incorrect matches, eθ (35).Therefore, weighting the individual solutions based on dθ willhave no significant effect on the overall performance in measur-ing heading, compared to an unweighted average. Furthermore,the error in heading from incorrect matches is also relativelyconstant over the entire sensor range. Therefore, it is to be ex-pected that the performance in measuring heading is largelyinsensitive to the weighting of individual solutions, since allindividual solutions will have the same approximate errors.

VII. CONCLUSION

Analytical expressions for the 3-D location error in an RGB-D sensor were derived and experimentally verified. Throughthe use of the 3-D information from the RGB-D sensor, a novel,spatial feature matching algorithm was implemented. A detailederror analysis of this algorithm was conducted, quantifying er-ror bounds in the feature tracking process. This feature trackingalgorithm was implemented on a WMR, along with a 1-pt in-verse kinematic visual odometry solution. The sensor and fea-ture tracking error bounds were analytically propagated throughthe visual odometry solution to provide expected bounds onthe visual odometry performance. These bounds were experi-mentally verified. Finally, the predictable error information wasused to improve the visual odometry performance by applyingthe highest weighting to the most accurate information. Usingthis approach, visual odometry errors of ∼1.5% were achieved,without the use of iterative outlier-rejections tools.

REFERENCES

[1] J. Fabian and G. Clayton, “One-point visual odometry using a RGB-depthcamera pair,” in Proc. ASME Dyn. Syst. Control Conf., 2012, pp. 491–498.

[2] J. Fabian and G. Clayton, “Spatial feature matching for visual odometry:A parametric study,” presented at the 2013 ASME Dyn. Syst. ControlConf.

[3] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “RGB-D map-ping: Using Kinect-style depth cameras for dense 3D modeling of indoorenvironments,” Int. J. Robot. Res., vol. 31, no. 5, pp. 647–663, 2012.

[4] S. Se, D. Lowe, and J. Little, “Mobile robot localization and mappingwith uncertainty using scale-invariant visual landmarks,” Int. J. Robot.Res., vol. 21, no. 8, pp. 735–758, 2002.

[5] D. Nister, O. Naroditsky, and J. Bergen, “Visual odometry,” in Proc. IEEEComput. Soc. Conf. Comput. Vis. Pattern Recog., 2004, vol. 1, pp. I-652–I-659.

[6] D. Scaramuzza, “Performance evaluation of 1-point-RANSAC visualodometry,” J. Field Robot., vol. 95, no. 1, pp. 792–811, Jul. 2011.

[7] B. Kitt, A. Geiger, and H. Lategahn, “Visual odometry based on stereoimage sequences with RANSAC-based outlier rejection scheme,” in Proc.IEEE Intell. Veh. Symp. (IV), 2010, pp. 486–492.

[8] M. Maimone, Y. Cheng, and L. Matthies, “Two years of visual odometryon the mars exploration rovers,” J. Field Robot., vol. 24, no. 3, pp. 169–186, 2007.

[9] D. Helmick, Y. Cheng, D. Clouse, L. Matthies, and S. Roumeliotis, “Pathfollowing using visual odometry for a mars rover in high-slip environ-ments,” in Proc. IEEE Aerosp. Conf., vol. 2, 2004, pp. 772–789.

Page 11: 06746183

1906 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 19, NO. 6, DECEMBER 2014

[10] E. Guizzo and T. Deyle, “Robotics trends for 2012,” IEEE Robot. Autom.Mag., vol. 19, no. 1, pp. 119–123, Mar. 2012.

[11] M. Fischler and R. Bolles, “Random sample and consensus: A paradigmfor model fitting with applications to image analysis and automated car-tography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, Jan. 1981.

[12] D. Nister, “Preemptive RANSAC for live structure and motion estima-tion,” Mach. Vis. Appl., vol. 16, no. 5, pp. 321–329, 2005.

[13] J. Civera, O. Grasa, A. Davison, and J. Montiel, “1-point RANSAC forEKF-based structure from motion,” Intell. Robots Syst., vol. 27, no. 5,pp. 609–631, Apr. 2010.

[14] F. Fraundorfer and D. Scaramuzza, “Visual odometry: Part II: Matching,robustness, optimization, and applications,” IEEE Robot. Autom. Mag.,vol. 19, no. 2, pp. 78–90, Jun. 2012.

[15] M. Chli and A. Davison, “Active matching for visual tracking,” Robot.Auton. Syst., vol. 57, no. 12, pp. 1173–1187, 2009.

[16] H. Stewenius, C. Engels, and D. Nister, “Recent developments on directrelative orientation,” J. Photogrammetry Remote Sens., vol. 60, no. 4,pp. 284–294, 2006.

[17] J. Fabian, T. Young, J. P. Jones, and G. Clayton, “Integrating the MicrosoftKinect with Simulink: Real-time object tracking example,” IEEE/ASMETrans. Mechatronics, to be published.

[18] B. Betts, “The teardown,” Eng. Technol., vol. 6, no. 6, pp. 94–95, 2011.[19] P. Benavidez and M. Jamshidi, “Mobile robot navigation and target track-

ing system,” in Proc. 6th Int. Conf. Syst. Syst. Eng., Jun. 2011, pp. 299–304.[20] E. Georgiou, J. Dai, and M. Luck, “The KCLBOT: Exploiting RGB-d

sensor for navigation environment building and mobile robot localization,”Int. J. Adv. Robot. Syst., vol. 8, no. 4, pp. 194–202, 2011.

[21] J. Stowers, M. Hayes, and A. Bainbridge-Smith, “Altitude control of aquadrotor helicopter using depth map from Microsoft Kinect sensor,” inProc. IEEE Int. Conf. Mechatronics, 2011, pp. 358–362.

[22] J. Tran, A. Ufkes, M. Fiala, and A. Ferworn, “Low-cost 3D scene re-construction for response robots in real-time,” in Proc. IEEE Int. Symp.Safety, Security, Rescue Robot., 2011, pp. 161–166.

[23] H. Andreasson and A. Lilienthal, “6D scan registration using depth-interpolated local image features,” Robot. Auton. Syst., vol. 58, no. 2,pp. 157–165, 2010.

[24] M. Fiala and A. Ufkes, “Visual odometry using 3-dimensional video in-put,” in Proc. Can. Conf. Comput. Robot Vis., 2011, pp. 86–93.

[25] R. Haralick, “Propagating covariance in computer vision,” in Proc. 12thIAPR Int. Conf. Pattern Recog. Comput. Vis. Image Process., 1994, vol. 1,pp. 493–498.

[26] E. Grossmann and J. Santos-Victor, “Uncertainty analysis of 3D recon-struction from uncalibrated views,” Image Vis. Comput., vol. 18, no. 9,pp. 685–696, 2000.

[27] A. Chowdhury and R. Chellappa, “Statistical error propagation in 3Dmodeling from monocular video,” in Proc. Comput. Vis. Pattern Recog.Workshop, 2003, vol. 8, pp. 89–89.

[28] D. Xu, L. Han, M. Tan, and Y. F. Li, “Ceiling-based visual positioningfor an indoor mobile robot with monocular vision,” IEEE Trans. Ind.Electron., vol. 56, no. 5, pp. 1617–1628, May 2009.

[29] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Speeded-up robust features(SURF),” Comput. Vis. Image Understand., vol. 110, no. 110, pp. 346–359, Sep. 2008.

[30] J. Heikkila and O. Silven, “A four-step camera calibration procedure withimplicit image correction,” in Proc. IEEE Comput. Soc. Conf. Comput.Vis. Pattern Recog., 1997, pp. 1106–1112.

[31] P. Bevington and D. Robinson, Data Reduction and Error Analysis for thePhysical Sciences. New York, NY, USA: McGraw-Hill, 2003.

[32] D. Herrera, J. Kannala, and J. Heikkila, “Accurate and practical calibrationof a depth and color camera pair,” in Computer Analysis of Images andPatterns. New York, NY, USA: Springer-Verlag, 2011, pp. 437–445.

[33] M. Spong and M. Vidyasagar, Robot Dynamics and Control. New York,NY, USA: Wiley, 1989.

[34] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int.J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Jan. 2004.

[35] K. Mikolajczyk and C. Schmid, “An affine invariant interest point detec-tor,” in Proc. 7th Eur. Conf. Comput. Vision, 2002, pp. 128–142.

[36] K. Konolige, M. Agrawal, and J. Sola, “Large-scale visual odometry forrough terrain,” in Robotics Research. New York, NY, USA: Springer-Verlag, 2011, pp. 201–212.

Joshua Fabian received the B.S. degrees in physicsand mechanical engineering from Franklin and Mar-shall College, Lancaster, PA, USA, and the Univer-sity of Pittsburgh, Pittsburgh, PA, USA, in 1998 and2000, respectively, the M.S.E. degree in mechani-cal engineering from the University of Pennsylvania,Philadelphia, PA, USA, in 2003, and the Ph.D. degreein engineering from Villanova University, Villanova,PA, USA, in 2013.

He currently works for Lockheed Martin Corpora-tion, Valley Forge, PA, USA, primarily on advanced

research, development, and experimentation for system-of-systems concepts.

Garrett M. Clayton received the B.S. degree in me-chanical engineering from Seattle University, Seattle,WA, USA, in 2001, and the M.S. and Ph.D. degrees inmechanical engineering from the University of Wash-ington, Seattle, WA, USA, in 2003 and 2008, respec-tively.

He is currently an Assistant Professor of mechan-ical engineering at Villanova University, Villanova,PA, USA, and a member of the Center for Nonlin-ear Dynamics and Controls (CENDAC). His currentresearch interests include image-based control in a

broad range of applications including scanning probe microscopy and unmannedvehicles.