3-Point RANSAC for Fast Vision based Rotation Estimation ...ce.sharif.edu/~dkamran/MFI.pdf ·...

6
3-Point RANSAC for Fast Vision based Rotation Estimation using GPU Technology Danial Kamran 1 , Mohammad T. Manzuri 1 , Ali Marjovi 2 and Mahdi Karimian 1 Abstract— In many sensor fusion algorithms, the vision based RANdom Sample Consensus (RANSAC) method is used for estimating motion parameters for autonomous robots. Usu- ally such algorithms estimate both translation and rotation parameters together which makes them inefficient solutions for merely rotation estimation purposes. This paper presents a novel 3-point RANSAC algorithm for estimating only the rotation parameters between two camera frames which can be utilized as a high rate source of information for a camera-IMU sensor fusion system. The main advantage of our proposed approach is that it performs less computations and requires fewer iterations for achieving the best result. Despite many previous works that validate each hypothesis for all of data points and count the number of inliers for it, we use a voting based scheme for selecting the best rotation among all primary answers. This methodology is much more faster than the traditional inlier based approach and is more efficient for parallel implementation of RANSAC iterations. We also investigate parallel implementation of the proposed 3-point RANSAC using CUDA technology which leads to a great improvement in the processing time of estimation algorithm. We have utilized real datasets for evaluation of our algorithm and also compared it with the well-known 8-point algorithm in terms of accuracy and speed. The results show that the proposed approach improves the speed of estimation algorithm up to 150 times faster than the 8-point algorithm with similar accuracy. I. I NTRODUCTION Angular velocity of a robot is crucial information used as input in the Inertial Navigation System (INS) for having accurate estimation of the robot’s location in real world. Gyroscope is one of the main sensors used for the navigation of robots in the INS process which provides high rate angular velocity information; however, its measurements are accurate enough only for a short period of time [1] and after that period the measurements are mixed with some sensor errors such as biases and scale factor errors that are not necessarily constant over time. Moreover, other inertial sensors such as accelerometer or compass have their own drawbacks in specific situations. Due to the drawbacks of gyroscopes and also other inertial sensors for having an accurate estimation of angular velocity, nowadays researchers have focused on the utilization of camera images as a new type of information for improving the accuracy of motion estimations for the robots. In several works researchers have mixed visual and inertial sources in a sensor fusion system in order to have an accurate 1 Department of Computer Engineering, Sharif University of Technology, Tehran, Iran 2 School of Architecture, Civil and Environmental Engineering, Ecole Polytechnique F´ ed´ erale de Lausanne (EPFL), Lausanne, Switzerland estimation of robot’s rotation (e.g., [1]–[6]). However, the visual part of rotation estimation algorithm still needs some improvements. In some articles the rotation estimation algo- rithm is just applicable in indoor environments with enough number of vanishing points (e.g., [4]–[6]) which are not applicable for outdoor or unknown environments. On the other hand, feature based estimation methods [1]–[3] usually utilize geometric techniques that estimate both translation and rotation parameter jointly with extra processing. There- fore, even feature based methods are not efficient for the purpose of fast and accurate rotation estimation. These types of methods usually perform some inefficient computations because of estimating extra parameters rather than only rotation of the robot. These methods try to estimate only 3 rotation parameters (roll, pitch and yaw); however, most of them require more than 3 points (5, 6 or 8 points) which results in their inefficiency for merely rotation estimation. This situation leads to some extra execution time and power consumption of the overall system. Due to the fact that most of the feature based estimation methods need to apply RANSAC technique [7] in order to remove outliers and estimate the final answer by considering all inlier correspondences, the number of minimum data points for estimating a primary hypothesis has a great impact on the number of required iterations for finding the best answer. This relationship can be seen in Fig. 1 that shows the number of required iterations for different numbers of minimum points having a primary answer and for different proportions of outliers in the correspondences. According to this figure, processing extra points in the RANSAC algorithm will also increase the number of iterations required for finding the best answer among different hypotheses which results in more execution time and waste of processing resources in crucial applications. The main contribution of this paper is proposing a 3-point RANSAC algorithm for a fast and also efficient rotation estimation algorithm which can be used as a high rate source of information for sensor fusion applications. This 3-point algorithm requires fewer number of RANSAC iterations comparing with other similar methods due to separation between rotation and translation effects and concentrating merely on the rotation estimation. Moreover, we present a novel parallel implementation for the proposed RANSAC algorithm based on a voting approach for best rotation selection and parallel estimation of different hypotheses using CUDA cores. This methodology helps to improve the speed of overall RANSAC algorithm. The efficiency of the proposed algorithm is evaluated using several real datasets

Transcript of 3-Point RANSAC for Fast Vision based Rotation Estimation ...ce.sharif.edu/~dkamran/MFI.pdf ·...

Page 1: 3-Point RANSAC for Fast Vision based Rotation Estimation ...ce.sharif.edu/~dkamran/MFI.pdf · Danial Kamran 1, Mohammad T. Manzuri , Ali Marjovi2 and Mahdi Karimian Abstract—In

3-Point RANSAC for Fast Vision based Rotation Estimation using GPUTechnology

Danial Kamran1, Mohammad T. Manzuri1, Ali Marjovi2 and Mahdi Karimian1

Abstract— In many sensor fusion algorithms, the vision basedRANdom Sample Consensus (RANSAC) method is used forestimating motion parameters for autonomous robots. Usu-ally such algorithms estimate both translation and rotationparameters together which makes them inefficient solutionsfor merely rotation estimation purposes. This paper presentsa novel 3-point RANSAC algorithm for estimating only therotation parameters between two camera frames which can beutilized as a high rate source of information for a camera-IMUsensor fusion system. The main advantage of our proposedapproach is that it performs less computations and requiresfewer iterations for achieving the best result. Despite manyprevious works that validate each hypothesis for all of datapoints and count the number of inliers for it, we use avoting based scheme for selecting the best rotation amongall primary answers. This methodology is much more fasterthan the traditional inlier based approach and is more efficientfor parallel implementation of RANSAC iterations. We alsoinvestigate parallel implementation of the proposed 3-pointRANSAC using CUDA technology which leads to a greatimprovement in the processing time of estimation algorithm.We have utilized real datasets for evaluation of our algorithmand also compared it with the well-known 8-point algorithmin terms of accuracy and speed. The results show that theproposed approach improves the speed of estimation algorithmup to 150 times faster than the 8-point algorithm with similaraccuracy.

I. INTRODUCTION

Angular velocity of a robot is crucial information usedas input in the Inertial Navigation System (INS) for havingaccurate estimation of the robot’s location in real world.Gyroscope is one of the main sensors used for the navigationof robots in the INS process which provides high rate angularvelocity information; however, its measurements are accurateenough only for a short period of time [1] and after thatperiod the measurements are mixed with some sensor errorssuch as biases and scale factor errors that are not necessarilyconstant over time. Moreover, other inertial sensors suchas accelerometer or compass have their own drawbacks inspecific situations.

Due to the drawbacks of gyroscopes and also other inertialsensors for having an accurate estimation of angular velocity,nowadays researchers have focused on the utilization ofcamera images as a new type of information for improvingthe accuracy of motion estimations for the robots. In severalworks researchers have mixed visual and inertial sourcesin a sensor fusion system in order to have an accurate

1Department of Computer Engineering, Sharif University of Technology,Tehran, Iran

2School of Architecture, Civil and Environmental Engineering, EcolePolytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland

estimation of robot’s rotation (e.g., [1]–[6]). However, thevisual part of rotation estimation algorithm still needs someimprovements. In some articles the rotation estimation algo-rithm is just applicable in indoor environments with enoughnumber of vanishing points (e.g., [4]–[6]) which are notapplicable for outdoor or unknown environments. On theother hand, feature based estimation methods [1]–[3] usuallyutilize geometric techniques that estimate both translationand rotation parameter jointly with extra processing. There-fore, even feature based methods are not efficient for thepurpose of fast and accurate rotation estimation. These typesof methods usually perform some inefficient computationsbecause of estimating extra parameters rather than onlyrotation of the robot. These methods try to estimate only3 rotation parameters (roll, pitch and yaw); however, mostof them require more than 3 points (5, 6 or 8 points) whichresults in their inefficiency for merely rotation estimation.This situation leads to some extra execution time and powerconsumption of the overall system.

Due to the fact that most of the feature based estimationmethods need to apply RANSAC technique [7] in order toremove outliers and estimate the final answer by consideringall inlier correspondences, the number of minimum datapoints for estimating a primary hypothesis has a great impacton the number of required iterations for finding the bestanswer. This relationship can be seen in Fig. 1 that showsthe number of required iterations for different numbers ofminimum points having a primary answer and for differentproportions of outliers in the correspondences. According tothis figure, processing extra points in the RANSAC algorithmwill also increase the number of iterations required forfinding the best answer among different hypotheses whichresults in more execution time and waste of processingresources in crucial applications.

The main contribution of this paper is proposing a 3-pointRANSAC algorithm for a fast and also efficient rotationestimation algorithm which can be used as a high rate sourceof information for sensor fusion applications. This 3-pointalgorithm requires fewer number of RANSAC iterationscomparing with other similar methods due to separationbetween rotation and translation effects and concentratingmerely on the rotation estimation. Moreover, we present anovel parallel implementation for the proposed RANSACalgorithm based on a voting approach for best rotationselection and parallel estimation of different hypothesesusing CUDA cores. This methodology helps to improve thespeed of overall RANSAC algorithm. The efficiency of theproposed algorithm is evaluated using several real datasets

Page 2: 3-Point RANSAC for Fast Vision based Rotation Estimation ...ce.sharif.edu/~dkamran/MFI.pdf · Danial Kamran 1, Mohammad T. Manzuri , Ali Marjovi2 and Mahdi Karimian Abstract—In

Fig. 1. Number of required RANSAC iterations (N) for different numberof minimum points (k) and different ratio of outliers in correspondences (ε)[8]

from Karlsruhe vision benchmark [9], [10] and is comparedwith the well-known 8 point algorithm [11] in both terms ofaccuracy and speed.

The outline of the paper is the following. In section II, wereview previous related works. In section III, we introduceour 3-point RANSAC and also explain its advantages fora fast parallel implementation using GPU technology. Insection IV, we express our dataset and the hardware platformwe used in our experiments. We also present our experi-mental results the section IV. Finally, section V draws theconclusions of this article.

II. RELATED WORKS

The most well-known minimal solutions for the purposeof rotation estimation between camera images are the 5-pointmethod by Nister [12], Stewenius et al. [13] and Kukelovaet al. [14]. The 8-point algorithm is also among famousalgorithms that was proposed by Longuet-Higgins [11]. Themain problem for these approaches is mixing rotation andtranslation parameters jointly by considering fundamental oressential matrix in their solutions [15].

There are other rotation estimation algorithms which tryto decouple the effect of rotation and translation from eachother in order to concentrate merely on the rotation estima-tion part. The 6-point algorithm proposed by Kneip et al.[15] tries to estimate the rotation between two frames ofthe camera independently from its translation. The proposedalgorithm shows a great accuracy for the synthetic data;however, it has not been validated for real datasets. Moreover,this method produces up to 20 candidates for each rotationestimation and therefore needs some extra assumptions inorder to find the best solution among others. Also an opti-mization algorithm for generating accurate rotation betweentwo frames has been proposed in [16]. The main drawbackfor this work is that it needs an initial value as a primaryestimation of the rotation. According to our experiments, thisalgorithms is very sensitive to the initial value and withouthaving good estimation cannot produce accurate results.

All of the algorithms expressed previously require morethan 3 points for estimating the rotation between two frames.Hence, they lead to more RANSAC iterations, extra pro-cessing and finally increasing the execution time. On theother hand, a few previous works have tried to reduce thenumber of minimum required points for motion estimationin order to decrease the number of RANSAC iterations.In [17], Scaramuzza et al. proposed 1-point RANSAC forestimating the rotation and translation of ground vehiclesby restricting the motion model and parameterizing themotion with only 1 feature correspondence. This method isjust applicable for on-road vehicles due to the assumptionsthat it has in the motion model of the vehicle. Also otherworks have proposed 2-point [18] and 3-point [19] estimationalgorithms which use the information from other sensorssuch as IMU or accelerometer in order to estimate someof motion parameters. The main defect for these methodsis that they are dependent to accurate information from aninertial sensor.

III. PROPOSED METHOD

The method that we propose in this article for the purposeof fast rotation estimation between two camera frames isbasically derived from a rigid body transformation estimationalgorithm developed by Arun et al. [20]. Arun’s algorithmdecouples the effect of translation on 3D point locations bysubtracting the average point of each correspondent set fromall its points. The main drawback of this technique is thatit is applicable for 3D correspondences; however, we willshow that by using some assumptions in the relationshipbetween 2D points, this algorithm is even applicable for 2Dcorrespondences without knowing 3D location of matchedpoints. In the remaining parts of this section, at first weexplain our assumptions about the relationship between 2Dcorrespondences which allows us to utilize Arun’s decou-pling algorithm for estimating only the rotation betweenthem. After that we introduce our 3-point RANSAC ap-proach for fast rotation estimation with minimum numberof required iterations and also the methodology for parallelimplementation of RANSAC iterations.

A. Decoupling Rotation from Translation

According to Arun et al. in [20], if we define the meanpoints for two 3D point clouds as the average of all 3Dlocations in each point cloud, the effect of translation can beomitted from 3D correspondences by subtracting the meanpoint of each 3D set from all its points. This is done usingbelow equations for two point clouds pi and p′i:

qi = pi −1

N

N∑k=1

pk (1)

q′i = p′i −1

N

N∑k=1

p′k (2)

Where pi and p′i are the 3D correspondent points in eachpoint cloud and N is the size of each point cloud. Using

Page 3: 3-Point RANSAC for Fast Vision based Rotation Estimation ...ce.sharif.edu/~dkamran/MFI.pdf · Danial Kamran 1, Mohammad T. Manzuri , Ali Marjovi2 and Mahdi Karimian Abstract—In

the above equations, we can assume that just the rotationtransformation exists between the two 3D point clouds:

q′i = R qi (3)

Now in order to estimate the rotation matrix (R) betweentwo sets, Arun’s algorithm defines H as the covariancematrix between qi and q′i points (as following). Then itderives the rotation matrix from computing the SingularValue Decomposition (SVD) of H:

H =

N∑i=1

qi q′ti (4)

H = U Λ V t (5)

R = V U t (6)

As we expressed previously, the only limitation on Arun’smethod which prevents us to apply it in our 2D-2D corre-spondences is that it is developed for 3D points. However,if we compute bearing vectors of 2D correspondences andassume that all of these bearing vectors point to unknown 3Dlocations with fixed depth from the camera center, then wecan apply Arun’s algorithm in order to decouple the effect oftranslation and estimate the rotation matrix. The main reasonwhich allows us to use such assumption is that the rotationmatrix in Arun’s algorithm is computed from SVD of Hmatrix and is independent of the depth of real 3D points.

In order to prove our hypothesis, at first we computebearing vectors from 2D correspondences [21]:

bi =1

||K−1(ui, vi, 1)t||K−1(ui, vi, 1)t (7)

Where K is calibration matrix for the camera and ui and viare x-y coordinates for each 2D point. Now in order to applyArun’s algorithm on the bearing vectors, we assume a fixedbut unknown depth for all bearing vectors pointing to theirreal 3D points:

pi = λ ∗ bi (8)

The λ is unknown for us due to the fact that we are unawareof real 3D locations; however, we will see that the λ valuedoes not have any effect on the computation of rotationmatrix:

H =

N∑i=1

(λbi) (λb′i)t (9)

H = λ2N∑i=1

bi b′ti (10)

The rotation matrix based on the Arun’s algorithm is derivedfrom the SVD of H matrix:

H = Uλ2ΛV t (11)

R = V U t (12)

The above equation proves that Arun’s algorithm is ap-plicable even for 2D-2D correspondences but when they are

all in a same plane parallel to the image plane. It meansthat all of the 2D matched points should have a same depthfrom camera center; however, usually we cannot assume suchassumption for all of matched points between two cameraframes due to the fact that they are related to different objectsin images with different depths from camera center. In thenext part of this section, we will introduce a methodologyfor finding 2D points with the same actual depth by applyingRANSAC scheme on all matched points.

B. 3-point RANSAC

1) Feature Detection and Matching: One of the importantsteps in all feature based motion estimation algorithms isfinding the location of key points in two camera frames andmatching them together. Actually several types of feature de-tectors and descriptors have been proposed for this purpose;however, the combination of FAST detector [22] and BRIEFdescriptor [23] has been approved as an efficient procedureof feature detection and matching in previous works [24].Hence, we also decided to use such combination in order toperform feature detection and matching for the purpose ofproducing 2D-2D correspondences as the input of our 3-pointRANSAC estimator.

2) Decoupled Rotation Estimation for Matched Pointswith Different Depths: In order to solve the problem ofdepth variation for matched bearing vectors which preventsus to assume a fixed depth for them, we propose a 3-pointRANSAC scheme for estimating the rotation for differenttriples of random bearing vectors and finally choosing thebest rotation with maximum consensus. Each random triple,which creates a plane in the 3D coordinates, produces a truerotation if its related plane is parallel to the image plane (thethree points have similar depths). Therefore, in the proposedRANSAC at first we generate several random triples inorder to cover different combinations of matched points withvarious actual depths. Then, in the motion selection step,we estimate the rotation for all of these random triples. Wecan guarantee that each rotation matrix estimated for eachrandom set is a true rotation matrix, if all three bearingvectors satisfy our assumption of having similar depth fromcamera center.

3) Selecting the Best Rotation using Histogram Voting:The main drawback for the traditional method of outlierrejection is that each hypothesis should become validatedfor all matched points. The execution time of this processhas a direct relationship with the number of iterations andalso number of matched points. In order to address this chal-lenge, we use a histogram voting methodology for findingthe best rotation with maximum number of votes amongall estimations. A similar approach has been also used inthe 1-point RANSAC method [17] which requires muchfewer iterations. We followed a similar methodology forbuilding the histogram on the yaw rotations between images.However, in our method, we need to investigate much moreiterations in order to cover a large proportion of triples inthe matched points.

Page 4: 3-Point RANSAC for Fast Vision based Rotation Estimation ...ce.sharif.edu/~dkamran/MFI.pdf · Danial Kamran 1, Mohammad T. Manzuri , Ali Marjovi2 and Mahdi Karimian Abstract—In

The proposed 3-point RANSAC methodology for estimat-ing the rotation between 2D correspondences has these ad-vantages comparing to other rotation estimation algorithms:• Fewer points for a primary estimation: The minimum

number of required points for estimating the rotationis 3, which is much lower than 5, 6, 7, or 8-pointalgorithms [11]–[13], [15].

• Fewer RANSAC iterations: Requiring only 3 pointsto have a primary estimation, the RANSAC algorithmneeds much lower iterations in order to cover a majorityof combinations of 3 points from all correspondences.

• Voting based selection: Histogram voting approach hasa great impact on the improvement of execution timedue to the fact that there is no need to calculate numberof inliers for each hypothesis. We will also see thatthis method has a great impact on the independent andparallel implementation of RANSAC iterations.

C. GPU Implementation

As we expressed previously, one of the main challenges forthe RANSAC algorithm is serial execution of its iterationswhich leads to a huge processing time for the rotation esti-mation process. In this paper, we introduce a novel parallelimplementation of RANSAC using CUDA technology whichperforms all of RANSAC iterations separately in each CUDAthread. This methodology has been also used previouslyin [8] for the 8-point RANSAC algorithm without anyconcentration on the rotation estimation. Moreover, selectingthe best rotation of all iterations using the voting scheme, inthe methodology which is proposed in this paper there isno need to calculate number of inliers in each CUDA corewhich causes all of parallel threads to execute completelyindependent from each other.

IV. EXPERIMENTAL RESULTS

In this section, at first we introduce datasets and hardwareplatform that we have used in our experiments. Then weexplain the procedure of detecting key points and matchingthem that is needed for the purpose of generating 2Dcorrespondences between two camera views. Finally, we willreport the result of our experiments for applying our methodon one of datasets and compare its accuracy and speed with8-point algorithm.

A. Datasets and Hardware

We have used Karlsruhe [9], [10] datasets in our experi-ments which is a famous benchmark for evaluating severalvision based motion estimation algorithms e.g., [25]. Fig. 2shows a sample frame from the Karlsruhe dataset that givesa general imagination about the location of camera whichis attached on the top of a ground vehicle. Angular rotationspeed from a high quality IMU is assumed as the groundtruth for this dataset and is used for the evaluation of ouralgorithm. Moreover, due to the fact that the ground vehicleonly has significant changes in the yaw degree rotation, wemerely investigate the yaw rotation in order to compare our

Fig. 2. A sample frame from Karlsruhe benchmark which we have usedin our experiments.

method with ground truth data and also with the 8-pointalgorithm.

The system on which the proposed RANSAC has beenevaluated is equipped with a Dual-Core Pentium CPU run-ning at frequency of 2.8 GHz and 4 gigabytes of RAM.Moreover, the CUDA device which we have utilized forGPU implementation is a NVIDIA GeForce GT 730 withmaximum frequency of 902 MHz and 2 Multiprocessorscontaining 384 CUDA cores. The GPU code has beendeveloped using CUDA version 7.5 [26].

B. Accuracy of 3-point RANSAC

Now in order to evaluate the accuracy of our 3-pointRANSAC, we first compare it with the well-known 8-point algorithm which estimates both translation and rotationbetween two camera frames using the Fundamental matrixmethod [11]. For this purpose, we have compared the yawrotation of proposed method with the 8-point algorithmfor two datasets of Karlsruhe benchmark. The result forone of these datasets is shown in Fig. 3 which depictsestimated yaw degree using our 3-point RANSAC and also 8-point RANSAC for different numbers of RANSAC iteration.According to these diagrams, the proposed 3-point RANSACdoes not have a good accuracy compared to the 8-pointmethod for small number of iterations (N=300). The mainreason is that we have used the voting scheme for selectingthe best rotation, which requires a relatively large numberof iterations in order to have a comprehensive selection.However, for higher iterations (N=15000) the proposed al-gorithm has a comparable accuracy due to selecting moresamples. Therefore, we can conclude that the proposed 3-point RANSAC algorithm requires more iterations because ofusing voting scheme for selecting the best rotation. Althoughmore iterations can lead to longer execution time (which isconsidered as a drawback in previous works), we will showthat by using the GPU implementation and also because ofindependent execution of iterations due to the voting scheme,we can achieve a great improvement in the performance of3-point RANSAC.

C. Parallel Implementation using CUDA

One of the main advantages of our method comparingwith other RANSAC based estimation algorithms is usingvoting scheme for selecting the best rotation among sev-eral hypotheses. This approach prevents parallel threads tocompute the number of inliers for their estimated solutionand leads to a much faster implementation. This fact isdepicted in Fig. 4 which shows a great improvement in

Page 5: 3-Point RANSAC for Fast Vision based Rotation Estimation ...ce.sharif.edu/~dkamran/MFI.pdf · Danial Kamran 1, Mohammad T. Manzuri , Ali Marjovi2 and Mahdi Karimian Abstract—In

frame number0 100 200 300 400

yaw

degre

e (

deg)

-800

-700

-600

-500

-400

-300

-200

-100

0

100

200

ground truth

8 point GPU

3 point GPU

frame number0 100 200 300 400

yaw

degre

e (

deg)

-800

-700

-600

-500

-400

-300

-200

-100

0

100

ground truth

8 point GPU

3 point GPU

frame number0 100 200 300 400

yaw

degre

e (

deg)

-800

-700

-600

-500

-400

-300

-200

-100

0

100

ground truth

8 point GPU

3 point GPU

N=15000N=2000N=300

Fig. 3. The result of estimating yaw rotation for one of Karlsruhe datasets using 3-point RANSAC (our method) and 8-point RANSAC for differentnumbers of RANSAC iteration.

number of RANSAC iterations

0 5000 10000 15000

mean

co

mp

uta

tio

n t

ime (

ms)

0

50

100

150

8 point GPU

3 point GPU

Fig. 4. Execution time of parallel 3-point (our method) and 8-pointimplementation using CUDA for different numbers of RANSAC iteration

the execution time of our 3-point RANSAC. According tothis figure, the execution time of 8-point algorithm has adirect relationship with the number of RANSAC iterations;however, the GPU implementation of our 3-point approachdoes not have any significant relationship to this parameter.This is because of independent implementation of parallelthreads in our method. We have also computed the achievedspeed-up in the execution time for the GPU implementationcomparing with serial implementation for the 8-point andour 3-point RANSAC algorithms which are depicted in Fig.5 and Fig. 6 respectively. Comparing these two diagrams(note the difference in the scale of vertical axis for thesetwo diagrams), we conclude that GPU implementation has abetter improvement in the speed of 3-point algorithm due tothe fact that it has much less computations and also uses thevoting scheme for selecting the best rotation.

D. The impact of Number of Features

One of the main parameters which has a great impacton the speed of motion estimation algorithm is the numberof detected features in the camera images. Although usingmore feature points can increase the accuracy of estimatedvalues, it also can lead to more processing and finally longerexecution time for both feature matching and RANSACsteps. In the RANSAC algorithm, inlier selection is the main

number of RANSAC iterations

0 5000 10000 15000

sp

ee

d u

p

1

2

3

4

5

6

7

8

9

Fig. 5. Speed-up in the execution time of GPU implementation comparingwith serial implementation for the 8-point RANSAC algorithm

number of RANSAC iterations

0 5000 10000 15000

sp

eed

up

0

50

100

150

Fig. 6. Speed-up in the execution time of GPU implementation comparingwith serial implementation for our 3-point RANSAC algorithm

part which its execution time has a great relationship withthe number of matched points. However, using the votingapproach for finding the best rotation, the performance of ourproposed 3-point RANSAC does not get any adverse effectfrom the number of detected features. This fact is shownin Fig. 7 which plots the execution time of overall motionestimation for serial 8-point, parallel 8-point and parallel 3-point algorithms using different numbers of matched points.Comparing these diagrams, we conclude that by increasingthe number of matched points, the execution time for bothserial and parallel 8-point algorithms will increase; however,the execution time of our 3-point algorithm does not haveany dependency on the number of points. On the other hand,

Page 6: 3-Point RANSAC for Fast Vision based Rotation Estimation ...ce.sharif.edu/~dkamran/MFI.pdf · Danial Kamran 1, Mohammad T. Manzuri , Ali Marjovi2 and Mahdi Karimian Abstract—In

number of points

8 ser 8 GPU 3 GPU 8 ser 8 GPU 3 GPU 8 ser 8 GPU 3 GPU 8 ser 8 GPU 3 GPU

mean c

om

puta

tion tim

e (

ms)

0

50

100

150

200

250

300

350

point matchingRANSAC

500100 1000 2000

Fig. 7. The impact of different number of matched points on the executiontime of serial 8-point, parallel 8-point and parallel 3-point. Blue shows theexecution time related to the point matching part of the algorithm, whileyellow shows the execution time related to the RANSAC algorithm. Whilepoint matching and also the 8-point RANSAC (serial and parallel) executiontimes increase with the number of points, the 3-point parallel RANSAC(our method) execution time remains almost constant for various number ofpoints.

the processing time for feature matching step has a greatdependency with the number of matched points which wedo not have any control on it. Therefore, we conclude thatthe overall speed of estimation algorithm is restricted byfeature extraction and matching process, as the parallel 3-point RANSAC has a relatively constant execution time fordifferent numbers of key points.

V. CONCLUSIONS

In this paper, we showed that by decoupling the effectof translation and rotation from each other, we can utilizea 3-point RANSAC algorithm for the purpose of merelyrotation estimation between camera views. We also used avoting based approach for choosing the best rotation amongseveral hypotheses. Moreover, we explained the methodologyof parallel implementation for the proposed 3-point algorithmusing the GPU technology. One of the main advantages forthe proposed 3-point method is that it allows parallel threadson the GPU to execute completely independent from eachother. We evaluated the accuracy and speed of our approachwith the 8-point algorithm and showed that it improvesthe performance of the rotation estimation algorithm up to150 times faster than parallel implementation of 8-pointalgorithm. Still, the proposed approach has a similar accuracywith the 8-point algorithm.

REFERENCES

[1] C. Hide, T. Botterill, and M. Andreotti, “Low cost vision-aided imu forpedestrian navigation,” in Ubiquitous Positioning Indoor Navigationand Location Based Service (UPINLBS), 2010, Oct 2010, pp. 1–7.

[2] K.-H. Yang, W.-S. Yu, and X.-Q. Ji, “Rotation estimation for mobilerobot based on single-axis gyroscope and monocular camera,” Inter-national Journal of Automation and Computing, vol. 9, no. 3, pp.292–298, 2012.

[3] D. Hong, H. Lee, H. Cho, Y. Park, and J. H. Kim, “Visual gyroscope:Integration of visual information with gyroscope for attitude measure-ment of mobile platform,” in Control, Automation and Systems, 2008.ICCAS 2008. International Conference on, 2008, pp. 503–507.

[4] C. Chen, W. Chai, S. Wang, and H. Roth, “A single frame depthvisual gyroscope and its integration for robot navigation and mappingin structured indoor environments,” in Autonomous Robot Systemsand Competitions (ICARSC), 2014 IEEE International Conference on,2014, pp. 73–78.

[5] L. Ruotsalainen, “Visual gyroscope and odometer for pedestrian indoornavigation with a smartphone,” in Proceedings of the 25th Interna-tional Technical Meeting of The Satellite Division of the Institute ofNavigation (ION GNSS 2012), 2012, pp. 2422–2431.

[6] C. Beall, D.-N. Ta, K. Ok, and F. Dellaert, “Attitude heading referencesystem with rotation-aiding visual landmarks,” in Information Fusion(FUSION), 2012 15th International Conference on, 2012, pp. 976–982.

[7] M. A. Fischler and R. C. Bolles, “Random sample consensus: Aparadigm for model fitting with applications to image analysis andautomated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395,1981.

[8] J. Roters and X. Jiang, “Festgpu: a framework for fast robustestimation on gpu,” Journal of Real-Time Image Processing, pp.1–14, 2014. [Online]. Available: http://dx.doi.org/10.1007/s11554-014-0439-5

[9] A. Geiger, J. Ziegler, and C. Stiller, “Stereoscan: Dense 3d reconstruc-tion in real-time,” in Intelligent Vehicles Symposium (IV), 2011, pp.963–968.

[10] A. Geiger, M. Roser, and R. Urtasun, “Efficient large-scale stereomatching,” in Asian Conference on Computer Vision (ACCV), 2010,pp. 25–38.

[11] H. C. Longuet-Higgins, “Readings in computer vision: Issues,problems, principles, and paradigms,” M. A. Fischler and O. Firschein,Eds., San Francisco, CA, USA, 1987, ch. A Computer Algorithm forReconstructing a Scene from Two Projections, pp. 61–62. [Online].Available: http://dl.acm.org/citation.cfm?id=33517.33523

[12] D. Nister, “An efficient solution to the five-point relative pose prob-lem,” Pattern Analysis and Machine Intelligence, IEEE Transactionson, vol. 26, no. 6, pp. 756–770, 2004.

[13] H. Stewnius, C. Engels, and D. Nistr, “Recent developments on directrelative orientation,” {ISPRS} Journal of Photogrammetry and RemoteSensing, vol. 60, no. 4, pp. 284 – 294, 2006.

[14] Z. Kukelova, M. Bujnak, and T. Pajdla, “Polynomial eigenvaluesolutions to the 5-pt and 6-pt relative pose problems.” in BMVC, vol. 2,2008, p. 2008.

[15] L. Kneip, R. Siegwart, and M. Pollefeys, “Finding the exact rotationbetween two images independently of the translation,” in ComputerVision ECCV 2012, 2012, vol. 7577, pp. 696–709.

[16] L. Kneip and S. Lynen, “Direct optimization of frame-to-frame rota-tion,” in Computer Vision (ICCV), 2013 IEEE International Confer-ence on, 2013, pp. 2352–2359.

[17] D. Scaramuzza, F. Fraundorfer, and R. Siegwart, “Real-time monocularvisual odometry for on-road vehicles with 1-point ransac,” in Roboticsand Automation, 2009. ICRA ’09. IEEE International Conference on,May 2009, pp. 4293–4299.

[18] C. Troiani, A. Martinelli, C. Laugier, and D. Scaramuzza, “2-point-based outlier rejection for camera-imu systems with applications tomicro aerial vehicles,” in 2014 IEEE International Conference onRobotics and Automation (ICRA), May 2014, pp. 5530–5536.

[19] F. Fraundorfer, P. Tanskanen, and M. Pollefeys, “A minimal casesolution to the calibrated relative pose problem for the case of twoknown orientation angles,” in Proceedings of the 11th EuropeanConference on Computer Vision: Part IV, ser. ECCV’10. Berlin,Heidelberg: Springer-Verlag, 2010, pp. 269–282. [Online]. Available:http://dl.acm.org/citation.cfm?id=1888089.1888110

[20] K. Arun, T. Huang, and S. Blostein, “Least-squares fitting of two3-d point sets,” Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. PAMI-9, no. 5, pp. 698–700, 1987.

[21] L. Kneip, “Real-time scalable structure from motion: From funda-mental geometric vision to collaborative mapping,” Ph.D. dissertation,ETH, 2012.

[22] E. Rosten and T. Drummond, “Machine learning for high-speed cornerdetection,” in Computer Vision ECCV 2006, 2006, vol. 3951, pp. 430–443.

[23] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “Brief: Binary robustindependent elementary features,” in Computer Vision ECCV 2010,2010, vol. 6314, pp. 778–792.

[24] M. C. Laurent Kneip and R. Siegwart, “Robust real-time visualodometry with a single camera and an imu,” in Proceedings of theBritish Machine Vision Conference, 2011, pp. 16.1–16.11.

[25] A. Geiger, J. Ziegler, and C. Stiller, “Stereoscan: Dense 3d reconstruc-tion in real-time,” in Intelligent Vehicles Symposium (IV), 2011 IEEE,2011, pp. 963–968.

[26] NVIDIA. (2015) Cuda toolkit documentation (version 7.5).