Pose Optimization in Edge Distance Field for Textureless 3D ......Figure 1: 3D contour matching. (a)...
Transcript of Pose Optimization in Edge Distance Field for Textureless 3D ......Figure 1: 3D contour matching. (a)...
Pose Optimization in Edge Distance Field for Textureless 3DObject Tracking
Bin WangSchool of Computer Science andTechnology, Shandong University
Fan ZhongSchool of Computer Science andTechnology, Shandong University
Xueying QinSchool of Computer Science andTechnology, Shandong University
ABSTRACT
This paper presents a monocular model-based 3D trackingapproach for textureless objects. Instead of explicitly search-ing for 3D-2D correspondences as previous methods, whichunavoidably generates individual outlier matches, we aim tominimize the holistic distance between the predicted objectcontour and the query image edges. We propose a methodthat can directly solve 3D pose parameters in unsegmentededge distance field. We derive the differentials of edge match-ing distance with respect to the pose parameters, and searchthe optimal 3D pose parameters using standard gradient-based non-linear optimization techniques. To avoid beingtrapped in local minima and to deal with potential largeinter-frame motions, a particle filtering process with a firstorder autoregressive state dynamics is exploited. Occlusionsare handled by a robust estimator. The effectiveness of ourapproach is demonstrated using comparative experimentson real image sequences with occlusions, large motions andcluttered backgrounds.
CCS CONCEPTS
โข Computing methodologies โ Mixed / augmentedreality; Tracking ;
KEYWORDS
3D tracking, Pose optimization, Distance field, Particle filter
ACM Reference format:Bin Wang, Fan Zhong, and Xueying Qin. 2017. Pose Optimizationin Edge Distance Field for Textureless 3D Object Tracking. In
Proceedings of CGI โ17, Yokohama, Japan, June 27-30, 2017,6 pages.https://doi.org/10.1145/3095140.3095172
1 INTRODUCTION
3D object tracking is a fundamental computer vision task witha variety of applications in augmented reality and robotics.
Permission to make digital or hard copies of all or part of this workfor personal or classroom use is granted without fee provided thatcopies are not made or distributed for profit or commercial advantageand that copies bear this notice and the full citation on the firstpage. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copyotherwise, or republish, to post on servers or to redistribute to lists,requires prior specific permission and/or a fee. Request permissionsfrom [email protected].
CGI โ17, June 27-30, 2017, Yokohama, Japan
ยฉ 2017 Association for Computing Machinery.ACM ISBN 978-1-4503-5228-4/17/06. . . $15.00https://doi.org/10.1145/3095140.3095172
3D tracking systems are expected to estimate the six degreesof freedom (6DoF) pose parameters of an object relative tothe camera in unknown and dynamic environments.
Thanks to robust keypoint extractors and descriptors [1,14, 19], keypoint-based 3D tracking methods [13, 17, 23]have been proposed in last decades. Although these methodsachieve impressive performance for textured objects, theyare not applicable for textureless objects due to the lack ofreliable feature matches.
For textureless objects, edges or contours are the vitalvisual cue that can be detected in most situations. Therefore,edges or contours are exploited in edge-based 3D trackingmethods. RAPID [7] is the first edge-based 3D tracker byprojecting the sampled 3D model edge points to a 2D im-age and aligning the projected edge points with the imageedge points. A 1D search for the image edge point is per-formed at the projected edge point along the direction per-pendicular to the projected model edge, and the 2D pixelposition with maximum gradient is considered as the corre-spondence of the sampled 3D model point. Several improve-ments [5, 15, 20, 22, 24, 25] have been proposed for better3D-2D correspondences afterwards. These methods are shownto be effective in some situations. However, in order to ob-tain 3D-2D correspondences, all of these methods perform1D local search perpendicular to object contours within alimited extent. Since individual contour points are usuallyindistinctive, incorrect 3D-2D correspondences are unavoid-able, especially in more complicated cases of occlusions, largeinter-frame motions and background clutters. On the otherhand, the search extent is difficult to be determined, largeextent may result in more incorrect correspondences, whilesmall extent will lead to sensitivity with large inter-framemotions.
In this paper, we propose an edge-based 3D tracking ap-proach without explicit 3D-2D correspondences. We formu-late the 3D object tracking as a contour matching problemby fitting the 3D object contour to the image edge distancefield. The distance between the predicted object contour andthe query image edge in distance field is minimized by directoptimisation of the 3D pose parameters. The differentials ofthis energy with respect to the pose parameters are derived,and the pose parameters are optimized iteratively with theLevenberg-Marquardt(L-M) algorithm. The image edges areextracted by edge detector, without requirements to do objectsegmentation and edge filtering. Cluttered backgrounds canbe handled because of the holistic matching energy function.For better tracking performance, a particle filtering process
CGI โ17, June 27-30, 2017, Yokohama, Japan B. Wang et al.
with a first order autoregressive state dynamics is exploitedto deal with potential large inter-frame motions, and a robustestimator is adopted to handle occlusions. Comparative ex-periments demonstrate that the proposed method is effectiveon real image sequences with occlusions, large motions andcluttered backgrounds.
2 RELATED WORK
The literature of 3D object tracking is particularly massive.Given a 3D model of the target, 3D-2D correspondencesbetween 3D features of the model and 2D measurements inthe image are exploited for 3D tracking. According to the typeof 2D measurements, 3D-2D correspondences based methodsare classified as fudicial-based [10], keypoint-based [13, 17, 23]and edge-based [5, 7, 11, 15, 25]. We refer the reader to [12, 16]for more details. Here we restrict ourselves to monocular edge-based methods with a 3D model of the target available.
As the vital visual cue of textureless objects, edges orcontours are employed by edge-based trackers. To construct3D-2D correspondences, [5, 15] adopted a precomputed con-volution kernel function of the contour orientation to findthe image edge point only with an orientation similar tothe projected contour orientation, not the edge point withmaximum gradient in the scanline. [25] proposed multipleedge hypotheses that it attributed all the local extrema ofthe gradient along the scanline as potential correspondences.Multiple hypotheses prevent a wrong gradient maximum frombeing assigned as a correspondence, but increase the com-putation cost. [20, 24] exploited the local region knowledgeof foreground and background, and the affinity of adjacentimage edge points to search the optimal correspondences.These improvements are impressive, however the robustnessdecreases when ambiguities between different edges occur inthe scene. All of these methods assume that edge correspon-dences are determined by local search in a limit extent basedon a prior pose. If the prior pose is sufficiently incorrect,tracking probably fails especially when the object moves fast.
To ensure a good prior pose, multiple pose hypotheses areproposed to propagate the pose using particle filters. SinceIsard et al. [9] applied particle filters to 2D edge tracking, var-ious edge-based 3D tracking methods have been implementedusing a particle filter framework. [11] tracked complex 3Dobjects by utilizing the GPU to calculate edges and evaluatepose likelihoods. [4] employed keypoint correspondences forparticle initialization, and then refined the estimated pose byaligning the projected model edges and the image edges using3D-2D correspondences explicitly. Our approach is similarto [3] by both using the edge distance field. [3] is a tracking-by-detection framework that uses distance field for chamfermatching between offline 2D edge templates and the sceneimage, and the coarse pose of the matched template is usedfor initializing particles. It employed a standard edge-basedtracking to establish the correspondences and predict thefinal pose, while our approach directly optimizes the pose indistance filed. These methods can achieve impressive resultsespecially for large inter-frame motions.
(a) (b) (c) (d)
Figure 1: 3D contour matching. (a) A color image ๐ผ ofthe target (a CAT). (b) Canny edge map of ๐ผ. (c) Projected3D contour points are evolved under Equation 4 from a priorpose (in red) to the optimal pose (in green) in edge distancefield of ๐ผ. (d) The optimal pose is visualized by the greenwireframe overlaid on ๐ผ.
3 POSE PARAMETERIZATION
3D tracking aims to estimate the 6DoF pose of an objectrelative to the camera given the camera intrinsic matrix๐พ โ R3ร3, the image ๐ผ, the 3D model ๐ . A 3D model point๐ โ R3 is projected to an image pixel ๐ฅ โ R2 using thestandard pinhole camera model by
๏ฟฝฬ๏ฟฝ = ๐พ ยท [๐ (๐)|๐ก] ยท ๏ฟฝฬ๏ฟฝ (1)
where ๐ก and ๐ (๐) are respectively the translation vector androtation matrix parameterized by the Rodrigues rotationvector ๐. ๏ฟฝฬ๏ฟฝ and ๏ฟฝฬ๏ฟฝ are respectively the homogenous repre-sentation of ๐ฅ and ๐. The 6DoF pose is parameterized by๐ = (๐, ๐ก) in this paper.
4 3D TRACKING IN EDGE DISTANCEFIELD
This section describes our 3D tracking approach in edgedistance field in detail. We begin with an energy functionthat defines the contour matching in distance field. Then inorder to overcome the local minima in 3D pose optimization,we introduce a particle filtering process with a first orderautoregressive state dynamics. Finally, the optimization canbe made more immune to occlusions by employing a robustestimator.
4.1 Pose optimization as 3D contourmatching
We formulate the pose optimization as a contour match-ing process by fitting the 3D contour points ฮฆ to the edgedistance field ๐ท of the image ๐ผ.
To generate ๐ท, we use canny edge detector [2] to extractthe edge map, then apply a fast distance transform [6] tothe edge map. For each image pixel ๐ฅ, ๐ท(๐ฅ) indicates thedistance to its nearest image edge point. Figure 1(a), 1(b), 1(c)illustrate the procedure of generating edge distance field.
Given a prior pose ๐๐ and the 3D model ๐ , we can renderthe depth map and extract the 2D contour points on it. Thenฮฆ can be easily obtained by back-projecting the 2D contourpoints to the the 3D model ๐ .
For a 3D contour point ๐๐ โ ฮฆ, the matching cost ๐๐ isdenoted as follows:
๐๐ = ๐ท(๐(๐พ ยท [๐ (๐)|๐ก] ยท ๏ฟฝฬ๏ฟฝ๐)) (2)
Pose Optimization in Edge Distance Field for Textureless 3D Object Tracking CGI โ17, June 27-30, 2017, Yokohama, Japan
(a) (b) (c) (d)
(e) (f) (g) (h)
Figure 2: Particle filtering between 2 consecutiveframes of the target (a CAT). We take ๐ as 10 forexample. (a) ๐ผ๐กโ1 with pose ๐๐กโ1. (b) ๐ผ๐ก with a relative largemotion from ๐ผ๐กโ1. (c) The estimated optimal pose using ourmethod is visualized by green wireframe overlaid on the tar-get. (d) The estimated pose with ๐๐กโ1 as the single priorpose is visualized, obviously it converged to a local minimum.(e) The prior particles Sโ
๐ก sampled from S+๐กโ1 are visualized
by red projected contours. (f) The updated particles S+๐ก by
Equation 4 are visualized by green projected contours. (g)Projected 3D contour of the best particle is evolved usingour method from a prior pose (in red) to the optimal pose(in green) in distance field of ๐ผ๐ก. (h) Projected 3D contouris evolved from the single prior pose ๐๐กโ1 (in red) to theoptimal pose (in green), it got stuck in a local minimum.
where ๐ transforms the homogenous coordinates into its non-homogenous representation. Therefore, the whole matchingcost ๐ธ between ฮฆ and ๐ท is defined by following objectiveenergy function:
๐ธ(๐, ๐ก) =โ๏ธ
๐๐โฮฆ
๐2๐ . (3)
Starting from ๐๐, the optimal pose ๐๐ is calculated by itera-tively minimizing Equation 3 using L-M algorithm:
๐๐ = arg min๐,๐ก
๐ธ(๐, ๐ก). (4)
Figure 1(c) shows the evolution of projected 3D contourpoints from ๐๐ to ๐๐ by Equation 4. Figure 1(d) shows theoptimal pose ๐๐ by overlying green wireframe on the target.
We can differentiate Equation 3 with respect to the poseparameters ๐ to get the Jacobian required by L-M:
๐ฝ =โ๏ธ๐
๐๐๐๐๐ฅ๐ยท ๐๐ฅ๐
๐๐(5)
where ๐ฅ๐ is the projection of ๐๐. The differential ๐๐๐๐๐ฅ๐โ R1ร2
can be computed using centered finite difference in ๐ท, and๐๐ฅ๐๐๐โ R2ร6 can be derived from Equation 1 analytically.
4.2 Particle filtering
Generally 3D tracking starts from a prior pose ๐๐ก at frame ๐ก.Many edge-based tracking methods [5, 15, 20, 24] initialize ๐๐ก
using the estimated pose ๐๐กโ1 of frame ๐กโ1 under small inter-frame motion assumption. If large inter-frame motions occur,these methods with single pose hypothesis fail inevitablyas the initial pose is not close to the global minimum. In
this paper we exploit a particle filtering framework with afirst order autoregressive dynamical model to deal with largeinter-frame motions. Figure 2(a), 2(b) give 2 consecutiveframes with a relative large inter-frame motion. Figure 2(c)shows the optimized pose with our particle filtering methodin contrast to the estimated pose only using a single priorpose from previous frame as in Figure 2(d).
In our particle filtering framework, the posterior distribu-tion (denoted +) at ๐กโ1 is represented as a set of ๐ particlesS+๐กโ1 associated with normalized weights ฮ+
๐กโ1 by{๏ธS+๐กโ1 = {๐(0)
๐กโ1, . . . ,๐(๐โ1)๐กโ1 }
ฮ+๐กโ1 = {๐(0)๐กโ1, . . . , ๐
(๐โ1)๐กโ1 }
(6)
where the particle ๐(๐)๐กโ1 is the ๐th sample in the 6DoF pose
space with an associated weight ๐(๐)๐กโ1. For the next frame ๐ก,
particles Sโ๐ก are resampled according to the weights ฮ+
๐กโ1 andtransited by a motion model to form the prior distribution(denoted โ) of frame ๐ก:{๏ธ
Sโ๐ก = {๐(0)
๐ก , . . . ,๐(๐โ1)๐ก }
ฮโ๐ก = {1/๐, . . . , 1/๐}
(7)
where ฮโ๐ก indicates each particle with an uniform weight. Sโ
๐ก
is updated to S+๐ก using 3D contour matching as described in
setection 4.1, and ฮ+๐ก is evaluated according to the contour
matching cost.
For each particle ๐(๐)๐ก โ Sโ
๐ก at frame ๐ก, the transition isprocessed as:
๐(๐)๐ก = ๐
(๐)๐ก + ๐๐ฃ๐ฃ
(๐)๐กโ1 + ๐๐๐
(๐)๐ก ,๐ฃ
(๐)๐กโ1 = ๐
(๐)๐กโ1 โ ๐
(๐)๐กโ2 (8)
where ๐ฃ(๐)๐กโ1 denotes the velocity of the ๐th particle between
๐(๐)๐กโ1 โ S+
๐กโ1 and ๐(๐)๐กโ2 โ S+
๐กโ2. ๐(๐)๐ก โ R6 is a Gaussian noise
from ๐ฉ (0,ฮฃ) with a zero mean and a covariance ฮฃ โ R6ร6.๐๐ฃ and ๐๐ are the weights for balancing the autoregressivemotion and random motion respectively.
Once each particle is transited, it is employed in Equa-
tion 3 as the initial pose. The updating from ๐(๐)๐ก โ Sโ
๐ก to
๐(๐)๐ก โ S+
๐ก is accomplished by optimizing the Equation 3.Figure 2(e), 2(f) illustrate the particles updated from Sโ
๐ก toS+๐ก , and Figure 2(g) shows the evolution of the best particle
in contrast to the evolution using a single prior pose as given
in Figure 2(h). We denote the matching cost ๐(๐)๐ก of ๐
(๐)๐ก
as the residual after optimization, then the corresponding
weight ๐(๐)๐ก of ๐
(๐)๐ก โ S+
๐ก is evaluated using the residual ๐(๐)๐ก
as follows:
๐(๐)๐ก = exp(โ๐
(๐)๐ก
๐๐) (9)
where the positive ๐๐ is a control parameter for scaling the
residual. After updating all the particles, the weight ๐(๐)๐ก โ
ฮ+๐ก of each particle ๐
(๐)๐ก โ S+
๐ก is normalized by
๐(๐)๐ก =
๐(๐)๐กโ๏ธ๐
๐=1 ๐(๐)๐ก
. (10)
We consider the particle ๐(๐)๐ก โ S+
๐ก with the highest weightas the optimal pose at frame ๐ก.
CGI โ17, June 27-30, 2017, Yokohama, Japan B. Wang et al.
When updating is done, we obtain the posterior distribu-tion at frame ๐ก, and it is used to generate the prior distributionat next frame ๐ก+ 1 by importance resampling. Each particle
๐(๐)๐ก+1 in the prior particles Sโ
๐ก+1 are randomly drawn from S+๐ก
according to the weights ฮ+๐ก . After resampling is done, we
can start the next particle filtering process.
4.3 Occlusion handling
We assume that occluded 3D contour points tend to havea large distance to the nearest image edge. A simple qua-dratic error in Equation 3 is sensitive to occluded contourpoints. Therefore, an alternative weight function ๐ค can beincoporated by generalizing Equation 3:
๐ธ(๐, ๐ก) =โ๏ธ๐
๐ค(๐๐)๐2๐ . (11)
We can still apply the L-M algorithm to solve the iteratedre-weighted least-squares(IRLS). In order to suppress theoccluded 3D contour points strongly by assigning them zeroweights, in this paper we chose the Tukey estimator:
๐ค(๐) =
{๏ธ๐[1โ ( ๐
๐)2]2 if |๐| โค ๐
0 otherwise(12)
where ๐ is the maximum valid distance for a projected 3Dcontour point to the nearest image edge.
4.4 Implementation details
This section elaborates the complete framework of our ap-proach and the details of parameter settings. Given an imagesequence โ, the camera intrinsic matrix ๐พ and an initialpose ๐๐, our 3D tracking approach estimates the pose ๐๐ก
at each frame ๐ก with the 3D model ๐ . The framework ofour approach is summarized into Algorithm 1. To minimize
Algorithm 1: 3D Tracking in Edge Distance Field
Input: โ = {๐ผ0, ๐ผ1, ยท ยท ยท , ๐ผ๐โ1}, ๐ , ๐พ, ๐๐
Param : ๐, ๐ , ฮฃ, ๐๐ฃ, ๐๐, ๐๐
Output: ๐ซ = {๐0,๐1, ยท ยท ยท ,๐๐โ1}1 for ๐กโ 0 to ๐ โ 1 do2 ๐ธ๐๐๐๐๐๐๐ก โ Canny(๐ผ๐ก)
3 ๐ท๐ก โ DistanceTransform(๐ธ๐๐๐๐๐๐๐ก)
4 if ๐ก = 0 then5 Sโ
๐ก โ InitializeParticles(๐ , ๐๐)
6 ฮโ๐ก โ InitializeWeightsUniform(๐)
7 Sโ๐ก โ TransitParticles(Sโ
๐ก , ๐๐ฃ, ๐๐, ฮฃ)
8 for ๐ โ 0 to ๐ โ 1 do
9 ฮฆ๐ โ Extract3DContour(๐พ, ๐(๐)๐ก โ Sโ
๐ก , ๐)
10 (๐(๐)๐ก โ S+
๐ก , ๐(๐)๐ก ) โ IRLS(๐ท๐ก, ฮฆ๐, ๐, ฮฃ)
11 ๐(๐)๐ก โ ฮ+
๐ก โ EvaluateWeights(๐(๐)๐ก , ๐๐)
12 ฮ+๐ก โ NormalizeWeights(ฮ+
๐ก )
13 ๐๐ก โ ๐ซ โ SelectBestParticle(S+๐ก , ฮ
+๐ก )
14 Sโ๐ก+1 โ ResampleParticles(S+
๐ก , ฮ+๐ก , ๐)
15 return ๐ซ
Figure 3: Tracking results showing the effectivenessof particle filtering. The rows are respectively the track-ing results with 1, 10, 100 particles. Our method with 100particles works fine, others fail.
Equation 11, the L-M algorithm is employed and terminateduntil an maximum iteration steps(100). The number of par-ticles ๐ is set as a different value from {1, 10, 100} so as toevaluate the efficiency and effectiveness of particle filtering.The covariance matrix ฮฃ โ R6ร6 in Equation 8 is diagonal,and ๐๐๐๐(ฮฃ) = (0.1, 0.1, 0.1, 0.1, 0.1, 0.1). In Equation 8, ๐๐ฃ is0.1, and ๐๐ is 1. ๐๐ in Equation 9 is set as the number of 3Dcontour points(i.e. |ฮฆ|) to normalize the matching residual.The threshold ๐ of Equation 12 is set as 20.
5 EXPERIMENTS
We validate our method using different comparative experi-ments. Firstly, we demonstrate the effectiveness of particle fil-tering and occlusion handling. Then we compare our methodwith a pixel-wise tracker PWP3D [18] and a state-of-the-arttracker GOS [24] with a single pose hypothesis. Finally, weadopt marker-based tracker [10] as the baseline, and compareour method with it to accomplish quantitative evaluations.
Our system is implemented in C++, and runs on an Inteli5 CPU with 8GB RAM. The test sequences are captured bya camera with 640ร 480 resolution, and both the object andcamera are movable.
Many edge-based methods assume that the motion ofcamera or object is small and smooth, thus the prior poseis close to the global minimum. If large inter-frame motionsoccur, the tracking fails and the estimated pose convergesto a local minimum. In order to deal with large inter-framemotions, we respectively employ 1, 10, 100 particles to trackthe object under fast camera or object motion. Figure 3 givesthe comparison results of 1, 10, 100 particles in case of fastcamera motion, we use 100 particles to track the BUNNYeven with a bit motion blur.
Most edge-based methods construct 3D-2D edge correspon-dences explicitly. When the target object is occluded, theyfind wrong edge correspondences, or even cannot find edgecorrespondences. As we described in section 4.3, we employthe Tukey estimator to suppress the importance of occludedcontour points. Our method can work if the occlusion is not
Pose Optimization in Edge Distance Field for Textureless 3D Object Tracking CGI โ17, June 27-30, 2017, Yokohama, Japan
Figure 4: Tracking results showing the effectivenessof occlusion handling. The DUCK is tracked using 10particles when it is occluded by a card or a hand.
Figure 5: Comparison between our method (๐ = 100)with PWP3D and GOS. The first row is the trackingresults of our method (๐ = 100). The second row is theresults of PWP3D, and the tracking drifted due to fast cameramotion and white background. The third row is the resultsof GOS, and the edge correspondences of white BUNNY aredisturbed by the white edges from background.
very severe. Figure 4 shows that the DUCK is tracked suc-cessfully using 10 particles with occlusion handling when itis occluded by a card or a hand.
We compare our method with a pixel-wised tracker PW-P3D [18]. PWP3D proposed a probabilistic framework forsimultaneous 2D image segmentation and 3D object track-ing without building 3D-2D correspondences explicitly. Itemployed the statistic color information of foreground andbackground based on a prior pose, thus the tracking driftswhen the target object has similar color statistics with theenvironment. Figure 5 compares the tracking results obtainedby PWP3D (the second row) with our method (๐ = 100, thefirst row). PWP3D drifted due to the fast camera motionand white background.
As a representative edge-based tracker, GOS [24] con-structs 3D-2D edge correspondences explicitly. The imageedge correspondences are determined by a 1D local searchwith a limited extent based on a prior pose. Although GOSexploits the region knowledge around the edge point and theaffinity of adjacent image edge points, it still suffers fromthe error correspondences raised from the similar edge ofbackground. Figure 5 compares the tracking results betweenGOS (the third row) and our method (๐ = 100, the firstrow). For the GOS tracker, the edge correspondences of whiteBUNNY are disturbed by the white edges from background.
Table 1: Performance evaluation on 4 sequences. Rfor rotation error, T for translation error, and AD for averagedistance.
Seq(#) Method Time Accuracy(ms) R(โ) T(cm)AD(cm)
PWP3D 79.8 201.8 10.2 11.9GOS 148.6 5.7 1.0 0.9
BUNNYOur method(๐ = 1) 16.0 8.0 1.9 2.0(1657) Our method(๐ = 10) 90.7 2.9 1.6 1.5
Our method(๐ = 100) 934.5 2.4 1.6 1.3PWP3D 90.0 208.8 6.0 7.9GOS 154.7 1.8 1.3 1.3
CAT Our method(๐ = 1) 16.7 3.6 1.9 1.9(1427) Our method(๐ = 10) 110.6 1.5 2.0 2.0
Our method(๐ = 100)1015.8 1.3 1.9 1.9PWP3D 73.5 189.2 7.8 9.1GOS 143.8 75.8 3.2 3.5
DUCK Our method(๐ = 1) 15.9 6.5 1.9 2.0(1677) Our method(๐ = 10) 104.5 6.5 1.8 1.9
Our method(๐ = 100) 960.9 6.2 1.8 2.0PWP3D 52.3 235.8 16.9 22.6GOS 147.6 16.2 3.5 3.3
LEGO Our method(๐ = 1) 15.9 9.3 1.5 1.9(1466) Our method(๐ = 10) 87.4 4.6 1.5 1.7
Our method(๐ = 100) 830.7 4.4 1.3 1.7PWP3D 73.9 208.9 10.2 12.9GOS 148.7 24.9 2.3 2.3
AVG Our method(๐ = 1) 16.1 6.9 1.8 2.0(1556) Our method(๐ = 10) 98.3 3.9 1.7 1.8
Our method(๐ = 100) 935.5 3.6 1.7 1.7
In order to evaluate the tracking accuracy and time perfor-mance of our method, we adopt the marker-based trackingmethod [10] as the baseline. The coordinate system of theobject is predefined and fixed relative to the marker, thusthe ground truth pose of the object can be transformed fromthe marker. We captured 4 sequences respectively for theBUNNY, CAT, DUCK and LEGO with a hand-hold cam-era as Figure 6 shows. Table 1 gives the accuracy and timeperformance of our method, we use two criteria to evaluatethe accuracy: rotation error and translation error [21], andaverage distance [8]. The proposed method runs 62fps with 1particle, 10fps with 10 particles and 1fps with 100 particleson average. For tracking accuracy, it achieves 3.6โ rotationerror and 1.7cm translation error, and 1.7cm average distancebetween all vertices of the 3D model in the estimated poseand the ground truth pose with 100 particles on average.
6 LIMITATIONS AND FUTURE WORK
Our method directly optimizes the pose parameters in edgedistance field, thus it depends on the edge map of the queryimage. When the color of background is similar to the object,or severe motion blur occurs in the scene, we cannot getadequate contours of the target. Our method will fail becauseit merely exploits the contour information. In future work,we will consider the inner pixel information of the object toenhance the tracking robustness.
For symmetrical objects, multiple poses may result intothe same contour, which is ambiguous for our method toestimate the pose correctly. Moreover, the computation cost
CGI โ17, June 27-30, 2017, Yokohama, Japan B. Wang et al.
Figure 6: Tracking results by our method(๐ = 100)for BUNNY, CAT, DUCK and LEGO.
is also a critical problem for fast object tracking using massiveparticles. We will speed up the particle filtering using GPUtechnique.
7 CONCLUSIONS
This paper proposes a monocular model-based 3D trackingapproach for textureless objects. We minimize the holisticdistance between the predicted object contour and the queryimage edge in distance field via direct optimisation of the 3Dpose parameters. We derive the differentials of this energywith respect to the pose parameters, and search the optimalpose parameters using L-M algorithm. We employ a particlefiltering framework to avoid being trapped in local minima.Occlusions are handled by a robust estimator. We demon-strated the effectiveness of our method using comparativeexperiments on real image sequences with occlusions, largemotions and cluttered backgrounds.
ACKNOWLEDGMENTS
The authors gratefully acknowledge the anonymous reviewersfor their comments to help us to improve our paper. This workis supported by the National Key Research and DevelopmentProgram of China (No. 2016YFB1001501).
REFERENCES[1] Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal
Fua. 2010. BRIEF: Binary Robust Independent Elementary Fea-tures. In European Conference on Computer Vision. 778โ792.
[2] John Canny. 1986. A Computational Approach to Edge Detec-tion. IEEE Transactions on Pattern Analysis and MachineIntelligence 8, 6 (1986), 679โ98.
[3] Changhyun Choi and Henrik I. Christensen. 2012. 3D TexturelessObject Detection and Tracking: An Edge-Based Approach. InInternational Conference on Intelligent Robotics and System.3877โ3884.
[4] Changhyun Choi and Henrik I. Christensen. 2012. Robust 3DVisual Tracking using Particle Filtering on the Special EuclideanGroup: A Combined Approach of Keypoint and Edge Features.
International Journal of Robotics Research 33, 4 (2012), 498โ519.
[5] Andrew I. Comport, Eric Marchand, and Francois Chaumette.2003. A Real-Time Tracker for Markerless Augmented Reality.In IEEE International Symposium on Mixed and AugmentedReality. 36โ45.
[6] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. 2004. DistanceTransforms of Sampled Functions. Theory of Computing 8, 19(2004), 415โ428.
[7] Chris Harris and Carl Stennett. 1990. RAPiD - A Video-RateObject Tracker. In British Machine Vision Conference. 73โ77.
[8] Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Stefan Holz-er, Gary Bradski, Kurt Konolige, and Nassir Navab. 2012. ModelBased Training, Detection and Pose Estimation of Texture-less3D Objects in Heavily Cluttered Scenes. In Asian conference oncomputer vision. 548โ562.
[9] Michael Isard and Andrew Blake. 1998. CONDENSATION -Conditional Density Propagation for Visual Tracking. In Inter-national Journal of Computer Vision. 5โ28.
[10] Hirokazu Kato and Mark Billinghurst. 1999. Marker Trackingand HMD Calibration for a Video-Based Augmented Reality Con-ferencing System. In IEEE and ACM International Workshopon Augmented Reality. 85โ94.
[11] Georg Klein and David W. Murray. 2006. Full-3D Edge Trackingwith a Particle Filter. In British Machine Vision Conference.1119โ1128.
[12] Vincent Lepetit and Pascal Fua. 2005. Monocular Model-Based3D Tracking Of Rigid Objects: A Survey. Foundations and Trendsin Computer Graphics and Vision 1, 1 (2005), 1โ89.
[13] Manolis Lourakis and Xenophon Zabulis. 2013. Model-Based PoseEstimation for Rigid Objects. In International Conference onComputer Vision Systems. 83โ92.
[14] David G Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision60, 2 (2004), 91โ110.
[15] Eric Marchand, Patrick Bouthemy, and Francois Chaumette. 2001.A 2DC3D Model-Based Approach to Real-time Visual Tracking.Image and Vision Computing 19, 13 (2001), 941โ955.
[16] Eric Marchand, Hideaki Uchiyama, and Fabien Spindler. 2015.Pose Estimation for Augmented Reality: A Hands-On Survey.IEEE Transactions on Visualization and Computer Graphics22, 12 (2015), 2633โ2651.
[17] Youngmin Park, Vincent Lepetit, and Woontack Woo. 2008. Mul-tiple 3D Object Tracking for Augmented Reality. In IEEE Inter-national Symposium on Mixed and Augmented Reality. 117โ120.
[18] Victor A. Prisacariu and Ian D. Reid. 2012. PWP3D: Real-TimeSegmentation and Tracking of 3D Objects. International Journalof Computer Vision 98, 3 (2012), 335โ354.
[19] Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski.2011. ORB: An Efficient Alternative to SIFT or SURF. In IEEEInternational Conference on Computer Vision. 2564โ2571.
[20] Byung Kuk Seo, Hanhoon Park, Jong Il Park, Stefan Hinterstoiss-er, and Slobodan Ilic. 2014. Optimal Local Searching for Fast andRobust Textureless 3D Object Tracking in Highly Cluttered Back-grounds. IEEE Transactions on Visualization and ComputerGraphics 20, 1 (2014), 99โ110.
[21] Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi,Antonio Criminisi, and Andrew Fitzgibbon. 2013. Scene Coor-dinate Regression Forests for Camera Relocalization in RGB-DImages. In IEEE Conference on Computer Vision and PatternRecognition. 2930โ2937.
[22] Luca Vacchetti, Vincent Lepetit, and Pascal Fua. 2004. Combin-ing Edge and Texture Information for Real-Time Accurate 3DCamera Tracking. In IEEE International Symposium on Mixedand Augmented Reality. 48โ56.
[23] Luca Vacchetti, Vincent Lepetit, and Pascal Fua. 2004. StableReal-Time 3D Tracking using Online and Offline Information.IEEE Transactions on Pattern Analysis and Machine Intelli-gence 26, 10 (2004), 1385โ1391.
[24] Guofeng Wang, Bin Wang, Fan Zhong, Xueying Qin, and BaoquanChen. 2015. Global Optimal Searching for Textureless 3D ObjectTracking. The Visual Computer 31, 6 (2015), 979โ988.
[25] Harald Wuest, Florent Vial, and Didier Strieker. 2005. AdaptiveLine Tracking with Multiple Hypotheses for Augmented Reality.In IEEE International Symposium on Mixed and AugmentedReality. 62โ69.