Pose Optimization in Edge Distance Field for Textureless 3D ......Figure 1: 3D contour matching. (a)...

Pose Optimization in Edge Distance Field for Textureless 3DObject Tracking

Bin WangSchool of Computer Science andTechnology, Shandong University

[email protected]

Fan ZhongSchool of Computer Science andTechnology, Shandong University

[email protected]

Xueying QinSchool of Computer Science andTechnology, Shandong University

[email protected]

ABSTRACT

This paper presents a monocular model-based 3D trackingapproach for textureless objects. Instead of explicitly search-ing for 3D-2D correspondences as previous methods, whichunavoidably generates individual outlier matches, we aim tominimize the holistic distance between the predicted objectcontour and the query image edges. We propose a methodthat can directly solve 3D pose parameters in unsegmentededge distance field. We derive the differentials of edge match-ing distance with respect to the pose parameters, and searchthe optimal 3D pose parameters using standard gradient-based non-linear optimization techniques. To avoid beingtrapped in local minima and to deal with potential largeinter-frame motions, a particle filtering process with a firstorder autoregressive state dynamics is exploited. Occlusionsare handled by a robust estimator. The effectiveness of ourapproach is demonstrated using comparative experimentson real image sequences with occlusions, large motions andcluttered backgrounds.

CCS CONCEPTS

• Computing methodologies → Mixed / augmentedreality; Tracking ;

KEYWORDS

3D tracking, Pose optimization, Distance field, Particle filter

ACM Reference format:Bin Wang, Fan Zhong, and Xueying Qin. 2017. Pose Optimizationin Edge Distance Field for Textureless 3D Object Tracking. In

Proceedings of CGI ’17, Yokohama, Japan, June 27-30, 2017,6 pages.https://doi.org/10.1145/3095140.3095172

1 INTRODUCTION

3D object tracking is a fundamental computer vision task witha variety of applications in augmented reality and robotics.

Permission to make digital or hard copies of all or part of this workfor personal or classroom use is granted without fee provided thatcopies are not made or distributed for profit or commercial advantageand that copies bear this notice and the full citation on the firstpage. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copyotherwise, or republish, to post on servers or to redistribute to lists,requires prior specific permission and/or a fee. Request permissionsfrom [email protected].

CGI ’17, June 27-30, 2017, Yokohama, Japan

© 2017 Association for Computing Machinery.ACM ISBN 978-1-4503-5228-4/17/06. . . $15.00https://doi.org/10.1145/3095140.3095172

3D tracking systems are expected to estimate the six degreesof freedom (6DoF) pose parameters of an object relative tothe camera in unknown and dynamic environments.

Thanks to robust keypoint extractors and descriptors [1,14, 19], keypoint-based 3D tracking methods [13, 17, 23]have been proposed in last decades. Although these methodsachieve impressive performance for textured objects, theyare not applicable for textureless objects due to the lack ofreliable feature matches.

For textureless objects, edges or contours are the vitalvisual cue that can be detected in most situations. Therefore,edges or contours are exploited in edge-based 3D trackingmethods. RAPID [7] is the first edge-based 3D tracker byprojecting the sampled 3D model edge points to a 2D im-age and aligning the projected edge points with the imageedge points. A 1D search for the image edge point is per-formed at the projected edge point along the direction per-pendicular to the projected model edge, and the 2D pixelposition with maximum gradient is considered as the corre-spondence of the sampled 3D model point. Several improve-ments [5, 15, 20, 22, 24, 25] have been proposed for better3D-2D correspondences afterwards. These methods are shownto be effective in some situations. However, in order to ob-tain 3D-2D correspondences, all of these methods perform1D local search perpendicular to object contours within alimited extent. Since individual contour points are usuallyindistinctive, incorrect 3D-2D correspondences are unavoid-able, especially in more complicated cases of occlusions, largeinter-frame motions and background clutters. On the otherhand, the search extent is difficult to be determined, largeextent may result in more incorrect correspondences, whilesmall extent will lead to sensitivity with large inter-framemotions.

In this paper, we propose an edge-based 3D tracking ap-proach without explicit 3D-2D correspondences. We formu-late the 3D object tracking as a contour matching problemby fitting the 3D object contour to the image edge distancefield. The distance between the predicted object contour andthe query image edge in distance field is minimized by directoptimisation of the 3D pose parameters. The differentials ofthis energy with respect to the pose parameters are derived,and the pose parameters are optimized iteratively with theLevenberg-Marquardt(L-M) algorithm. The image edges areextracted by edge detector, without requirements to do objectsegmentation and edge filtering. Cluttered backgrounds canbe handled because of the holistic matching energy function.For better tracking performance, a particle filtering process

https://doi.org/10.1145/3095140.3095172

https://doi.org/10.1145/3095140.3095172

CGI ’17, June 27-30, 2017, Yokohama, Japan B. Wang et al.

with a first order autoregressive state dynamics is exploitedto deal with potential large inter-frame motions, and a robustestimator is adopted to handle occlusions. Comparative ex-periments demonstrate that the proposed method is effectiveon real image sequences with occlusions, large motions andcluttered backgrounds.

2 RELATED WORK

The literature of 3D object tracking is particularly massive.Given a 3D model of the target, 3D-2D correspondencesbetween 3D features of the model and 2D measurements inthe image are exploited for 3D tracking. According to the typeof 2D measurements, 3D-2D correspondences based methodsare classified as fudicial-based [10], keypoint-based [13, 17, 23]and edge-based [5, 7, 11, 15, 25]. We refer the reader to [12, 16]for more details. Here we restrict ourselves to monocular edge-based methods with a 3D model of the target available.

As the vital visual cue of textureless objects, edges orcontours are employed by edge-based trackers. To construct3D-2D correspondences, [5, 15] adopted a precomputed con-volution kernel function of the contour orientation to findthe image edge point only with an orientation similar tothe projected contour orientation, not the edge point withmaximum gradient in the scanline. [25] proposed multipleedge hypotheses that it attributed all the local extrema ofthe gradient along the scanline as potential correspondences.Multiple hypotheses prevent a wrong gradient maximum frombeing assigned as a correspondence, but increase the com-putation cost. [20, 24] exploited the local region knowledgeof foreground and background, and the affinity of adjacentimage edge points to search the optimal correspondences.These improvements are impressive, however the robustnessdecreases when ambiguities between different edges occur inthe scene. All of these methods assume that edge correspon-dences are determined by local search in a limit extent basedon a prior pose. If the prior pose is sufficiently incorrect,tracking probably fails especially when the object moves fast.

To ensure a good prior pose, multiple pose hypotheses areproposed to propagate the pose using particle filters. SinceIsard et al. [9] applied particle filters to 2D edge tracking, var-ious edge-based 3D tracking methods have been implementedusing a particle filter framework. [11] tracked complex 3Dobjects by utilizing the GPU to calculate edges and evaluatepose likelihoods. [4] employed keypoint correspondences forparticle initialization, and then refined the estimated pose byaligning the projected model edges and the image edges using3D-2D correspondences explicitly. Our approach is similarto [3] by both using the edge distance field. [3] is a tracking-by-detection framework that uses distance field for chamfermatching between offline 2D edge templates and the sceneimage, and the coarse pose of the matched template is usedfor initializing particles. It employed a standard edge-basedtracking to establish the correspondences and predict thefinal pose, while our approach directly optimizes the pose indistance filed. These methods can achieve impressive resultsespecially for large inter-frame motions.

(a) (b) (c) (d)

Figure 1: 3D contour matching. (a) A color image 𝐼 ofthe target (a CAT). (b) Canny edge map of 𝐼. (c) Projected3D contour points are evolved under Equation 4 from a priorpose (in red) to the optimal pose (in green) in edge distancefield of 𝐼. (d) The optimal pose is visualized by the greenwireframe overlaid on 𝐼.

3 POSE PARAMETERIZATION

3D tracking aims to estimate the 6DoF pose of an objectrelative to the camera given the camera intrinsic matrix𝐾 ∈ R3×3, the image 𝐼, the 3D model 𝑀 . A 3D model point𝑋 ∈ R3 is projected to an image pixel 𝑥 ∈ R2 using thestandard pinhole camera model by

�̄� = 𝐾 · [𝑅(𝑟)|𝑡] · �̄� (1)

where 𝑡 and 𝑅(𝑟) are respectively the translation vector androtation matrix parameterized by the Rodrigues rotationvector 𝑟. �̄� and �̄� are respectively the homogenous repre-sentation of 𝑥 and 𝑋. The 6DoF pose is parameterized by𝑝 = (𝑟, 𝑡) in this paper.

4 3D TRACKING IN EDGE DISTANCEFIELD

This section describes our 3D tracking approach in edgedistance field in detail. We begin with an energy functionthat defines the contour matching in distance field. Then inorder to overcome the local minima in 3D pose optimization,we introduce a particle filtering process with a first orderautoregressive state dynamics. Finally, the optimization canbe made more immune to occlusions by employing a robustestimator.

4.1 Pose optimization as 3D contourmatching

We formulate the pose optimization as a contour match-ing process by fitting the 3D contour points Φ to the edgedistance field 𝐷 of the image 𝐼.

To generate 𝐷, we use canny edge detector [2] to extractthe edge map, then apply a fast distance transform [6] tothe edge map. For each image pixel 𝑥, 𝐷(𝑥) indicates thedistance to its nearest image edge point. Figure 1(a), 1(b), 1(c)illustrate the procedure of generating edge distance field.

Given a prior pose 𝑝𝑖 and the 3D model 𝑀 , we can renderthe depth map and extract the 2D contour points on it. ThenΦ can be easily obtained by back-projecting the 2D contourpoints to the the 3D model 𝑀 .

For a 3D contour point 𝑋𝑖 ∈ Φ, the matching cost 𝑒𝑖 isdenoted as follows:

𝑒𝑖 = 𝐷(𝜋(𝐾 · [𝑅(𝑟)|𝑡] · �̄�𝑖)) (2)

Pose Optimization in Edge Distance Field for Textureless 3D Object Tracking CGI ’17, June 27-30, 2017, Yokohama, Japan

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 2: Particle filtering between 2 consecutiveframes of the target (a CAT). We take 𝑁 as 10 forexample. (a) 𝐼𝑡−1 with pose 𝑝𝑡−1. (b) 𝐼𝑡 with a relative largemotion from 𝐼𝑡−1. (c) The estimated optimal pose using ourmethod is visualized by green wireframe overlaid on the tar-get. (d) The estimated pose with 𝑝𝑡−1 as the single priorpose is visualized, obviously it converged to a local minimum.(e) The prior particles S−

𝑡 sampled from S+𝑡−1 are visualized

by red projected contours. (f) The updated particles S+𝑡 by

Equation 4 are visualized by green projected contours. (g)Projected 3D contour of the best particle is evolved usingour method from a prior pose (in red) to the optimal pose(in green) in distance field of 𝐼𝑡. (h) Projected 3D contouris evolved from the single prior pose 𝑝𝑡−1 (in red) to theoptimal pose (in green), it got stuck in a local minimum.

where 𝜋 transforms the homogenous coordinates into its non-homogenous representation. Therefore, the whole matchingcost 𝐸 between Φ and 𝐷 is defined by following objectiveenergy function:

𝐸(𝑟, 𝑡) =∑︁

𝑋𝑖∈Φ

𝑒2𝑖 . (3)

Starting from 𝑝𝑖, the optimal pose 𝑝𝑜 is calculated by itera-tively minimizing Equation 3 using L-M algorithm:

𝑝𝑜 = arg min𝑟,𝑡

𝐸(𝑟, 𝑡). (4)

Figure 1(c) shows the evolution of projected 3D contourpoints from 𝑝𝑖 to 𝑝𝑜 by Equation 4. Figure 1(d) shows theoptimal pose 𝑝𝑜 by overlying green wireframe on the target.

We can differentiate Equation 3 with respect to the poseparameters 𝑝 to get the Jacobian required by L-M:

𝐽 =∑︁𝑖

𝜕𝑒𝑖𝜕𝑥𝑖· 𝜕𝑥𝑖

𝜕𝑝(5)

where 𝑥𝑖 is the projection of 𝑋𝑖. The differential 𝜕𝑒𝑖𝜕𝑥𝑖∈ R1×2

can be computed using centered finite difference in 𝐷, and𝜕𝑥𝑖𝜕𝑝∈ R2×6 can be derived from Equation 1 analytically.

4.2 Particle filtering

Generally 3D tracking starts from a prior pose 𝑝𝑡 at frame 𝑡.Many edge-based tracking methods [5, 15, 20, 24] initialize 𝑝𝑡

using the estimated pose 𝑝𝑡−1 of frame 𝑡−1 under small inter-frame motion assumption. If large inter-frame motions occur,these methods with single pose hypothesis fail inevitablyas the initial pose is not close to the global minimum. In

this paper we exploit a particle filtering framework with afirst order autoregressive dynamical model to deal with largeinter-frame motions. Figure 2(a), 2(b) give 2 consecutiveframes with a relative large inter-frame motion. Figure 2(c)shows the optimized pose with our particle filtering methodin contrast to the estimated pose only using a single priorpose from previous frame as in Figure 2(d).

In our particle filtering framework, the posterior distribu-tion (denoted +) at 𝑡−1 is represented as a set of 𝑁 particlesS+𝑡−1 associated with normalized weights Θ+

𝑡−1 by{︂S+𝑡−1 = {𝑝(0)

𝑡−1, . . . ,𝑝(𝑁−1)𝑡−1 }

Θ+𝑡−1 = {𝜃(0)𝑡−1, . . . , 𝜃

(𝑁−1)𝑡−1 }

(6)

where the particle 𝑝(𝑘)𝑡−1 is the 𝑘th sample in the 6DoF pose

space with an associated weight 𝜃(𝑘)𝑡−1. For the next frame 𝑡,

particles S−𝑡 are resampled according to the weights Θ+

𝑡−1 andtransited by a motion model to form the prior distribution(denoted −) of frame 𝑡:{︂

S−𝑡 = {𝑝(0)

𝑡 , . . . ,𝑝(𝑁−1)𝑡 }

Θ−𝑡 = {1/𝑁, . . . , 1/𝑁}

(7)

where Θ−𝑡 indicates each particle with an uniform weight. S−

𝑡

is updated to S+𝑡 using 3D contour matching as described in

setection 4.1, and Θ+𝑡 is evaluated according to the contour

matching cost.

For each particle 𝑝(𝑘)𝑡 ∈ S−

𝑡 at frame 𝑡, the transition isprocessed as:

𝑝(𝑘)𝑡 = 𝑝

(𝑘)𝑡 + 𝜆𝑣𝑣

(𝑘)𝑡−1 + 𝜆𝑛𝑛

(𝑘)𝑡 ,𝑣

(𝑘)𝑡−1 = 𝑝

(𝑘)𝑡−1 − 𝑝

(𝑘)𝑡−2 (8)

where 𝑣(𝑘)𝑡−1 denotes the velocity of the 𝑘th particle between

𝑝(𝑘)𝑡−1 ∈ S+

𝑡−1 and 𝑝(𝑘)𝑡−2 ∈ S+

𝑡−2. 𝑛(𝑘)𝑡 ∈ R6 is a Gaussian noise

from 𝒩 (0,Σ) with a zero mean and a covariance Σ ∈ R6×6.𝜆𝑣 and 𝜆𝑛 are the weights for balancing the autoregressivemotion and random motion respectively.

Once each particle is transited, it is employed in Equa-

tion 3 as the initial pose. The updating from 𝑝(𝑘)𝑡 ∈ S−

𝑡 to

𝑝(𝑘)𝑡 ∈ S+

𝑡 is accomplished by optimizing the Equation 3.Figure 2(e), 2(f) illustrate the particles updated from S−

𝑡 toS+𝑡 , and Figure 2(g) shows the evolution of the best particle

in contrast to the evolution using a single prior pose as given

in Figure 2(h). We denote the matching cost 𝑒(𝑘)𝑡 of 𝑝

(𝑘)𝑡

as the residual after optimization, then the corresponding

weight 𝜃(𝑘)𝑡 of 𝑝

(𝑘)𝑡 ∈ S+

𝑡 is evaluated using the residual 𝑒(𝑘)𝑡

as follows:

𝜃(𝑘)𝑡 = exp(−𝑒

(𝑘)𝑡

𝜆𝑒) (9)

where the positive 𝜆𝑒 is a control parameter for scaling the

residual. After updating all the particles, the weight 𝜃(𝑘)𝑡 ∈

Θ+𝑡 of each particle 𝑝

(𝑘)𝑡 ∈ S+

𝑡 is normalized by

𝜃(𝑘)𝑡 =

𝜃(𝑘)𝑡∑︀𝑁

𝑖=1 𝜃(𝑖)𝑡

. (10)

We consider the particle 𝑝(𝑘)𝑡 ∈ S+

𝑡 with the highest weightas the optimal pose at frame 𝑡.


When updating is done, we obtain the posterior distribu-tion at frame 𝑡, and it is used to generate the prior distributionat next frame 𝑡+ 1 by importance resampling. Each particle

𝑝(𝑘)𝑡+1 in the prior particles S−

𝑡+1 are randomly drawn from S+𝑡

according to the weights Θ+𝑡 . After resampling is done, we

can start the next particle filtering process.

4.3 Occlusion handling

We assume that occluded 3D contour points tend to havea large distance to the nearest image edge. A simple qua-dratic error in Equation 3 is sensitive to occluded contourpoints. Therefore, an alternative weight function 𝑤 can beincoporated by generalizing Equation 3:

𝐸(𝑟, 𝑡) =∑︁𝑖

𝑤(𝑒𝑖)𝑒2𝑖 . (11)

We can still apply the L-M algorithm to solve the iteratedre-weighted least-squares(IRLS). In order to suppress theoccluded 3D contour points strongly by assigning them zeroweights, in this paper we chose the Tukey estimator:

𝑤(𝑒) =

{︂𝑒[1− ( 𝑒

𝑐)2]2 if |𝑒| ≤ 𝑐

0 otherwise(12)

where 𝑐 is the maximum valid distance for a projected 3Dcontour point to the nearest image edge.

4.4 Implementation details

This section elaborates the complete framework of our ap-proach and the details of parameter settings. Given an imagesequence ℐ, the camera intrinsic matrix 𝐾 and an initialpose 𝑝𝑖, our 3D tracking approach estimates the pose 𝑝𝑡

at each frame 𝑡 with the 3D model 𝑀 . The framework ofour approach is summarized into Algorithm 1. To minimize

Algorithm 1: 3D Tracking in Edge Distance Field

Input: ℐ = {𝐼0, 𝐼1, · · · , 𝐼𝑇−1}, 𝑀 , 𝐾, 𝑝𝑖

Param : 𝑐, 𝑁 , Σ, 𝜆𝑣, 𝜆𝑛, 𝜆𝑒

Output: 𝒫 = {𝑝0,𝑝1, · · · ,𝑝𝑇−1}1 for 𝑡← 0 to 𝑇 − 1 do2 𝐸𝑑𝑔𝑒𝑀𝑎𝑝𝑡 ← Canny(𝐼𝑡)

3 𝐷𝑡 ← DistanceTransform(𝐸𝑑𝑔𝑒𝑀𝑎𝑝𝑡)

4 if 𝑡 = 0 then5 S−

𝑡 ← InitializeParticles(𝑁 , 𝑝𝑖)

6 Θ−𝑡 ← InitializeWeightsUniform(𝑁)

7 S−𝑡 ← TransitParticles(S−

𝑡 , 𝜆𝑣, 𝜆𝑛, Σ)

8 for 𝑘 ← 0 to 𝑁 − 1 do

9 Φ𝑘 ← Extract3DContour(𝐾, 𝑝(𝑘)𝑡 ∈ S−

𝑡 , 𝑀)

10 (𝑝(𝑘)𝑡 ∈ S+

𝑡 , 𝑒(𝑘)𝑡 ) ← IRLS(𝐷𝑡, Φ𝑘, 𝑐, Σ)

11 𝜃(𝑘)𝑡 ∈ Θ+

𝑡 ← EvaluateWeights(𝑒(𝑘)𝑡 , 𝜆𝑒)

12 Θ+𝑡 ← NormalizeWeights(Θ+

𝑡 )

13 𝑝𝑡 ∈ 𝒫 ← SelectBestParticle(S+𝑡 , Θ

+𝑡 )

14 S−𝑡+1 ← ResampleParticles(S+

𝑡 , Θ+𝑡 , 𝑁)

15 return 𝒫

Figure 3: Tracking results showing the effectivenessof particle filtering. The rows are respectively the track-ing results with 1, 10, 100 particles. Our method with 100particles works fine, others fail.

Equation 11, the L-M algorithm is employed and terminateduntil an maximum iteration steps(100). The number of par-ticles 𝑁 is set as a different value from {1, 10, 100} so as toevaluate the efficiency and effectiveness of particle filtering.The covariance matrix Σ ∈ R6×6 in Equation 8 is diagonal,and 𝑑𝑖𝑎𝑔(Σ) = (0.1, 0.1, 0.1, 0.1, 0.1, 0.1). In Equation 8, 𝜆𝑣 is0.1, and 𝜆𝑛 is 1. 𝜆𝑒 in Equation 9 is set as the number of 3Dcontour points(i.e. |Φ|) to normalize the matching residual.The threshold 𝑐 of Equation 12 is set as 20.

5 EXPERIMENTS

We validate our method using different comparative experi-ments. Firstly, we demonstrate the effectiveness of particle fil-tering and occlusion handling. Then we compare our methodwith a pixel-wise tracker PWP3D [18] and a state-of-the-arttracker GOS [24] with a single pose hypothesis. Finally, weadopt marker-based tracker [10] as the baseline, and compareour method with it to accomplish quantitative evaluations.

Our system is implemented in C++, and runs on an Inteli5 CPU with 8GB RAM. The test sequences are captured bya camera with 640× 480 resolution, and both the object andcamera are movable.

Many edge-based methods assume that the motion ofcamera or object is small and smooth, thus the prior poseis close to the global minimum. If large inter-frame motionsoccur, the tracking fails and the estimated pose convergesto a local minimum. In order to deal with large inter-framemotions, we respectively employ 1, 10, 100 particles to trackthe object under fast camera or object motion. Figure 3 givesthe comparison results of 1, 10, 100 particles in case of fastcamera motion, we use 100 particles to track the BUNNYeven with a bit motion blur.

Most edge-based methods construct 3D-2D edge correspon-dences explicitly. When the target object is occluded, theyfind wrong edge correspondences, or even cannot find edgecorrespondences. As we described in section 4.3, we employthe Tukey estimator to suppress the importance of occludedcontour points. Our method can work if the occlusion is not

Pose Optimization in Edge Distance Field for Textureless 3D Object Tracking CGI ’17, June 27-30, 2017, Yokohama, Japan

Figure 4: Tracking results showing the effectivenessof occlusion handling. The DUCK is tracked using 10particles when it is occluded by a card or a hand.

Figure 5: Comparison between our method (𝑁 = 100)with PWP3D and GOS. The first row is the trackingresults of our method (𝑁 = 100). The second row is theresults of PWP3D, and the tracking drifted due to fast cameramotion and white background. The third row is the resultsof GOS, and the edge correspondences of white BUNNY aredisturbed by the white edges from background.

very severe. Figure 4 shows that the DUCK is tracked suc-cessfully using 10 particles with occlusion handling when itis occluded by a card or a hand.

We compare our method with a pixel-wised tracker PW-P3D [18]. PWP3D proposed a probabilistic framework forsimultaneous 2D image segmentation and 3D object track-ing without building 3D-2D correspondences explicitly. Itemployed the statistic color information of foreground andbackground based on a prior pose, thus the tracking driftswhen the target object has similar color statistics with theenvironment. Figure 5 compares the tracking results obtainedby PWP3D (the second row) with our method (𝑁 = 100, thefirst row). PWP3D drifted due to the fast camera motionand white background.

As a representative edge-based tracker, GOS [24] con-structs 3D-2D edge correspondences explicitly. The imageedge correspondences are determined by a 1D local searchwith a limited extent based on a prior pose. Although GOSexploits the region knowledge around the edge point and theaffinity of adjacent image edge points, it still suffers fromthe error correspondences raised from the similar edge ofbackground. Figure 5 compares the tracking results betweenGOS (the third row) and our method (𝑁 = 100, the firstrow). For the GOS tracker, the edge correspondences of whiteBUNNY are disturbed by the white edges from background.

Table 1: Performance evaluation on 4 sequences. Rfor rotation error, T for translation error, and AD for averagedistance.

Seq(#) Method Time Accuracy(ms) R(∘) T(cm)AD(cm)

PWP3D 79.8 201.8 10.2 11.9GOS 148.6 5.7 1.0 0.9

BUNNYOur method(𝑁 = 1) 16.0 8.0 1.9 2.0(1657) Our method(𝑁 = 10) 90.7 2.9 1.6 1.5

Our method(𝑁 = 100) 934.5 2.4 1.6 1.3PWP3D 90.0 208.8 6.0 7.9GOS 154.7 1.8 1.3 1.3

CAT Our method(𝑁 = 1) 16.7 3.6 1.9 1.9(1427) Our method(𝑁 = 10) 110.6 1.5 2.0 2.0

Our method(𝑁 = 100)1015.8 1.3 1.9 1.9PWP3D 73.5 189.2 7.8 9.1GOS 143.8 75.8 3.2 3.5

DUCK Our method(𝑁 = 1) 15.9 6.5 1.9 2.0(1677) Our method(𝑁 = 10) 104.5 6.5 1.8 1.9


LEGO Our method(𝑁 = 1) 15.9 9.3 1.5 1.9(1466) Our method(𝑁 = 10) 87.4 4.6 1.5 1.7


AVG Our method(𝑁 = 1) 16.1 6.9 1.8 2.0(1556) Our method(𝑁 = 10) 98.3 3.9 1.7 1.8

Our method(𝑁 = 100) 935.5 3.6 1.7 1.7

In order to evaluate the tracking accuracy and time perfor-mance of our method, we adopt the marker-based trackingmethod [10] as the baseline. The coordinate system of theobject is predefined and fixed relative to the marker, thusthe ground truth pose of the object can be transformed fromthe marker. We captured 4 sequences respectively for theBUNNY, CAT, DUCK and LEGO with a hand-hold cam-era as Figure 6 shows. Table 1 gives the accuracy and timeperformance of our method, we use two criteria to evaluatethe accuracy: rotation error and translation error [21], andaverage distance [8]. The proposed method runs 62fps with 1particle, 10fps with 10 particles and 1fps with 100 particleson average. For tracking accuracy, it achieves 3.6∘ rotationerror and 1.7cm translation error, and 1.7cm average distancebetween all vertices of the 3D model in the estimated poseand the ground truth pose with 100 particles on average.

6 LIMITATIONS AND FUTURE WORK

Our method directly optimizes the pose parameters in edgedistance field, thus it depends on the edge map of the queryimage. When the color of background is similar to the object,or severe motion blur occurs in the scene, we cannot getadequate contours of the target. Our method will fail becauseit merely exploits the contour information. In future work,we will consider the inner pixel information of the object toenhance the tracking robustness.

For symmetrical objects, multiple poses may result intothe same contour, which is ambiguous for our method toestimate the pose correctly. Moreover, the computation cost


Figure 6: Tracking results by our method(𝑁 = 100)for BUNNY, CAT, DUCK and LEGO.

is also a critical problem for fast object tracking using massiveparticles. We will speed up the particle filtering using GPUtechnique.

7 CONCLUSIONS

This paper proposes a monocular model-based 3D trackingapproach for textureless objects. We minimize the holisticdistance between the predicted object contour and the queryimage edge in distance field via direct optimisation of the 3Dpose parameters. We derive the differentials of this energywith respect to the pose parameters, and search the optimalpose parameters using L-M algorithm. We employ a particlefiltering framework to avoid being trapped in local minima.Occlusions are handled by a robust estimator. We demon-strated the effectiveness of our method using comparativeexperiments on real image sequences with occlusions, largemotions and cluttered backgrounds.

ACKNOWLEDGMENTS

The authors gratefully acknowledge the anonymous reviewersfor their comments to help us to improve our paper. This workis supported by the National Key Research and DevelopmentProgram of China (No. 2016YFB1001501).

REFERENCES[1] Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal

Fua. 2010. BRIEF: Binary Robust Independent Elementary Fea-tures. In European Conference on Computer Vision. 778–792.

[2] John Canny. 1986. A Computational Approach to Edge Detec-tion. IEEE Transactions on Pattern Analysis and MachineIntelligence 8, 6 (1986), 679–98.

[3] Changhyun Choi and Henrik I. Christensen. 2012. 3D TexturelessObject Detection and Tracking: An Edge-Based Approach. InInternational Conference on Intelligent Robotics and System.3877–3884.

[4] Changhyun Choi and Henrik I. Christensen. 2012. Robust 3DVisual Tracking using Particle Filtering on the Special EuclideanGroup: A Combined Approach of Keypoint and Edge Features.

International Journal of Robotics Research 33, 4 (2012), 498–519.

[5] Andrew I. Comport, Eric Marchand, and Francois Chaumette.2003. A Real-Time Tracker for Markerless Augmented Reality.In IEEE International Symposium on Mixed and AugmentedReality. 36–45.

[6] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. 2004. DistanceTransforms of Sampled Functions. Theory of Computing 8, 19(2004), 415–428.

[7] Chris Harris and Carl Stennett. 1990. RAPiD - A Video-RateObject Tracker. In British Machine Vision Conference. 73–77.

[8] Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Stefan Holz-er, Gary Bradski, Kurt Konolige, and Nassir Navab. 2012. ModelBased Training, Detection and Pose Estimation of Texture-less3D Objects in Heavily Cluttered Scenes. In Asian conference oncomputer vision. 548–562.

[9] Michael Isard and Andrew Blake. 1998. CONDENSATION -Conditional Density Propagation for Visual Tracking. In Inter-national Journal of Computer Vision. 5–28.

[10] Hirokazu Kato and Mark Billinghurst. 1999. Marker Trackingand HMD Calibration for a Video-Based Augmented Reality Con-ferencing System. In IEEE and ACM International Workshopon Augmented Reality. 85–94.

[11] Georg Klein and David W. Murray. 2006. Full-3D Edge Trackingwith a Particle Filter. In British Machine Vision Conference.1119–1128.

[12] Vincent Lepetit and Pascal Fua. 2005. Monocular Model-Based3D Tracking Of Rigid Objects: A Survey. Foundations and Trendsin Computer Graphics and Vision 1, 1 (2005), 1–89.

[13] Manolis Lourakis and Xenophon Zabulis. 2013. Model-Based PoseEstimation for Rigid Objects. In International Conference onComputer Vision Systems. 83–92.

[14] David G Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision60, 2 (2004), 91–110.

[15] Eric Marchand, Patrick Bouthemy, and Francois Chaumette. 2001.A 2DC3D Model-Based Approach to Real-time Visual Tracking.Image and Vision Computing 19, 13 (2001), 941–955.

[16] Eric Marchand, Hideaki Uchiyama, and Fabien Spindler. 2015.Pose Estimation for Augmented Reality: A Hands-On Survey.IEEE Transactions on Visualization and Computer Graphics22, 12 (2015), 2633–2651.

[17] Youngmin Park, Vincent Lepetit, and Woontack Woo. 2008. Mul-tiple 3D Object Tracking for Augmented Reality. In IEEE Inter-national Symposium on Mixed and Augmented Reality. 117–120.

[18] Victor A. Prisacariu and Ian D. Reid. 2012. PWP3D: Real-TimeSegmentation and Tracking of 3D Objects. International Journalof Computer Vision 98, 3 (2012), 335–354.

[19] Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski.2011. ORB: An Efficient Alternative to SIFT or SURF. In IEEEInternational Conference on Computer Vision. 2564–2571.

[20] Byung Kuk Seo, Hanhoon Park, Jong Il Park, Stefan Hinterstoiss-er, and Slobodan Ilic. 2014. Optimal Local Searching for Fast andRobust Textureless 3D Object Tracking in Highly Cluttered Back-grounds. IEEE Transactions on Visualization and ComputerGraphics 20, 1 (2014), 99–110.

[21] Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi,Antonio Criminisi, and Andrew Fitzgibbon. 2013. Scene Coor-dinate Regression Forests for Camera Relocalization in RGB-DImages. In IEEE Conference on Computer Vision and PatternRecognition. 2930–2937.

[22] Luca Vacchetti, Vincent Lepetit, and Pascal Fua. 2004. Combin-ing Edge and Texture Information for Real-Time Accurate 3DCamera Tracking. In IEEE International Symposium on Mixedand Augmented Reality. 48–56.

[23] Luca Vacchetti, Vincent Lepetit, and Pascal Fua. 2004. StableReal-Time 3D Tracking using Online and Offline Information.IEEE Transactions on Pattern Analysis and Machine Intelli-gence 26, 10 (2004), 1385–1391.

[24] Guofeng Wang, Bin Wang, Fan Zhong, Xueying Qin, and BaoquanChen. 2015. Global Optimal Searching for Textureless 3D ObjectTracking. The Visual Computer 31, 6 (2015), 979–988.

[25] Harald Wuest, Florent Vial, and Didier Strieker. 2005. AdaptiveLine Tracking with Multiple Hypotheses for Augmented Reality.In IEEE International Symposium on Mixed and AugmentedReality. 62–69.

Pose Optimization in Edge Distance Field for Textureless 3D ......Figure 1: 3D contour matching. (a)...

Documents

Transcript of Pose Optimization in Edge Distance Field for Textureless 3D ......Figure 1: 3D contour matching. (a)...