In Defense of 3D-Label Stereo - · PDF fileIn Defense of 3D-Label Stereo Carl Olsson,Johannes...

1
In Defense of 3D-Label Stereo Carl Olsson, Johannes Ul´ en Yuri Boykov Centre for Mathematical Sciences, Lund University, Sweden Department of Computer Science, University of Western Ontario, Canada Overview It is commonly believed that higher order smoothness should be modeled using higher order interactions. For example, 2nd-order derivatives for deformable (active) contours are represented by triple cliques. Similarly, the 2nd-order regularization methods in stereo predominantly use MRF models with scalar (1D) disparity labels and triple clique interactions. In this paper we advocate a largely overlooked alternative approach to stereo where 2nd-order surface smoothness is represented by pairwise interactions with 3D-labels, e.g. tangent planes. This general paradigm has been criticized due to perceived computational complexity of optimization in higher-dimensional label space. Contrary to popular beliefs, we demonstrate that representing 2nd-order surface smoothness with 3D labels leads to simpler optimization problems with (nearly) submodular pairwise interactions. Our theoretical and experimental results demonstrate advantages over state-of-the-art methods for 2nd-order smoothness stereo. 1 Code available at http://www.maths.lth.se/ulen/ Compared to other works Woodford et. al. [1] Li and Zucker [2] This paper 1D-labels (disparity/depth) 3D - la bels (tan gent planes) 3D - la bels (tan gent planes) Triple cliques approximate 2nd - or der deriva tives Local precomputed tangents 2nd-order regularization Pairwise cliques approximate 2nd - or der deriva tive Reduction to QPBO fu sion moves Belief propagation QPBO fu sions moves Hard QPBO problem No solution guarantees Submodularity properties lead to simpler problems Generalization to higher order interactions Background We assign each pixel p a tangent plane. From the tangent planes it is straight forward to extract a cor- responding disparity or depth estimate. The underlying energy function is optimized by performing fusion moves on proposed solutions (proposals). Definition 1. Let D (p) be the disparity at pixel p. Furthermore let T p D : I 7→ R define the tangent at the point p seen as a function of the whole image, that is T p D (x)= D (p)+ ∇D (p) T (x - p). (1) We define a regularization between neighboring pixels as V pq = |T p D (q ) -D (q ) |. (2) V pq measures the curve’s deviation from the tangent plane. Using the Taylor expansion D (q ) ≈D (p)+ ∇D (p) T (q - p)+ 1 2 (q - p) T 2 D (p)(q - p), (3) where 2 D (p) is the Hessian at p, we see that V pq ≈| 1 2 (q - p) T 2 D (p)(q - p)|. (4) That is, V pq measures the second derivative at p in the direction q - p of the underlying disparity function. I q p V pq D (p) D (q ) T p D (q ) I q p d(p)p h d(q )q h V pq T p D (q ) q h Fig. 1: Left, Rectified cameras : Geometric interpretation of the smoothness term for parallel viewing rays. Right, Regular cameras : Smoothness term when the viewing rays are not parallel. To make the energy discontinuity preserving we add a threshold t to the interaction, E pq (D , P ) := min(V pq (D , P ),t). (5) Theoretical results Proposition 2. If the proposal P is a plane then the fusion with any function D is a submodular move for both E pq and V pq . Proof. Since P is a plane we have T p P (q )= P (q ) (6) and therefore V pq (P , P ) = 0. Furthermore, V pq (D , D )= T p D (q ) -D (q ) (7) = T p D (q ) -P (q )+ T p P (q ) -D (q ) (8) T p D (q ) -P (q ) + T p P (q ) -D (q ) (9) = V pq (D , P )+ V pq (P , D ) (10) which shows that submodularity, V pq (D , D )+ V pq (P , P ) V pq (P , D )+ V pq (D , P ), (11) holds. The proof for E pq is given in the paper. u t Proposition 3. If both D and P are convex (or alternatively both concave) between p and q then the interactions V pq and V qp are submodular for the fusion move. Generalization to higher dimensional labels Label Pairwise Interaction Unary Term Submodular Proposals Depth 1st derivative Depth Constant functions Tangent plane 2nd derivative Depth, 1st derivative Constant 1st derivative 2nd-order approximation 3rd derivative Depth, 1st, 2nd derivative Constant 2nd derivative . . . . . . . . . . . . Fig. 2: Characterization of pairwise interactions, unary terms and submodular proposals for different types of labels. Results Image Only data term With regularization. Fig. 3: Result using regular cameras, picture of Skansen Lejonet in Gothenburg, Sweden. Image Only data term With regularization. Fig. 4: Result using regular cameras, picture of ¨ Orebro castle, Sweden. (a) Image (b) Our (c) Woodford (d) Woodford 1op (e) Ground truth (b) Our unlabelled (c) Woodford unlabelled (d) Woodford 1op unlabelled Fig. 5: Result using rectified cameras. (b-d) are estimated disparity maps after fusing the 14 SegPln proposals. In (f-h) we present the unlabelled variables summed over all 14 proposals scaled 0–14. A white pixel would mean that fusing a proposal for this pixel failed for every single proposal. Tsukuba Venus Teddy Cones Our 0.065 % 0.0264 % 0.127 % 0.0847 % Woodford 30.0 % 30.6 % 27.6 % 27.3 % Woodford 1op 0 % 0% 0% 0.0411 % Fig. 6: Unlabelled for the 14 SegPln proposals on Middlebury. Tsukuba Venus Teddy Cones Average Our 21.3 25.5 29.4 36.5 28.2 Woodford 106 139 143 181 142 Woodford /Our 4.96 5.47 4.87 4.96 5.07 Fig. 7: Running time (s) using the convergence criteria in Woodford [1]. Tsukuba Venus Teddy Cones Average Non occ All Disc Non occ All Disc Non occ All Disc Non occ All Disc Our 4.49 5.52 12.3 0.298 0.648 3.99 7.71 11.2 17.8 9.78 15.4 18.3 8.95 Woodford 4.83 5.99 13.9 0.536 0.921 6.39 8.16 11.8 19.3 9.74 15.6 18.4 9.63 Fig. 8: Scores on Middlebury using the same proposals, lower is better. All values are % of pixels being 1 pixel incorrect for each of the three classes. The classes are non occluded regions, all pixels and regions near depth discontinuities. References [1] O. Woodford, P. Torr, I. Reid, and A. Fitzgibbon, “Global stereo reconstruction under second order smoothness priors,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009. [2] G. Li and S. Zucker, “Differential geometric inference in surface stereo,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 32, no. 1, pp. 72–86, 2010. IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013

Transcript of In Defense of 3D-Label Stereo - · PDF fileIn Defense of 3D-Label Stereo Carl Olsson,Johannes...

In Defense of 3D-Label StereoCarl Olsson, Johannes Ulen Yuri Boykov

Centre for Mathematical Sciences, Lund University, Sweden Department of Computer Science, University of Western Ontario, Canada

Overview

It is commonly believed that higher order smoothness should be modeled usinghigher order interactions. For example, 2nd-order derivatives for deformable(active) contours are represented by triple cliques. Similarly, the 2nd-orderregularization methods in stereo predominantly use MRF models with scalar(1D) disparity labels and triple clique interactions. In this paper we advocatea largely overlooked alternative approach to stereo where 2nd-order surfacesmoothness is represented by pairwise interactions with 3D-labels, e.g. tangentplanes. This general paradigm has been criticized due to perceived computationalcomplexity of optimization in higher-dimensional label space. Contrary to popularbeliefs, we demonstrate that representing 2nd-order surface smoothness with 3Dlabels leads to simpler optimization problems with (nearly) submodular pairwiseinteractions. Our theoretical and experimental results demonstrate advantagesover state-of-the-art methods for 2nd-order smoothness stereo.1 Code available at http://www.maths.lth.se/∼ulen/

Compared to other works

Woodford et. al. [1] Li and Zucker [2] This paper

• 1D-labels (disparity/depth) • 3D-labels (tangent planes) • 3D-labels (tangent planes)

• Triple cliques approximate2nd-order derivatives

• Local precomputed tangents2nd-order regularization

• Pairwise cliques approximate2nd-order derivative

• Reduction toQPBO fusion moves

• Belief propagation • QPBO fusions moves

• Hard QPBO problem • No solution guarantees • Submodularity propertieslead to simpler problems

• Generalization to higher orderinteractions

Background

We assign each pixel p a tangent plane. From the tangent planes it is straight forward to extract a cor-responding disparity or depth estimate. The underlying energy function is optimized by performing fusionmoves on proposed solutions (proposals).

Definition 1. Let D(p) be the disparity at pixel p. Furthermore let TpD : I 7→ R define the tangentat the point p seen as a function of the whole image, that is

TpD (x) = D (p) +∇D (p)T (x− p). (1)

We define a regularization between neighboring pixels as

Vpq = |TpD (q)−D (q) |. (2)

Vpq measures the curve’s deviation from the tangent plane. Using the Taylor expansion

D (q) ≈ D (p) +∇D (p)T (q − p) +1

2(q − p)T∇2D (p) (q − p), (3)

where ∇2D (p) is the Hessian at p, we see that

Vpq ≈ |1

2(q − p)T∇2D (p) (q − p)|. (4)

That is, Vpq measures the second derivative at p in the direction q − p of the underlying disparity function.

I qp

Vpq

D (p) D (q)

TpD (q)

I qp

d(p)phd(q)qh

Vpq

TpD (q) qh

Fig. 1: Left, Rectified cameras : Geometric interpretation of the smoothness term for parallel viewingrays. Right, Regular cameras : Smoothness term when the viewing rays are not parallel.

To make the energy discontinuity preserving we add a threshold t to the interaction,

Epq(D,P) := min(Vpq(D,P), t). (5)

Theoretical resultsProposition 2. If the proposal P is a plane then the fusion with any function D is a submodularmove for both Epq and Vpq.

Proof. Since P is a plane we have

TpP (q) = P (q) (6)

and therefore Vpq(P ,P) = 0. Furthermore,

Vpq(D,D) =∣∣TpD (q)−D (q)

∣∣ (7)

=∣∣TpD (q)− P (q) + TpP (q)−D (q)

∣∣ (8)

≤∣∣TpD (q)− P (q)

∣∣ +∣∣TpP (q)−D (q)

∣∣ (9)

= Vpq(D,P) + Vpq(P ,D) (10)

which shows that submodularity,

Vpq(D,D) + Vpq(P ,P) ≤ Vpq(P ,D) + Vpq(D,P), (11)

holds. The proof for Epq is given in the paper. ut

Proposition 3. If both D and P are convex (or alternatively both concave) between p and q then theinteractions Vpq and Vqp are submodular for the fusion move.

Generalization to higher dimensional labels

Label Pairwise Interaction Unary Term Submodular Proposals

Depth 1st derivative Depth Constant functionsTangent plane 2nd derivative Depth, 1st derivative Constant 1st derivative

2nd-order approximation 3rd derivative Depth, 1st, 2nd derivative Constant 2nd derivative... ... ... ...

Fig. 2: Characterization of pairwise interactions, unary terms and submodular proposals for different types of labels.

Results

Image Only data term With regularization.

Fig. 3: Result using regular cameras, picture of Skansen Lejonet in Gothenburg, Sweden.

Image Only data term With regularization.

Fig. 4: Result using regular cameras, picture of Orebro castle, Sweden.

(a) Image (b) Our (c) Woodford (d) Woodford 1op

(e) Ground truth (b) Our unlabelled (c) Woodford unlabelled (d) Woodford 1op unlabelled

Fig. 5: Result using rectified cameras. (b-d) are estimated disparity maps after fusing the 14 SegPln proposals. In (f-h) wepresent the unlabelled variables summed over all 14 proposals scaled 0–14. A white pixel would mean that fusing a proposalfor this pixel failed for every single proposal.

Tsukuba Venus Teddy Cones

Our 0.065 % 0.0264 % 0.127 % 0.0847 %Woodford 30.0 % 30.6 % 27.6 % 27.3 %Woodford 1op 0 % 0 % 0 % 0.0411 %

Fig. 6: Unlabelled for the 14 SegPln proposals on Middlebury.

Tsukuba Venus Teddy Cones Average

Our 21.3 25.5 29.4 36.5 28.2Woodford 106 139 143 181 142Woodford /Our 4.96 5.47 4.87 4.96 5.07

Fig. 7: Running time (s) using the convergence criteria in Woodford [1].

Tsukuba Venus Teddy ConesAverage

Non occ All Disc Non occ All Disc Non occ All Disc Non occ All Disc

Our 4.49 5.52 12.3 0.298 0.648 3.99 7.71 11.2 17.8 9.78 15.4 18.3 8.95Woodford 4.83 5.99 13.9 0.536 0.921 6.39 8.16 11.8 19.3 9.74 15.6 18.4 9.63

Fig. 8: Scores on Middlebury using the same proposals, lower is better. All values are % of pixels being ≥ 1 pixel incorrect for each of thethree classes. The classes are non occluded regions, all pixels and regions near depth discontinuities.

References

[1] O. Woodford, P. Torr, I. Reid, and A. Fitzgibbon, “Global stereo reconstruction under second order smoothness priors,” inIEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.

[2] G. Li and S. Zucker, “Differential geometric inference in surface stereo,” Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 32, no. 1, pp. 72–86, 2010.

IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013