2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE...

12
2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow Estimation as a Matching Problem Mikhail G. Mozerov, Member, IEEE Abstract— In general, discretization in the motion vector domain yields an intractable number of labels. In this paper, we propose an approach that can reduce general optical flow to the constrained matching problem by pre-estimating a 2-D disparity labeling map of the desired discrete motion vector function. One of the goals of the proposed paper is estimating coarse distribution of motion vectors and then utilizing this distribution as global constraints for discrete optical flow estimation. This pre-estimation is done with a simple frame-to-frame correlation technique also known as the digital symmetric-phase-only-filter (SPOF). We discover a strong correlation between the output of the SPOF and the motion vector distribution of the related optical flow. A two step matching paradigm for optical flow estimation is applied: pixel accuracy (integer flow) and subpixel accuracy esti- mation. The matching problem is solved by global optimization. Experiments on the Middlebury optical flow datasets confirm our intuitive assumptions about strong correlation between motion vector distribution of optical flow and maximal peaks of SPOF outputs. The overall performance of the proposed method is promising and achieves state-of-the-art results on the Middlebury benchmark. Index Terms— Digital symmetric-phase-only-filter (SPOF), discrete energy minimization, optical flow estimation. I. I NTRODUCTION R ECENTLY proposed optical flow (OF) estimation meth- ods [1]–[11] show an impressive level of accuracy in terms of the criterions proposed in [12]. However, this progress is achieved due to a significant complication of the optical flow estimation paradigm. This paradigm is now far from the original formulations of Horn and Schunk [13] or Lucas and Kanade [14]. As an example, the general concept behind the method presented in [8], which ranks at the top of the Middle- bury optical flow benchmark is to mix the best OF estimation techniques in such a manner that the final estimation scheme is a tradeoff between advantages and drawbacks of each individual approach. It seems that this trend is the only way to achieve state-of-the-art results in OF estimation. Results on the epipolar constrained stereo matching problem seem to reach its natural limits since global optimization methods were proposed. One of the papers [15]–[17] that ranks at the top the Middlebury stereo benchmark was published in 2006 [15]. Manuscript received July 10, 2012; revised January 22, 2013; accepted January 24, 2013. Date of publication January 30, 2013; date of current version March 19, 2013. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Anthony Vetro. The author is with the Universitat Autonoma de Barcelona, Depart- ment of Computer Vision Center, Barcelona 08193, Spain (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2013.2244221 And most of the top methods are somehow derivatives of the MRF energy minimization approach proposed in [18] and effectively solved in [19]. Stereo matching as well as optical flow estimation aim to recover a dense map of displacement vectors, which establish the global correspondence between every pixel of two consid- ered images. Stereo matching techniques assume the epipolar constraint that makes the problem feasible for the discrete MRF optimization. In contrast, even the integer discretization in the motion vector domain for optical flow yields consider- ably more labels due to the 2D nature of the domain, thereby significantly increasing the computational complexity of the estimation. These observations motivate us to search a mechanism that can sufficiently restrict the initial search space for the integer OF (pixel accuracy level). If we are able to obtain a reasonable number of dominant motion vectors of a sequence by pre- estimating a motion vector distribution, the OF estimation problem can be reduced to a constrained matching task similar to stereo matching. Our research is motivated by work in the field of pattern localization with the symmetric-phase-only-filter (SPOF) [20], [21]. If we convolve two consecutive frames in sequences with SPOF we can obtain the global motion (a simple shift vector for all pixels in the image plane). This observation implies that if there are several regions in a scene, each possessing different shift vectors relative to the reference frame, there will be the same number of peaks in the output domain of the SPOF between two consecutive frames. It is important to understand the difference between our technique and convenient correlation-based methods [22], [23], which were popular in stereo matching. The core of these methods is sliding window matching: the desirable displace- ment vector value is obtained at each point (or in a sparse set of points) of the image plane by choosing one maximal peak in the output of a cross-correlation filter. The problem of this technique is the over-smoothing and possible loss of several values in the integer displacement space. Application of SPOF can improve robustness of OF estimation, due to the sharpness of the filter output peaks [24]. However, the main concept remains in the general framework of correlation-based methods. In contrast, our constraint pre-estimation method obtains all non-zero values of the distribution in the integer OF vectors space directly, by taking a reasonable number of maximal peaks in the output of SPOF. In other words, one can get the desirable distribution without estimating OF in the image plane. Most modern OF algorithms use an image pyramid and in this way cannot deal with large motion 1057-7149/$31.00 © 2013 IEEE

Transcript of 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE...

Page 1: 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow

2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013

Constrained Optical Flow Estimation asa Matching Problem

Mikhail G. Mozerov, Member, IEEE

Abstract— In general, discretization in the motion vectordomain yields an intractable number of labels. In this paper, wepropose an approach that can reduce general optical flow to theconstrained matching problem by pre-estimating a 2-D disparitylabeling map of the desired discrete motion vector function.One of the goals of the proposed paper is estimating coarsedistribution of motion vectors and then utilizing this distributionas global constraints for discrete optical flow estimation. Thispre-estimation is done with a simple frame-to-frame correlationtechnique also known as the digital symmetric-phase-only-filter(SPOF). We discover a strong correlation between the output ofthe SPOF and the motion vector distribution of the related opticalflow. A two step matching paradigm for optical flow estimation isapplied: pixel accuracy (integer flow) and subpixel accuracy esti-mation. The matching problem is solved by global optimization.Experiments on the Middlebury optical flow datasets confirm ourintuitive assumptions about strong correlation between motionvector distribution of optical flow and maximal peaks of SPOFoutputs. The overall performance of the proposed method ispromising and achieves state-of-the-art results on the Middleburybenchmark.

Index Terms— Digital symmetric-phase-only-filter (SPOF),discrete energy minimization, optical flow estimation.

I. INTRODUCTION

RECENTLY proposed optical flow (OF) estimation meth-ods [1]–[11] show an impressive level of accuracy in

terms of the criterions proposed in [12]. However, this progressis achieved due to a significant complication of the opticalflow estimation paradigm. This paradigm is now far from theoriginal formulations of Horn and Schunk [13] or Lucas andKanade [14]. As an example, the general concept behind themethod presented in [8], which ranks at the top of the Middle-bury optical flow benchmark is to mix the best OF estimationtechniques in such a manner that the final estimation schemeis a tradeoff between advantages and drawbacks of eachindividual approach. It seems that this trend is the only wayto achieve state-of-the-art results in OF estimation. Resultson the epipolar constrained stereo matching problem seem toreach its natural limits since global optimization methods wereproposed. One of the papers [15]–[17] that ranks at the topthe Middlebury stereo benchmark was published in 2006 [15].

Manuscript received July 10, 2012; revised January 22, 2013; acceptedJanuary 24, 2013. Date of publication January 30, 2013; date of currentversion March 19, 2013. The associate editor coordinating the review of thismanuscript and approving it for publication was Dr. Anthony Vetro.

The author is with the Universitat Autonoma de Barcelona, Depart-ment of Computer Vision Center, Barcelona 08193, Spain (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2013.2244221

And most of the top methods are somehow derivatives ofthe MRF energy minimization approach proposed in [18] andeffectively solved in [19].

Stereo matching as well as optical flow estimation aim torecover a dense map of displacement vectors, which establishthe global correspondence between every pixel of two consid-ered images. Stereo matching techniques assume the epipolarconstraint that makes the problem feasible for the discreteMRF optimization. In contrast, even the integer discretizationin the motion vector domain for optical flow yields consider-ably more labels due to the 2D nature of the domain, therebysignificantly increasing the computational complexity of theestimation.

These observations motivate us to search a mechanism thatcan sufficiently restrict the initial search space for the integerOF (pixel accuracy level). If we are able to obtain a reasonablenumber of dominant motion vectors of a sequence by pre-estimating a motion vector distribution, the OF estimationproblem can be reduced to a constrained matching task similarto stereo matching.

Our research is motivated by work in the field of patternlocalization with the symmetric-phase-only-filter (SPOF) [20],[21]. If we convolve two consecutive frames in sequences withSPOF we can obtain the global motion (a simple shift vectorfor all pixels in the image plane). This observation impliesthat if there are several regions in a scene, each possessingdifferent shift vectors relative to the reference frame, therewill be the same number of peaks in the output domain of theSPOF between two consecutive frames.

It is important to understand the difference between ourtechnique and convenient correlation-based methods [22],[23], which were popular in stereo matching. The core of thesemethods is sliding window matching: the desirable displace-ment vector value is obtained at each point (or in a sparseset of points) of the image plane by choosing one maximalpeak in the output of a cross-correlation filter. The problemof this technique is the over-smoothing and possible loss ofseveral values in the integer displacement space. Applicationof SPOF can improve robustness of OF estimation, due to thesharpness of the filter output peaks [24]. However, the mainconcept remains in the general framework of correlation-basedmethods. In contrast, our constraint pre-estimation methodobtains all non-zero values of the distribution in the integerOF vectors space directly, by taking a reasonable numberof maximal peaks in the output of SPOF. In other words,one can get the desirable distribution without estimating OFin the image plane. Most modern OF algorithms use animage pyramid and in this way cannot deal with large motion

1057-7149/$31.00 © 2013 IEEE

Page 2: 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow

MOZEROV: CONSTRAINED OPTICAL FLOW ESTIMATION AS A MATCHING PROBLEM 2045

differences that exceed the size of the object. Thus, one ofthe advantages of the SPOF application is that the method hasno limits for the range of the estimated displacement vectors.In principle, any value that does not overpass the size of aframe can be detected. Thus, the initial idea is to find theconstraint parameters globally by convolving two consecutiveframes of a video sequence. The number of maximal peaks tobe chosen is an open problem, and our final constraint pre-estimation method is a tradeoff between global and local SPOFapplication.

In Fig. 1 an overview of the optical flow method proposedin this paper is given. The main idea of our paper is thatwe reduce the OF estimation problem to a labeling problemwhich can be solved accurately by global optimization duringeach step of the estimation. This is achieved as follows.Firstly, the 2D disparity labeling map is pre-estimated. Thenwe estimate a dense integer OF field using the stereo dis-parity matching paradigm. We also introduce multidirectionalmatching approach to improve occlusion handling. Furthersubpixel accuracy is achieved by the same global optimizationapproach, with different levels of accuracy tuning. During eachstep of the coarse-to-fine procedure our algorithm chooses oneof nine corrective vectors (labels) by optimizing an MRF underthe global smoothness constraint. Finally, a cascade of post-processing filters is applied to achieve higher accuracy in OFestimation. Note, that the general scheme of the constrainedoptical flow matching (COFM) in Fig. 1 is a combinationof three successive problems that can be solved in separatemodules: constraints pre-estimation; the integer OF estimation;the subpixel accuracy OF estimation.

The paper is organized as follows: in Section II the relatedwork is discussed; in Section III the method of disparity label-ing map estimation with SPOF is explained; in Section IV theCOFM problem is formulated, and the general scheme of thealgorithm is described; computer experiments are discussed inSection V; and Section VI summarizes our conclusions.

II. RELATED WORK

Several methods that use discrete optimization algorithmsin the OF estimation have been reported in recent years [2],[25]–[27]. In [25] the quantization problem is solved bycomputing candidate flow fields using Horn and Schunk [13]and Lucas and Kanade [14] approaches, and [27] SIFT match-ing is used for the same purpose. Note that in both papersconstraint estimation is done by known optical flow estimationtechniques and these techniques involve the pyramidal multi-resolution approach (with the decreasing number of image gridnodes on each level), which potentially leads to the loss ofsome significant motion vectors.

In [26] the multi-resolution approach is combined with OFestimation in a sparse set of points that allows algorithmstarting from the full range of the search space and thensuccessively restricting the initial search space to a reason-able number of dominant motion vectors through the multi-resolution levels. A similar approach is proposed in [28], butin this work the more flexible region-tree scaled technique fordifferent accuracy levels is proposed. Both approaches may

suffer from the loss of significant dominant motion vectors.For example, application of the method [26] fails to detectimportant values of the integer OF for some test sequences onthe Middlebury benchmark (e.g. Urban). In contrast, our con-straint estimation algorithm does not use the multi-resolutionapproach for the integer flow estimation. Also, our algorithmestimates it globally (previous work use local constraintsor non-constrained search space). Global constraints makethe problem definition more clear, strict and flexible like instereo matching. Application of SPOF for global constraintestimation can be considered as a novelty: previous workuses SPOF for local OF estimation, which severely increasescomputational complexity and decreases accuracy.

Another problem of discreet optical flow estimation is sub-modular constraint for the regularization term. Especially, thisdifficulty arises for the coarse-to-fine optical flow estimation,where the same neighbor labels map to different motionvectors. For example, the method [26] achieves subpixel accu-racy by using the discreet optical flow estimation approach.However, the global energy minimization algorithm that isused in [26] does not allow non-sub-modular prior matrices.To overcome this difficulty the authors propose a morphingalgorithm [29]: on each level of accuracy algorithm transformsthe initial image and a temporally flow to a motion interpolatedand then match with the target frame. To obtain the final OFthe method performs multiple-flow-unwrapping with multipleinterpolations. Thus, this approach complicates the calculationscheme and puts limits to achievable accuracy. In contrast, oursubpixel accuracy module uses the sequential tree-reweightedmax-product message passing (TRW-S) optimization [30],which allows non-sub-modular matrices (but formally belong-ing to the part of the general sub-modular matrix). Thus allcorrections are directly accumulated in every pixel of the initialgrid. Consequently, this module can work with accuracy up to0.01 pixel and works so well that the final post-processing steponly polishes the solution and has much less impact than inmany other methods, see for example [8] and Table II.

III. DISPARITY LABELING MAP PRE-ESTIMATION

The SPOF is a method for rigid image registration, whichexploits Fourier transform. We chose this approach becausethe SPOF algorithm is fast and accurate [21].

Consider two X1×X2 images, f1(x1, x2) and f2(x1, x2). LetF1(k1, k2) and F2(k1, k2) be the 2D discrete Fourier transformsof the two images. In the definition of POF which uses thespectral phase of f2(x1, x2) as the filters transfer function,the output rS P O F (x1, x2) is the 2D inverse discrete Fouriertransform of RS P O F (k1, k2) which is given by

RS P O F = F1 (k1, k2) F∗2 (k1, k2)

|F1 (k1, k2)| |F2 (k1, k2)| , (1)

where F∗2 (k1, k2) is the complex conjugate of F2 (k1, k2).Let X = {X1/2, X2/2} is the image half size vector, x ={x1, x2} is the image coordinate vector. Also suppose that twocompared images are shifted one relative another by a constantmotion vector v such as

f1 (x) = f2 (x + v) (2)

Page 3: 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow

2046 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013

Constraints pre-estimation

( )η x

tI

1tI +

( )nlv( )v x ( )v x

( )1v x

( )2v x

Integer OF estimation Subpixel accuracy

MRF energy minimization

Fig. 1. General scheme of the COFM method illustrated by resultant flow images of the Teddy motion sequence.

Then the sharp peak appears in the coordinate xmax and a shiftvector v is equal

v = mod (xmax + X, 2X)− X

xmax = arg maxx

(rS P O F (x)) (3)

The operator mod() in (3) is the modulo operation with thedivisor 2X.

Suppose that first image f1 (x) is composed of L regionsf l1

(xl

)and each of them has its own shift vector vl such as

f l1

(xl

)= f l

2

(xl + vl

)(4)

In this case the output function of the SPOF includes L peaks,which can be detected as follows

vl = mod(

xlmax + X, 2X

)− X

xlmax = arg

lmaxx

(rS P O F (x)) , (5)

where the arg maxl function retuns the coordinate xlmax of the

l’s maximum of the function rS P O F (x). The operator mod()in (5) is the modulo operation with the divisor 2X. Note, ourcoarse heuristic model, which is formalized in (5) does nottake into account occlusion. More detailed impact of occlusionis discussed later in this section. The set of vectors {vl} in (5)is considered as a constraint for the matching problem.

For the SPOF application usually the sum of the color valuesis taken to encode the input image function f (x):

f (x) =⎛

⎝∑

c∈R,G,B

Ic (x)

⎠ exp (iϕ) , (6)

where Ic (x) are pixel-wise values of the image color channelsand the phase variable ϕ is supposed to be zero. Thus thecolor information is lost. We propose a heuristic encodingthat allows to preserve the color information while it is not

increasing the computational complexity of the approach. Inour algorithm the phase depends on the color as follows:

ϕ = arg (IG − IR, IG − IB) (7)

In other words, the phase is equal to the argument of a complexnumber z = IG − IR + i (IG − IB). Thereby an arbitrary colorvector can be mapped onto the complex plane. To avoiduncertainty we define arg (0, 0) = 0. Consequently, for agrayscale image the phase value automatically becomes zero.The proposed encoding decreases the number of false positivesfor images with rich color texture, thus potentially decreasescomputational time for the OF estimation process. The exper-imental difference between the color and the grayscale encod-ing is shown in Table I, compare columns “false detections”and “false detections for the grayscale version of images”(FDGS).

For accuracy reasons, the output function rS P O F (x) is aver-aged by a Gaussian filter with the averaging radius equalingone pixel.

The number L is an open problem. We did not find strongcorrelation between number of ground truth (GT) labels andstatistical characteristics of the output function rS P O F (x) ofthe SPOF. However we found a solution that makes thedistribution map estimation adaptive: the image plane is partedto several S overlapped regions (with the size 128 × 128and with the overlapping period 32) and the first m = 65maxima of every local SPOF output are taken. Then thegeneral distribution for the full frame is accumulated, and 10%of the less repeated values are truncated. In this case any set ofthe GT discrete vectors belongs to a set of the pre-estimatedmotion vectors {vl}GT ⊂ {vl}P E for the Middlebury datasets.For example, the resultant disparity labeling maps for the twotest sequences Urban-2 and Grove-3 with the GT are shownin Fig. 2. These are two problematic sequences with a largescale of disparity vectors: L = 65 for the Urban-2 sequenceand L = 106 for the Grove-3 sequence. In Fig. 2 the coloredbins represent the detected GT vectors, the gray bins represent

Page 4: 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow

MOZEROV: CONSTRAINED OPTICAL FLOW ESTIMATION AS A MATCHING PROBLEM 2047

(a) (b)

Fig. 2. Disparity labeling maps and the GT. (a) GT of the Urban-2 motion sequence and its mutual disparity labeling map. (b) GT of the Grove-3 motionsequence and its mutual disparity labeling map. The colored bins represent the detected GT vectors, the gray bins represent the false detected vectors. SeeFig. 3 for the color coding of the flow.

the false detected vectors. More detailed statistics are given inTable I.

To illustrate our experimental result we use the HSV colorwheel that was proposed in [12]. The color coding is depictedin Fig. 3.

Our algorithm considerably reduces the initial search space.However the new space is still redundant due to false detec-tions, especially for sequences with a small range of themotion vectors. It means that some labels are useless. On theother hand, the proposed multidirectional matching approachassumes that the pre-defined label space structure is used atleast three times: to estimate the forward-backward flows andfor the final integer flow merging procedure. The forwardinteger flow estimation is able to detect unused labels; asa result the further reduction of the label space is possible.Thus the variable L in our method possesses two values:L0 - the number of labels after the SPOF pre-estimation andL1 - the reduced number of labels after the forward integerflow estimation. The results of the secondary reduction of thelabel space are given in Table I as the numbers in betweenbrackets, and symbols L0, L1 are also introduced in Figs. 7-8.

The presence of large occlusion in the considered sceneis a real problem of any matching technique, because severalregions have no visual correspondence. If occluded patches arerelatively large it can produce false detections in the output ofthe SPOF, thus raising the redundancy of the solution searchspace, consequently increasing the computational complexityof the estimation process. Nevertheless, our experiments withthe Middlebury OF datasets show that there are no lost binseven in the presence of large occlusion (urban sequences),thus we can conclude that at least for the considered datasetsthe constraints pre-estimation module does not increase OFestimation errors due to occlusion.

It is interesting to see, how the SPOF pre-estimation reactsto large patches with multiple or even continuously changingmotions. The simplest model of such a motion field is an imagezoom. Several experiments are designed to test the algorithmunder this model, and the experiments show reasonable pre-estimation result up to 15% zoom. It means that the algorithmdoes not lose the GT vectors in the pre-estimation process.Fig. 3 illustrates the result of the labeling map pre-estimationfor a synthetic sequence with 10% zoom. This particularsequence is generated on the base of the RubberWhale image

and possesses a large scale of GT disparity vectors: L = 513.In Fig. 3 the colored bins represent the detected GT vectors,the gray bins represent the false detected vectors. Note that inthis experiment we get only 198 false positives and the relativedetection redundancy is even less than for the Middlebury OFdatasets, see Table I.

In this paper we aim to solve the problem of the constrainedOF estimation. The experiments with the Middlebury datasetsconfirm our intuitive assumption that the integer OF spacecan be restricted to a reasonable number of non-zero bins.Actually, for all the 24 sets the number of labels does notoverpass 300. The point is, if it is possible to apply the sameapproach in the case, where all vectors are equally distributedor there is no restriction of search space. The image framedomain might be parted into several overlapped regions withreasonable number of non-zero possible vectors, and then onecan estimate OF for each region independently.

IV. PROBLEM DEFINITION AND SOLUTION

In this paper the general OF estimation problem is branchedinto two levels: the integer OF estimation level and thesubpixel adjustment level. Let us define symbols: v (x), v (x)and v (x) as a desired solution vector, a solution of the integerOF estimation problem and a corrective motion vector of thesubpixel accuracy respectively. Then the mentioned two stepstrategy can be formalized by the following expression:

v (x) = v (x)+ η (x)

K∑

k=1

δk v (x)

︸ ︷︷ ︸v(x)

+ o (δk) , (8)

where η (x) is a confidence map and the term o (δk) definesan approximation accuracy level. The confidence map factorη (x) masks the occlusion regions.

The meaning of the intrinsic parameters of the subpixeladjustment level: δk , K and v will be explained later.

A. Integer OF Estimation

The general stereo and motion matching approach aimsto find correspondence between pixels of images It (x) andIt+1 (x), where x is a coordinate of a pixel in the image plane,t is an index of the considered image in the sequence. A vector

Page 5: 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow

2048 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013

TABLE I

PRE-ESTIMATION MISSED AND FALSE DETECTION STATISTICS BASED ON THE GT ANALYSIS OF THE MIDDLEBURY DATASETS. “FDGS: FALSE

DETECTIONS FOR THE GRAYSCALE VERSION OF IMAGES” IS ABBREVIATED AS FDGS

Datasets GT labels Pre-estimated Labels Missed Detections False Detections FDGS

Dimetrodon 21 111 (37) 0 90 (16) 90Grove 2 9 124 (32) 0 115 (21) 116Grove 3 104 225 (137) 0 111 (33) 113Hydrangea 91 180 (120) 0 89 (29) 88RubberWhale 39 127 (57) 0 88 (18) 95Urban 2 61 269 (167) 0 208 (106) 229Urban 3 44 257 (107) 0 213 (63) 218Venus 17 172 (55) 0 155 (38) 156

Fig. 3. GT of the synthetic 10% zoom sequence and its mutual disparitylabeling map. The colored bins represent the detected GT vectors, the graybins represent the false detected vectors.

v (xt) in Fig. 4 (a) denotes the disparity of two correspondingpixels xt and xt+1. The disparity vector v has usually the samedimensionality as the domain of the image except in certainspecial cases: for instance, if stereo matching is considered thedisparity vector domain becomes one dimensional due to theadditional epipolar constraints. Simply expressed, if the stereoor motion matching problem is considered a dense disparitymap v(x) has to be obtained. In stereo matching the mostappropriate way to solve such a problem is global optimization[15]–[17], [31], [32].

Note that in the case of the distribution constrained OFthe desired motion vectors belong to a set of discrete vectorsv ∈ {vl}, and this set can be pre-estimated by the techniquethat is described in the previous section.

The global energy minimization approach intends to find thedesired disparity function v(x), which minimizes the energyfunction E(v(x)) in the disparity space image (DSI) C(x, v),see Fig. 4 (b). The DSI (sometimes called the correlationvolume or the cost volume) is the 4D discrete space that ismapped to the problem solution domain. The DSI representsa discrete collection of correspondence cost. For example, iftwo compared pixels (x1)t and (x1 + v1)t+1 have the sameluminance value (which means that these pixels are a potentialmatch) the cost value C (x1, v1) might be minimal. Vice versa,if the luminance values differ, the related cost value increases.The global energy usually contains two terms, the data termand the smoothness term

E (v(x)) =∑

x∈�C (x, v(x)) +

x∈�G (v(x)), (9)

where G is a smoothness function and � is the domain of thevector x. The domain of the vector v composed of L motion

vectors values is pre-defined through the disparity labelingalgorithm, which is described in Section III. The cost valuesC(v(x)) that form the DSI is computed as follows

C (x, v)= |It+1(x + v)−It(x)|+α (x)|∇ It+1(x + v)−∇ It(x)|(10)

where It+1, ∇ It+1, It , ∇ It are luminance and gradient ofluminance values of two neighboring images in dynamicsequences respectively. A local weight parameter α (x) iscalculated as

α (x)= 1−ρ (|∇ It(x)|) , (11)

where ρ() is the cumulative distribution of the gradient mod-ulus of an image luminance It . The goal of introducing thefunction α (x) in (10) is to augment the matching robustness inthe texture-less regions of an image by increasing the weightof the gradient in the cost function. In (11) we slightly modifythe idea that is proposed in [27].

The function G in the smoothness term of (9) is given by

G (v(x)) = ω (x)∑

d∈D

r∈R

f (|vr (xd + 1)− vr (xd)|), (12)

where ω (x) is a locally adaptive function used to penalizemotion vector discontinuities, R and D are the dimensionali-ties of motion vector and image spaces respectively. Later onin this paper a shortcut

f (|v (xd + 1)− v (xd)|)=∑

r∈R

f (|vr (xd + 1)− vr (xd)|) (13)

is used for the distance measure. A positive definite increasingfunction f is usually proportional to the gradient of the motionvector or its squared value

G (v (x)) = ω (x)∑

d∈D

|v (xd + 1)− v (xd)| (14)

To prevent over-penalizing discontinuity a more flexiblesmoothness function is used

G(v(x)) = ω (x) min

(∑

d∈D

f (|v(xd + 1)− v(xd)|) , f (g)

)

,

(15)where g is a truncation threshold, and the function ω (x) canbe expressed as

ω (x) ={

2λ ρ (|∇ It (x)|) < 0.7λ otherwise

(16)

Page 6: 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow

MOZEROV: CONSTRAINED OPTICAL FLOW ESTIMATION AS A MATCHING PROBLEM 2049

y

x

v( )v x

( )1,C x v

(a) (b)

Fig. 4. (a) Scheme of correspondence matching between two images. (b) DSI as a collection of correspondence costs C (x, v), and a desired function ofdisparity value v(x).

where λ is an experimental constant. In other words, if a valueof a local gradient of an image It (x) is less than a thresholdthat is calculated on the base of the cumulative distributionρ(), than ω (x) = 2λ. The idea of such adaptive function isproposed in [19].

In our experiments we put the threshold g equal to 3 andchoose the most popular truncated linear prior f (|a|) = |a|in (15). Also, the parameter λ of the prior function in (15)is taken proportional to the mean value 〈C (x, v)〉 of the DSIcost

λ = 3 〈C (x, v)〉g

(17)

The desired discrete function v (x) is a solution of the globalenergy minimization problem

v (x) = arg minv

(E (v(x))) , (18)

where E is introduced in (9).In general, the energy minimization problem is an NP-hard

problem, thus approximate minimization algorithms have to bechosen to solve the problem. To make our choice we follow theanalysis given in [33] and finally apply the TRW-S algorithmdescribed in [30]. The TRW-S is the method, which was devel-oped in the framework of the belief propagation paradigm.The sequential approach makes this algorithm convergent andfast. For the truncated linear and quadratic priors the methodusually reaches 1% approximation accuracy in a few iterations,thus outperforming the popular graph cut expansion algorithmboth in accuracy and speed. The additional advantage of theTRW-S is that the algorithm requires half as much memoryas traditional message passing approaches.

B. Occlusion handling

The solution v (x) of (18) can suffer from inaccuracyespecially in occluded regions in sequences with considerable

range of integer motion vectors. On the other hand, invisiblepixels in the frame It become visible in the frame It+1. Thus,one can expect that analysis of the results of two integerOF estimations v (xt) and v (xt+1) helps to handle occlusionproblem as it was done for stereo matching in [34]. For thispurpose, the forward and backward flows have to be estimatedwith the procedure described in the previous subsection. Thenthese pre-flows are used for a more accurate estimation ofthe desired integer flow with algorithm described in thissubsection.

Let us consider an unwrapped mirror integer OF

v2 (xt ) = −v (xt+1 + v (xt )) , (19)

where xt+1 ∈ �t+1 and xt ∈ �t are two domains of twoconsecutive frames in video sequence, v (xt+1) and v (xt)are two respective results of integer OF estimation with twodifferent matching directions t → t + 1 and t ← t + 1. If wedefine v1 (xt) ≡ v (xt) then two integer OF v1 and v2 haveto be equal for all non-occluded pixels. Thus, the confidencemap in (8) can be defined as

η (x) = (1+ |v2 (x)− v1 (x)|)−1, (20)

where function η (x) achieves its maximal values in pixels withstrong equality of two different integer OFs v2 (x) = v1 (x).Occlusion based confidence measures similar to (20) have beenintroduced in [35].

If more than two images in a sequence are available it isuseful to estimate another integer OF v3 (x) relative to theprevious frame t − 1← t , and its mirror integer OF v4 (x):

v4 (xt ) = −v (xt−1 + v3 (xt )) (21)

Our motivation to introduce these additional integer OFs isbased on the assumption that v3 (x) = v4 (x) ≈ v1 (x) = v2 (x)and also that the reciprocal occlusion regions of a pair ofimages It and It+1 in general not coincide with the reciprocal

Page 7: 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow

2050 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013

TABLE II

AVERAGE AND MEDIAN (IN BRACKETS) EE FOR DIFFERENT STEPS OF THE PROPOSED ALGORITHM AND DIFFERENT ACCURACY LEVELS o (δk). D:

DIMETRODON. G2: GROVE 2. H: HYDRANGEA. RW: RUBBERWHALE. U2: URBAN 2. IOF: INTEGER OF IS. PPOF: POST-PROCESS OF. IOF+LK:

INTEGER OF PLUS LUCAS-KANADE REFINEMENT

Step o (δk) D G2 H RW U2

IOF v1 0.5 0.399 (0.395) 0.467 (0.413) 0.291 (0.187) 0.301 (0.239) 0.867 (0.486)IOF v 0.5 0.395 (0.397) 0.447 (0.407) 0.302 (0.180) 0.291 (0.239) 0.668 (0.507)OF v1 0.25 0.215 (0.196) 0.280 (0.232) 0.276 (0.180) 0.211 (0.174) 0.439 (0.255)OF v12 0.002 0.097 (0.068) 0.112 (0.035) 0.156 (0.034) 0.072 (0.029) 0.228 (0.066)PPOF vP P 0.002 0.088 (0.061) 0.109 (0.031) 0.154 (0.034) 0.065 (0.028) 0.211 (0.058)IOF+LK 0.002 0.251 (0.186) 0.387 (0.182) 0.271 (0.121) 0.187 (0.086) 0.604 (0.229)

(a) (b) (c)

Fig. 5. Quantitative evaluation of the Gaussian noise impact. (a) Lost bins (%) according to the Gaussian noise Std. Dev. (b) Average EE according to theGaussian noise Std. Dev. (c) Redundancy(%) according to the Gaussian noise Std. Dev.

occlusion regions of a pair of images It−1 and It . Similartemporal continuity constraints have been researched in [36].

To estimate the final integer OF we apply the same globaloptimization approach that is described in the previous sub-section. The difference is that the merging algorithm uses thepreviously estimated back and forward flows as an observation.Thus the cost term C (x, v) in (9) now is not a color matchingsimilarity measure, but a likelihood estimate based on localstatistics of the previously calculated flows. If, for example,all previously calculated flow values of a local neighborhoodcentered in a pixel x are equal to vl then the probability of theevent: (x)t corresponds to

(x + vl

)t+1 is 1. In this particular

case we expect that the cost function is C(x, v = vl

) = 0and C

(x, v �= vl

) = ∞. The colors of the image appear nowonly in the bilateral kernel, which defines a local neighborhoodof a pixel x. Consequently, the cost function is calculated asfollows

C (x, v) = − log

⎝T∑

t=1

φ (t)∑

|r|<R

η2 (x)� (x + r)1v (vt (x))

(22)where R is a neighborhoods parameter, T is the number of theprocessed multidirectional integer OFs vt (x), φ (t) is a weightfunction relevant to vt (x), 1v () is the indicator function thatis equal to 1 if vt (x) = v and 0 otherwise, � (x + r) is thebilateral Gaussian kernel. The weight function φ (t) is definedexplicitly: φ (t) = {1, 1, 0.65, 0.65}, and these values reflectthe fact that the most important input flows are v1 (x) andv2 (x).

The bilateral kernel � (x + r) is expressed as follows

� (x + r) = exp

(|r|22R2 +

|I (x + r)− I (x)|22σc

2

)

, (23)

where σc is the standard deviation of a signal I (x) and R inour experiments is equal to the standard deviation of a signalv1 (x) multiplied by 4.

Our motivation to introduce the cost function C (x, v) inthe form of (22) comes from the MAP general framework,where the cost function is proportional to the logarithm ofthe likelihood estimate C (x, v) ∝ − log (P (v|x)). The choiceof the bilateral kernel in (23) for the likelihood statisticsestimation in turn is motivated by the concept of filtering whilepreserving the edge [37]. This kind of kernels are used infilters that not only considering the pixel location correlationbut also the pixel intensity correlation.

Finally, the rectified integer OF function v (x) is a solutionof the global energy minimization problem in (18) with respectto the energy function E in (9) and the cost function in (22).

C. Subpixel Accuracy Matching

The accuracy level of the integer OF is equaling to 0.5 interms of the endpoint error [12]. And this is not sufficient forthe state-of-the-art results in OF estimation. Consequently, thesubpixel accuracy step is highly attractive.

The desirable OF solution v (x) in (8) is given in the seriesapproximation form. However, it is convenient to representthis equation in an iterative form:

vk (x) = vk−1 (x)+ δk v (x) , (24)

where v0 (x) ≡ v (x) and in our experiments δk = 2−2(k+1)

3

and the number of iterations K = 12. One might expectthat δk = 2−k because it is faster and more convenient, butexperimental results with the training datasets showed that thisvalue increases accuracy of the approach.

Also, for the discrete domain of the vector v, which isa parameter of our algorithm, we choose 8 directions in

Page 8: 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow

MOZEROV: CONSTRAINED OPTICAL FLOW ESTIMATION AS A MATCHING PROBLEM 2051

Fig. 6. Screen shot of the Middlebury public end-point error (EE) table. The proposed method is COFM (the red frame highlight). Layer++ (the yellowframe highlight) was published at the time of paper submission.

the displacement space such as: v = [vx , vy

] ⇐ vx , vy ∈{−1, 0, 1}. Thus, the full verity of this domain can be mappedinto the new label space with L = 9. Consequently, anyintermediate solution vk (x) belongs to the discrete vectorspace vk (x, l) with the set cardinality equaling to L×|�| andthis solution can be obtained in the framework of the MRFenergy minimization described in Subsection IV.A. However,we have to point four essential differences of using theformulas of Subsection IV.A in the case of the subpixelaccuracy level: While calculating the cost values C (x, l) theaim coordinate x+ v (l) in (10) does not belong to the imagegrid domain � and bicubic interpolation is necessary. In (10)we put α (x) = 1. The prior matrix G (lx, lx+1) in (15) is notfixed now and depends on the coordinate x, thereby it is nota sub-modular matrix. Thus, the graph cut techniques becomenot applicable for this energy minimization problem; in (15)we put ω (x) = λ.

Finally, we make the iterative solution of (24) consistentwith (8) w. r. t. the confidence function η (x)

v (x) = v (x)+ η (x)(

vK (x)− v (x)). (25)

D. OF Estimation Post-Processing

The result of OF estimation in (25) can be further improvedby applying post-processing filters. For this purpose we applythe bilateral filter, which is a robust edge-preserving filterintroduced by Tomasi and Manduchi [34]. This filter is usedin many computer vision and image processing tasks, and ageneral overview of the applications can be found in [38].In this work the idea to apply the bilateral filter is takenfrom [39], where bilateral filtering is applied to regularizethe optical flow computation. Thus, the final OF estimate is

Algorithm 1 COFM Algorithm for OF Estimation

1: Pre-estimate the disparity labeling map {vl}P E with (5).2: Estimate the multidirectional integer OFs vt by minimizing

energy in (18) w. r. t. (9).3: Estimate the refined intereg OF v by minimizing energy in

(18) w. r. t. (9) and the cost in (22).4: On the subpixel accuracy level iteratively estimate the OF

v in (8) by minimizing energy in (18) w. r. t. (9) for eachiteration in (24).

5: Estimate the resultant OF vP P by applying post-processfilters in (26).

calculated as follows

vP P (x) = 1

W

|r|<R

v (x)� (x + r) , (26)

where � () is the bilateral Gaussian kernel that is describedin (23) and W is the kernel normalization factor, which iscalculated as

W =∑

|r|<R

� (x + r) (27)

The kernel parameter R is equal to 7.

E. COFM Algorithm

The steps involved in our optical flow estimation are out-lined in Algorithm 1.

V. EXPERIMENTAL RESULTS AND DISCUSSION

In this section we consider two different aspects of theproposed technique. First, we point out the advantages of

Page 9: 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow

2052 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013

Army

0

1

220142

LL=

=

Backyard

0

1

254102

LL=

=

Basketball

0

1

17294

LL=

=

Dumptruck

0

1

167142

LL=

=

Evergreen

0

1

217162

LL=

=

Grove

( )tI x ( )l v ( )1v x ( )v x ( )v x

0

1

11153

LL==

Fig. 7. Results of OF estimation for the first six sets of the Middlebury test datasets. See Fig. 3 for the color coding of the flow.

applying the scheme of the COFM in OF estimation. Thenour results of estimation with the Middlebury datasets arediscussed.

The skeleton of the COFM method is described in Algo-rithm 1 and illustrated in Fig. 1. The COFM aims to solve threeseparate problems: constraints pre-estimation; the integer OFestimation; the subpixel accuracy OF estimation.

Crucial for the success of posing OF as a labelling opti-mization problem, is finding a constraint set of OF vectors.The results of experiments with the Middlebury datasets aregiven in Table I. From Table I we can see that for all thesets there is no loss of the constraint integer values, whichbelong to the GT of a set. The f alse detections term in

Table I means that the method finds useless values, whichincrease the final size of the label space. However, the numberof the labels is reasonable and the restriction of the initialspace is considerable. Analyzing the GT data we found thatthe maximum motion for the Middlebury training datasets isabout 22 pixels (Urban2). The initial space in this case isabout 2000 labels (45× 45) and for the Urban2 set this valueis 269 in Table I. For the Urban test set the initial size is about4000 labels if we assume the maximum motion for this set is32 (this value is obtained manually), and in our experimentthis space is constraint by 298 labels. The multidirectionalmatching approach assumes that the pre-defined label space isused several times: for each vt . Thus, the further restriction of

Page 10: 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow

MOZEROV: CONSTRAINED OPTICAL FLOW ESTIMATION AS A MATCHING PROBLEM 2053

0

1

177134

LL==

Mequon

0

1

183120

LL==

Schefflera

0

1

24795

LL==

Teddy

0

1

218131

LL==

Urban

0

1

167112

LL==

Wooden

0

1

10039

LL==

Yosemite

( )tI x ( )l v ( )1v x ( )v x ( )v x

Fig. 8. Results of OF estimation for the second six sets of the Middlebury test datasets. See Fig. 3 for the color coding of the flow.

the label space is practical and the first step of the integer OFestimation yields reliable statistics to do it. The results of thesecondary reduction of the label space are given in Table I asthe numbers in between brackets.

The second and third problems can also be solved inseparate modules. And we prepare Table II based on ourexperiments with the training sets to analyze the impact ofeach intrinsic OF estimation step on the estimation accuracy.To represent the impact of the subpixel refinement algorithmwe include in Table II the result after the first v1 and afterthe last v12 iteration of the module. For the result evaluationin Table II along with the conventional average endpoint error

(EE) we introduce a new error measure called the medianendpoint error (MEE). The meaning of this measure is: thatthe value of the point-wise EE for at least 50% of image pixelsless then given. It is intuitively understandable that the MEEcharacterizes limits of the subpixel accuracy level. In fact, ifwe assume that the number of all pixel-wise integer errorsdoes not overpass 50% of all image pixels (which holds forthe Middlebury datasets), then the MEE characterizes puresubpixel accuracy.

In Table II we can see that the occlusion handling stepv1 → v described in Subsection IV.B considerably improvesaccuracy only for the Urban-2 set (from 0.867 to 0.668).

Page 11: 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow

2054 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013

The result is expectable, because this sequence is with alarge range of integer motion vectors and includes significantoccluded regions. However, if we omit this step for anothertest sequences, the final result will be worse. The reason is:the occlusion handling step suppresses fatal errors that cannotbe corrected by the subpixel accuracy adjustment. Results inTable II show that the most important accuracy improvementis achieved with the subpixel accuracy OF estimation. Forexample, for the Dimetrodon and RubberWhale sets four timesimprovement is achieved (0.395 to 0.097) and (0.291 to 0.072)respectively. The final MEEs for the Grove 2 and Rubber-Whale sets are 0.031 and 0.028 respectively, and this meansthat the accuracy of the method is comparable with 1

30 of thepixel size. To illustrate the advantage of the proposed globalrefinement approach an extra entry with the results of simplefine-scale Lucas-Kanade refinement (based on the algorithmpresented in [40]) is included in Table II. We can see thatour algorithm considerably outperforms accuracy rate of thissimple and fast method.

Note, our scheme is flexible and, in principal, the threecore blocks of the COFM can be: improved, replaced byanother approach or used for solution of another problem.For example, the output of constraints pre-estimation blockis applicable in action recognition tasks for global descriptorestimation. Also, the subpixel accuracy block can be used instereo matching without constraints pre-estimation.

The basic OF equation is not robust, since the intensityof objects at the same point is not constant in most imagesequences [41]. Under the noisy conditions, the quality ofOF estimation algorithms decreases, and current state-of-the-art matching algorithms fail on noisy images. Several robustmatching costs that potentially can increase the performanceof matching techniques were proposed. The application ofsuch robust similarity measures is discussed, for example,in [42]. Note that under the low noisy conditions accuracyof the methods is usually decreasing due to the applicationof the robust costs and a tradeoff might be found in eachparticular case. However, for us it was more interesting to seethe performance of the SPOF implementation under noise.We find that the SPOF pre-estimation is sufficiently robustin comparison with the estimation algorithm itself. In ourexperiments we added Gaussian noises with mean 0 andvarious standard deviation(Std. Dev.) for all sequences ofthe Middlebury training datasets with known GT. Then threeevaluation characteristics have been estimated: Lost bins(%) -the percentage of the missed motion vectors; E E - theendpoint error that has the same meaning as in the previousexperiments; Redundancy(%) - the percentage of the falsedetected motion vectors. The characteristics has been averagedall over the training set. The result of evaluation is illustrated inFig. 5 we can see in Fig. 5 (a) that for the Gaussian noise withStd. Dev. less than 5 the SPOF pre-estimation algorithm doesnot lose motion vectors, thus does not increase OF estimationerrors. In contrast, the OF estimation module of our algorithmalmost completely fail up to Std. Dev. equaling to 5, seeFig. 5 (b). We find that the redundancy at this level of thenoise conditions is even less, than for images without noise,see Fig. 5 (c).

TABLE III

TIME IN SECONDS NEEDED FOR THE COMPLETION OF EACH

INTERMEDIATE TASK OF THE METHOD FOR THE URBAN SET

l (v) v1 (x) v (x) v (x) vP P (x)

4 70 460 910 930

To evaluate the performance of our method we calculatethe flow images for the Middlebury OF benchmark. Fig. 6provides the first 15 methods results of the Middlebury publicendpoint error table as a screen shot at the time of thesubmission. The average rank of our method for the mostrelevant EE measure was 11.1 (top 5) at the time of theresult submission and only one published method Layer++ [8]shows better result: 7.8. An executable program file with ademo version of the proposed COFM algorithm is availableat [43].

In the paper we do not aim to optimize the time neededfor the completion of each intermediate task of the method.Nevertheless, Table III is prepared to give a presentationabout the impact of each estimation step to the integralcomputational complexity of the method. We can see thatour subpixel refinement module uses almost half of the fullprocessing time. For tasks and datasets where the subpixelaccuracy is not very important our refinement module can beomitted and we can speedup OF estimation by a factor two.In our experiment we use C++ programming with the proces-sor type: Core Duo; and the clock speed: 3.16 GHz.

VI. CONCLUSION

The main idea of the paper is to find constraints that canbe used in OF estimation as an equivalent of the epipolarconstraint in stereo. An approach that can reduce generaloptical flow to the constrained matching problem by pre-estimating a 2D disparity labeling map is proposed. This pre-estimation is done with a simple frame-to-frame correlationtechnique also known as the digital symmetric-phase-only-filter (SPOF). We propose the method, which is a combinationof three independently solvable problems: constraints pre-estimation, the integer OF estimation, and the subpixel accu-racy OF estimation. Experiments with the Middlebury opticalflow datasets confirm our intuitive assumptions about strongcorrelation between the motion vector distribution of opticalflow and maximal peaks of SPOF outputs. Furthermore, theoverall performance of the proposed COFM scheme showspotential and achieves state-of-the-art results.

REFERENCES

[1] S. Roth and M. J. Black, “On the spatial statistics of optical flow,” Int.J. Comput. Vis. vol. 47, no. 1, pp. 1–10, 2007.

[2] W. Trobin, T. Pock, D. Cremers, and H. Bischof, “Continuous energyminimization via repeated binary fusion,” in Proc. Eur. Conf. Comput.Vis., 2008, pp. 677–690.

[3] A. Wedel, T. Pock, C. Zach, D. Cremers, and H. Bischof, “An improvedalgorithm for TV-L1 optical flow,” in Proc. Dagstuhl Motion Workshop,2008, pp. 23–45.

[4] A. Wedel, T. Pock, and D. Cremers, “Structure and motion-adaptiveregularization for high accuracy optic flow,” in Proc. 12th Int. Conf.Comput. Vis., 2009, pp. 1663–1668

Page 12: 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5 ...mozerov/2013TIP_OF.pdf · 2044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013 Constrained Optical Flow

MOZEROV: CONSTRAINED OPTICAL FLOW ESTIMATION AS A MATCHING PROBLEM 2055

[5] M. Werlberger, W. Trobin, T. Pock, A. Wedel, D. Cremers, andH. Bischof, “Anisotropic Huber-L1 optical flow,” in Proc. Brit. Mach.Vis. Conf., 2009, pp. 1–11.

[6] H. Zimmer, A. Bruhn, J. Weickert, L. Valgaerts, A. Salgado, B. Rosen-hahn, and H. P. Seidel, “Complementary optic flow,” in Proc. 7th Int.Conf. Energy Minimization Methods Comput. Vis. Pattern Recognit.,2009.

[7] D. Sun, S. Roth, and M. J. Black, “Secrets of optical flow estimation andtheir principles,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,Jun. 2010, pp. 2432–2439.

[8] D. Sun, E. Sudderth, and M. Black, “Layered image motion with explicitocclusions, temporal consistency, and depth ordering,” in Proc. NeuralInf. Process. Syst., vol. 23. 2010, pp. 2226–2234.

[9] C. Rhemann, A. Hosni, M. Bleyer, C. Rother, and M. Gelautz, “Fastcost-volume filtering for visual correspondence and beyond,” in Proc.IEEE Conf. Comput. Vis. Pattern Recognit., 2011, pp. 3017–3024.

[10] S. Volz, A. Bruhn, L. Valgaerts, and H. Zimmer, “Modeling temporalcoherence for optical flow,” in Proc. 12th Int. Conf. Comput. Vis., 2011,pp. 1–8.

[11] H. Rashwan, D. Puig, and M. Garcia, “On improving the robustness ofdifferential optical flow,” in Proc. 12th Int. Conf. Comput. Vis. ArtemisWorkshop, 2011, pp. 1–8.

[12] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. J. Black, and R. Szeliski,“A database and evaluation methodology for optical flow,” in Proc. 12thInt. Conf. Comput. Vis., 2007, pp. 1–8.

[13] B. Horn and B. Schunck, “Determining optical flow,” Artif. Intell.,vol. 17, nos. 1–2, pp. 185–203, 1981.

[14] B. Lucas and T. Kanade, “An iterative image registration technique withan application to stereo vision,” in Proc. Int. Joint Conf. Artif. Int., 1981,pp. 674–679.

[15] A. Klaus, M. Sormann, and K. Karner, “Segment-based stereo matchingusing belief propagation and a self-adapting dissimilarity measure,” inProc. 18th Int. Conf. Pattern Recognit., 2006, pp. 15–18.

[16] Z. Wang and Z. Zheng, “A region based stereo matching algorithm usingcooperative optimization,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit., Jun. 2008, pp. 1–8.

[17] Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister, “Stereo match-ing with color-weighted correlation, hierarchical belief propagation andocclusion handling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31,no. 3, pp. 492–504, Mar. 2009.

[18] S. Roy and I. J. Cox, “A maximum-flow formulation of the N-Camerastereo correspondence problem,” in Proc. 12th Int. Conf. Comput. Vis.,1998, pp. 492–499.

[19] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy min-imization via graph cuts,” IEEE Trans. Pattern Anal. Machine Intell.,vol. 23, no. 11, pp. 1222–1239, Nov. 2001.

[20] Q. Chen, M. Defrise, and F. Deconinck, “Symmetric phase-only matchedfiltering of Fourier-Mellin transforms for image registration and recog-nition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 16, no. 12,pp. 1156–1168, Dec. 1994.

[21] V. Kumar, A. Mahalanobis, and R. D. Juday, Correlation patternrecognition, Cambridge Univ. Press, Cambridge, U.K. 2006.

[22] H. Nishihara and P. Crossley, “Measuring photolithographic overlayaccuracy and critical dimensions by correlating binarized laplacian ofgaussian convolutions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 10,no. 1, pp. 17–30, Jan. 1988.

[23] T. Kanade and M. Okutomi, “A stereo matching algorithm with anadaptive window: Theory and experiment,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 16, no. 9, pp. 920–932, Sep. 1994.

[24] M. P. Wernet, “Symmetric phase only filtering: A new paradigm fordpiv data processing,” Meas. Sci. Technol., vol. 16, no. 3, pp. 601–618,2005.

[25] V. Lempitsky, S. Roth, and C. Rother, “Fusion flow: Discrete-continuousoptimization for optical flow estimation,” in Proc. IEEE Conf. Comput.Vis. Pattern Recognit., Jun. 2008, pp. 1–8.

[26] B. Glocker, N. Paragios, N. Komodakis, G. Tziritas, and N. Navab,“Optical flow estimation with uncertainties through dynamic MRFs,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2008, pp. 1–8.

[27] L. Xu, J. Jiaand, and Y. Matsushita, “Motion detail preserving opticalflow estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,Jun. 2010, pp. 1293–1300.

[28] C. Lei and Y. H. Yang, “Optical flow estimation on coarse-to-fine region-trees using discrete optimization,” in Proc. 12th Int. Conf. Comput. Vis.,2009, pp. 1562–1569.

[29] B. Glocker, N. Komodakis, N. Paragios, G. Tziritas, and N. Navab, “Interand intra-modal deformable registration: Continuous deformations meetefficient optimal linear programming,” Inf. Process. Med. Imag., LNCS4584, pp. 408–420, Jul. 2007.

[30] V. Kolmogorov, “Convergent tree-reweighted message passing forenergy minimization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28,no. 10, pp. 1568–1583, Oct. 2006.

[31] J. Kim, V. Kolmogorov, and R. Zabih, “Visual correspondence usingenergy minimization and mutual information,” in Proc. IEEE Int. Conf.Comput. Vis., Oct. 2003, pp. 1033–1040.

[32] Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister, “Stereo match-ing with color-weighted correlation, hierarchical belief propagation andocclusion handling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,Mar. 2009, pp. 2347–2354.

[33] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov,A. Agarwal, M. Tappen, and C. Rother, “A comparative study of energyminimization methods for markov random fields,” in Proc. Eur. Conf.Comput. Vis., 2006, pp. 16–29.

[34] J. Sun, Y. Li, S. Kang, and H.-Y. Shum, “Symmetric stereo matching forocclusion handling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,Jun. 2005, pp. 399–406.

[35] L. Alvarez, R. Deriche, T. Papadopoulo, and J. Sanchez, “Symmetricaldense optical flow estimation with occlusions detection,” Int. J. Comput.Vis., vol. 75, no. 3, pp. 371–385, 2007.

[36] A. S. Volz, L. Valgaerts, and H. Zimmer, “Modeling temporal coher-ence for optical flow,” in Proc. 12th Int. Conf. Comput. Vis., 2011,pp. 1116–1123.

[37] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and colorimages,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jan. 1998,pp. 839–846.

[38] S. Paris, P. Kornprobst, J. Tumblin, and F. Durand, “Bilateral filtering:Theory and applications,” Found. Trends Comput. Graph. Vis., vol. 4,no. 1, pp. 1–73, 2008.

[39] J. Xiao, H. Cheng, H. Sawhney, C. Rao, and M. Isnardi, “Bilateralfiltering based optical flow estimation with occlusion detection,” in Proc.Eur. Conf. Comput. Vis., 2006, pp. 211–224.

[40] J. Y. Bouguet, “Pyramidal implementation of the Lucas Kanade featuretracker description,” Microsoft Research Lab., Intel Corporation, SantaClara, CA, USA, Tech. Rep., 2000, pp. 1–9.

[41] F. Alter, Y. Matsushita, and X. Tang, “An intensity similarity measurein low-light conditions,” in Proc. Eur. Conf. Comput. Vis., 2006, pp.267–280.

[42] Y. S. Heo, K. M. Lee, and S. U. Lee, “Simultaneous depth reconstructionand restoration of noisy stereo images using non-local pixel distribution,”in Proc. 12th Int. Conf. Comput. Vis., 2007, pp. 1–8.

[43] M. Mozerov, (2012) Additional materials: Test images and an executableprogram file with a demo version of the proposed cofm algorithm,[Online]. Available: http://www.cvc.uab.es/~mozerov/results/

Mikhail G. Mozerov (M’12) received the M.S.degree in physics from the Moscow State University,Moscow, Russia, and the Ph.D. degree in digitalimage processing from the Institute of InformationTransmission Problems, Russian Academy of Sci-ences, Moscow, in 1982 and 1995. He is currentlya Project Director with the Computer Vision Cen-ter of Universitat Autonoma de Barcelona (UAB),Barcelona, Spain.

His current research interests include Signal andImage Processing, Pattern Recognition, and Digital

Holography.