Eigenspace-based tracking for feature points

9
Eigenspace-Based Tracking for Feature Points Chen PENG , Qian CHEN, and Wei-xian QIAN Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense, Nanjing 210094, China (Received September 24, 2013; revised February 21, 2014; Accepted March 24, 2014) Feature point tracking deals with image streams that change over time. Most existing feature point tracking algorithms only consider two adjacent frames at a time, and forget the feature information of previous frames. In this paper, we present a new eigenspace-based tracking method that learns an eigenspace representation of training features online, and finds the target feature point with Gauss–Newton style search method. A coarse-to-fine processing strategy is introduced to handle large affine transformations. Several simulations and experiments on real images indicate the effectiveness of the proposed feature tracking algorithm under the conditions of large pose changes and temporary occlusions. # 2014 The Japan Society of Applied Physics Keywords: feature points, visual tracking, eigenspace methods, occlusion 1. Instruction Feature points have been long used in the context of motion, stereo, and tracking problems. Given an image sequence, feature points tracking is to find the correspon- dences between the feature points in the images that occur due to the same object in the real world at different time instants. 1) However, due to pose, illumination or appearance changes of the object and occlusions from the field of view, feature points may disappear, enter and leave the view field. In this situation, feature trajectories are always incomplete in typical feature tracking methods. In this paper, a robust eigenspace-based tracking algorithm for feature points is proposed, the main advantage of which is that it can track feature points while training eigenspaces and keep feature trajectories complete even in situations invol- ving occlusion, background clutter, and noise. The remaining part of this paper is organized as follows. The next section reviews related work on feature point tracking and applications of principal component analysis (PCA) in visual tracking. The details of our algorithm are described in Sect. 3. The results of numerous experiments and performance evaluation are presented in Sects. 4 and 5. 2. Related Work There is a rich literature in feature point tracking, and KLT 3–5) is one of most widely used tracking methods for feature points. The KLT tracking method makes use of the spatial intensity gradient of the images to iteratively track feature points between frames. The main shortage of KLT is that it only considers two adjacent frames at a time, and forgets the feature information of previous frames, so KLT usually fails when occlusions happened. Some algorithms 6–8) based on feature trajectory optimizing are proposed to solve this problem. However, the effects of these algorithms are limited by the accuracy of the point tracking method. If the feature of interest points in each frame was learned and memorized, then we would relocate feature points after occlusions by comparing the samples with trained databases. Black and Jepson proposed the technique of ‘‘eigentrack- ing’’ 2) which tracked the object by optimizing the transfor- mation between the eigenspace and the image. They used a pre-trained view-based eigenbases representation which needed learning a set of view-based eigenbases before the tracking task began. However, for feature point tracking, we need an online training method which can update eigenbases during the point tracking process. We use the Karhunen–Loeve (KL) transform 9) to build eigenbases of the feature point. The KL transform is a preferred method for approximating a set of images by a low dimensional subspace. Levy and Lindenbaum proposed an efficient online update procedure for KL transform named sequential Karhunen–Loeve (SKL). 10) Ross et al. 11) extend the SKL algorithm presenting a new incremental learning algorithm for robust visual tracking (IVT) that correctly updates the eigenbases as well as the mean. Our work is motivated in part by the eigenspace representation, 2,12,13) the on-line update scheme, 9–11) and the coarse-to-fine processing strategy. 2) In contrast to the KLT method, our algorithm is more robust that it can keep on tracking when the feature point reappears after occlu- sions. Compared with the eigentracking algorithm, 2) our algorithm does not require a training phase but learns the eigenbases on-line during the tracking process. The IVT method uses a particle filter for motion parameter estima- tion, which is time-consuming for multiple feature points. Compared with the IVT method, we present an iterative optimization method which is computationally efficient to compute the affine transform. 3. Sequential Eigenspace-Based Tracking for Feature Points 3.1 Eigen representation Given a set of training feature point images, we have a K n matrix A ¼fI 1 ; ... ; I n g, where each column I i is a K-dimensional vector by scanning each input image in the standard lexicographic order. We assume that n K, and the sample mean of the training images is I A ¼ 1 n P n i¼1 I i . We use Singular Value Decomposition (SVD) to decompose ðA I A Þ as E-mail address: [email protected] OPTICAL REVIEW Vol. 21, No. 3 (2014) 304–312 304

Transcript of Eigenspace-based tracking for feature points

Page 1: Eigenspace-based tracking for feature points

Eigenspace-Based Tracking for Feature PointsChen PENG�, Qian CHEN, and Wei-xian QIAN

Jiangsu Key Laboratory of Spectral Imaging and Intelligent Sense, Nanjing 210094, China

(Received September 24, 2013; revised February 21, 2014; Accepted March 24, 2014)

Feature point tracking deals with image streams that change over time. Most existing feature point tracking algorithmsonly consider two adjacent frames at a time, and forget the feature information of previous frames. In this paper, wepresent a new eigenspace-based tracking method that learns an eigenspace representation of training features online,and finds the target feature point with Gauss–Newton style search method. A coarse-to-fine processing strategy isintroduced to handle large affine transformations. Several simulations and experiments on real images indicate theeffectiveness of the proposed feature tracking algorithm under the conditions of large pose changes and temporaryocclusions. # 2014 The Japan Society of Applied Physics

Keywords: feature points, visual tracking, eigenspace methods, occlusion

1. Instruction

Feature points have been long used in the context ofmotion, stereo, and tracking problems. Given an imagesequence, feature points tracking is to find the correspon-dences between the feature points in the images that occurdue to the same object in the real world at different timeinstants.1) However, due to pose, illumination or appearancechanges of the object and occlusions from the field of view,feature points may disappear, enter and leave the view field.In this situation, feature trajectories are always incomplete intypical feature tracking methods.

In this paper, a robust eigenspace-based tracking algorithmfor feature points is proposed, the main advantage of which isthat it can track feature points while training eigenspaces andkeep feature trajectories complete even in situations invol-ving occlusion, background clutter, and noise.

The remaining part of this paper is organized as follows.The next section reviews related work on feature pointtracking and applications of principal component analysis(PCA) in visual tracking. The details of our algorithm aredescribed in Sect. 3. The results of numerous experimentsand performance evaluation are presented in Sects. 4 and 5.

2. Related Work

There is a rich literature in feature point tracking, andKLT3–5) is one of most widely used tracking methods forfeature points. The KLT tracking method makes use of thespatial intensity gradient of the images to iteratively trackfeature points between frames. The main shortage of KLT isthat it only considers two adjacent frames at a time, andforgets the feature information of previous frames, so KLTusually fails when occlusions happened. Some algorithms6–8)

based on feature trajectory optimizing are proposed to solvethis problem. However, the effects of these algorithms arelimited by the accuracy of the point tracking method.

If the feature of interest points in each frame was learnedand memorized, then we would relocate feature points afterocclusions by comparing the samples with trained databases.

Black and Jepson proposed the technique of ‘‘eigentrack-ing’’2) which tracked the object by optimizing the transfor-mation between the eigenspace and the image. They useda pre-trained view-based eigenbases representation whichneeded learning a set of view-based eigenbases before thetracking task began. However, for feature point tracking, weneed an online training method which can update eigenbasesduring the point tracking process.

We use the Karhunen–Loeve (KL) transform9) to buildeigenbases of the feature point. The KL transform is apreferred method for approximating a set of images by a lowdimensional subspace. Levy and Lindenbaum proposed anefficient online update procedure for KL transform namedsequential Karhunen–Loeve (SKL).10) Ross et al.11) extendthe SKL algorithm presenting a new incremental learningalgorithm for robust visual tracking (IVT) that correctlyupdates the eigenbases as well as the mean.

Our work is motivated in part by the eigenspacerepresentation,2,12,13) the on-line update scheme,9–11) andthe coarse-to-fine processing strategy.2) In contrast to theKLT method, our algorithm is more robust that it can keepon tracking when the feature point reappears after occlu-sions. Compared with the eigentracking algorithm,2) ouralgorithm does not require a training phase but learns theeigenbases on-line during the tracking process. The IVTmethod uses a particle filter for motion parameter estima-tion, which is time-consuming for multiple feature points.Compared with the IVT method, we present an iterativeoptimization method which is computationally efficient tocompute the affine transform.

3. Sequential Eigenspace-Based Tracking for FeaturePoints

3.1 Eigen representationGiven a set of training feature point images, we have a

K� n matrix A ¼ fI1; . . . ; Ing, where each column Ii is aK-dimensional vector by scanning each input image in thestandard lexicographic order. We assume that n � K, andthe sample mean of the training images is �IA ¼ 1

n

Pni¼1 Ii.

We use Singular Value Decomposition (SVD) to decomposeðA� �IAÞ as�E-mail address: [email protected]

OPTICAL REVIEW Vol. 21, No. 3 (2014) 304–312

304

Page 2: Eigenspace-based tracking for feature points

ðA� �IAÞ ¼ U�VT; ð1Þwhere � is a diagonal matrix with eigenvalues �1; . . . ; �n

computed from ðA� �IAÞðA� �IAÞT and listed in descendingorder, U is an orthogonal matrix whose columns areeigenvectors computed from ðA� �IAÞðA� �IAÞT correspond-ing to eigenvalues in �, and V is an orthogonal matrixwhose columns are eigenvectors computed fromðA� �IAÞTðA� �IAÞ. (The notation ðA� �IAÞ is meant as ashorthand for the matrix ½ðI1 � �IAÞ � � � ðIn � �IAÞ�.)

If the singular values �i are small when i > L for some L,then we can keep only the first L columns of U, and give thetruncated transformation14)

YL ¼ UTLðA� �IAÞ; ð2Þ

which transforms the data to a new coordinate system suchthat the greatest variance by any projection of the datacomes to lie on the first coordinate (called the first principalcomponent), the second greatest variance on the secondcoordinate, and so on. Figure 1 shows 5 samples of a11� 11 feature point in a sequence of 50 frames (top row),and the first 5 columns of U reshaped as eigen images(bottom row).

3.2 Updating of eigenbases and meanWhile tracking feature points, the training data is

incremental. A modified SKL algorithm11) is introduced toupdate eigenbases as well as mean when more data arrives.We provide a brief review of this algorithm.

Assuming a K� m matrix B of new images is available,forming a larger K� ðnþ mÞ matrix C ¼ ½A B� in whichA is the K� n matrix defined in Sect. 3.1. The goal is tocalculate the SVD: ðC� �ICÞ ¼ U0�0V 0T. Ross et al. haveproofed that the SVD of ðC� �ICÞ is equal to the SVD of thehorizontal concatenation of ðA� �IAÞ, ðB� �IBÞ, and oneadditional vector

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffinm=ðnþmÞp ð�IB � �IAÞ.11) Letting B ¼

½�Imþ1 � �IB � � � �Imþn � �IBffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffinm=ðnþmÞp ð�IB � �IAÞ�, the

concatenation of ðA� �IAÞ and B can be expressed in apartitioned form as follows:

A� �IA B� � ¼ U ~B

� � � UTB

0 ~BTB

" #VT 0

0 I

� �; ð3Þ

where ½U ~B� is the columns orthonormal matrix perform-ing an orthonormalization process on the matrix ½U B�. Let

R ¼ ½� UTB0 ~BTB

�. The SVD of R is R ¼ ~U ~� ~VT. Now the SVDof ½A� �IA B� can be expressed as

A� �IA B� � ¼ U ~B

� �~U

� �~� ~VT VT 0

0 I

� � : ð4Þ

Finally the updating process is that U0 ¼ ð½U ~B� ~UÞ, �0 ¼~�, and �IC ¼ n

nþm�IA þ m

nþm�IB.

3.3 Feature point trackingAssuming Jðxþ uÞ is the feature image in the new frame,

and u is affine motion of the feature at point x ¼ ½x y�Twhich can be expressed as

u ¼ Dxþ d; ð5Þwhere D is the 2� 2 deformation matrix, D ¼ ½ dxx dxy

dyx dyy�,

and d is the translation of the feature window’s center,d ¼ ½dx dy�T. Feature tracking means determining the sixparameters that appear in the deformation matrix D and thedisplacement vector d.

Convert Jðxþ uÞ into a column vector Jðxþ uÞ incolumn-major order by the transformation of vectorization:

Jðxþ uÞ ¼ vecðJðxþ uÞÞ: ð6ÞThe columns of UL defined in Eq. (2) span an eigenspace

centered at �I, where �I is the mean of the training images.Given a feature sample Jðxþ uÞ, we define " as the distancefrom the feature sample to the eigenspace center. Since thesample is assumed to be generated from the eigenspace, thesmaller " is, the more likely the sample to be the targetfeature. This idea is similar to Ref. 15 in spirit. In that way, "is the local minima if the sample is exactly correct. So weminimize " to track the feature point in the new frame. " canbe expressed as

"2 ¼ 1

s2kðJðxþ uÞ � �IÞ � ULU

TLðJðxþ uÞ � �IÞk2|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}

d2t

þ k�0�1L UT

LðJðxþ uÞ � �IÞk2|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}d2w

: ð7Þ

The distance " consists of two components. The first is thedistance-to-subspace, dt, which is from the sample to theprojected point of the eigenspace. It has the same form as theenergy function mentioned in Refs. 15 and 16. The second isthe distance-within-subspace, dw, which is the Mahalanobis

Fig. 1. Sample images of the feature point are shown along with the first five columns of U.

OPTICAL REVIEW Vol. 21, No. 3 (2014) 305C. PENG et al.

Page 3: Eigenspace-based tracking for feature points

distance from the projected sample to the subspace center.�0

L is a diagonal matrix with valuesffiffiffiffiffi�1

p; . . . ;

ffiffiffiffiffiffi�L

p, that are

square roots of eigenvalues in �L. dt is scaled by s2, aswhich increases, the contribution of dt is reduced. Themaximum-likelihood estimator for s2 is given by Tippingand Bishop,17)

s2 ¼ 1

K� L

XKi¼Lþ1

�i; ð8Þ

which has a clear interpretation as the eigenvalues ‘‘lost’’ inthe projection, averaged over the lost dimensions.

Assuming the motion u to be small, then the termJðxþ uÞ can be approximated by its Taylor series expansion

truncated to the linear term:

Jðxþ uÞ ¼ JðxÞ þ gTðxÞu; ð9Þwhere

g ¼ @J

@x

@J

@y

� �T: ð10Þ

Define

d ¼ dxx dyx dxy dyy dx dy� �T

; ð11Þwhich is a vector that collects the unknown entries of thedeformation D and the displacement d. Then we define hðxÞ(see Appendix) as

hðxÞ ¼ @JðxÞ@x

x@JðxÞ@y

x@JðxÞ@x

y@JðxÞ@y

y@JðxÞ@x

@JðxÞ@y

� �; ð12Þ

which satisfies the equation

hðxÞd ¼ gTðxÞu: ð13ÞThen

Jðxþ uÞ ¼ JðxÞ þ hðxÞd: ð14ÞCombined with Eq. (14), Eq. (7) would yield the follow-

ing equation:

"2 ¼ 1

s2kðJðxÞ � �IÞ � ULU

TLðJðxÞ � �IÞ

þ ½hðxÞ � ULUTLhðxÞ�dk2

þ k�0�1L UT

LðJðxÞ � �IÞ þ�0�1L UT

LhðxÞdk2: ð15ÞDefine

E ¼ hðxÞ � ULUTLhðxÞ; ð16Þ

F ¼ ðJðxÞ � �IÞ � ULUTLðJðxÞ � �IÞ; ð17Þ

M ¼ �0�1L UT

LhðxÞ; ð18ÞN ¼ �0�1

L UTLðJðxÞ � �IÞ; ð19Þ

then Eq. (7) can be simplified as

"2 ¼ 1

s2kF þ Edk2 þ kN þMdk2: ð20Þ

The problem of tracking can be defined as finding the dby minimizing "2, which we solve by means of a Gauss–Newton style iterative search for the optimal values. Weyield the following matrix equation:

@"2

@d¼ 2

s2ETðF þ EdÞ þ 2MTðN þMdÞ ¼ 0: ð21Þ

Then, we get d by solving the equation

1

s2ETEþMTM

d ¼ � 1

s2ETF þMTN

: ð22Þ

Because of the linearization of Eq. (9), d is approximatelysatisfied. However, Eq. (22) can be solved iteratively. SetD0 ¼ 0, d0 ¼ 0, J0 ¼ JðxÞ as the initial estimates. At the ithiteration, we solve Eq. (22) combined with Eqs. (16)–(19)that updated by Ji ¼ Ji�1ðxþDi�1xþ di�1Þ.

During tracking, the affine deformation D of the featurewindow is likely to be small. It is effective to reduce thenumber of parameters in D, then D can be simplified toD ¼ ½ dxx �dyx

dyx dxx� with two parameters which supports nonre-

flective similarity transformation (rotation, and isotropicscaling). In this situation, we define (see Appendix)

hnsðxÞ ¼ @JðxÞ@x

xþ @JðxÞ@y

y@JðxÞ@y

x� @JðxÞ@x

y@JðxÞ@x

@JðxÞ@y

� �; ð23Þ

to get

dns ¼ dxx dyx dx dy� �T

: ð24ÞMore simplified D is set zero for pure translation. Similarlywe define

hptðxÞ ¼ @JðxÞ@x

@JðxÞ@y

� �; ð25Þ

to get

dpt ¼ dx dy� �T

: ð26Þ

3.4 Multi-scale trackingAt every tracking step, we set J0 ¼ JðxÞ as the initial

estimate, of which x is the location that is the result oftracking in the previous frame. If feature motion betweenadjacent frames was too large, then the initial estimatewould be far away from the real value, therefore the Gauss–Newton style search method often gets stuck in local minimaor is distracted by outliers, as described in Ref. 11.

To solve this problem a coarse-to-fine tracking strategyis defined. At the coarse level, the large and discontin-uous feature motions are smoothed into small and contin-uous ones. So the feature points can be located roughly at

OPTICAL REVIEW Vol. 21, No. 3 (2014)306 C. PENG et al.

Page 4: Eigenspace-based tracking for feature points

the coarse level and be adjusted accurately at the finelevel.

In the coarse-to-fine strategy, we construct a pyramid ofimages by spatial filtering and subsampling for each imagein the training set. Then for each level of pyramid weconstruct an eigenspace description, as illustrated in Fig. 2.During tracking process, we firstly track the feature points atthe coarse level where the physical distance between theinitial guess and the real image will be small. The newparameters of affine transformation are then projected to thenext level as initial estimates. By tracking in this level theseaffine parameters will be refined. The process continues tothe finest level.

Figure 3 shows the example of tracking process in thecoarse-to-fine strategy. The initial location of the feature point[yellow window in Fig. 3(a)] is guessed far away from thereal value [blue window in Fig. 3(a)] because of occlusions inthe previous frames. In Figs. 3(b), 3(c), and 3(d), the grayvalue of pixels stands for the probability that how likely thefeature point locates at the corresponding pixel. The brighterthe image region is, the more likely it is the location of thefeature point. Shown as the red plot in Fig. 3(b), the originalGauss–Newton method gets stuck in local minima. However,with the coarse-to-fine model, our tracking method over-comes the local minima at the coarse level, shown as thegreen plot in Fig. 3(c), and converges to the right result at thefine level, shown as the red plot in Fig. 3(d).

3.5 Summary of the feature tracking algorithmWe now provide a summary of the proposed feature

tracking algorithm in Table 1. At the very beginning whenthe eigenbases is empty (i.e., before the first update), ourtracker works as a KLT tracker. At every tracking step, weestimate the likelihood of the tracking result. According toRef. 15, the likelihood of an input feature image J beinggenerated from the eigenspace can be expressed as

PðJjxþ uÞ ¼ exp

� 1

s2kðJðxþ uÞ � �IÞ

� ULUTLðJðxþ uÞ � �IÞk2

� expð�k�0�1

L UTLðJðxþ uÞ � �IÞk2Þ: ð27Þ

If the likelihood PðJjxþ uÞ was small, we believe someocclusions had happened. And we do not make use of thetracking result until we trust the likelihood.

After feature point tracking, point pairs are matchedbetween two frames. Based on these point pairs, we use theLeast Median Squares (LMS) algorithm18) to estimate thegeometric transformation of the whole object. The advantageof LMS is that it requires no setting of thresholds or a prioriknowledge of the variance of the error.

4. Simulations

In this section, we present simulation results to compareour tracking algorithm with the KLT tracker and the IVTmethod. The IVT method is modified that it would stop if thelikelihood expressed as Eq. (27) was smaller than 0.6. The

(a) (b)

(c) (d)

Fig. 3. (Color online) (a) The feature for tracking. (b) Searchingprocess by the original Gauss–Newton method in the probabilitymap. (c) Searching process in the probability map of the coarselevel. (d) Searching process in the probability map of the fine level.

Fig. 2. Example of a multi-scale eigenspace representation, top row shows four sample images of the feature point in three levelpyramids, bottom row shows the first four eigenbases reshaped as images in each eigenpyramid.

OPTICAL REVIEW Vol. 21, No. 3 (2014) 307C. PENG et al.

Page 5: Eigenspace-based tracking for feature points

Table 1. A summary of the proposed feature tracking algorithm.

The eigenspace-based feature tracking algorithm

1. Locate the feature point in the first frame, either manually or by using a featuredetector (e.g., the Harris interest point detector).

2. Initialize the eigenbases U to be empty, and the mean �I to be the appearance ofthe feature point in the first frame.

3. If the eigenbases U is empty, the feature point is tracked by the KLT tracker, or elsethe feature point is tracked by a coarse-to-fine iterative algorithm:For each level in the pyramid from coarse to fine:(i) Perform k iterations of the update to produce an updated estimate of the

transformation D and d.(ii) Project the D and d to the next level in the pyramid.(iii) Warp the input image by D and d.(iv) Repeat.

4. Estimate the likelihood of the tracking result being generated from the eigenspace.If the likelihood is larger than some threshold (which is set 0.6 in our experiments),we store the tracking result, or else we believe occlusions happened and discard theresult.

5. When the desired number of stored new images have been accumulated, perform anincremental update of the eigenbases and mean. In our experiments, the update isperformed every third frame.

6. Estimate the geometric transformation of the whole object by LMS.7. Go to step 3.

Fig. 4. (Color online) Simulation results. The red window stands for the results of our proposed tracker, and the green one for the KLTtracker, the yellow one for the IVT tracker.

OPTICAL REVIEW Vol. 21, No. 3 (2014)308 C. PENG et al.

Page 6: Eigenspace-based tracking for feature points

71� 71 target image shown as the first frame (#1) in Fig. 4is warped with simulated affine parameters generated with

random noise (we set D � ½Nð1; 1=202Þ Nð0; 1=202ÞNð0; 1=202Þ Nð1; 1=202Þ �, d � ½Nð0; 1Þ

Nð0; 1Þ �in our experiment). We track the 11� 11 feature point in thetarget center, compared with the true parameters. Two levelspyramid is used in both our tracking algorithm and the KLTtracker. For each level, the maximal iteration number is setto 10, the subsample factor is set to 4. In our algorithm andthe IVT method, the number of eigenvectors is set to 16, theupdate of eigenbases and mean is performed every thirdframe. The temple size is set to default 32� 32 and thenumber of particles is set to 600 in the IVT method.

The total frames is set 60, and at the 40th, 41st and 42ndframe, we darken the target image to simulate the occlusion.Figure 4 shows the tracking results where the red windowstands for the results of our proposed tracker, while the greenone for the KLT tracker, and the yellow one for the IVTtracker. The errors of estimated affine parameters are definedas k�Dk and k�dk, where �D and �d are differencevalues between the estimation and the real data. Figure 5plots k�Dk and k�dk as a function of the frame number.

Shown as Fig. 5, all algorithms mentioned have the sameperformance on translation estimation, our algorithm and theKLT tracker have a more accurate estimation for deforma-tion parameters. When the simulated occlusion happens at40th frame, our algorithm computes the likelihood of theresult being generated from the eigenspace, and stopstracking since the likelihood is small, shown as Fig. 6. The

target image resumes at the 43rd frame, then our algorithmrelocates the feature point of which the likelihood is trusted,and goes on tracking. The modified IVT method have thesame process to handle occlusions as our algorithm, howeverthe deformation estimation of the IVT method is wrong afterocclusions, shown as the yellow window in Fig. 4 (#43 #50#60). The KLT tracker fails because of occlusions.

Figure 7 shows the computation time of each trackercompiled in Matlab. Our algorithm is more computationallyefficient than the IVT tracker, costing only a little more timethan the KLT tracker.

5. Experiments on Real Data

In the experiments, the proposed feature tracker is testedin different scenarios. During tracking, we track 11� 11

feature using two levels pyramid with the subsample factorset to be 2. For each level, the maximal iteration number isset to 10. The number of eigenvectors is set to 16. The batchsize for the eigenbases update is set to 3. And the likelihoodthreshold of the tracking result is set 0.6. For multiplefeature points tracking, we use LMS to estimate thegeometric transformation of the whole object.

We first tested our algorithm using a challenging videostudied in Ref. 11. Figure 8 shows the empirical resultsusing the proposed method. Since the affine deformation ofthe feature in this video is small, we employ the form oftranslation described as Eq. (26). Our tracker tracks thefeatures with red widows, while the KLT tracker with greenones for comparison. The face is tracked by a red dashed

Fig. 6. (Color online) The history plot of the likelihood of theresult being generated from the eigenspace in our algorithm.

(a)

(b)

Fig. 5. (Color online) The history plot of tracking errors:(a) k�Dk; (b) k�dk. Fig. 7. (Color online) The history plot of the computation time.

OPTICAL REVIEW Vol. 21, No. 3 (2014) 309C. PENG et al.

Page 7: Eigenspace-based tracking for feature points

Fig. 8. (Color online) Tracking results of a human face undergoing temporary occlusion. The red window stands for the results of ourproposed tracker, while the green one for the KLT tracker. The yellow window means the result is not trusted in our algorithm. The face istracked by a red dashed window.

Fig. 9. (Color online) Tracking results of a box undergoing temporary occlusion and large pose change. The red window stands for theresults of our proposed tracker, while the green one for the KLT tracker. The yellow window means the result is not trusted in ouralgorithm. The symbol on the box is tracked by a red dashed window.

OPTICAL REVIEW Vol. 21, No. 3 (2014)310 C. PENG et al.

Page 8: Eigenspace-based tracking for feature points

window. When the red widow turns yellow, it means theresult is not trusted. The yellow window stays put since thereis no prediction algorithm used. Note that our method stopstracking under temporary occlusion (#29, #86, #174) andstructured appearance change such as winks (#59), and isable to keep on tracking when the feature points reappear(#67, #146, #238). Because of the robustness of our featuretracking algorithm, the LMS method is able to track the faceunder occlusions, even when there are only two pairs offeature points (#86).

The second image sequence, shown in Fig. 9, containsa box moving in different poses and under temporaryocclusions. We employ the form of nonreflective similaritytransformation described as Eq. (24). Our algorithm is ableto track the feature as the target box rotates by a large angle,and keep on tracking after occlusions (#265). The colors oftrack window have the same meaning as that in Fig. 8.

Figure 10 shows the results of tracking feature points on amoving vehicle captured using an infrared (IR) camera, as itpassed by trees. Although there are occlusions in IR images

with low resolution and bad contrast, our algorithm is able totrack the feature points well.

6. Conclusion

In this paper, we have proposed a robust feature pointtracking algorithm that incrementally learns a low dimen-sional eigenbases representation. The affine transform of thefeature point is computed iteratively by a coarse-to-fineprocessing strategy. Our experiments demonstrate theeffectiveness of the proposed tracker in indoor and outdoorenvironments where the target feature undergo large posechanges and temporary occlusions.

Although our feature tracker performs well, it occasion-ally fails if the feature is poor. Since our feature trackerworks individually, a more robust global optimizationmethod such as RANSAC19) used to remove outliers couldfurther enhance robustness of the proposed algorithm. Andan efficient prediction technique could help our tracker torelocate the feature quickly and accurately. We aim toaddress these issues in our future work.

Fig. 10. (Color online) Tracking results of a vehicle captured using an IR camera undergoing temporary occlusions. The red windowstands for the results of our proposed tracker. The yellow window means the result is not trusted. The car is tracked by a red dashedwindow.

OPTICAL REVIEW Vol. 21, No. 3 (2014) 311C. PENG et al.

Page 9: Eigenspace-based tracking for feature points

Acknowledgements

This work was supported by General Armament Department Pre-Research Foundation, China (40405050303). We thank the reviewersand editors for their comments and suggestions.

Appendix

For D ¼ ½ dxx dxydyx dyy

�, yield the equation

gTðxÞu ¼ @JðxÞ@x

@JðxÞ@y

� �dxx dxy

dyx dyy

� �x

y

� �þ dx

dy

� �

¼ @JðxÞ@x

xdxx þ @JðxÞ@x

ydxy þ @JðxÞ@x

dx þ @JðxÞ@y

xdyx þ @JðxÞ@y

ydyy þ @JðxÞ@y

dy

¼ hðxÞd;where

hðxÞ ¼ @JðxÞ@x

x@JðxÞ@y

x@JðxÞ@x

y@JðxÞ@y

y@JðxÞ@x

@JðxÞ@y

� �; ðA�1Þ

d ¼ dxx dyx dxy dyy dx dy� �T

: ðA�2ÞSimilarly, for D ¼ ½ dxx �dyx

dyx dxx�, yield the equation

gTðxÞu ¼ @JðxÞ@x

@JðxÞ@y

� �dxx �dyx

dyx dxx

� �x

y

� �þ dx

dy

� �

¼ @JðxÞ@x

xþ @JðxÞ@y

y

dxx þ @JðxÞ

@xdx þ @JðxÞ

@yx� @JðxÞ

@xy

dyx þ @JðxÞ

@ydy

¼ hnsðxÞdns;where

hnsðxÞ ¼ @JðxÞ@x

xþ @JðxÞ@y

y@JðxÞ@y

x� @JðxÞ@x

y@JðxÞ@x

@JðxÞ@y

� �; ðA�3Þ

dns ¼ dxx dyx dx dy� �T

: ðA�4ÞFor D ¼ 0, gTðxÞu ¼ ½ @JðxÞ

@x@JðxÞ@y �½ dxdy � ¼ hptðxÞdpt, where

hptðxÞ ¼ @JðxÞ@x

@JðxÞ@y

� �; ðA�5Þ

dpt ¼ dx dy� �T

: ðA�6Þ

References

1) A. Yilmaz, O. Javed, and M. Shah: ACM Comput. Surv. 38(2006) 13.

2) M. J. Black and A. D. Jepson: Int. J. Comput. Vision 26 (1998)63.

3) J. Shi and C. Tomasi: presented at IEEE Computer SocietyConf. Computer Vision and Pattern Recognition, 1994.

4) C. Tomasi and T. Kanade: Detection and Tracking of PointFeatures (Carnegie Mellon University Press, Pittsburgh, PA,1991) p. 4.

5) B. D. Lucas and T. Kanade: presented at Int. Joint Conf.Artificial Intelligence, 1981.

6) V. Salari and I. K. Sethi: IEEE Trans. Pattern Anal. Mach.Intell. 12 (1990) 87.

7) D. Chetverikov and J. Verestoi: Computing 62 (1999) 321.8) C. Andrey and L. Andrey: presented at IEEE Congr.

Evolutionary Computation, 2009.9) K. Fukunaga: Introduction to Statistical Pattern Recognition

(Elsevier, Amsterdam, 1990) p. 35.10) A. Levy and M. Lindenbaum: presented at IEEE Int. Conf.

Image Processing, 1998.11) D. A. Ross, J. Lim, R. S. Lin, and M. H. Yang: Int. J. Comput.

Vision 77 (2008) 125.12) H. Murase and S. K. Nayar: Int. J. Comput. Vision 14 (1995)

5.13) P. N. Belhumeur and D. J. Kriegman: Int. J. Comput. Vision

28 (1998) 245.14) I. Jolliffe: Principal Component Analysis (Wiley, Hoboken,

NJ, 2005) Chap. 1, p. 2.15) B. Moghaddam and A. Pentland: presented at IEEE Int. Conf.

Computer Vision, 1995.16) F. De La Torre and M. J. Black: Int. J. Comput. Vision 54

(2003) 117.17) M. E. Tipping and C. M. Bishop: J. R. Stat. Soc., B 61 (1999)

611.18) R. Hartley and A. Zisserman: Multiple View Geometry in

Computer Vision (Cambridge University Press, Cambridge,U.K., 2003) 2nd ed., p. 122.

19) M. A. Fischler and R. C. Bolles: Commun. ACM 24 (1981)381.

OPTICAL REVIEW Vol. 21, No. 3 (2014)312 C. PENG et al.