[IEEE 2011 International Symposium on Computer Science and Society (ISCCS) - Kota Kinabalu, Malaysia...

4
Real-Time Visual Tracking Using a New Weight Distribution Hua Shi, Cuihua Li, Taisong Jin School of Information Science and Technology, Xiamen University, Xiamen 361005, Fujian, P.R.China mailto:[email protected] Abstract—This paper presents a real-time visual tracking algorithm which uses a new weight distribution for color space. Firstly, first-order Kalman filter model is introduced to update video backgrounds and obtain the targets. HSV color space is used to measure the similarity between the supposed targets and match targets. In this process, a weighting function based on pixel confidence and pixel position is proposed to weigh the pixel values in the rectangle area of tracking. The experimental results show that the algorithm is robust to scale invariant, partial occlusion and interactions of non-rigid objects, especially similar objects. The proposed algorithm is computationally efficient and it can satisfy the real-time requirements for visual tracking. Keywords-Visual tracking; Weight distribution; Kalman filter; Partial occlusion I. INTRODUCTION Visual tracking is a critical step in many computer vision applications such as video surveillance and image understanding system. The essence of the tracking is to predict hypothetical states depending on the current state of the objects, and the main challenges of visual tracking can be attributed to illumination change, shape deformation of a target object, and occlusions etc. Nowadays many researchers focus on the visual analysis and visual tracking becomes more and more important as the key technology. With the development of pattern recognition and machine learning, some algorithms based on statistical learning are proposed. In [1], the Kalman filter is introduced to estimate the internal state of a linear dynamic system from a series of noisy measurement, and it can deal properly with linear tracking problem. Dorin Comaniciu et al. [2] proposed a new method for real-time tracking of non-rigid objects. In the method, the mean shift iterations is used to compute the center module and the most probable target position is found in the current frame. Allen et al. [3] developed the mean shift algorithm and explored CamShift (continuously adaptive mean shift) algorithm, which is an adaptation of the Mean Shift algorithm for object tracking. In [4], particle filtering has proven very successful for non-linear and non-Gaussian estimation problem. For tracking of non-rigid objects, the algorithm is robust. David Lowe [5] proposed the scale- invariant feature transform (SIFT) method, which usually involved fine scale selection, rotation correction, and intensity normalization. The method can be applied for visual tracking-by-detection. In [6], an effective online algorithm is presented, and the tracking problem is formulated as a state inference problem within a Markov Chain Monte Carlo framework and a particle filter is incorporated for propagating sample distributions over time. In [7-9], tracking is considered as a binary classification problem, where an ensemble of weak classifiers or tracking features is trained on-line to distinguish between the object and the background. In [10], by adapting a class-specific object detector to the target, the target can be separated from the background and other instances. Considering the stationary video camera, we employ the moving tracking algorithm based on a color model distribution to perform the multi-object tracking. Firstly, the moving objects are detected accurately, and then the color distribution model is established for the detected area. In order to improve the computational efficiency, a weighting function based on rectangle area is proposed, which takes into account both pixel confidence and pixel position. Our goal is to create a robust, adaptive tracking system to track the non-rigid object, so the color distribution model of foreground object is updated to adapt to this change. The paper is organized as follows: In Section 2 we briefly describe the moving object detection, and in Section 3 the algorithm of the moving object is presented and a weight distribution will be established. Section 4 presents some experimental results and comparison. In Section 5, we summarize our conclusions. II. MOVING OBJECT DETECTION Before tracking, the tracked target must be labeled (it can also be finished by manual initialization). In this paper, background subtraction methods [11] are exploited for moving object detection in video sequences. In the first stage, a pixel-wise median filter over time is applied to the N consecutive images during the training period to construct the initial background model. Let ) ( x and ) ( x be the median value and standard deviation of intensities at pixel location x in N consecutive images. Then the initial background model ) (x B for a pixel location x is obtained as follows: ) ( ) ( x x B . However, the background model cannot be expected to remain the same for long period of time. There could be illumination changes, moving scene clutter, multiple moving objects, and other arbitrary changes to the observed scene. To handle these outside changes and track the moving objects, a robust, adaptive background model must be created. We will use the first-order Kalman filter model [1] to update the background. The model updating equation is: ) ( ) ( ) 1 ( ) ( 1 x I x B x B t t t (1) 2 2 2 1 () (1 ) () (() ( )) t t t t x x Ix Bx (2) Where, ) ( x B t and ) ( x t stand for background value and standard deviation at the location x at time t, ) ( x I t is 2011 International Symposium on Computer Science and Society 978-0-7695-4443-4/11 $26.00 © 2011 IEEE DOI 10.1109/ISCCS.2011.47 146

Transcript of [IEEE 2011 International Symposium on Computer Science and Society (ISCCS) - Kota Kinabalu, Malaysia...

Page 1: [IEEE 2011 International Symposium on Computer Science and Society (ISCCS) - Kota Kinabalu, Malaysia (2011.07.16-2011.07.17)] 2011 International Symposium on Computer Science and Society

Real-Time Visual Tracking Using a New Weight Distribution

Hua Shi, Cuihua Li, Taisong Jin School of Information Science and Technology,

Xiamen University, Xiamen 361005, Fujian, P.R.China

mailto:[email protected]

Abstract—This paper presents a real-time visual tracking algorithm which uses a new weight distribution for color space. Firstly, first-order Kalman filter model is introduced to update video backgrounds and obtain the targets. HSV color space is used to measure the similarity between the supposed targets and match targets. In this process, a weighting function based on pixel confidence and pixel position is proposed to weigh the pixel values in the rectangle area of tracking. The experimental results show that the algorithm is robust to scale invariant, partial occlusion and interactions of non-rigid objects, especially similar objects. The proposed algorithm is computationally efficient and it can satisfy the real-time requirements for visual tracking.

Keywords-Visual tracking; Weight distribution; Kalman filter; Partial occlusion

I. INTRODUCTION

Visual tracking is a critical step in many computer vision applications such as video surveillance and image understanding system. The essence of the tracking is to predict hypothetical states depending on the current state of the objects, and the main challenges of visual tracking can be attributed to illumination change, shape deformation of a target object, and occlusions etc. Nowadays many researchers focus on the visual analysis and visual tracking becomes more and more important as the key technology.

With the development of pattern recognition and machine learning, some algorithms based on statistical learning are proposed. In [1], the Kalman filter is introduced to estimate the internal state of a linear dynamic system from a series of noisy measurement, and it can deal properly with linear tracking problem. Dorin Comaniciu et al. [2] proposed a new method for real-time tracking of non-rigid objects. In the method, the mean shift iterations is used to compute the center module and the most probable target position is found in the current frame. Allen et al. [3] developed the mean shift algorithm and explored CamShift (continuously adaptive mean shift) algorithm, which is an adaptation of the Mean Shift algorithm for object tracking. In [4], particle filtering has proven very successful for non-linear and non-Gaussian estimation problem. For tracking of non-rigid objects, the algorithm is robust. David Lowe [5] proposed the scale-invariant feature transform (SIFT) method, which usually involved fine scale selection, rotation correction, and intensity normalization. The method can be applied for visual tracking-by-detection. In [6], an effective online algorithm is presented, and the tracking problem is formulated as a state inference problem within a Markov Chain Monte Carlo framework and a particle filter is incorporated for propagating sample distributions over time. In [7-9], tracking is considered as a binary

classification problem, where an ensemble of weak classifiers or tracking features is trained on-line to distinguish between the object and the background. In [10], by adapting a class-specific object detector to the target, the target can be separated from the background and other instances.

Considering the stationary video camera, we employ the moving tracking algorithm based on a color model distribution to perform the multi-object tracking. Firstly, the moving objects are detected accurately, and then the color distribution model is established for the detected area. In order to improve the computational efficiency, a weighting function based on rectangle area is proposed, which takes into account both pixel confidence and pixel position. Our goal is to create a robust, adaptive tracking system to track the non-rigid object, so the color distribution model of foreground object is updated to adapt to this change.

The paper is organized as follows: In Section 2 we briefly describe the moving object detection, and in Section 3 the algorithm of the moving object is presented and a weight distribution will be established. Section 4 presents some experimental results and comparison. In Section 5, we summarize our conclusions.

II. MOVING OBJECT DETECTION

Before tracking, the tracked target must be labeled (it can also be finished by manual initialization). In this paper, background subtraction methods [11] are exploited for moving object detection in video sequences.

In the first stage, a pixel-wise median filter over time is applied to the N consecutive images during the training period to construct the initial background model. Let

)(x� and )(x� be the median value and standard deviation of intensities at pixel location x in N consecutive images. Then the initial background model )(xB for a pixel location x is obtained as follows: )()( xxB �� .

However, the background model cannot be expected to remain the same for long period of time.

There could be illumination changes, moving scene clutter, multiple moving objects, and other arbitrary changes to the observed scene. To handle these outside changes and track the moving objects, a robust, adaptive background model must be created.

We will use the first-order Kalman filter model [1] to update the background. The model updating equation is:

)()()1()(1 xIxBxB ttt �� ���� (1) 2 2 2

1( ) (1 ) ( ) ( ( ) ( ))t t t tx x I x B x� � � �� � � � � (2) Where, )( xB t

and )(xt� stand for background value and standard deviation at the location x at time t, )(xI t is

2011 International Symposium on Computer Science and Society

978-0-7695-4443-4/11 $26.00 © 2011 IEEE

DOI 10.1109/ISCCS.2011.47

146

Page 2: [IEEE 2011 International Symposium on Computer Science and Society (ISCCS) - Kota Kinabalu, Malaysia (2011.07.16-2011.07.17)] 2011 International Symposium on Computer Science and Society

the pixel value of the current frame in the sequences at the location x at time t, and � is updating parameter.

According to the equations above, after the background model is created, the difference image )(xDt between

present frame )(xIt and background model )(xBt can be obtained, which can distinguish the foreground target from the background model. )(xDt is very useful in latter sections and is defined by:

))()(()( xBxIabsxD ttt �� (3) Depending on the current difference image, we can get

a binary mask image )(xM , if )(xM is equal to 1, x is a foreground pixel, otherwise it is a background pixel. This method can be formulated as follows:

1 ( ) 2.5 ( )( )

0t tD x x

M xotherwise

��� �

(4)

However, by threshold, it is not sufficient to obtain clear foreground regions. When the difference between the pixel value of foreground region and that of background region is very small, tiny cavity or rupture may occur on the foreground region. Hence, the post-processing should be implemented. Firstly, dilation operator is employed to deal with these cavities or ruptures, then we use region-based noise cleaning to eliminate noise regions, if the number of the pixels in a separated foreground region is less than 20, this region will be eliminated; Finally, the object of interest ill be enclosed by rectangle.

III. MOVING OBJECT TRACKING

In this paper, a color distribution model is used in moving object tracking. In general, the RGB space is not capable of color perception in the vision, and the space distance between two color points is unable to indicate the perception similarity. So the HSV space model is applied to object tracking. Each pixel is described with 3-D model as TiViSiHiI )](),(),([)( � , where, H is hue, S is saturation and V is the pixel value.

A. Weight Distribution Model Suppose that the distributions are discredited into m-

bins. In our experiments, we use 8×8×8 bins (it could be used instead with less sensitivity to V, so it is 8×8×4 bins). h(i) is the histogram function that assigns the color at location x to the corresponding bins.

In order to reduce the computational complexity, we make use of the rectangle area which is given as � ),(),,(),,( yx HHyxyxS ��� . Where, ),( yx is the

coordinate of the current pixel, ),( yx �� is the acceleration of the pixel, and ),( yx HH is the length of the half axes of the rectangle. In our experiments, our goal is to track people, so is limited in ),( yx HH .

During the tracking, a weighting function is proposed to allocate the reliable pixels for heavy weights.

2( ) (1 )* ( )iw i r D i� � (5)

m ax { , }1 .1 * 1 .1 *i

x y

x yrH H

� �� (6)

log( ( ) ) ( )( )

0t tD i THRE if D i THRE

D iotherwise

� �� �

(7)

Where ir is the normalized distance, x� and y� arehorizontal distance and vertical distance from pixel i to the rectangle center, )(iDt is given by Equation (3). In Equation(5), the first factor )1( 2

ir� can depress the weight of the boundary pixel since it might belong to the background or get occluded; and the second factor )(iD is related to the pixel of the difference image, the larger the value of the difference image pixel is, the larger the probability belonging to the foreground is. In our experiments, is noise threshold which is set to 8~16, and it can be set to )(5.2~)(5.1 xx tt �� improve the

adaptability of the algorithm, here, )(xt� is given by Equation (2).

According to the weighting function above, the color distribution is calculated as

� �� ��

��I

i

u uihiwf

p1

)( )()(1 � (8)

wheref1 is the normalization factor and ,

��

�I

iiwf

1)( is the number of pixels in the region, is

the Kronecker delta function. In order to measure the similarity between the

estimated state and current state, Bhattacharyya coefficient is introduced [12]. The hypothesized state of the target is predicted in the neighborhood of the previous frame due to the rate of the pace. A 7×7 module is adopted and a target scaling is taken into account by calculating the Bhattacharyya coefficient for three different sizes (same scale, ±5% change) and choosing the size which gives the highest similarity to the target model. Considering discrete densities such as our color histograms

� muupp ,...2,1

)(�� and � mu

uqq ,...2,1)(

�� , the coefficient is defined as [4]:

� � ��

�m

u

uu qpqp1

)()(,� (9)

The coefficient � indicates the similarity of the color distributions. For two identical normalized histograms we obtain � 1 indicating a perfect match. So the state with the large coefficient is the best match state, which is considered as the target state for the latter frame.

B. Updating Background Model The color model of non-rigid object cannot be expected

to remain the same for long period of time, even it could change in a short time, and therefore we must update the target model. While at the same time, if the target is occluded, the color distribution must keep steady. Therefore the color distribution is computed as

( ) ( )( ) 1 max( )

( )1

(1 ) (max( ) 1)u uu t

t ut

q P if Tq

q otherwise�� � ��

� � �� �

(10)

where, tq is a target model at time t, whose initial value is obtained by the detection in Section 2. m ax( )P �

is the optimal predicted state at time t, )max(� is corresponding match ratio, that is the optimal match ratio. T1 is a

147

Page 3: [IEEE 2011 International Symposium on Computer Science and Society (ISCCS) - Kota Kinabalu, Malaysia (2011.07.16-2011.07.17)] 2011 International Symposium on Computer Science and Society

threshold which is set to 0.97 in our experiments, and � is updating parameter which is set to 0.01~0.05.

IV. EXPRIMENTAL RESULTS

In order to evaluate the performance of the algorithm, some experiments have been carried out. The image sequences showed in Fig.1 and Fig.2 are captured video with 320×240 resolution, and these sequences are captured by a digital video camera fixed on a tripod. Fig.3 and Fig.4 are standard MPEG-4 video sequences. In the beginning of tracking, we suppose that a person is separated from the others without clutter or partial occlusion. The proposed method has been applied to these sequences to track those persons marked by the detected rectangles.

The tracking results are presented in the football sequence in Fig.1. During the tracking, the edge of the image is considered as the detection region and the system will trace a new target as soon as it is detected. The occlusion and overlapping occur twice in the whole video sequences. Fig.1 shows that the tracking performance is remarkable when two objects are discrepant in the color distributions.

Another example of object intersection is shown in Fig.2, from which we can see that the difference of color distribution between the person labeled by red rectangle and the person labeled by blue rectangle is very small, but the system works well.

A typical example is shown in Hall-monitor sequence in Fig.3, demonstrating the capability of the tracker to adapt to shape transformation and scale change.

Fig.1 Football Sequence 1: The color distributions are discrepant. The frames 221, 242, 249, 255, 273,279, 285, 343 are shown (left-right, top-down).

Fig.4 presents four frames from Laboratory Sequence showing the tracking of a person from entering the laboratory till leaving. The system begins tracking after some distinctive part (for example: the head) of the target is detected. In frame 311, the background model changes suddenly and the difference between the head and background becomes very small when the person in the scene opens the cabinet. Although the tracking result in Frame 311 has some deviation from actual state, it is satisfying due to the similarity of the color distributions. This character of local tracking is particularly effective to the targets which have distinctive local color character.

Comparing with mean shift algorithm [9], the proposed algorithm is computationally efficient and avoids

computing complex weights for every predicted sample; in addition, it can track the targets accurately. Particle filtering [5] is one of the most distinctive approaches based on sequential Monte Carlo method. Comparing with particle filtering which employs recursive sample to approach the complex posterior probability distribution, our approach has a most obvious advantage of keeping the track of the initial target when there are several similar targets during the process of tracking. Because target state is predicted in the neighborhood of the target in the preceding frame, and the weight of the pixel in the tracking regions depends on pixel confidence and pixel location, it can track the similar targets. An example of tracking similar targets is shown in Fig. 2.

Fig.2 Football Sequence 2: The color distributions are similar. The frames 305, 317, 349, 368 are shown.

148

Page 4: [IEEE 2011 International Symposium on Computer Science and Society (ISCCS) - Kota Kinabalu, Malaysia (2011.07.16-2011.07.17)] 2011 International Symposium on Computer Science and Society

Fig. 3 Hall-Monitor sequence: The frames 1, 39, 99, 159 are shown.

Fig. 4 Laboratory sequence: Tracking the head of the person. The frames 280, 293, 311, 372 are shown.

V. CONCLUSIONS

This paper proposes a visual tracking algorithm which uses a new weight distribution for color space. HSV model has the advantage of color perception, and is employed to measure the similarity of several weight distributions. Considering the occlusion and overlapping, a weighting function based on pixel confidence and pixel position is proposed to search for the optimal match. When targets have occlusion but not full overlapping, if the target colors are very similar or several targets have similar color distributions, the system can work well. The experimental results show that the tracking algorithm is computationally efficient and robust to scale invariant, partial occlusion and interactions of non-rigid objects.

ACKNOWLEDGMENT

The work was supported by the National Program on Basic Research Project of China(NO.B1420110155), the Fundamental Research Funds for the Central Universities (No.2010121066 and No.2010121067), and Natural Science Foundation of China under Grant (No.61001013)

REFERENCES

[1] G. Welch and G. Bishop, “An introduction to the Kalman filter,” University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-3175, 2001.

[2] D. Comaniciu, V. Ramesh, and P. Meer, “Real-time tracking of non-rigid objects using mean shift,” IEEE Conference on

Computer Vision and Pattern Recognition(CVPR'00), pp.142-149, 2000.

[3] J. G. Allen, R. Y. D. Xu, and J. S. Jin, “Object Tracking Using CamShift Algorithm and Multiple Quantized Feature Spaces”,

[4] K. Nummiaro, E. K. Meier, and L. V. Gool, “An adaptive color-based particle filter,” Image and Vision Computing, Vol. 21, No. 1, pp. 99-110, 2003.

[5] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, Vol. 60, No. 2, pp. 91-110, 2004.

[6] J. Lim, D. Ross, R. Lin, and M. Yang, “Incremental Learning for Visual Tracking,” In Advances in Neural Information Processing Systems, pp.793-800, 2004.

[7] S. Avidan, “Support Vector Tracking,” IEEE Transations on Pattern Analysis and Machine Intelligence, Vol. 26, No. 8, pp. 1064-1072, 2004.

[8] S. Avidan, “Ensemble Tracking,” IEEE Conference on Computer Vision and Pattern Recognition(CVPR'05), Vol. 2, pp, 1-8, 2005.

[9] T. R. Collins, Y. Liu, and M. Leordeanu, “Online Selection of Discriminative Tracking Features,” IEEE Transations on Pattern Analysis and Machine Intelligence, Vol. 27, No.10, pp. 1631-1643, 2005.

[10] J. Gall, N. Razavi, and L. V. Gool, “On-line Adaption of Class-specfic codebooks for Instance Tracking”, British Machine Vision Conference, Vol.1, pp. 1-12, 2010.

[11] E. Stringa and C. S. Regazzoni, “Real-Time Video-Shot Detection for Scene Surveillance Applications,” IEEE Transactions On Image Processing, Vol. 9, No.1, pp. 69-79, 2000.

[12] F. Aherne, N. Thacker and P. Rockett, “The Bhattacharyya metric as an absolute similarity measure for frequency coded data,” Kybernetika, Vol.34, No. 4, pp.363-368, 1998.

149