[IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011...

Robust Real-Time Object Tracking Under VaryingIllumination Condition

Deepak Kumar Panda ∗, Sukadev Meher †Department of Electronics and Communication Engineering

National Institute of Technology, Rourkela, [email protected] ∗

[email protected] †

Abstract—An object tracking is simply a problem of finding thedifferent positions of the object in each frame of a video. A novelmethod is proposed here for extracting target objects from realtime video and then tracking them under varying illuminationcondition. It operates on video imagery taken from multiplevideo cameras. Each camera is connected to an infrared sensorand is distributed around the site to get the complete pictureof the suspicious activities of intruders. By using an array ofinfrared sensors, each video camera is made event-driven. Thusa camera starts recording only after being triggered with theoutput of nearby infrared sensor. Thus it becomes a compressedsensing system. Our proposed method uses homomorphic filteringto overcome varying illumination condition. The illuminationcomponent of an image has slow spatial variations. Homomorphicfilter is employed to reject very low frequency components thatusually represent illumination variation.

Index Terms—Video surveillance, smart sensor, sensor net-work, compressed sensing, object tracking.

I. INTRODUCTION

Object tracking [1], [2] consists in estimation of trajectoryof moving objects in the sequence of images. Automatic detec-tion and tracking of moving object is very important task forhuman-computer interface, video communication /expression,and security and surveillance system application and so on.Application such as event detection, human action recognition,and semantic indexing of video are being developed in order toautomate the task of video surveillance. This paper uses visionsystem to monitor activity in a place over extended period oftime. It provides robust mechanism to find out the suspiciousactivities in and around the site and is very beneficial forthe defence people in detecting intruders. Video surveillancecan be used in monitoring the safe custody of crucial data,arms and ammunition in defence establishments. It can providesecurity to key installations and monuments.

It uses infrared sensors [3] to locate and track the directionof the moving object and sends out trigger signal to videocamera for tracking. Video cameras are not always takingthe video imagery of the site, it starts acquisition imageonly when it get trigger signal from the infrared sensor. Inthis way this system uses less data for storing the videoimagery. Infrared sensors are connected to sensor node andit is further connected to gateway as shown in Fig. 1. Inter-connection can be wired or wireless connection. Video surveil-lance cameras are connected to gateway. The gateway is used

�

IR Sensor1 2

� � �

Sensor Node

Gate Way

VideoCamera 1

N

WLAN

Internet

Monitoring System

ProposedAlgorithm

�

��

VideoCamera 2

VideoCamera N

IR Sensor IR Sensor

Fig. 1. Video surveillance cameras, infrared sensor, sensor node and gatewayin sensor network for intelligent tracking

as communication and information processing unit to processthe information obtained from the infrared sensors and videocameras. Gateway is a medium to communicate the videocameras and infrared sensors with the central database system,using internet and WLAN (Wireless Local Area Network).

Every object tracking systems starts with detecting movingobject in video streams. Motion segmentation not only helpsin segmenting moving region from the rest of image, detectingmoving objects is used for recognition, classification andactivity analysis, making these latter steps more efficient.Thedetection of motion in many tracking systems relies on thetecniques of frame differencing and background subtraction.Frame differencing [4], [5] is a pixel-wise differencing be-tween two or three consecutive frames in an image sequence todetect regions corresponding to moving object such as humanand vehicles. The threshold function determine’s change andit depends on the speed of object motion. It’s hard to maintainthe quality of segmentation, if the speed of the object changessignificantly. It is very adaptive to dynamic environments,but does not detect the entire object and very often holesare developed inside moving entities. On the other hand,background subtraction [6]-[8] detects moving regions in animage by taking the difference between the current image andthe reference background image captured from a static back-ground during a period of time. The subtraction leaves only

non-stationary or new objects, which include entire silhouetteregion of an object. Background subtraction can providemore reliable information about moving objects, but it requiremore complex processing for adaptation of the backgroundto changes in illumination condition and extraneous eventetc. Therefore it is highly dependent on a good backgroundmaintenance model.

The major problems encountered in the object tracking arechanges in illumination, complex object shape being trackedand occlusion in case of tracking multiple people. Some ofthe difficulties in tracking moving objects can be summarizedas follows [2].

• Loss of information in projecting the 3-D image to 2-Dplane,

• Noise in image video resulting in loss of information,• Difficult in finding the exact position of moving object

in each frame,• Partial and full object occlusions,• Complex object shapes,• Unobstructed view of background is not available,• Motion in the background,• Scene illumination changes,• Need for real-time processing.

In this paper, a robust object tracking algorithm is pro-posed to overcome the problem of illumination variation. Itis proposed to detect and track a moving object using framedifferencing. Frame differencing method is one of the simplestmethods to detect and track objects for real time processing.

There are three key steps in video analysis:

• Detection of suspicious moving objects• Tracking of objects from frame to frame• Analysis of object tracking to recognize their behaviour

In object detection methodology, many researchers havedeveloped their methods. Changes in scene lightning can causeproblem for object detection. Stauffer and Grimson [7] havemodeled each pixel as a mixture of Gaussians and uses an on-line approximations to update the model. This can deal withlightning changes, motions in the background, and from thelong term scene changes. Maddalena and Petrosino [8] haveproposed SOBS based on self organization through artificialneural networks that can handle background clutter, gradualillumination variations and camouflage, has no bootstrappinglimitations, overcomes the problem of shadows cast by movingobjects, and achieves robust detection for different types ofvideos taken from still cameras. Toyoma et al. [9] havediscused the problems of changing illumination, backgroundclutter, camouflage and shadows, using the proposed three-component system for background maintenance: the pixel levelcomponent, the region-level component and the frame-levelcomponent.

In this paper, an robust object tracking that can handle sceneillumination changes is proposed. The rest of this paper is or-ganized as follows. In the next section, the proposed algorithmis described in detail. Section III gives the experimental setupand then, the experimental results are described in Section IV.

HomomorphicFiltering

GammaCorrection

FrameDifferencing

ObjectRepresentation

ImageLabeling

MorphologicalOperation

Input

Output

Video

Video

Fig. 2. Steps in object tracking under varying illumination condition

ln DFT H(u,v) (DFT) exp-1f(x,y) g(x,y)

Fig. 3. Homomorphic filtering

Finally, Section V concludes this paper.

II. PROPOSED METHOD

Object detection and tracking in each frame of the video isperformed by a six stage process shown in Fig.2.

The various sub-processes are described below.

A. Homomorphic Filtering

The input video is decomposed into frames f(x, y, n) andeach frame is converted into grayscale image. Here x, y are thespatial coordinates and n is the frame number that representsdiscrete time. A video image may be modelled as :

f(x, y, n) = i(x, y, n)× r(x, y, n) (1)

where i(x, y, n) and r(x, y, n) denote the illumination andreflectance components respectively. The nature of i(x, y, n)is determined by the illumination source and r(x, y, n) isdetermined by the characteristics of the object to be imaged.The parameters will have a range of:

0 < i(x, y, n) < ∞ (2)

and0 < r(x, y, n) < 1 (3)

To overcome the problem of varying illumination condition,homomorphic filtering [10] is employed here. This filteringprocess is depicted in Fig. 3. The process is performed in logdomain and hence a log-transformation at the beginning andan exponentiation at the end are a must. A definite high-passfilter, characterized by its frequency-domain transfer-functionH(u, v),is employed to reject very low frequency componentsthat usually represent illumination variation. To perform thisfrequency-domain filtering, signal is transformed to frequency-domain. Hence the DFT and inverse-DFT processes are em-ployed as shown in the figure.

The illumination component of an image has slow spa-tial variations,while the reflectance component tends to varyrapidly. The problem with low cost surveillance camera is thattheir video imagery gets affected by the changing illumination

condition. A good control over the illumination and reflectancecomponents can be done with homomorphic filter.

Homomorphic filter function H(u, v, n) is given by

H(u, v, n) = (γH − γL)[1− e−c[D2(u,v,n)/D20 ]] + γL (4)

where

D(u, v, n) = [(u− P/2)2 + (v −Q/2)2]1/2 (5)

Here u and v denote frequency-domain variables and typicallyP = 2M and Q = 2N are chosen, for frame size of M ×N .Constant c controls the sharpness of the slope of the functionas it makes transition between γL and γH and D0 is the cut-off frequency.γL < 1 and γH > 1 are chosen to attenuate the contributionmade by the low frequencies (illumination) and amplify thecontribution made by high frequencies (reflectance). As aresult dynamic range is compressed and contrast is enhanced.

B. Gamma Correction

Gamma correction [10] is a nonlinear operation used to codeand decode luminance in video or still image systems. Gammacorrection is defined by the following power-law expression:

s = crγ (6)

where c and γ are positive constants. A gamma value γ < 1is sometimes called an encoding gamma, and the processof encoding with this compressive power-law nonlinearityis called gamma compression. Conversely a gamma valueγ > 1 is called a decoding gamma and the application of theexpansive power-law nonlinearity is called gamma expansion.

C. Frame Differencing

Frame differencing [4] is a pixel-wise difference betweentwo consecutive frame. Each current frame is subtracted fromthe previous frame to detect the moving object. This is beingused to detect regions corresponding to moving object such ashumans and vehicles. Frame differencing is very adaptive tochanging environments, but very often holes are left insidemoving entities. It depends on good threshold to segmentmoving foreground from the background. Threshold T shouldbe judiciously selected. If the difference is greater than thethreshold T , then the value is considered to be a part ofthe moving object; otherwise, it is considered to be thebackground. Here a threshold T is chosen based on Otsu’s[11] method. Otsu’s method is optimum in the sense that itmaximizes the between-class variance of the background andforeground.

D (x, y, n) =

⎧⎨⎩

1, if |f(x, y, n)− f(x, y, n− 1)| ≥ T

0, otherwise(7)

It works only when there is no camera motion, the movingobject is not stationary and it is not occluded.

D. Morphological Operation

Morphological operations apply a structuring element to aninput image, creating an output image of the same size. Mor-phological operation is performed to fill small gaps inside themoving object and to reduce the noise remained in the movingobjects [12]. The morphological operators implemented aredilation followed by erosion. In dilation, each backgroundpixel that is touching an object pixel is changed into an objectpixel. Dilation adds pixels to the boundary of the object andcloses isolated background pixel. Dilation [10] of set A bystructuring element B is defined as :

A⊕B =⋃b∈B

(A)b (8)

In erosion, each object pixel that is touching a backgroundpixel is changed into a background pixel. Erosion removesisolated foreground pixels. Erosion [10] of set A by structuringelement B is defined as:

A�B =⋂b∈B

(A)−b (9)

The number of pixels added or removed from the objectsin an image depends on the size and shape of the structuringelement used to process the image. Morphological operationeliminates background noise and fills small gaps inside anobject. This property makes it well suited to our objectivesince we are interested in generating masks which preserve theobject boundary. There is no fixed limit on the number of timesdilation and erosion is performed. In the proposed algorithmdilation and erosion is used iteratively till the foregroundobject is completely segmented from the background.

E. Image Labeling

After foreground region detection, a binary connected com-ponent analysis is applied to the foreground pixels to assign aunique label to each foreground object. Connected componentlabeling is performed to label each moving object emergingin the background. The connected component labeling [10]groups the pixels into components based on pixel connectivity.Connected component labeling is done by comparing the pixelwith the pixel in four neighbors. If the pixel has at least oneneighbor with the same label, this pixel is labeled as same asneighbors label.

F. Object Representation

Once morphological operations are over, the detected fore-ground object is fully visible from the background and thereis less chance of misdetection of object. The segmentedobject is represented through centroid and rectangular shape toenvelope the object. Following formulae are used to determinea centroid

Cx(n) = l(n)/2 (10)

Cy(n) = b(n)/2 (11)

where l(n) and b(n) are derived as follows:

l(n) = x(n)max − x(n)min (12)

b(n) = y(n)max − y(n)min (13)

Here l(n) and b(n) are length and breadth of the rectangularregion that describes the detected foreground and x(n)max

,x(n)min , y(n)max , x(n)min are the maximum and minimumspatial coordinates of the detected foreground region.

III. EXPERIMENTAL SETUP

The proposed algorithm has been implemented with mul-tiple video cameras. In real-time processing there is everychance that moving object view can be lost. So to avoid objectocclusion and to get the complete depth analysis of the movingobject, multiple camera detection scheme is deployed.

The proposed algorithm is implemented on anIntel R©CoreTM2 Duo 2.40GHz on MATLAB R©underMicrosoft R©Windows 7TM. Video is shot from webcamsof resolution (640× 480), 20 frames per second and total of100 frames are taken.

IV. EXPERIMENTAL RESULTS

The algorithm can detect and track moving object and istested for all kinds of varied illumination conditions takenunder indoor and outdoor environments. Fig. 4. Shows aperson moving in the corridor and is subjected to the varyingillumination conditions. Using hommorphic filtering process,the change in illumination is sufficiently reduced and theproposed algorithm is able to perform detection and tracking ofthe person. In Fig. 4(d). centroid is plotted against the number

(a) (b)

(c)

0 20 40 60 80 1000

100

200

300

400

500

600

frame number

valu

es o

f centr

oid

x axis centroidy axis centroid

(d)

Fig. 4. Tracking object from input video under changing illuminationcondition. (a)Input Video for Object Tracking. (b)Object Tracking afterMorphological Operation. (c)Tracking Object in Input Video (d)Centroidcoordinates vs number of frames

of frames. The curve clearly shows that the algorithm is ableto track the person effectively in each frame. The effectivenessof the proposed scheme is demonstrated with tracking videogiven in Fig. 5. The simulation results obtained for trackingthe person under bright and dim light conditions are shownin figures (a) and (b) respectively. The rate of miss-detectionand false detection is very less even under a large change inillumination.

V. CONCLUSION

Here we have proposed a novel method for multi-cameradetection and tracking using smart sensor network. The algo-rithm works fine with all kind of video data set taken. Thesystem uses no color information and works on the grayscalevideo imagery. Each video camera is connected to a separatesensor and it starts acquisition only when the sensor givestrigger signal to the video camera. So this system requiresless data storage for storing the video files in the memory.

The proposed system is implemented using low-cost web-cams. Thus our system is a very cost effective as well as arobust and efficient object tracking system.

The proposed method is tested and demonstrated to performwell with 2-D imagery. The limitation of the scheme is its in-ability to perfectly judge the distance between the backgroundand the foreground. We have future plan to implement its 3-Dversions with multi-camera stereovision imagery to overcomesuch limitation.

REFERENCES

[1] Hanna Goszczynska, Object Tracking, Intech publication.[2] A. Yilmaz, O. Javed, and M. Shah, ”Object Tracking: A Survey”

ACM Comput. Surv. 38, 4, Article 13 (Dec. 2006), 45 pages.DOI=10.1145/1177352.1177355.

[3] C. Zhang, J. Wu, and G. Tu, ”Object Tracking and QOS Controlusing Infrared Sensor and Video Cameras,” In Proc. IEEE InternationalConference on Networking, Sensing and Control, pp. 974-979, 2006.

[4] A.J. Lipton, H. Fujiyoshi, and R.S. Patil, ”Moving Target Classificationand Tracking from Real-Time Video,” In Proc. Fourth IEEE Workshopon Applications of Computer Vision, pp. 8-14, 1998.

[5] R. Collins, A. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin,D. Tolliver, N. Enamoto, and Hasegawa, ”System for Video Surveil-lance and monitoring, Technical Report CMU-RI-TR-00-12,” RoboticsInstitute, Carneige Mellon University, 2000.

[6] I. Haritaoglu, D. Harwood and L.S. Davis, ”W4: Who? When? Where?What? A Real Time System for Detecting and Tracking People,” In Proc.Third IEEE International Conference on Automatic Face and GestureRecognition, pp. 222-227, 1998.

[7] C. Stauffer, and W.E.L. Grimson,”Learning Patterns of Activity UsingReal-Time Tracking,” IEEE Trans. Pattern Analysis and Machine Intel-ligence, pp. 747-757, 2000.

[8] L. Maddalena, and A. Petrosino, ”A Self-Organizing Approach toBackground Subtraction for Visual Surveillance Applications,” IEEETransactions on Image Processing, Volume 17, Issue 7, pages 1168-1177, July 2008.

[9] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, ”Wallflower: Princi-ples and Practice of Background Maintenance,” In Proc. Seventh IEEEInternational Conference on Computer Vision, pp. 255-261, 1999.

[10] Rafael Gonzalez and Richard Woods, Digital Image Processing, PearsonPublications.

[11] Otsu, N., ”A Threshold Selection Method from Gray-Level Histograms,”IEEE Transactions on Systems, Man, and Cybernetics, Vol. 9, No. 1,1979, pp. 62-66.

[12] B. Sugandi, H.S. Kim, J.K. Tan, and S. Ishikawa, ”Tracking of MovingObject by using Low Resolution Image,” In Proc. International Confer-ence on Innovative Computive, Information and Control, pp. 408, 2007.

(a)

(b)

Fig. 5. Tracking object from input video under changing illumination conditions. (a) Person moving in the corridor (Outdoor environment that is bright lightcondition. (b) Person moving inside the laboratory (Indoor environment that is dim light condition)

[IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011...

Documents

Transcript of [IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011...