[IEEE 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC...

5
Proceedings ofIC-NID C2012 HIERARCHICAL CODEBOOK BACKGROUND MODEL USING HAAR-LIKE FEATURES Pengxiang Zhao l , Yanyun Zhao l ,2, Anni Cai l ,2 'Multimedia Commnunication and Pattern Recognition Laboratory 2Beijing Key Laboratory of Network System and Network Culture Beijing University of Posts and Telecommunications, Beijing 100876, China [email protected], [email protected], [email protected] Abstract: Background subtraction is one of the most popular methods to detect moving objects in videos. In this paper, we propose an efficient hierarchical background subtraction method with block-based and pixel-based codebooks (CBs) using haar-like features for foreground detection. In the block-based stage, four haar-like features and a block average value, which can be calculated rapidly using integral image and are not sensitive to dynamic background, are used to represent a block. Through the block-based stage we can remove most of the background without reducing the true positive rate. To overcome the low precision problem in the block-based stage, the pixel-based stage is adopted to increase the precision. Experiment results show that our approach can provide faster computation speed compared with that of the present related approaches meanwhile, ensure a high correct detection rate. Keywords: Hierarchical codebook; Haar-like features; Background subtraction; Foregrounds detection. 1 Introduction Moving object Detection is often the first task in many video-based applications, such as object classification, tracking and action recognition. Therefore, its performance can have a great impact on the performance of higher level tasks. Among various methods used to detect moving objects, background subtraction is the most popular one. The basic idea of background subtraction is that the moving objects (foreground) are regarded as the difference between the current frame and the scene's statistical static parts (background). However, there are many challenges in developing a good background subtraction algorithm [1]. First, it must be robust against changes in illumination (gradual or sudden). Second, it should avoid foreground misdetections caused by non-stationary background objects that may be waving leaves, fluttering flags, ripple water etc. Finally, the background model should react quickly to changes in the background such as starting and stopping of vehicles. A large number of background subtraction methods have been proposed in the last few years [2-4]. Meanwhile, many different features are utilized for modeling the background. Most of these methods use 978-1-4673-2204-1/12/$31.00 ©2012 IEEE 438 only the pixel color or intensity information as the feature. A few studies [5-7] have also utilized discriminative texture features in dealing with the problem. However, these methods are time-consuming due to use of the texture features in the pixel level. In this paper, we propose a hierarchical codebook model for foreground detection using haar-like features. By employing block-based and pixel-based CBs to model the background we can implement high-efficient foreground detection. Using the block-based background model efficient foreground detection is achieved without reducing the true positive rate, though the precision is rather low. Then a pixel-based background model is introduced to increase the precision while maintaining high speed of the algorithm. One specific problem for background subtraction is permanent changes of background geometry such as the starting and stopping of the vehicles. A particular parameter is employed in the background model to solve this problem. As shown in the experimental results, the proposed method can effectively detect foreground with a fast computation speed and high correct detection rate even in non-stationary background environment. The rest of this paper is organized as follows. In Section 2, we briefly review the related works. The detail information of model initialization and foreground detection are introduced respectively in Section 3 and Section 4. Experiment results are shown in Section 5 and Section 6 gives the conclusions. 2 Previous work Among various background subtraction methods, the simplest one used to detect foreground is the (weighted) running average [8], which describes background model using an average of gray value or color intensities at each pixel. More advanced background modeling methods like single Gaussian model [9-11] are density based, where the background model for each pixel is defined by a probability density function. All the above techniques are single-mode in pdf, which can lead a satisfactory result in stationary environments. However, more sophisticated method is necessary when dealing with videos captured in complex environments where dynamic background, camera ego-motion, and high sensor noise are encountered.

Transcript of [IEEE 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC...

Page 1: [IEEE 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC 2012) - Beijing, China (2012.09.21-2012.09.23)] 2012 3rd IEEE International Conference

Proceedings ofIC-NIDC2012

HIERARCHICAL CODEBOOK BACKGROUND MODELUSING HAAR-LIKE FEATURES

Pengxiang Zhaol , Yanyun Zhaol ,2, Anni Cail ,2

'Multimedia Commnunication and Pattern Recognition Laboratory2Beijing Key Laboratory ofNetwork System and Network Culture

Beijing University of Posts and Telecommunications, Beijing 100876, [email protected], [email protected], [email protected]

Abstract: Background subtraction is one of the mostpopular methods to detect moving objects in videos. Inthis paper, we propose an efficient hierarchicalbackground subtraction method with block-based andpixel-based codebooks (CBs) using haar-like featuresfor foreground detection. In the block-based stage, fourhaar-like features and a block average value, which canbe calculated rapidly using integral image and are notsensitive to dynamic background, are used to represent ablock. Through the block-based stage we can removemost of the background without reducing the truepositive rate. To overcome the low precision problem inthe block -based stage, the pixel-based stage is adoptedto increase the precision. Experiment results show thatour approach can provide faster computation speedcompared with that of the present related approachesmeanwhile, ensure a high correct detection rate.

Keywords: Hierarchical codebook; Haar-like features;Background subtraction; Foregrounds detection.

1 Introduction

Moving object Detection is often the first task in manyvideo-based applications, such as object classification,tracking and action recognition. Therefore, itsperformance can have a great impact on theperformance of higher level tasks. Among variousmethods used to detect moving objects, backgroundsubtraction is the most popular one.

The basic idea of background subtraction is that themoving objects (foreground) are regarded as thedifference between the current frame and the scene'sstatistical static parts (background). However, there aremany challenges in developing a good backgroundsubtraction algorithm [1]. First, it must be robust againstchanges in illumination (gradual or sudden). Second, itshould avoid foreground misdetections caused bynon-stationary background objects that may be wavingleaves, fluttering flags, ripple water etc. Finally, thebackground model should react quickly to changes inthe background such as starting and stopping ofvehicles.

A large number of background subtraction methodshave been proposed in the last few years [2-4].Meanwhile, many different features are utilized formodeling the background. Most of these methods use

978-1-4673-2204-1/12/$31.00 ©2012 IEEE

438

only the pixel color or intensity information as thefeature. A few studies [5-7] have also utilizeddiscriminative texture features in dealing with theproblem. However, these methods are time-consumingdue to use of the texture features in the pixel level.

In this paper, we propose a hierarchical codebook modelfor foreground detection using haar-like features. Byemploying block-based and pixel-based CBs to modelthe background we can implement high-efficientforeground detection. Using the block-basedbackground model efficient foreground detection isachieved without reducing the true positive rate, thoughthe precision is rather low. Then a pixel-basedbackground model is introduced to increase theprecision while maintaining high speed of the algorithm.One specific problem for background subtraction ispermanent changes of background geometry such as thestarting and stopping of the vehicles. A particularparameter is employed in the background model tosolve this problem. As shown in the experimentalresults, the proposed method can effectively detectforeground with a fast computation speed and highcorrect detection rate even in non-stationary backgroundenvironment.

The rest of this paper is organized as follows. In Section2, we briefly review the related works. The detailinformation of model initialization and foregrounddetection are introduced respecti vely in Section 3 andSection 4. Experiment results are shown in Section 5and Section 6 gives the conclusions.

2 Previous workAmong various background subtraction methods, thesimplest one used to detect foreground is the (weighted)running average [8], which describes background modelusing an average of gray value or color intensities ateach pixel. More advanced background modelingmethods like single Gaussian model [9-11] are densitybased, where the background model for each pixel isdefined by a probability density function. All the abovetechniques are single-mode in pdf, which can lead asatisfactory result in stationary environments. However,more sophisticated method is necessary when dealingwith videos captured in complex environments wheredynamic background, camera ego-motion, and highsensor noise are encountered.

Page 2: [IEEE 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC 2012) - Beijing, China (2012.09.21-2012.09.23)] 2012 3rd IEEE International Conference

Proceedings ofIC-NIDC2012

The Gaussian Mixture Model (GMM) [2, 12-13] is themost popular one among various complex pixel-levelalgorithms proposed in the past years. In this model, amixture of Gaussian distributions is used to representvarious background appearances. Thus it can cope withsome situations where repetitive background motionsare encountered. However, GMM assumes that one orseveral Gaussian models can represent the distributionof a pixel value, which is debatable as some researchersclaim that natural images exhibit non-Gaussianstatistics.

In the codebook (CB) algorithm [4], each pixel isrepresented by a codebook, which is a compressed formof background model for a long image sequence. Eachcodebook is composed of a few codewords whichrepresent various background appearances. This methodis believed to be able to capture background motionover a long period of time with a limited amount ofmemory. However, the proposed codebook updatedmechanism in Ref. [4] does not allow the creation ofnew codewords, and this can be problematic ifpermanent structural changes occur in the background.

Region-based algorithms have also been proposed forbackground subtraction, in which specific block featuresare calculated to represent the block. However, only

using region-based background subtraction theforeground objects detected are coarse. Thus a two-levelmechanism must be introduced in order to solve thisproblem. In Ref. [12], the author combines pixel-basedand block-based approaches into a single frameworkthrough using GMM in both levels. In Ref. [14], ahierarchical scheme with block-based and pixel-basedcodebooks for foreground detection is presented.However, the calculation of the block features used inRefs. [12, 14] is time-consuming.

Motivated by the work of Refs. [14-15], we propose acoarse-to-fine foreground detection method in this paper,in which haar-like features are used to represent theblock. Since haar-like features can be rapidly calculatedusing integral image, the block-based stage can providea faster process speed compared with the approaches inRefs. [14-15]. The background modeling of the twoproposed CBs is similar to the original CB [4].Although the original CB can provide a high efficiencyin background modeling, updating and foregrounddetecting, it still involves many redundancies. Throughusing the two-layer framework we further speed up theclassification of foreground and background, meanwhilepreserving the advantage of CB. Figure 1 shows thestructure of the proposed method.

Figure 1 Structure ofproposed method

Figure 2 Template ofhaar-like features

3 Background modelingdivided into four even parts. Four haar-like features,Hi,i=1,2,3,4, and an average value are then calculatedwith a weighting factor on each part, as shown in Figure2. Each block can be represented asVb = {H i J HZJ H3 J H4 JAvg}. All the five features can becalculated rapidly through the integral image.

3.2 Block-based CB initialization

During the training phase, the features of a specificblock can be represented as a vectorVb = {Hj, HZJ H3 J H4 JAvg}. A CB for a block can berepresented as c, = {cbi 11 :::; i :::; Lb } , consisting of t.,codewords, and describing the possible background ofthe block. Each codeword Cbi consists of 5-D featuresVbi and additional auxiliary information auxj., ={ numg., Ifb i }, where numj.. indicates how many timesthe codeword is matched and Ifbi indicates the last timewhen the codeword is matched.

QTI][]]I]

[IQ][II]]

Four haar-like features and an average value areextracted to represent one block. The haar-like featuresare extracted from an N*N rectangular regions at eachlocation in the image. First, the rectangular region is

3.1 Features used in block-based backgroundsubtraction

439

Page 3: [IEEE 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC 2012) - Beijing, China (2012.09.21-2012.09.23)] 2012 3rd IEEE International Conference

(1)

Proceedings ofIC-NIDC2012

The proposed algorithm of the block-based CBinitialization is illustrated in Algorithm 1, where Nrecords the number of current codewords and theparameter a denotes the learning rate. The objective ofthis initialization algorithm is to find multiple Cbi withdifferent features for describing the entire blockcontents in the training phase. In order to judge whethera block feature Vb had appeared or not, a matchfunction [14] which is also used in pixel level isemployed to measure the correlations with the vectors inthe CB

Ildllrnatch(v1, V2) = dirn(d)

{Ci' i = min(match(v, Ck) < Th)

find(v, C) = knull, No match(v, Ck) < Th

where dim(·) denotes the dimension of the inputvector, Th denotes the threshold for determining if thecompared two vectors are matched, and

d = Vi - V2

Ildll = L~~~(d) [d.]

It is obvious that not all the codewords in the CB canrepresent the background, only the codewords which arematched enough times may be the actual backgroundcodewords. Suppose that after the training period theCB denoted by Cinit = {Ci' 1 :::; i :::; N} is obtained, we

{num·choose the final CB as Cfinal = Ci I Ci E Ciniv __I >

Tf3}, where T is the training frame numbers and ~

denotes the threshold for determining whether thecodeword Ci is actual background codeword.

Algorithm 1: CB Initialization

1. V == {Hi 1 HZ1 H3 1 H4 1Avg} or V == {val}

2. For all codewords in C, find a codeword Ci

matching to V according to (I).

3. If foundCi == aV + (1 - a)cb num, == num. + 1,

If == nFrm.Else

N == N + 1, CN == V, numj, == 1,

lfN == nFrm

4. IfnFrm=Tcheck the CB to delete the false backgroundcodeword

3.3 Pixel-based CB initialization

The algorithm of the pixel-based CBs initialization issimilar to the block -based CB. Each pixel is representedas Vp = {val}, where val is the gray value of a pixel inour implementation, and the CB for a pixel can berepresented as Cp = {Cpi 11 :::; i :::; Lp }. Each codewordCpi consists of vector vpi and auxiliary informationauxpi = {numj], lfpi}. The algorithm of the trainingprocess can also be presented by Algorithm 1.

440

4 Hierarchical foreground detection

After the background models training as indicatedabove, the constructed block-based and pixel-based CBsare obtained and then applied to the foregrounddetection. A sequence of frames denoted asxT+1. X T+2, ••• is fed into the proposed system fordetecting the moving objects. Most of the backgroundregions of the input frame can be filtered out with theblock-based CB. In this process, the block-based andpixel-based background CBs are kept updating toaccommodate dynamic environments. Subsequently, thecoarse foreground obtained by the block-based CB canbe precisely refined by the pixel-based detectionmechanism.

4.1 Foreground detection with the block-based CB

In the detection phase, input frames (x") are first dividedinto non-overlapped blocks and a 5-D vectors Vb iscalculated from each block for block-based foregrounddetection. As illustrated in Algorithm 2, the input vectorVb is compared with each codeword (Cbi) until a matchis found, in which the match function defined in Eq. (1)is applied. When Vb is classified as the background,the corresponding block is used to update both of theblock-based and the pixel-based CBs. This can ensurethat the pixel-based CB is also adapted to thebackground changes. However, this strategy will causean increase of the processing time in foregrounddetection. To cope with this, a parameter Thupdate isutilized to enable the function of pixel-based updating inthis stage. In other words, the updating function in pixellevel will be performed in every Thupdate frames.

Algorithm 2: Foreground Detection

1. Vb == {Hi 1 HZ1 H3 1 H4 1Avg}

2. For all codewords in Cb, find a codeword Cbi

matching to V, according to (I).

3. If foundCbi == aVb + (1 - a)Cbb

numj., == numj., + 1, lfbi == nFrm.fglfb == O.IfnFrm%Thupdate=O

Update each pixel CB in this block.Else

fglfb == fglf b + 1.If fglfj> Thadd

Add a new codeword using V,Detect foreground in pixel level in this block.

4. If nFrm%Thcheck=0Check the CBs to delete useless codeword

4.2 Foreground detection with the pixel-based CB

In the block that has been detected by the block -basedCB as foreground, the input pixel Vp takes tumcompared with Lp pixel-based codewords (Cpi). Thematch function defined in Eq. (1) is applied to

Page 4: [IEEE 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC 2012) - Beijing, China (2012.09.21-2012.09.23)] 2012 3rd IEEE International Conference

Proceedings ofIC-NIDC2012

determine whether Vp belongs to foreground or not.Notably, the block threshold Th., is selected accordingto the principle of making the foreground be correctlydetected as more as possible, although this may allowsome backgrounds to be incorrectly classified asforegrounds. Howerver, these false detections can beeliminated through pixel-based foreground detection bythreshold Thp in pixel match function.

In addition, in order to adapt to changes in thebackground geometry such as starting and stopping ofvehicles, we check the block-based and pixel-base CBsevery Thcheck frame and delete the codewords that donot appear for a long time (nFrm - If > Thdel ).Meanwhile, a parameter fglf, which represents thenumber that a block is detected as foregroundcontinuously, is added to the block-based CB. WhenfgIf > Thadd , a new codeword is added to theblock-based CB.

better performance compared with that of GMM andoriginal CB. In addition, compared with other twomethods, our proposed method need lowercomputational cost. 300 frames of 352*288 pixelsimage can be processed per second on average. Thestatistical results are shown in Table I.

Figure 3 Fountain: (a) Frame; (b) Ground truth; (c) GMM;(d) Original CB; (e) Proposed method

Figure 5 PETS200l: (a) Frame; (c) GMM; (d) Original CB;(e) Proposed metho

Figure 4 Watersurface: (a) Frame; (b) Ground truth; (c) GMM;(d) Original CB; (e) Proposed method

5 ExperimentsFalse positive (FP) rate, true posrtive (TP) rate,precision, and similarity [14] are employed to measurethe performance of the proposed approach. The testdatasets, Fountain and Watersurface, used in ourexperiments are publicly available made by Ref. [16]and these sequences are provided to test the robustnessof the model to the non-stationary background. Anothertest sequence is PET2001 DATA2, which is provided totest the adaptability of the model to the gradual changesof the background. The proposed approach wascompared with Gaussian Mixture Model (GMM) [2]and the original pixel-level CB [4].

The parameters used in our experiments are set asfollows: the threshold Thb= 5, Thp = 10, ~ = 0.1, thetraining time T=100, the pixel update gap Thupdate =3. the block size is 8*8, the learning rate a = 0.05, thecheck gap Thcheck = 3, the delete threshold Thdel =200, the add threshold Thadd = 200. The algorithms areimplemented in C++ language and run on a standard PCwith 2.5GHz CPU, 2.0G memory and Windows XP SP3operation system.

Figure 3, Figure 4 and Figure 5 show the results ofdifferent background modeling methods. (b) representsthe ground truth, (c) and (d) respectively are the resultsof GMM and the original pixel-based CB, (e) is theresult obtained by the proposed method. From theresults, we can see that the proposed method provides a

Table I Performance on the test sequences and the running frame rate on average

Sequence

Fountain

WaterSurface

TP rate(%) FP rate(%) precision(%) similarity(%) FPS Method

73.4 0.9 75.3 59.6 364 PRO

70.9 1.7 60.0 48.4 98 GMM

70.5 3.4 42.9 36.1 84 CBs

82.9 0.3 96.6 80.5 370 PRO

71.5 1.4 81.9 61.8 100 GMM

82.7 0.1 98.3 81.5 84 CBs

441

Page 5: [IEEE 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC 2012) - Beijing, China (2012.09.21-2012.09.23)] 2012 3rd IEEE International Conference

Proceedings ofIC-NIDC2012

6 Conclusions

In this paper we propose an effective hierarchicalcodebook background model using haar-like featureswhich consist of block-based CB and pixel-based CB.For haar-like features can be rapidly extracted usingintegral image, the block-based foreground detectionstage can provide high processing speed. Furthermore,the pixel-based foreground detection stage can improvethe precision of the foreground objects. Through usingparameters Thdel and Thadd we successfully solve theproblem that static objects leave off the background ormoving objects stop in the background. As documentedin the experimental results, the proposed method caneffectively detect foreground with a fast computationspeed and high correct detection rate even innon-stationary background environments.

Acknowledgements

This work is supported by the National Natural ScienceFoundation of China (90920001 and 61101212), the NationalHigh Technology Research and Development Program ofChina (863 Program, No. 2012AAOI2505), the FundamentalResearch Funds for the Central Universities and NationalScience and Technology Major Project of the Ministry ofScience and Technology of China (2012ZX03005008).

References

[1] M. Piccardi. "Background subtraction techniques: Areview", in Proc. IEEE Int. Conf. Syst., Man Cybern.,The Hague, The Netherlands, Oct. 2004, vol. 4, pp.3099-3104.

[2] Stauffer, C. and W. E. L. Grimson. "Adaptivebackground mixture models for real-time tracking",IEEE. Comput. Soc. Conf. Comput. Vis. PatternRecognit., vol. 2. Jun. 1999, pp. 246-252.

[3] Elgammal, A., D. Harwood, et al. "Non-parametricmodel for background subtraction. " ComputerVision-ECCV 2000: 751-767.

[4] Kim, K., T. H. Chalidabhongse, et al. "Real-timeforeground-background segmentation using codebookmodel." Real-time imaging. 2005, 11(3): 172-185

[5] Han, B. and L. Davis. "Density-Based Multi-FeatureBackground Subtraction with Support Vector Machine."

442

Pattern Analysis and Machine Intelligence, IEEETransactions on (99): 1-1. 2011

[6] Heikkila, M. and M. Pietikainen. "A texture-basedmethod for modeling the background and detectingmoving objects." Pattern Analysis and MachineIntelligence, IEEE Transactions on 28(4): 657-662. 2006

[7] Liao, S., G. Zhao, et al. Modeling pixel process withscale invariant local patterns for background subtractionin complex scenes, IEEE Conf. on Computer Vision andPattern Recognition, (CVPR), San Francisco, CA, USA,2010.

[8] Cavallaro, A. and T. Ebrahimi. "Video object extractionbased on adaptive background and statistical changedetection. in Proc." Vis. Commun. Image Process., Jan.2001, pp. 465-475.

[9] Cezar Silveira Jacques, 1., C. Rosito Jung, et al.. "Abackground subtraction model adapted to illuminationchanges", in Proc. IEEE Int.Conf. Image Processing,2006,pp.1817-1820.

[10] Davis, 1. and V. Sharma. "Robustbackground-subtraction for person detection in thermalimagery." IEEE Int. Wkshp. on Object Tracking andClassification Beyond the Visible Spectrum. 2004

[11] Wren, C. R., A. Azarbayejani, et al.. "Pfinder: Real-timetracking of the human body." Pattern Analysis andMachine Intelligence, IEEE Transactions on 19(7):780-785. 1997

[12] Chen, Y. T., C. S. Chen, et al.. "Efficient hierarchicalmethod for background subtraction." PatternRecognition 40(10): 2706-2715. 2007

[13] Zivkovic, Z. and F. van der Heijden. "Efficient adaptivedensity estimation per image pixel for the task ofbackground subtraction." Pattern recognition letters27(7): 773-780.2006

[14] Guo, 1. M., Y. F. Liu, et al. "Hierarchical method forforeground detection using codebook model." Circuitsand Systems for Video Technology, IEEE Transactionson 21(6): 804-815.2011

[15] Han, B. and L. Davis. "Density-Based Multi-FeatureBackground Subtraction with Support Vector Machine."Pattern Analysis and Machine Intelligence, IEEETransactions on(99): 1-1. 2011

[16] L. Li, W. Huang, I. Gu, and Q. Tian. "Foreground objectdetection from videos containing complex background".In Proceedings of the eleventh ACM internationalconference on Multimedia, pages 2-10, New York, USA,2003. http://perception.i2r.a-star.edu.sg.