Anomaly Detection through Spatio-Temporal Context Modeling in...

6
2014 22nd International Conference on Pattern Recognition Anomaly Detection through Spatio- Temporal Context Modeling in Crowded Scenes Tong Lu, Liang Wu, Xiaolin Ma National Key Laboratory of Novel Software Technology Nanjing University, Nanjing, China 210023 Email: [email protected] Chew Lim Tan School of Computing, National University of Singapore Email: [email protected] Abstract-A novel statistical framework for modeling the intrinsic structure of crowded scenes and detecting abnormal activities is presented in this paper. The proposed framework essentially turns the anomaly detection process into two parts, namely, motion pattern representation and crowded context mod- eling. During the first stage, we averagely divide the spatio- temporal volume into atomic blocks. Considering the fact that mutual interference of several human body parts potentially happen in the same block, we propose an atomic motion pattern representation using the Gaussian Mixture Model (GMM) to distinguish the motions inside each block in a refined way. Usual motion patterns can thus be defined as a certain type of steady motion activities appearing at specific scene positions. During the second stage, we further use the Markov Random Field (MRF) model to characterize the joint label distributions over all the adjacent local motion patterns inside the same crowded scene, aiming at modeling the severely occluded situations in a crowded scene accurately. By combining the determinations from the two stages, a weighted scheme is proposed to automatically detect anomaly events from crowded scenes. The experimental results on several different outdoor and indoor crowded scenes illustrate the effectiveness of the proposed algorithm. I. INTRODUCTION Nowadays, automatic analysis of densely crowded envi- ronments such as subways, universities, railway stations and stadiums has been a recent interest in pattern recognition. Methods for crowed modeling [I), crowed size estimation [2] and individual behavior prediction 131 have been proposed. Most crowed monitoring systems aim at assisting public secu- rity, thus these efforts face a common problem of detecting de- viations from crowded environments. The problem is referred to as crowded anomaly motion detection, which provides much more valuable hints than detecting normal behaviors. The anomaly motions here are defined as the unusual events that are different from those steady ones frequently happen at particular positions inside a scene. Unlike anomaly detection from non-crowded scenes, a crowed environment generally requires monitoring an exces- sive number of individuals and their activities through video surveillance devices. As a result, computational approaches of anomaly detection in densely crowded scenes may face more difficulties both in scene modeling and anomaly behaviors de- 1051-4651/14 S3 1.00 c 2014 IEEE DOII0.II09 / 1CPR.2014.383 Palaiahnakote Shivakumara Faculty of Computer Science and Information Technology University of Malaya. Kuala Lumpur, Malaysia Email: [email protected] tecting in two aspects. First, multiple individuals in a crowded scene in general lead to severe occlusions, which make object tracking fairly difficult and sometimes a significant challenge even for human observers. Second, highly irregular pedestrian behaviors within the same crowded scene potentially make explicit modeling of anomaly deviations difficult, and therefore it may be hard to distinguish tiny deviations from normal behaviors clearly. In this paper, we present a novel spatio-ternporal frame- work for modeling the intrinsic structure of crowded scenes and detecting abnormal activities in it by using Gaussian Mix- ture Model (GMM) and Markov Random Field (MRF). Since each crowded scene essentially consists of a large number of pedestrian activities together with their interrelations, we turn the anomaly detection process into two parts, namely. motion pattern representation and crowded context modeling. During the first stage, we averagely divide the spatio-temporal volume of the scene into atomic blocks. Considering the fact that the mutual interferences of several human body parts potentially happen in the same block, we propose an atomic motion pattern representation using GMM to distinguish the motions inside each block in a refined way. Usual motion patterns can thus be defined as a certain type of steady motion activities appearing at specific scene positions. During the second stage, we further use the MRF model to characterize the joint label distributions over all the adjacent local motion patterns inside the same crowded scene, aiming at modeling the severely occluded situations in the crowded scene accurately. By combining the determinations from the two stages, a weighted scheme is finally proposed to automatically detect anomaly events in crowded scenes. The main contribution of our approach is to present a statistical approach by coupling spatio-temporal crowded con- text modeling for anomaly event detection in crowded scenes. The context model is suitable to characterize the relations between spatially or temporally adjacent pedestrian activities, thus enforcing local consistency on them to assist detecting anomaly motion patterns. The experimental results on several real-life outdoor and indoor scenes illustrate the effectiveness of the proposed method by outperforming the existing anomaly detection algorithm. 2203 IEEE computer society

Transcript of Anomaly Detection through Spatio-Temporal Context Modeling in...

Page 1: Anomaly Detection through Spatio-Temporal Context Modeling in …eprints.um.edu.my/13089/1/anomaly_detection_through... · 2015-03-23 · Anomaly Detection through Spatio-Temporal

2014 22nd International Conference on Pattern Recognition

Anomaly Detection through Spatio- TemporalContext Modeling in Crowded Scenes

Tong Lu, Liang Wu, Xiaolin MaNational Key Laboratory of Novel Software Technology

Nanjing University, Nanjing, China 210023Email: [email protected]

Chew Lim TanSchool of Computing, National University of Singapore

Email: [email protected]

Abstract-A novel statistical framework for modeling theintrinsic structure of crowded scenes and detecting abnormalactivities is presented in this paper. The proposed frameworkessentially turns the anomaly detection process into two parts,namely, motion pattern representation and crowded context mod-eling. During the first stage, we averagely divide the spatio-temporal volume into atomic blocks. Considering the fact thatmutual interference of several human body parts potentiallyhappen in the same block, we propose an atomic motion patternrepresentation using the Gaussian Mixture Model (GMM) todistinguish the motions inside each block in a refined way. Usualmotion patterns can thus be defined as a certain type of steadymotion activities appearing at specific scene positions. During thesecond stage, we further use the Markov Random Field (MRF)model to characterize the joint label distributions over all theadjacent local motion patterns inside the same crowded scene,aiming at modeling the severely occluded situations in a crowdedscene accurately. By combining the determinations from the twostages, a weighted scheme is proposed to automatically detectanomaly events from crowded scenes. The experimental resultson several different outdoor and indoor crowded scenes illustratethe effectiveness of the proposed algorithm.

I. INTRODUCTION

Nowadays, automatic analysis of densely crowded envi-ronments such as subways, universities, railway stations andstadiums has been a recent interest in pattern recognition.Methods for crowed modeling [I), crowed size estimation [2]and individual behavior prediction 131 have been proposed.Most crowed monitoring systems aim at assisting public secu-rity, thus these efforts face a common problem of detecting de-viations from crowded environments. The problem is referredto as crowded anomaly motion detection, which provides muchmore valuable hints than detecting normal behaviors. Theanomaly motions here are defined as the unusual events that aredifferent from those steady ones frequently happen at particularpositions inside a scene.

Unlike anomaly detection from non-crowded scenes, acrowed environment generally requires monitoring an exces-sive number of individuals and their activities through videosurveillance devices. As a result, computational approaches ofanomaly detection in densely crowded scenes may face moredifficulties both in scene modeling and anomaly behaviors de-

1051-4651/14 S3 1.00 c 2014 IEEEDOII0.II09/1CPR.2014.383

Palaiahnakote ShivakumaraFaculty of Computer Science and Information Technology

University of Malaya. Kuala Lumpur, MalaysiaEmail: [email protected]

tecting in two aspects. First, multiple individuals in a crowdedscene in general lead to severe occlusions, which make objecttracking fairly difficult and sometimes a significant challengeeven for human observers. Second, highly irregular pedestrianbehaviors within the same crowded scene potentially makeexplicit modeling of anomaly deviations difficult, and thereforeit may be hard to distinguish tiny deviations from normalbehaviors clearly.

In this paper, we present a novel spatio-ternporal frame-work for modeling the intrinsic structure of crowded scenesand detecting abnormal activities in it by using Gaussian Mix-ture Model (GMM) and Markov Random Field (MRF). Sinceeach crowded scene essentially consists of a large numberof pedestrian activities together with their interrelations, weturn the anomaly detection process into two parts, namely.motion pattern representation and crowded context modeling.During the first stage, we averagely divide the spatio-temporalvolume of the scene into atomic blocks. Considering the factthat the mutual interferences of several human body partspotentially happen in the same block, we propose an atomicmotion pattern representation using GMM to distinguish themotions inside each block in a refined way. Usual motionpatterns can thus be defined as a certain type of steady motionactivities appearing at specific scene positions. During thesecond stage, we further use the MRF model to characterizethe joint label distributions over all the adjacent local motionpatterns inside the same crowded scene, aiming at modeling theseverely occluded situations in the crowded scene accurately.By combining the determinations from the two stages, aweighted scheme is finally proposed to automatically detectanomaly events in crowded scenes.

The main contribution of our approach is to present astatistical approach by coupling spatio-temporal crowded con-text modeling for anomaly event detection in crowded scenes.The context model is suitable to characterize the relationsbetween spatially or temporally adjacent pedestrian activities,thus enforcing local consistency on them to assist detectinganomaly motion patterns. The experimental results on severalreal-life outdoor and indoor scenes illustrate the effectivenessof the proposed method by outperforming the existing anomalydetection algorithm.

2203 IEEEcomputer

society

Page 2: Anomaly Detection through Spatio-Temporal Context Modeling in …eprints.um.edu.my/13089/1/anomaly_detection_through... · 2015-03-23 · Anomaly Detection through Spatio-Temporal

The rest of the paper is organized as follows. Section 2reviews the related works. Section 3 gives the representationof atomic motion patterns of crowded scene using GMM. thenSection 4 models the context for the activities in crowded sceneby MRF. Experiments and discussions are given in Section 5.Finally, Section 6 concludes the paper.

II. RELATED WORK

A number of methods have been proposed in the pastyears and the existing methods for anomaly detection can beroughly classified into three categories: trajectory learning,motion modeling, and unsupervised approach.

Generally, a trajectory learning approach first tracks eachobject in a scene and then learns a model from object tracks toidentify unusual activities by detecting deviations [4]. Everyscene object is tracked over a dynamic sequence of inferredtracking states as 51' = {Sl, S2, ... sJ·}, where T is a framesequence and 8, depicts things such as appearance, position,shape. velocity and direction. In 151, an anomaly detectionsystem is proposed to automatically learn motion patterns bytracking multiple objects, in which growing and prediction ofcluster centroids of foreground pixels ensure the accuracy oftracking a moving object in the scene. Unfortunately, it is notvery promising for tracking objects in densely crowded scenes.The method in [6] is one of the first algorithms for trackingobjects in a crowded environment, which uses articulatedellipsoids to model human appearance and a Gaussian distri-bution to model the background for segmentation. Recently,Ali and Shah [3] present a force model which has threefloor fields inspired by the research in the field of evacuationdynamics, namely Static Floor Field (SFF). Dynamic FloorField (OFF) and Boundary Floor Field (BFF). However, it isstill challenging to tracking individuals in a crowed scene bysimply using trajectory learning techniques.

Motion modeling is an alternative approach which has beenproposed by modeling motion patterns to avoid object tracking[7]. This approach focuses on distinguishing unusual eventsfrom other stationary behaviors and has been proved suitablefor anomaly detection especially in crowded scenes. In thisapproach, optical flow is a popular low-level representationto describe motion patterns from surveillance video. Andradeet af. [8] implement spectral clustering on feature prototypesthat are obtained by performing PCA on the optical flow fields,and train the MOHMM model for each class during anomalydetection. Adm et al. [9] calculate the probabilities of opticalflows in local regions. Unfortunately, optical flow, togetherwith other similar low-level descriptors such as pixel changehistograms and background subtraction operations, are notreliable enough for detecting abnormal events from crowdedscenes in which occlusions always exist [10).

Since no prior assumptions of what unusual eve~ts maylook like are required, unsupervised anomaly detection canbe considered as a complementary but simultane~usly a ~orenatural approach to detect unusual events. It. l~ espe~'~lIysuitable for the situations such as the lack of sufficient trairungdata. and the volatility of defining normality and abnormality.Xiang and Gong [11] propose natural grou~ing on behaviorpatterns through an unsupervised model, :-vhlch s~lects eigen-vector features from a normalized affinity matnx. Zhao et

al. [12] propose a fully unsupervised dynamic sparse codingapproach for detecting unusual events in videos. Most of theunsuperv,i,sed methods utilize the "hard to describe" but "easyto verify property of unusual events, without building anexplicit model for normal events. Its advantage is that onecan compare each event with all the other observed events andthus determine whether a given event is abnormal. However,they stili have the same problems, namely, how to improve theaccuracy and how to efficiently measure the similarity of thedete~ted events, especially on a relatively large dataset that isrequired by most unsupervised methods.

III. REPRESENTATIONOF ATOMIC MOTION PATTERNSUSING GMM

Due to the relatively large number of individuals in acrowded scene, segmentation according to the contours ofpedestrians in the scene is not always reliable as discussed.Averagely dividing the spatio-temporal volume into atomicspatio-temporal blocks with a fixed size is popular to modelmotions in a local spatio-temporal block. However, it is stilldifficult to avoid the discordance of motions within the sameblock. Namely, even for the same block, it potentially con-tains discordance of motions which are generally brought bydifferent body parts of the same pedestrian, or the body partsfrom different pedestrians that are unexpectedly divided intothe same block. As a result, using a uniform representationto characterize such a single spatio-temporal block in generalfaces difficulties in characterizing multiple motions inside thesame block, especially considering the mutual interferencesof body parts from the same pedestrian or several differentneighboring pedestrians.

In order to better characterize each spatio-temporal volumeblock, we define the motion pattern within each block as acollection of motions from multiple body parts. For this pur-pose, we first extract optical flow vectors and then employ theGaussian Mixture Model over the optical flow vectors in eachblock. As a result, every motion component within a spatio-temporal block can be captured and the interference betweeneach other are accordingly avoided since the distribution ofoptical flows located in that block can be modeled by GMM.The Gaussian Mixture models of all the optical flows withinthe same block are accordingly named as its motion patterns.In this way, we obtain the atomic motion representation forevery block in the whole spatio-temporal volume.

Specifically. we first divide the volume into spatio-temporalblocks of a fixed size as in 1131.We set the size of each blockto the average width of pedestrians. For every feature point(.T, y) in an input video frame, the optical flow l' is computedusing the pyramidal Lucas-Kanade algorithm:

(I)

where U:r and uy are the velocities along the s: and y direct!ons,respectively. The flow vectors that have a too large magnitudeare considered as noises and thereby discarded directly. Fol-lowing the described spatio-temporal volume representation,all the optical flow vectors are then assigned into spatio-temporal blocks according to their spatio-temporal locations.

Then for a block at position (i,j) and time moment t.we calculate the Gaussian Mixture Model G~j(x~j IOU for the

2204

Page 3: Anomaly Detection through Spatio-Temporal Context Modeling in …eprints.um.edu.my/13089/1/anomaly_detection_through... · 2015-03-23 · Anomaly Detection through Spatio-Temporal

block by:k!;

G~J(X;jlf.L:jJI~) = 2:>kg..(X:){L~j' E~j) (2)k=]

where J'" is a 2-dimensional optical flow vector in the block,WI.; (I.: ,:\ .... ,k,j)' denotes a corresponding mixture wei~ht,

and q(x' 1,/ E') I.: = 1, .. , k' are the component Gaussian. • ·IJ I)' 'L]' ., I)

densities.There are altogether kl; components in G;j' each denoting

the motion of a particular body part from the same pedestrianor several different pedestrians. To determine the value ofkt, we apply a mean shift clustering algorithm over theo~tical flows which are assigned to the block at (i,j), andaccordingly use the number of the result clusters to initializekL. Then, given the training optical flow vectors in any spano-temporal block, we model the motion pattern for It by GMM,namely, the parameter of () which best matches the distributionof the training optical flow vectors. We use the MaximumLikelihood (ML) algorithm to estimate the parameter. Sincethe likelihood P(xl(}) is a non-linear function for the parameter() and a direct maximization will be difficult to calculate, weuse the Expectation-maximization (EM) algorithm to find themaximum likelihood solution. Suppose the model parametersof w, /1. and E are denoted by ():

N kLP(xIB) = P(xlw, IL, E) = II2:>"N(xnl{Lk' Ek) (3)

n=l !'=1

The clustering centers and the number of clusters kL describedabove are passed to the training process of GMM as the inputfor initialization. On each EM iteration (M-step), the followingre-estimation formulas are used, which guarantee a monotonicincrease of the likelihood value in the model:

1 NIL!. = " L ,(zn!·)xn (4)

1,k n=1

whereN

n, =L ,(Znk)n=l

() wkN(x.,.IILb Ek),Z"k = J<

Lj=lWjN(x1l1,lj, Ej)

Ilk. Ek and Wk are mean, covariances and mixing coefficients(of ...), respectively. x" is an optical flow vector belonging tothe spatio-temporal block. In this way, we obtain the modelparameters () if the convergence criterion is satisfied.

We further name the cluster of the motion patterns insidethe same block as a motion prototype. which is consideredas an usual event prototype in the block. Since the mean of

Fig. I. The spherical polar coordinate space for labeling mol ion vectors.where the optical now in the colorful region is assigned a lahell,.

a set of GMMs do not always remain a GMM and directlycomputing the mean of GMMs will be infeasible, we definethe medoid M~jk of a motion prototype by

where G]"1AI~jk is the motion pattern that is associated withthe prototype M!jk at position (i, i, k). Since each prototypeessentially represents a certain type of steady motion activitieswhich appear in a specific local region inside the scene, wetreat it as the usual motion pattern at that position. Then wedetermine whether a given motion pattern is usual or unusualby measuring its deviations from the calculated prototypesthrough the following confidence:

Cbasic = max(p(G;iklP;jk» (10)..

(7)

IV. CONTEXT MODELING FOR CROWDED SCENES

Our next purpose is to improve the accuracy of crowdedanomaly detection by characterizing the context of motionpatterns. Inspired by the work of Galleguillos et. al [14)for object categorization. we use the MRF model to learnthe joint label distributions of adjacent local motion patternsas the context of a crowded scene. Each component C, inGMM is actually a kind of composition of local motionpattern representation and is defined by a Gaussian distributiontogether with a weighted value w,. Thus we can assign adiscrete motion label by transforming the mean vector of C,from the original video spatio-temporal coordinate space to theplanar polar coordinate space, which is divided into a set ofdiscrete radius and radian intervals with a fixed size as shownin Fig. I. Namely, each cluster is assigned with a label of aspecific bin in which the centroid vector falls in. To decide theproper size for the intervals, we choose the maximum speedvalue R. of pedestrians in the scene s, and correspondinglydivide the planar polar coordinate space (T, ()). r E (0, R..),() E (0,360) into fixed radius and angle intervals.

For a specific motion pattern that is composed of k~components, we obtain a discrete distribution of labels a~shown in Table I.

Next, to describe the spatio-temporal context of mo-tion patterns, we define a set of relation matrices

(8)

2205

Page 4: Anomaly Detection through Spatio-Temporal Context Modeling in …eprints.um.edu.my/13089/1/anomaly_detection_through... · 2015-03-23 · Anomaly Detection through Spatio-Temporal

TABLE I. THE LABEL DISTRIBUTION OF A MOTION PATTERN.1,(1::; i ::; kijl IS THE LABELS FOR CLUSTERS Or THE MOTION PATTERN.

p,(l ::; i ::;kl) IS THE CORRESPONDING PROBABILITY, AND k;j IS THETOTAL NUMBER RELATED TO THE MOTION PATTERN (GMM).

<f>~m, <f>~m, <f>'~m,<f>;""', each capturing the interactions be-tween the label distributions from adjacent blocks along oneof the 4 directions. Taking the x direction as an example,each entry (i, j) in the matrix <f>~m actually represents thevirtual times of the motion pattern having the label of Ii, whichappears in the training volume together with another motionpattern having the label of IJ• The term virtual here is a floatvalue rather an integer one to calculate co-occurrence times. Tosimplify the computation of virtual times between two adjacentmotion patterns, we assume that their label distributions areindependent. Thus, the joint distribution of labels of anytwo adjacent motion patterns can be calculated as shown inTable II:

TABLE II. THE JOINT LABEL DISTRIBUTION OF TWO ADJACENTMOTION PATTERNS m AND n. p;" IS THE PROBABILITY or LABEL li IN THE

LABEL DISTRIBUTION Lm, AND k!Jm IS THE TOTAL NUMBER OF LABELSWITHIN THE MOTION PATTERN m.

P; P; ptp2 Pi P~t'J

Pl PI P2 P2 P2 Pq.

PI.c",IPl Pl.:~ P2 P~t p~~.'J 'J

After obtaining the probabilities which are extracted fromeach specific volume, we can empirically count the virtual co-occurrence times between any two adjacent motion patterns.Basically, for one label pair (it .iJ), supposing the probabilityof L, in the label distribution Ll is pi (Ii) and the probabilityof Ij in the label distribution L2 is p2(lj), we add pl(li)p2(lj)to the entry (i, j) in the relation matrix, which corresponds tothe two adjacent motion patterns. Therefore, the entry (i,j) inmatrix <f>;n can be calculated as following:

1'01<f>'.:"'(i.j) = L:>k(I;)p;:(lj)

where IVI is the total number of the volumes obtained fromthe training image sequence, Pkn(I;) stands for the probabilityof label It in the discrete label distribution corresponding tothe block m of the kth training volume. Similarly, we canobtain the relation matrix <f>~m(i,j), <f>~m(i,j) for the y andz direction, respectively.

With the MRF model, the probability of a spatially ortemporally adjacent label pair can be defined as follows:

( l I m11) 1 (",1»"(' .)) (12)P < "J >:tPn = Z( mn) cxp '1'0 t,Ji/)"

where Z(.) is the partition function, (j);;"(i.j) is the interactionpotentials which are learned from the training data, n standsfor the category of relation that is one of .t. s. Z or t, 111 and 'It

denote the spatial positions of two motion patterns where I. andlJ are extracted. Using the joint probability as the weight, we

compute the consistent probability of the two adjacent motionpatterns in a weighted sum form as follows:

p( < u.; Mn >; <p,;;n) =ICumllCuml

.L .L pM". (l;Jp1l1n (Ij)p« li,lj >:rf.>;::,n) (13)·i=1 j=1

where !II,,, and !lIn are two motion patterns located at spatialP?sltlon 1:L ~l1d .n, respectively. pM." (.) and pM" (.) are thedl,~,~:etedistributions corr~sponding to the motion pattern pair,<Po IS the parameter matrix of the interaction potential learnedfrom the training data.

Thus, given an input labeled dataset V, the likelihoodfunction for relation H'"? is calculated by

1 let ICIp(D: rP~"') = ---:= '" '" q,nm( .. ) ,,"''' ( .. )Z(<p~",)lvl ~ ~ Q 1,) 'Yo 1,) (14)

where <f>~tn(m, n) is the entry (i, j) of the frequency matrixfor spatial relation R'?" learned from the training data setV, which counts the virtual times that the label pair (I" I .)appears in a training lattices at the spatial position rn and ;t,respectively. IVI is the total number of the volumes in thetraining set V.

With the spatio-temporal context modeling that captures theco-occurrence relations among adjacent motion patterns, wedefine a measurement that indicates the consistency between aspecific motion pattern and its spatial and temporal neighbors:

where (3 is the adjacent spatio-temporal position, which hasone of six relations relative to the current position m - spatialup, spatial down, spatial left, spatial right, spatial before,spatial after, temporal before or temporal after.

V. ANOMALY DETECTION

(I I)With the given discussions, we can now integrate the

static and the contextual measurements for detecting anomalymotions in a crowded scene. Using the spatio-temporal contextlearned from training data, we evaluate the confidence measurefor each usual motion pattern, simultaneously taking intoaccount the confidence Cbnsic that indicates the consistencywith the particular prototype, and the confidence Ccontc.r:tthat denotes the consistency of the spatio-temporal relationswith the neighbors in the scene. Specially, we use a weightedcombination of the two factors as follows:

( 16)

In this way, we finally use the summarized confidenceCtot"l to detect anomaly events from an input image sequencesof a crowded scene. When the total confidence is lower thana specific threshold, we consider the motion as an anomalyevent.

2206

Page 5: Anomaly Detection through Spatio-Temporal Context Modeling in …eprints.um.edu.my/13089/1/anomaly_detection_through... · 2015-03-23 · Anomaly Detection through Spatio-Temporal

TABLE III.. F THE PROPOSED METHOD BY BOTH GMM AND MRF, GMM ONLY, AND THE STCOG APPROACH r 161. IN

PERFORMANC~V~~~:~~~O:~ ~P DENOTE THE TRUE POSITIVE ANI) THE FALSE POSITIVE. RESPECTIVELY).

GMM WIth MRF GMM only STCOG [161

Outdoor Bui ding Gare

Crowded Scene Categories True Pos --raIse Pos True Pos Fa sc Pos TruePos .False Pos0.81% 0.19% 0.78% 0.22% 0.78% 0.22%0.80% 0.19% 0.72% 0.27% 0.7~% 0.27%Indoor Classroom

Outdoor Street 0.68% 0.:11% 0.59% 0.42% 0.58% 0.43%

~ 0,1 I

J" :'~ 04 I

"

.- _GMt.4W1!h~RF

6 GMt.4ontvSTCQC

0.1 02 0.3 04 05 01 07 08 09FaIstIPa.I, .. Ral.

(c)

Fig. 2. Outdoor abnormal detection (Bllilding Gale): (a) the ROC curves ofthe proposed method by both GMM and MRF. GMM only. and the STCOGapproach. (b) and (c) the detected abnormal events of walking' along a differentdirection.

VI. EXPERIMENTAL RESULTS AND DISCUSSIO S

We evaluate the performance of our anomaly detection onthe dataset [I 5l, which consists of over 25000 images with theresolution of 1024 x 768 pixels. The dataset contains samplesof three crowded environments: two outdoor scenes, namely,a (i) Building Gale scene, a (ii) Street scene, and an indoorClassroom scene. Each scene category contains several abnor-mal cases such as STopping, turning around. suddenly changingspeed and going backwards. For each abnormal event. weaveragely collect 250 images for testing. while leaving about1500 images for training. To evaluate the effectiveness of ouranomaly detection method, all the query sequences are hand-labeled and used as the ground truth. Note that [15l presenteda model for abnormal detection from depth videos, in whichcontext modeling is not considered.

ROC 01 dalilbaH SI'.~11

c.s

-&- GMM w,lh MRF

.0. GMMonIvSTCOO

0.5

01 0.2 0.3 04 0.5 06 0.7 01 o.gF.ke Po,"llveR.IC

(c)

Fig. 3. Outdoor abnormal detection (Street): (a) the ROC curves of theproposed method by both GMM and MRF. GMM only. and STCOG. (b) thedetected abnormal event of suddenly stopping. and (e) the detected abnormalevent of changing direction.

Table III shows the comparison results of three differentapproaches on the same dataset: the proposed algorithm withboth GMM and MRF modeling, the proposed algorithm withGMM only, and another approach STCOG [16]. STCOG isalso a block-level algorithm for detecting abnormal events, inwhich GMM is built on every two neighboring blocks. It canbe found that the performance of the proposed algorithm withGMM only is very similar to those of STCOG. However, aftercontext modeling using MRF. the average true positive rate ofour method is increased while the average false positive rateis decreased simultaneously.

Fig. 2 shows the ROC curves of the three methods andseveral detected abnormal event examples from an outdoorscene of Building Gate. The ROC curves in Fig. 2(a) illustratesthe effectiveness of anomaly detecting by combining GMM

2207

Page 6: Anomaly Detection through Spatio-Temporal Context Modeling in …eprints.um.edu.my/13089/1/anomaly_detection_through... · 2015-03-23 · Anomaly Detection through Spatio-Temporal

and MRF modeling in such a crowded scene. Fig. 2(b) andFig. 2(c) show two detected cases of the abnormal eventwalking along a different direction, respectively.

Fig. 3(a) further shows the ROC curves of the three meth-ods in another outdoor Street scene. Similarly, Fig. 3(b) andFig. 3(b) give two examples of the detected anomaly motions.The rest abnormal events, such as unexpected turning back,sudden speed changes, and walking along wrong directionscan all be successfully detected. Note that the ROC curves ofthe Street scene in Fig. 3(a) are lower than those in Fig. 2(a). Itis due to the fact that the motions in a Street scene is in generalmuch more complex than the latter. For instance, pedestriansand vehicles such as cars and bicycles may simultaneouslyexist in the same scene, making it difficult to clearly define thesteady motion patterns. Moreover, pedestrians become smallerin such a street scene, which potentially makes the detectionof their anomaly motions easily being disturbed.

Fig. 4 gives the comparisons using the image sequencesfrom an indoor Classroom scene. For such an indoor scene, wedefine the motions of entering the classroom as usual events.Generally, the accurate calculation of optical flows plays animportant role in our GMM and MRF integrated anomalydetection. We find there may exist errors during calculatingoptical flows in an indoor scene. potentially brought by thelow-quality of illumination or the easy confusion of sceneobjects and the scene background. Fig. 4(b) shows the correctlydetected anomaly motions together with a false positive whichis caused by inaccurate optical flow estimation.

ROC of dqlai>aM classroom

0.8

01

04 05 0.6 01 0,1 09fals, Pou",. Rale

(b)

Fig. 4. Indoor scene detection: (u) the ROC curves of the Classroom scenethe proposed method by both GMM and MRF. GMM only. and STCOG. and(b) the detected abnormal motion in the indoor Classroom scene.

VII. CONCLUSION

A novel statistical framework for modeling the intrinsicstructure of crowded scenes and detecting abnormal activitiesis presented in this paper. We propose an atomic motion pattern

)'

I

re~resentation using the Gaussian Mixture Model to distin-gUls.h the motions inside each block in a refined way. Usualmotion pattem~ can thus be defined as a certain type of steadymotion activines appeanng at specific scene positions. Wefurther use the Markov Random Field model to characterizethe joint label distributions over all the adjacent local motionpatterns inside the same crowded scene, aiming at modelingthe sever~l~ occluded situations in crowded scene accurately.By combllllll~ the determinations from the two stages, wepropose a weighted scheme to automatically detect anomalyevents from crowded scenes. The experimental results onseveral different outdoor and indoor crowded scenes illustratethe effectiveness of the proposed algorithm. In the future, wewill focus on incorporating stronger scene context informationto further improve the performance, and improve the robustcalculation of optical flows.

ACKNOWLEDGMENT

The work described in this paper was supported by the Nat-ural Science Foundation of China under Grant No. 61272218and No. 61321491, the 973 Program of China under Grant No.20 1OCB327903, and the Program for New Century ExcellentTalents under NCET-II-0232.

REFERENCES

III S. Ali and M. Shah, A lagrangian particle dynamics approach foi crowdflow segmentation and stability analysis. CVPR. pp. 1-6.2007.

[21 D. Kong. D. Gray and H. Tao, Counting pedestrians in crowds usingviewpoint invariant training, BMVC. pp. 1-6, 2005.

[31 S. Ali, and M. Shah, Flour fields for tracking in high dendity crowdscenes, ECCV. pp. 1-14,2008.

14] B,T. Morris, and M,M. Trivedi, A sun'ey of vision-based trajectorylearning and analysis for surveillance, IEEE Transactions on Circuitsand Systems for Video Tehcnology, vol. 18, no. 8, pp. I 114-1127, 2008.

[51 W. Hu, D. Xie and T Tan, A system for learning statistical motionpatterns. IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 2R, no. 9, pp. 1450-1464,2004.

l6J T. Zhao. and R. Ncvatia, Tracking multiple humans in crowded environ-mellt. CYPR. pp. 406-413, 2004.

[7J W.M. Hu, X..I. Xiao, Z.Y. Fu, D. Xie, TN. Tie, and S. Maybank, Asystem for learning statistical motion patterns, IEEE Trans.on PatternAnalysis and Machine Intelligence, vol. 28. no. 9, pp, 1450-1464.2006.

[81 E.L. Andrade, S. Blunsden and R.B. Fisher, Modelling crowd scenes forevent detection, ICPR. pp. 175-178, 2006.

[9J A. Adam. E. Rivlin, I. Shimshoni, and D. Reinitz, Robust real-timeunusual event detection using multiple fixed-location monitors. lEEETrans. on Pattern Analysis and Machine Intelligence. vol. 30. no. 3, pp.555-560, 2008,

[101 V. Mahadevan, W.X. Li. Y. Bhalodia. and N. Vasconcelos, Anomalydetection ill crowded scenes, CVPR. pp. 1975-1981. 20 IO.

T Xiang. and S. Gong, Video behaviour profiling and abnormalitydetection without manual labelling, ICCV, pp. 1238-1245,2005.

B. Zhao. F.F. Li, and E.P. Xing, Online detection of unusual events invideos via dynamic sparse coding, CVPR, pp. 33t3-3320, 2011.

L. Kratz, and K. Nishino. Alloma/v Detection in Extremelv CrowdedScenes Using Spmiu-Tempural Motioll Pattern Models, C:VPR. pp.1446-1453,2009.C. Galleguillos. A. Rabinovich, and S. Belongie. Object categorization

using co-o('('urrPIJ{'{', location and appearancr, CVPR. pp. 1-8, 200ftX.L. Ma, T. Lu, F.M. Xu, and F. Su, Anomaly Detection with Spatia-

Temporal Context Using Depth Images. ICPR, pp. 259() ..2593, 2012.

Y. Shi, Y. Gao, and R. Wang, Real-time almonnal evelll detection incomplicated scenes, ICPR. pp. 3653-3656,2010.

[ II]

[12J

1131

[14]

[15J

I t61

2208