[IEEE 2008 19th International Conference on Pattern Recognition (ICPR) - Tampa, FL, USA...
Transcript of [IEEE 2008 19th International Conference on Pattern Recognition (ICPR) - Tampa, FL, USA...
Learning Motion Patterns in Crowded Scenes Using Motion Flow Field
Min Hu, Saad Ali and Mubarak ShahComputer Vision Lab, University of Central Florida
{mhu,sali,shah}@eecs.ucf.edu
Abstract
Learning typical motion patterns or activities fromvideos of crowded scenes is an important visual surveil-lance problem. To detect typical motion patterns incrowded scenarios, we propose a new method whichutilizes the instantaneous motions of a video, i.e, themotion flow field, instead of long-term motion tracks.The motion flow field is a union of independent flowvectors computed in different frames. Detecting motionpatterns in this flow field can therefore be formulatedas a clustering problem of the motion flow fields, whereeach motion pattern consists of a group of flow vectorsparticipating in the same process or motion. We firstconstruct a directed neighborhood graph to measure thecloseness of flow vectors. A hierarchical agglomerativeclustering algorithm is applied to group flow vectorsinto desired motion patterns.
1. Introduction
Learning typical motion patterns of moving objectsin the scene from videos is an important visual surveil-lance task. A significant amount of effort has beenbeen put in this area. In [8], the system first tracksmoving objects, learns the motion patterns, and finallyuses these motion patterns for abnormal event detection.Johnson and Hogg [3] use neural networks to modelmotion paths from trajectories. In [2], trajectories areiteratively merged into a path. Wang et al. [5] proposeda trajectory similarity measure to cluster trajectories andthen learn the scene models from trajectory clusters.Porikli [1] represents trajectories in the HMM parame-ter space; the affinity matrices are then computed andeigenvalue decomposition is applied to find the clus-ters. Instead of using trajectories of individual objects,Vaswani et al. [6] model the motion of all the mov-ing objects performing the same activity by analyzingthe temporal deformation of the “shape” which is con-structed by joining the locations of objects at any time t.These methods are based on long-term motion tracks ofmoving objects and are applicable to the scenes where
moving objects are not densely crowded and motiontracks of objects are reliable and are readily available.All these methods require tracking algorithms to gen-erate long-term locations of individual objects. Littleattention has been paid to learning motion patterns incrowds where reliable tracks are harder to obtain.
To deal with this special scenario, instead of usinglong-term motion tracks of moving objects we proposea new method for learning typical motion patterns usingthe motion flow field. The motion flow field is a set offlow vectors representing the instantaneous motions ina video. Each flow vector is a four-dimensional vectorincluding the location and instantaneous velocity of apoint detected in a frame. We first use existing opticalflow method to compute flow vectors in each frame andthen combine them into a global motion field. Giventhe motion flow field, we detect the typical motion pat-terns by clustering flow vectors. Without loss of gen-erality, we assume each flow vector is associated withonly one motion pattern, i.e., no flow vector belongs totwo different motion patterns. Thus, each motion pat-tern is a cluster of flow vectors. To make the proposedmethod applicable to general motions or activities, wemake no assumption regarding the models of motionpatterns. To cluster flow vectors, we first need to an-swer what is a motion pattern. According to the gestalttheory of humans’ visual perception, the main factorsused in grouping are proximity, similarity, closure, sim-plicity and common fate (elements with same movingdirection are seen as a unit). According to these laws,a motion pattern should be smooth and the neighboringflow vectors should be similar and close. Note that evenflow vectors in the same motion pattern may be verydifferent. For example, the pattern of “U-turn” containsthe flow vectors whose velocity are opposite. We usethe neighborhood graph to measure the similarity andproximity of flow vectors. A summary of the proposedmethod is given in Fig. 1.
2. Motion Flow Field Generation
Given an input video, for each frame we computesparse optical flows (instantaneous velocities) on inter-
978-1-4244-2175-6/08/$25.00 ©2008 IEEE
(a) Input Video0
100
200
300
400
0
100
200
3001
2
3
4
5
6
xy
t
(b) Frame to Frame Flow Vectors
(c) Original Motion Flow Field (d) Reduced Motion Flow Field (e) Motion Patterns
Figure 1. The main steps involved in proposed motion pattern detection method. Given aninput video (a), frame-to-frame flow vectors (b), which are combined in a single global motionflow field (c). The original motion flow field (containing thousands of flow vectors) is reducedto the new motion flow field in (d) containing only hundreds of flow vectors. Finally typicalmotion patterns are detected (e) .
est points ([4]) or dense optical flows for all pixels ([7])in the image. The location Xi = (xi, yi) and the ve-locity Vi = (vxi
, vyi) of a point, i, in the image are
then combined into a vector pi = (Xi, Vi). After flowvectors in all frames are computed, we gather them intoa single global flow field. This flow field may containthousands of flow vectors, and, therefore, it is compu-tational expensive to obtain the shortest paths based onsuch a large number of data. These flow vectors alwayscontain lots of redundant information and noise. Weuse Gaussian ART (the unsupervised version of Gaus-sion ARTMAP see [9]) to reduce the number of flowvectors from thousands to hundreds. The reduced num-ber of flow vectors still maintain the geometric structureof the flow field without effecting detecting of motionpatterns. Fig. 1-(d) shows the reduction results of themotion flow field in Fig. 1-(e). The number of flowvectors were reduced from 1417 to 744.
3. Neighborhood Graph of the Flow Field
To exploit the proximity of flow vectors and the geo-metric structure of high-dimensional data, we construct
p
dp
q
(a)
qp p q
dp(p,q)
(b)
qp
p qdp(q,p)
(c)
Figure 2. Distances in different cases:(a) flow vectors p and q are on parallelcurves; (b,c) p and q are on the samecurve, and q follows p. (b) shows the dis-tance D(p, q) where (c) shows the distanceD(q, p).
a neighborhood graph of flow vectors in the flow field.The general neighborhood graph is an undirected graphwhere edges connect neighboring points. The distance
between a pair of points is then given by the ”short-est path”, i.e., geodesic distance, between them on theneighborhood graph. In the flow field, as described ear-lier each point is represented by a vector p = (X,V )where X = (x, y) is the location and V = (vx, vy) isthe velocity. To capture the spatial structure and direc-tion of the flow field, we extend the original undirectedneighborhood graph to the directed graph. The forwarddistance from flow vector p to q is defined as
D(p, q) = (dp(p, q)ds(p, q))2 (1)
The reason of using the square is given below. Thespatial distance dp(p, q) and the directional differenceds(p, q) are defined by two hypotheses (see Fig. 2):
1. p and q are on two parallel curves. In this case,
dp(p, q) = ‖Xp − Xq‖, (2)
ds(p, q) =
(
2
1 + ε + Vp · Vq
)2
, (3)
where ε = 10−6, V = V/‖V ‖.
2. p and q are on the same curve, and q follows p. Inthis case,
dp(p, q) = ‖Xp + Vp − Xq‖, (4)
ds(p, q) =2
1 + ε + cos θp
·2
1 + ε + cos θq
,(5)
cos θp = Vp · Xq − Xp, (6)
cos θq = Vq · Xq − Xp. (7)
The final distance is chosen as the minimum of the dis-tances in the above two hypotheses. After the distancematrix is computed, we find the neighbors among thevectors to reduce the computational burden in futuresteps. B is a neighbor of A if and only if D(A,B)equals the shortest path (obtained by Floyd’s algorithm)from A to B. Because we used the square in com-puting D(A,B), D(A,B) is not necessarily less thenD(A,C) + D(C,B). If the directional difference fac-tor were not included in D(A,B), it is easy to see thatwhen the angle ∠ACB > π/2, D(A,B) > D(A,C)+D(C,B), and B is not a neighbor of A, because C isroughly in the middle. Note that if B is a neighborof A, A is not necessarily a neighbor of B. If B isnot neighbor of A, the edge from A to B is removed,i.e. D(A,B) is set to infinity. This step removes manyedges for future considerations.
4. Detecting Motion Patterns
In the flow field, each motion pattern consists of agroup of flow vectors participating in the same process
(a) (b)
(c) (d)
Figure 3. Hierarchical clustering based onthe neighborhood graph: (a) one imageof the input video, different clustering re-sults: (b) 9 clusters, (c) 12 clusters, (d) 7clusters with removed noise.
or motion. Detecting motion patterns is therefore theproblem of clustering flow vectors. We use a hierar-chical agglomerative clustering algorithm, i.e, single-linkage clustering algorithm, to cluster flow vectors.The complete detection algorithm is given below.
According to the law of the common fate, flow vec-tors with the similar directions should be considered asa unit. This leads to the idea of iteratively groupingthe nearest neighbors into clusters using the agglomer-ative hierarchical clustering algorithms. One importantreason we use the hierarchical clustering algorithm isthat its computational complexity is polynomial in thenumber of edges in the neighborhood graph. Note thatalthough the distance matrix computed in the previousstep can identify the neighbors very efficiently and pro-duce accurate results, it does not necessarily serve as a
Algorithm 1: Motion Pattern Detection Algorithm.Input: a flow field F of a set of flow vectors
{p1, p2, ..., pN}, and the number of clusters n
Output: a set of motion patterns {M1, M2, ...}.Compute the distance matrix D according to (1).1
Compute the shortest path length L using Floyd’s2
method.for i← 1 to N do3
for j ← 1 to N do4
if D(i, j) > L(i, j) then5
D(i, j)←∞6
end7
end8
end9
Apply the hierarchical agglomerative clustering10
algorithm.
Algorithm 2: Hierarchical Agglomerative Cluster-ing Algorithm
Input: Distance matrix (adjacent matrix of theneighborhood graph or geodesic distance matrix)with size N ×N of a flow field, and n, thenumber of clusters.
Output: Output: n clusters {G1, G2, ..., Gn}.for m← N to n do1
Identify the two nearest points p, q and combine q2
into p.for i← 1 to N do3
D(p, i)← min{D(p, i), D(q, i)},4
D(i, p)← min{D(i, p), D(i, q)},5
D(q, i)←∞,6
D(i, q)←∞.7
end8
end9
good measure for our clustering algorithm. The reasonis that if both B and C are neighbors of A, we may tendto choose the one with similar direction to A regard-less of the spatial distance. Therefore, we provide theoption to modify the finite distances to the directionaldifference defined in (3).
The clustering algorithm works as follows. Ini-tially every vector is treated as a cluster. We use agreedy approach to merge the clusters efficiently: ineach step, we choose the shortest edge and merge thecorresponding vectors (say p and q) by setting the dis-tance D(p, i) to min{D(p, i), D(q, i)} and D(i, p) tomin{D(i, p), D(i, q) for each vector i and by removingthe edges connecting q (i.e. setting D(q, i) = D(i, q) =∞ for each i). When the number of clusters reaches aspecified number n, we stop merging. Since the noisyflow vectors may result into independent clusters, weneed to remove the clusters with very few vectors, andin our experiments we set n slightly higher than our de-sired number of clusters. To have a better control on thefinal number of clusters, we could start noise removalwhen the current number of clusters is close to the de-sired one. Fig. 3 shows the results with different num-bers (9, 12) of clusters and after noise removal. Fromthis figure, we can see that the clustering results are sim-ilar. The difference lies in some small groups (the sizeis less than 5) generated by increasing the number ofclusters. These small groups will be deleted by noiseremoval step. After noise removal, seven clusters aredetected (Fig. 3-(d)).
5. Experiments
We have tested the method on videos of some com-plex scenes. Fig. 4 shows the crossing on the Hong
(a)
(b) (c) (d)
Figure 4. Motion pattern detection in avideo of a street in Hong Kong. (a) Imagesof the input video (frame numbers are 1,100, 180 and 248 respectively from left toright); (b) Motion flow field; (c) Detectedmotion patterns compared to (d) manuallygenerated ground truth.
Kong street. In this video, people cross the road fromtwo opposite sides and intersect in the middle part. Seenfrom the detected motion flow field, the intersection oftheir flow vectors make the clustering problem hard. Byusing the directed neighborhood graph, these intersect-ing flow vectors are correctly distinguished and the cor-responding two major motion patterns are correctly de-tected. More results are shown in Fig. 5. The first rowshows the results of a traffic scene. The vehicles on thetwo left lanes and those on the two right lanes move inopposite directions. Our method detects seven motionpatterns. Four of them correspond to those in the groundtruth. One of three is caused by a short side road on theright. Two extra clusters on the top are falsely classi-fied due to the perspective distortions. The second rowshows results of another traffic scene. Three motion pat-terns corresponding to three lanes in are detected. Notethat when motions on two right lanes are merged to-gether it is impossible and unnecessary to distinguishthem. In the results shown in the third row, there aretwo typical motions. One is on the top of the image,people move in one direction in the building. In theother motion pattern, people move around a center inthe image. This results in some false classification. Theresults for a more complex crowded scene are shown inthe last row. People move in different directions.
To measure the performance of our method, we com-pare the the results of our method with the ground truthon the test data. The ground truth is manually generatedfrom the detected motion flow field. Table. 5 shows thecomparison results on the number of motion patterns,miss-classification error.
6. Conclusions
In this paper, we propose a new method for learningtypical motion patterns in challenging crowded scenes
(a) (b) (c) (d)
Figure 5. Motion pattern detection. (a) Input videos; (b) Motion flow field; (c) Detected motion patterns compared to (d)manually generated ground truth. Each row demonstrates the results for one video.
number of misclassificationvideo motion patterns rate among
detected ground truth flow vectorshigh-way 7 5 2.4%
3-way traffic 3 3 0Hajj005 3 2 2.4%old street 5 7 5.3%Hajj009 3 3 0
Hong Kong St 2 2 0.7%
Table 1. Comparison results between ourmethod and the ground truth.
using the motion flow field representing the instanta-neous motions in a video. The neighborhood graph isconstructed to measure the similarity and proximity offlow vectors. The agglomerative clustering algorithm isapplied to cluster flow vectors into motion patterns.
Acknowledgements: This research was fundedby the US Government VACE program.
References[1] F. M. Porikli, Trajectory Pattern Detection by HMM
Parameter Space Features and Eigenvector Clustering,ECCV, 2004.
[2] D. Makris and T. Ellis, Path Detection in Video Sequence,IVC, Vol. 30, 2002.
[3] N. Johnson et al., Learning the Distribution of ObjectTrajectories for Event Recognition, IVC, 14, 1996.
[4] B. D. Lucas and T. Kanade, An Iterative Image Regis-tration Technique with an Application to Stereo Vision,IJCAI, 1981.
[5] X. Wang et al., Learning Semantic Scene Models by Tra-jectory Analysis, ECCV, 2006.
[6] N. Vaswani et al., Activity Recognition Using the Dynam-ics of the Configuration of Interacting Objects, CVPR,2003.
[7] R. Gurka et al., Computation of Pressure Distribution Us-ing PIV Velocity Data, Workshop on Particle Image Ve-locimetry, 1999.
[8] W. E. L. Grimson et al., Using Adaptive Tracking to Clas-sify and Monitor Activities in a Site, CVPR, 1998.
[9] J. R. Williamson, Gaussian ARTMAP: A Neural Networkfor Fast Incremental Learning of Noisy MultidimensionalMaps, Neural Netw., 1996.