Tracking Pedestrians Using Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes
description
Transcript of Tracking Pedestrians Using Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes
Tracking Pedestrians Using Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes
Louis Kratz and Ko Nishino
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2012
OutlineMotivationIntroductionProposed methodExperimental resultsConclusion
MotivationGoal: tracking single or multiple pedestrians in crowd scenes
Solve conventional tracking problems- Occlusion problem- Pedestrians move in of different directions- Appearance change
Introduction(1)Observe a phenomenon
ObservationSmall area of instantaneous motions tend to repeat
- Temporal- Spatial
Introduction(2)Spatio-temporal motion pattern
- Describe crowd motion - Build a Spatial and temporal statistical model- Use to predict movement of individuals
Spatio-temporal motion pattern
t
y
x
Spatio-temporal motion pattern3D gradient vector:
Calculate the mean motion vector or build a statistical model at each cuboid
Introduction(3)Hidden Markov Model:
- States are not directly visible- Compromise of three components
① observation probabilities② transition probabilities③ initial probabilities
Introduction(4)Posterior distribution:
given confidence X find probability of parameters
Introduction(5)Particle filter: is a filter which can be used to predict next state
- different from kalman filter:Robust to non linear system and can handle non Gaussian noise
- Measurement:
Proposed method
Flow chart
(a) Divide the training video into spatio-temporal cuboids and calculate motion vectors, and then build statistical model for each motion patterns
(b) Train a collection of hidden Markov models (c) Use observed local motion patterns to predict the motion patterns at each location
(d) Use this predicted motion patterns to trace individuals
Step (a)-statistical model for motion patterns 1.First we calculate the motion vector at each pixel by 3D gradient vector
2.Next we build a statistical model by 3D Gaussian distribution
3. Define the local spatio-temporal pattern at location n and frame t
Step (b)-train hidden Markov models 1. By clustering algorithm, divide motion patterns into S clusters 2. Define states{s=1,…,S},and S is the number of clusters 3. For a specific hidden state s, the probability of an observed motion pattern is:
Calculate variance between two distributions
Step(c)- predict motion patterns Taking expected value of the predictive distribution:
Solve by forwards-backwards algorithm Reference: [23] L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,”Proc. IEEE,vol. 77, no. 2,pp. 257-286, Feb. 1989.
Step(d)-trace individuals Use particle filter maximize posterior distribution :
Compare to:
posterior likelihood priors
P(xf)
xf-1=[x,y,w,h]T in frame f-1
Figure present state vector xf-1 define a target window at frame f-1
Past and current measurement:
zf is the frame at time f
priors We use the motion pattern at the center of tracked target to estimate priors on the distribution of next state xf
Transition distributionP(xf|xf-1) is the transition distributionWe model by normal distribution:
is the 2D optical flow vector from predicted motion pattern [27] is the covariance matrix from predicted motion pattern distributionReference:
[27]J. Wright and R. Pless, “Analysis of Persistent Motion Patterns Using the 3D Structure Tensor,”Proc. IEEE Workshop Motion and Video Computing,pp. 14-19, 2005
Likelihood distribution
T: template of human objectR: region of bounding box at frame fZ: constant : variance respect to appearance change
Define distance measure:
ti: template gradient vector ri: region gradient vector M: number of pixels in templateIf distance large, likelihood small
If distance small, likelihood large
Add weight information to adjust appearance change
Error account to appearance change
- pixels from occlusion region have large angle between t and r thus error Ei large- When Ei large weight becomes small
Experimental resultsImplementation :
- Intel Xeon X5355 2.66GHz processor- 10 frames per seconds- cuboid size 10*10*10
Datasets
DatasetsFrom UCF Crowd data set300,350,300,120 frames respectively(a) train station’s concourse(b) ticket gate(c) sidewalk(d) intersection
Experiment 1 white indicate high error
error indicate little texture or
noisy area intersection scene due to small
amount amount of training data
Experiment 2
When occlusion enormous, variance of likelihood increase at frame 56,112,201
Experiment 3
Experiment 4
Errors cause by Innitial states not contain this direction
Experiment 5
Experiment 6
ConclusionWe proposed a efficient method for tracking individuals in crowded scenesWe solve the error caused by occlusion problem, appearance change, and different direction movement