Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event...
Transcript of Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event...
![Page 1: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/1.jpg)
Interactive Surveillance Event Detection
VIVA Research Lab
uOttawa:
Chris Whiten, Robert Laganière,
Ehsan Fazl-Ersi, Feng Shi
CBSA Science & Engineering Directorate:
Dmitry Gorodnichy, Jean-Philippe Bergeron,
Ehren Choy, David BissesserEcole Polytechnique Montreal:
Guillaume-Alexandre Bilodeau
![Page 2: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/2.jpg)
Background
• First participation to SED task
• Limited submission results
– Person-runs event detection
• Work in progress…
• uOttawa works on automatic the event detection part
• CBSA works on the interactive part
![Page 3: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/3.jpg)
Design objectives
• Problem of high relevance to CBSA
• To improve computational performance
• Traditional framework:
– to work with spatiotemporal features
• Feature detector
• Feature descriptor
– Bag of words
– SVM classifier
• Inspiration from MoSIFT (from CMU)
• Inspiration from recent fast image matching techniques
– Fast feature detector
– Binary descriptor
![Page 4: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/4.jpg)
Operational need
• Surveillance cameras are heavily used by CBSA (in particular, in Airports)
• Two modes of operation:
– Real-time: eg. to send a traveler to secondary examination
– Post-event: eg. evidence extraction
• In either mode, the decision - to trigger or not trigger alarm -needs to made within limited amount of time
![Page 5: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/5.jpg)
Machine-Human approach
• Current Video Analyticsalgorithms produce lot of false alarms
• Filtering such amount of false alarms requires efficient Visual Analytics tools (GUI) …
… that makes use of humans visual recognition power for fast processing of large quantities of data
![Page 6: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/6.jpg)
Event detection by Video Analytics
• Most Video Analytics approach are based on space-time points
• Historically, spatiotemporal descriptors have used gradient-based features (SIFT, Histogram of Oriented Gradients, etc..)
– Slow to detect/compute/match
– Difficult for the massive scale of surveillance data
• MoSIFT is a good example of such a space-time descriptor
![Page 7: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/7.jpg)
MoSIFT Approach
SIFT keypoints
Optical flow
MoSIFT
descriptors
Bag of WordsSVM
Frames
Event
![Page 8: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/8.jpg)
MoSIFT space-time descriptor
Image gradient Optical flow
HOG HOG
128-float vector 128-float vector
![Page 9: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/9.jpg)
Another approach
Bag of WordsSVM
Frames
Event
?
![Page 10: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/10.jpg)
Our Approach
Keypoints
Image Difference
Binary
descriptors
Bag of WordsSVM
Frames
Event
![Page 11: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/11.jpg)
Extracting space-time descriptors
• We elect to use the recently proposed FREAK descriptor
– Represents local keypoint with a binary string
– Efficient to detect/compute/match
• The bytes in the FREAK descriptor follow a coarse-to-fine ordering
First 16 bytes correspond to
a human’s peripheral vision
Remaining 48 bytes encode
finer details
![Page 12: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/12.jpg)
FREAK descriptor
intensity
comparisons
512 bits
![Page 13: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/13.jpg)
Extracting space-time descriptors
• At frame t, we compute the difference image between frame t and t – 5
– Implicitly encode motion in the difference image
– Avoid costly optical flow computations
![Page 14: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/14.jpg)
Extracting space-time descriptors
• For event recognition, we want to learn the action, not the actor
– Avoid “finer detail” bytes
• We choose to keep only the first 8 bytes of the FREAK descriptor
– Compact
– Efficient
– Encodes action in a more generic way
• 64 bits
![Page 15: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/15.jpg)
MoFREAK Approach
FREAK
descriptors
Bag of WordsSVM
Frames
Event
Keypoints
Image Difference
![Page 16: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/16.jpg)
Bag of Words in Hamming space
• We work with a binary descriptor
– which allows us to avoid Euclidean distance
– and instead use more efficient Hamming distance
• In addition
– We use random clusters
– Perform as well as K-means
![Page 17: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/17.jpg)
Automated Event Detection
• Each bag-of-words feature is fed into an SVM
– The SVM uses the histogram intersection kernel
• Each classified BOW feature returns a float
• The set of all classifications gives a distribution with many peaks and valleys
• Sufficiently large local maxima = event
![Page 18: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/18.jpg)
Manually Filtering False Positives
• The event detection system yields many false positives
– Requires human feedback to know which detected events are legitimate
• Visual analytics system:
– Events are presented in order of SVM response
– to allow a user to efficiently navigate detected events to identify false/true positive events.
CBSA
VAP
platform
![Page 19: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/19.jpg)
VAP Browser interface
• Using this visual analytics platform, a human operator is able to process over 600 detected events in a 25 minute time-window (24 events per minute)
![Page 20: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/20.jpg)
TRECVID submission
• We submit the results for the person-run event
– Events were detected using MoFREAK approach
– Events were filtered using VAP browser
– 15 events were extracted
![Page 21: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/21.jpg)
Person-runs Detections
http://www.site.uottawa.ca/~laganier/video/runs.avi
![Page 22: Interactive Surveillance Event Detection - NIST · 2012. 12. 7. · Interactive Surveillance Event Detection VIVA Research Lab uOttawa: ... Machine-Human approach • Current Video](https://reader033.fdocuments.in/reader033/viewer/2022060902/609ef0b5bb05f159d82d510d/html5/thumbnails/22.jpg)
Conclusion
• Using recent advances in binary descriptors, rather than gradient-based descriptors, makes processing surveillance footage much more feasible
– Currently 3 times faster
• Machine-human approach should however prevail:
Video Analytic component allows to detect alarms automatically
Visual Analytic interface is critical for efficient filtering of false alarms.