Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research...
-
date post
22-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research...
IntelIntel Research Research
Interactive Event Detection in Video and Audio
Rahul Sukthankar
Intel Research Pittsburgh &Carnegie Mellon University
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Contributors
• Diamond team: L. Huston, Satya, L. Mummert, C. Helfrich, L. Fix
• Forensic video retrieval:J. Campbell, P. Pillai, Diamond team
• Volumetric video analysis: Y. Ke, M. Hebert
• Sound object detection in soundtracks:D. Hoiem, Y. Ke
• Interactive search-assisted diagnosis for breast cancer:Y. Liu, R. Jin, B. Zheng, D. Jukic
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Why Interactive Event Detection?
• Events of interest are often not known a priori– Data exploration: “find me more things like this”
• User’s requirements change based on partial results– Surveillance: “Alert me if you see X… hmm… actually I want Y”
• Challenges:– Limited training data
• can we still learn good event detectors?
– Efficiency• how best to organize/index/pre-process the data?
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Outline
• Event detection in audio– sound object detection from a few examples
• Diamond– efficient search of non-indexed data
• Event detection in video– forensic video surveillance– volumetric analysis for action detection
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Example: Sound Object Detection
• Applications of sound object detection– “Alert me if you hear a gunshot.” (monitoring)
– “Fast forward to the next swordfight in LotR” (search and retrieval)
• Approach:– Learn boosted classifier from ~5-10 examples of the object
– Scan windowed classifier over all possible locations
Audio stream
…
Clip 1
Clip N
Clip Classifier
Classify each clip as object or non-object
Return locations of detected sound object
[D. Hoiem, Y. Ke, R. Sukthankar, ICASSP 2005]
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Sound Object Detection: Clip Classifier
• Feature extraction
• Weak classifier – small decision trees on features
• Learn classifier cascade using Adaboost …
138 Features
Decision nodes
Leaf Nodes
[D. Hoiem, Y. Ke, R. Sukthankar, ICASSP 2005]
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Sound Object Detection: Results
Best Performance
WorstPerformance
stage 1 stage 2 stage 3
pos neg pos neg pos neg
meow 0.0% 1.4% 0.0% 1.2% 2.2% 0.8%
phone 0.0% 0.4% 4.3% 0.1% 5.9% 0.0%
car horn 0.0% 3.9% 0.6% 2.2% 3.6% 1.3%
door bell 1.4% 2.1% 2.1% 0.4% 6.3% 0.1%
swords 6.1% 1.3% 6.7% 0.1% 6.7% 0.0%
scream 0.3% 5.5% 2.7% 1.4% 5.3% 1.1%
dog bark 0.7% 1.0% 6.0% 0.3% 7.7% 0.2%
laser gun 0.0% 6.8% 4.4% 5.1% 6.7% 0.9%
explosion 4.1% 5.2% 7.5% 1.5% 12.0% 0.5%
light saber 4.8% 6.8% 9.7% 1.0% 13.9% 0.2%
gunshot 8.1% 6.1% 12.5% 2.3% 14.5% 1.1%
close door 7.9% 7.8% 14.5% 4.8% 17.6% 2.3%
male laugh 4.3% 14.7% 9.5% 9.7% 13.3% 7.0%
average 2.9% 4.4% 6.0% 2.2% 8.5% 1.1%
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Framework for Interactive Event Detection
• Interactive event detection =?= non-indexed search
• Search and indexing:– If queries can be predicted in advance, indexing is possible
(e.g., Google for text data)– Alternative is brute-force search through non-indexed data
• How to perform efficient non-indexed search?– May need to execute arbitrary code (learned event detector)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Brute-Force Search
• Event detection: vast majority of the data is useless• BFS scales poorly with storage volume
discard
results
Search app Storage
query
User
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Diamond: Early Discard
• Reject as close to storage as possible• Reduce volume of data transferred• Scales much better!
Search app Storage
query
User
results
late discard
query’
early discard
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
As
so
c D
MA
Storage Runtime
Searchlet
Filter API
Diamond code (open)
App Code (proprietary or open)
Diamond API (open)
Storage access protocol (open)
As
so
c D
MA
Storage Runtime
Searchlet
Filter API
As
so
c D
MA
Storage Runtime
Searchlet
Filter API
Diamond is a collaborative projectbetween Intel Research & CMU
SearchApplication H
ost
ru
nti
me
Sea
rch
let
AP
I
Ass
oc
DM
A
Lin
ux
Diamond Architecture
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Anatomy of a Diamond Searchlet• Sequence of partially-ordered “filters”
– each filter can pass or drop an object– filters share state through attributes
• Diamond determines an optimal filter order
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Example Application: Forensic Video Surveillance
• Timely reconstruction of a crime scene – large quantities of video surveillance data– current practice: gather & manually scan video tapes– obvious optimization: transfer data to central site
• Better solution: send your detector to the data
cam
Host
Appcam
cam
cam
cam
[J. Campbell et al., VSSN 2004]
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Video Action Detection: Goal
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Idea: Treat Video as a Volume
TX
Y
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Related work: Recognition usingSVMs on Space-Time Interest Points
Space-time interest points
Figures courtesy: [Schuldt et al., ICPR 2004]
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Problem with Space-Time Interest Points:Too Sparse
Two examples of smooth motions where no stable space-time interest points are detected.
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Volumetric Features on Optical Flow
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Our Features: 3D Extension of Viola-Jones
TX
Y
TX
Yab
cd
efgh
Volumetric features Integral Volume
(x, y, t)
TX
Y
Volumetric features can be efficiently computed using integral volumes, with only 8 memory accesses per feature. The sum of the volume ise – a – f – g + b + c + h – d.
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Classifier cascade learned usingDirect Feature Selection, Wu et al., NIPS, 2002
An example of the features learned by the classifier to recognize the hand-wave action in a detection volume
Millions of potential features for selection, so Adaboost is too slow.
TX
Y
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Detection
• Use a sliding volume over video sequence• Model true event as a cluster of detections with
Gaussian distribution.
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Generic Volumetric Features
• Processing non-indexed video is slow – lots of data• Are there application-independent representations for video?• Goal: pre-process video once, support multiple video event apps.
[Y. Ke, unpublished 2006]
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Related work:Space-Time Behavior Based Correlation
Figures courtesy: [Shechtman & Irani, CVPR 2005]
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Interactive Search-Assisted Diagnosis
suspiciousmass (query)
Rank1: benignbiopsy
Rank2: benignbiopsy
Rank3: malignantbiopsy
ISAD Results
Collaborators:Collaborators:B. Zheng, D. Jukic, L. Yang, R. JinB. Zheng, D. Jukic, L. Yang, R. Jin
CLOSE?
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Query-adaptive Local Distance Learning
• Previously:– Various Lp norms: Euclidean distance is typically not the best– Global metric learning:
• Learn metric that best satisfies user-given pairwise data constraints
• Fares poorly with multimodal data
– Local metric learning:• Learn metric that does above, but weighs nearby constraints higher
• Chicken & egg problem
• What’s new:– Learn a metric for the given query based on neighborhood
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Summary
• Many real applications require interactive event detection• Good for ML algorithms that:
– operate with limited training data – train quickly/incrementally– exploit unlabeled data
• Diamond – infrastructure for efficient non-indexed searchhttp://diamond.cs.cmu.edu/
• Interactive event detection in video is still painful– Good general-purpose representation for event detection?