Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research...

26
Intel Intel Research Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research...

Page 1: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research

Interactive Event Detection in Video and Audio

Rahul Sukthankar

Intel Research Pittsburgh &Carnegie Mellon University

Page 2: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Contributors

• Diamond team: L. Huston, Satya, L. Mummert, C. Helfrich, L. Fix

• Forensic video retrieval:J. Campbell, P. Pillai, Diamond team

• Volumetric video analysis: Y. Ke, M. Hebert

• Sound object detection in soundtracks:D. Hoiem, Y. Ke

• Interactive search-assisted diagnosis for breast cancer:Y. Liu, R. Jin, B. Zheng, D. Jukic

Page 3: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Why Interactive Event Detection?

• Events of interest are often not known a priori– Data exploration: “find me more things like this”

• User’s requirements change based on partial results– Surveillance: “Alert me if you see X… hmm… actually I want Y”

• Challenges:– Limited training data

• can we still learn good event detectors?

– Efficiency• how best to organize/index/pre-process the data?

Page 4: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Outline

• Event detection in audio– sound object detection from a few examples

• Diamond– efficient search of non-indexed data

• Event detection in video– forensic video surveillance– volumetric analysis for action detection

Page 5: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Example: Sound Object Detection

• Applications of sound object detection– “Alert me if you hear a gunshot.” (monitoring)

– “Fast forward to the next swordfight in LotR” (search and retrieval)

• Approach:– Learn boosted classifier from ~5-10 examples of the object

– Scan windowed classifier over all possible locations

Audio stream

Clip 1

Clip N

Clip Classifier

Classify each clip as object or non-object

Return locations of detected sound object

[D. Hoiem, Y. Ke, R. Sukthankar, ICASSP 2005]

Page 6: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Sound Object Detection: Clip Classifier

• Feature extraction

• Weak classifier – small decision trees on features

• Learn classifier cascade using Adaboost …

138 Features

Decision nodes

Leaf Nodes

[D. Hoiem, Y. Ke, R. Sukthankar, ICASSP 2005]

Page 7: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Sound Object Detection: Results

Best Performance

WorstPerformance

  stage 1 stage 2 stage 3

  pos neg pos neg pos neg

meow 0.0% 1.4% 0.0% 1.2% 2.2% 0.8%

phone 0.0% 0.4% 4.3% 0.1% 5.9% 0.0%

car horn 0.0% 3.9% 0.6% 2.2% 3.6% 1.3%

door bell 1.4% 2.1% 2.1% 0.4% 6.3% 0.1%

swords 6.1% 1.3% 6.7% 0.1% 6.7% 0.0%

scream 0.3% 5.5% 2.7% 1.4% 5.3% 1.1%

dog bark 0.7% 1.0% 6.0% 0.3% 7.7% 0.2%

laser gun 0.0% 6.8% 4.4% 5.1% 6.7% 0.9%

explosion 4.1% 5.2% 7.5% 1.5% 12.0% 0.5%

light saber 4.8% 6.8% 9.7% 1.0% 13.9% 0.2%

gunshot 8.1% 6.1% 12.5% 2.3% 14.5% 1.1%

close door 7.9% 7.8% 14.5% 4.8% 17.6% 2.3%

male laugh 4.3% 14.7% 9.5% 9.7% 13.3% 7.0%

average 2.9% 4.4% 6.0% 2.2% 8.5% 1.1%

Page 8: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Framework for Interactive Event Detection

• Interactive event detection =?= non-indexed search

• Search and indexing:– If queries can be predicted in advance, indexing is possible

(e.g., Google for text data)– Alternative is brute-force search through non-indexed data

• How to perform efficient non-indexed search?– May need to execute arbitrary code (learned event detector)

Page 9: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Brute-Force Search

• Event detection: vast majority of the data is useless• BFS scales poorly with storage volume

discard

results

Search app Storage

query

User

Page 10: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Diamond: Early Discard

• Reject as close to storage as possible• Reduce volume of data transferred• Scales much better!

Search app Storage

query

User

results

late discard

query’

early discard

Page 11: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

As

so

c D

MA

Storage Runtime

Searchlet

Filter API

Diamond code (open)

App Code (proprietary or open)

Diamond API (open)

Storage access protocol (open)

As

so

c D

MA

Storage Runtime

Searchlet

Filter API

As

so

c D

MA

Storage Runtime

Searchlet

Filter API

Diamond is a collaborative projectbetween Intel Research & CMU

SearchApplication H

ost

ru

nti

me

Sea

rch

let

AP

I

Ass

oc

DM

A

Lin

ux

Diamond Architecture

Page 12: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Anatomy of a Diamond Searchlet• Sequence of partially-ordered “filters”

– each filter can pass or drop an object– filters share state through attributes

• Diamond determines an optimal filter order

Page 13: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Example Application: Forensic Video Surveillance

• Timely reconstruction of a crime scene – large quantities of video surveillance data– current practice: gather & manually scan video tapes– obvious optimization: transfer data to central site

• Better solution: send your detector to the data

cam

Host

Appcam

cam

cam

cam

[J. Campbell et al., VSSN 2004]

Page 14: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Video Action Detection: Goal

Page 15: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Idea: Treat Video as a Volume

TX

Y

Page 16: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Related work: Recognition usingSVMs on Space-Time Interest Points

Space-time interest points

Figures courtesy: [Schuldt et al., ICPR 2004]

Page 17: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Problem with Space-Time Interest Points:Too Sparse

Two examples of smooth motions where no stable space-time interest points are detected.

Page 18: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Volumetric Features on Optical Flow

Page 19: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Our Features: 3D Extension of Viola-Jones

TX

Y

TX

Yab

cd

efgh

Volumetric features Integral Volume

(x, y, t)

TX

Y

Volumetric features can be efficiently computed using integral volumes, with only 8 memory accesses per feature. The sum of the volume ise – a – f – g + b + c + h – d.

Page 20: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Classifier cascade learned usingDirect Feature Selection, Wu et al., NIPS, 2002

An example of the features learned by the classifier to recognize the hand-wave action in a detection volume

Millions of potential features for selection, so Adaboost is too slow.

TX

Y

Page 21: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Detection

• Use a sliding volume over video sequence• Model true event as a cluster of detections with

Gaussian distribution.

Page 22: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Generic Volumetric Features

• Processing non-indexed video is slow – lots of data• Are there application-independent representations for video?• Goal: pre-process video once, support multiple video event apps.

[Y. Ke, unpublished 2006]

Page 23: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Related work:Space-Time Behavior Based Correlation

Figures courtesy: [Shechtman & Irani, CVPR 2005]

Page 24: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Interactive Search-Assisted Diagnosis

suspiciousmass (query)

Rank1: benignbiopsy

Rank2: benignbiopsy

Rank3: malignantbiopsy

ISAD Results

Collaborators:Collaborators:B. Zheng, D. Jukic, L. Yang, R. JinB. Zheng, D. Jukic, L. Yang, R. Jin

CLOSE?

Page 25: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Query-adaptive Local Distance Learning

• Previously:– Various Lp norms: Euclidean distance is typically not the best– Global metric learning:

• Learn metric that best satisfies user-given pairwise data constraints

• Fares poorly with multimodal data

– Local metric learning:• Learn metric that does above, but weighs nearby constraints higher

• Chicken & egg problem

• What’s new:– Learn a metric for the given query based on neighborhood

Page 26: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.

IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop

Summary

• Many real applications require interactive event detection• Good for ML algorithms that:

– operate with limited training data – train quickly/incrementally– exploit unlabeled data

• Diamond – infrastructure for efficient non-indexed searchhttp://diamond.cs.cmu.edu/

• Interactive event detection in video is still painful– Good general-purpose representation for event detection?