Sound Detection

16
Sound Detection Derek Hoiem Rahul Sukthankar (mentor) August 24, 2004

description

Sound Detection. Derek Hoiem Rahul Sukthankar (mentor) August 24, 2004. Objective. Learn model of sound object from few (10-20) examples and distinguish from all other sounds Examples of sound classes: Gunshots, screams, laughter, car horns, meow, dog bark, etc. Applications. - PowerPoint PPT Presentation

Transcript of Sound Detection

Page 1: Sound Detection

Sound Detection

Derek Hoiem

Rahul Sukthankar (mentor)

August 24, 2004

Page 2: Sound Detection

Objective

Learn model of sound object from few (10-20) examples and distinguish from all other sounds

Examples of sound classes: Gunshots, screams, laughter, car horns, meow, dog

bark, etc

Page 3: Sound Detection

Applications

“Tell me if you hear a gunshot.” (monitoring)

“Get me video clips containing dogs barking.” (search and retrieval)

“What’s going on?” (scene understanding)

Page 4: Sound Detection

Why its difficult

Sound classes have large variations

Sounds are often ambiguous without context

Overlaid “noise” obscures sound

Page 5: Sound Detection

Sound or not?

Car horn

Laser gun

Dog bark

Which of these sounds are not from their named classes?

Page 6: Sound Detection

Previous work

Sound Classification (Wold 1996, Casey 2001, etc) Categorize short sound clips Reasonable accuracy (5-20% error)

Sound Detection (Defaux 2000, Piamsa-nga 1999) Localize and recognize sound objects in long clips Poor performance or assumption of unrealistic

conditions (e.g., very quiet background)

Page 7: Sound Detection

Detection via Windowed Search

Long Track

Clip 1

Clip 2

Clip N

Break audio track into short overlapping short clips

Clip Classifier

Independently classify short clips as object or non-object

Return locations of detected sound object

Page 8: Sound Detection

Representation

meows

phone rings

Raw RepresentationTime-frequency analysis: windowed Fourier transform

Extract power percentage in each band over time and total power over time

Features

Features

Features

Features

Compute features used for classification

Page 9: Sound Detection

Classification Features

Diverse feature set:Different sound classes are distinctive

in different waysmeans and standard deviations of

power at different frequenciesBand-width, peaks, loudness, etc.138 features in all

Page 10: Sound Detection

Classification by Decision Trees Try to find simple rules that discriminate object

from non-object Each decision is based on a threshold of a

feature value Assign confidence based on likelihood of data

for object and non-object classes at each leaf node

Decision nodes

Leaf Nodes

Page 11: Sound Detection

Boosted Trees

Problem: One decision tree by itself may not be a great classifier

Solution: Use several trees, with each one focusing on the mistakes of previously learned trees

Adaboost: Weight training data uniformly Learn a decision tree classifier on weighted data Re-weight data giving more weight to incorrectly

classified examples Final classification based on linear combination of

confidences from all learned decision trees

Page 12: Sound Detection

Examples of Decision Trees

Low percentage of power in low frequencies in

mid-time of sound

Very high power amplitude range

Meow Gunshot

High power amplitude range

More complex tree that

focuses on examples

misclassified by tree above

Gunshot

Page 13: Sound Detection

Cascade of Classifiers

Goal: eliminate false positives with few false negatives in early stages

Advantages: Allows use of large set of negative training examples Improves classification speed

Dangers: cannot recover from false negatives

Stage 1Sound Clip Stage 2 Stage 3 Pass

Fail

Pass (5%) Pass (2%) Pass (0.005%)

Fail Fail Fail

Page 14: Sound Detection

Results: Classification Error

Average Error vs Stages in Cascade

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

8.0%

9.0%

10.0%

stage 1 stage 2 stage 3

pos error

neg error

Best Performance

WorstPerformance

  stage 1 stage 2 stages 3

  pos neg pos neg pos neg

meow 0.0% 1.4% 0.0% 1.2% 2.2% 0.8%

phone 0.0% 0.4% 4.3% 0.1% 5.9% 0.0%

car horn 0.0% 3.9% 0.6% 2.2% 3.6% 1.3%

door bell 1.4% 2.1% 2.1% 0.4% 6.3% 0.1%

swords 6.1% 1.3% 6.7% 0.1% 6.7% 0.0%

scream 0.3% 5.5% 2.7% 1.4% 5.3% 1.1%

dog bark 0.7% 1.0% 6.0% 0.3% 7.7% 0.2%

laser gun 0.0% 6.8% 4.4% 5.1% 6.7% 0.9%

explosion 4.1% 5.2% 7.5% 1.5% 12.0% 0.5%

light saber 4.8% 6.8% 9.7% 1.0% 13.9% 0.2%

gunshot 8.1% 6.1% 12.5% 2.3% 14.5% 1.1%

close door 7.9% 7.8% 14.5% 4.8% 17.6% 2.3%

male laugh 4.3% 14.7% 9.5% 9.7% 13.3% 7.0%

average 2.9% 4.4% 6.0% 2.2% 8.5% 1.1%

Page 15: Sound Detection

Results: ROC curves

Note: to approximate negative error rate divide FP by 25,000

Page 16: Sound Detection

Results: Anecdotal

Gunshots Female Laugh Male Laugh

Swords Scream