Sound Detection
description
Transcript of Sound Detection
![Page 1: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/1.jpg)
Sound Detection
Derek Hoiem
Rahul Sukthankar (mentor)
August 24, 2004
![Page 2: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/2.jpg)
Objective
Learn model of sound object from few (10-20) examples and distinguish from all other sounds
Examples of sound classes: Gunshots, screams, laughter, car horns, meow, dog
bark, etc
![Page 3: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/3.jpg)
Applications
“Tell me if you hear a gunshot.” (monitoring)
“Get me video clips containing dogs barking.” (search and retrieval)
“What’s going on?” (scene understanding)
![Page 4: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/4.jpg)
Why its difficult
Sound classes have large variations
Sounds are often ambiguous without context
Overlaid “noise” obscures sound
![Page 5: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/5.jpg)
Sound or not?
Car horn
Laser gun
Dog bark
Which of these sounds are not from their named classes?
![Page 6: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/6.jpg)
Previous work
Sound Classification (Wold 1996, Casey 2001, etc) Categorize short sound clips Reasonable accuracy (5-20% error)
Sound Detection (Defaux 2000, Piamsa-nga 1999) Localize and recognize sound objects in long clips Poor performance or assumption of unrealistic
conditions (e.g., very quiet background)
![Page 7: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/7.jpg)
Detection via Windowed Search
Long Track
…
Clip 1
Clip 2
Clip N
Break audio track into short overlapping short clips
Clip Classifier
Independently classify short clips as object or non-object
Return locations of detected sound object
![Page 8: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/8.jpg)
Representation
meows
phone rings
Raw RepresentationTime-frequency analysis: windowed Fourier transform
Extract power percentage in each band over time and total power over time
Features
Features
Features
Features
Compute features used for classification
![Page 9: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/9.jpg)
Classification Features
Diverse feature set:Different sound classes are distinctive
in different waysmeans and standard deviations of
power at different frequenciesBand-width, peaks, loudness, etc.138 features in all
![Page 10: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/10.jpg)
Classification by Decision Trees Try to find simple rules that discriminate object
from non-object Each decision is based on a threshold of a
feature value Assign confidence based on likelihood of data
for object and non-object classes at each leaf node
Decision nodes
Leaf Nodes
![Page 11: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/11.jpg)
Boosted Trees
Problem: One decision tree by itself may not be a great classifier
Solution: Use several trees, with each one focusing on the mistakes of previously learned trees
Adaboost: Weight training data uniformly Learn a decision tree classifier on weighted data Re-weight data giving more weight to incorrectly
classified examples Final classification based on linear combination of
confidences from all learned decision trees
![Page 12: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/12.jpg)
Examples of Decision Trees
Low percentage of power in low frequencies in
mid-time of sound
Very high power amplitude range
Meow Gunshot
High power amplitude range
More complex tree that
focuses on examples
misclassified by tree above
Gunshot
![Page 13: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/13.jpg)
Cascade of Classifiers
Goal: eliminate false positives with few false negatives in early stages
Advantages: Allows use of large set of negative training examples Improves classification speed
Dangers: cannot recover from false negatives
Stage 1Sound Clip Stage 2 Stage 3 Pass
Fail
Pass (5%) Pass (2%) Pass (0.005%)
Fail Fail Fail
![Page 14: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/14.jpg)
Results: Classification Error
Average Error vs Stages in Cascade
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
8.0%
9.0%
10.0%
stage 1 stage 2 stage 3
pos error
neg error
Best Performance
WorstPerformance
stage 1 stage 2 stages 3
pos neg pos neg pos neg
meow 0.0% 1.4% 0.0% 1.2% 2.2% 0.8%
phone 0.0% 0.4% 4.3% 0.1% 5.9% 0.0%
car horn 0.0% 3.9% 0.6% 2.2% 3.6% 1.3%
door bell 1.4% 2.1% 2.1% 0.4% 6.3% 0.1%
swords 6.1% 1.3% 6.7% 0.1% 6.7% 0.0%
scream 0.3% 5.5% 2.7% 1.4% 5.3% 1.1%
dog bark 0.7% 1.0% 6.0% 0.3% 7.7% 0.2%
laser gun 0.0% 6.8% 4.4% 5.1% 6.7% 0.9%
explosion 4.1% 5.2% 7.5% 1.5% 12.0% 0.5%
light saber 4.8% 6.8% 9.7% 1.0% 13.9% 0.2%
gunshot 8.1% 6.1% 12.5% 2.3% 14.5% 1.1%
close door 7.9% 7.8% 14.5% 4.8% 17.6% 2.3%
male laugh 4.3% 14.7% 9.5% 9.7% 13.3% 7.0%
average 2.9% 4.4% 6.0% 2.2% 8.5% 1.1%
![Page 15: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/15.jpg)
Results: ROC curves
Note: to approximate negative error rate divide FP by 25,000
![Page 16: Sound Detection](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681527d550346895dc0a980/html5/thumbnails/16.jpg)
Results: Anecdotal
Gunshots Female Laugh Male Laugh
Swords Scream