Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3....
Transcript of Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3....
Clap Detection - Lesser & Ellis 2005-03-22 p. /141
1. “Rhythm Therapy”2. Clap Range Estimation3. Experiments4. Conclusions
Clap Detection and Discrimination for Rhythm Therapy
Nathan Lesser & Dan EllisLaboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Engineering, Columbia University, NY USA
{nathan,dpwe}@ee.columbia.edu
Clap Detection - Lesser & Ellis 2005-03-22 p. /142
1. “Rhythm Therapy”• Rhythmic clapping may
help neural developmentsensori-motor planningfocus and attention
• “Interactive metronome”devicesgive feedback on synchronysensor-based
• Classroom deployment?acoustic-based?for multiple simultaneous users??
from interactivemetronome.com
Clap Detection - Lesser & Ellis 2005-03-22 p. /143
Clap Discrimination
• Scenario: Many students in same classroomeach clapping in time to their own laptopstudents wear headphones (but no sensor)computer hears neighbors
• Goal: Discriminate between ‘near-field’and ‘far-field’ claps‘near-field’ = ~1 meter, on-axis‘far-field’ = > 2 meters, maybe off-axis
Clap Detection - Lesser & Ellis 2005-03-22 p. /144
Data Collection
• Record isolated claps at various locationscan superimpose them later...
• Grid of seats:claps from locations 0..9record at locations 5 & 9 only
• Multiple roomspilot: 1 room, 2 x 5 claps/locationmain data: 2 (+2) rooms, 1 x 50 farfield claps/location + 300 nearfield claps/rec.loc. = 1500 claps/room
0
1 2 3
4 5 6
7 8 9
5
9
Classroom Plan View
Clappinglocations
Recordinglocations
Front of Room
Clap Detection - Lesser & Ellis 2005-03-22 p. /145
2. Clap Range Estimation
• Task:Discriminate claps from in front of rig from all others (more distant)main perceptual cue to distance (range):direct-to-reverberant ratio (DRR)how to differentiate direct and reverb?
• Novel problem: Acoustic range estimationdefine correlates of DRRexploit properties of claps (wideband, compact).. then just feed to classifier
Clap Detection - Lesser & Ellis 2005-03-22 p. /146
Clap Examples
• Absolute level varies
• Decay slopes ~ samereverberation(RT60 ~ 900ms)
• Initial burst for near-field“direct sound”
-0.5
-0.25
0
0.25
0.5Near-field (327MUDD nf50:4)
am
plit
ud
e
Far-field (327MUDD ff50:4)
-80
-60
-40
-20
0
en
erg
y (
4m
s)
/ d
B
time / s time / s
fre
q /
kH
x
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
2
4
6
8
10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Clap Detection - Lesser & Ellis 2005-03-22 p. /147
Processing
• Detection → Features → Classifier
!"#$#%& '()*++*, !
!"#$%&'&(&
)*#+&,%"%-"./0
-.+*/01.*,230
*45678
9:,*+:;5<=.2
12/0"&304
>=5/*,0?6.@0
A=.<;B=.2
&CDB=.<;B+5#6&784./
!"#$%&''(&
)#"%$/2.9#"./0
:%"&1%#"82%;
E5;7*0
4*./*,0;F0G6++
<2#.0.0$=<%;"
H)E4
IF*6/J,*0K*L/;,M0C0N;<*5
9*+/
;,0
9,6=.=.2
9,6=.=.2
9*+/
45670O.<=L*+
<2#.0.0$&>#?%*;
G;<*5
5%;8*";4,;++04;,,*56/=;.
Clap Detection - Lesser & Ellis 2005-03-22 p. /148
Clap Detection
• Simple transient detector limits feature calculation to ‘clap events’
• Adjust thresholdon Δ(Energy20ms) to get desired number of clapsknown for our data
• Backup from maxima to find precise onsetFielded system will need to adapt thresholdand reject non-claps
time / s
freq / k
Hz
Far-field (327MUDD ff50:1-5)
0
2
4
6
8
10
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
20
40
ratio / d
B
Clap Detection - Lesser & Ellis 2005-03-22 p. /149
Range Features
• Paper: Ctr. of Mass, Slope in 0..20 , 0..100ms
•• New: Slope in 0..20ms , 20..100ms
+ Energy Ratio 0..20ms / 20..100ms
-60
-40
-20
Near-field (327MUDD nf50:4)
dB
0 0.05 0.1 0.15
CoM20ms
-80
-60
-40
Far-field (327MUDD ff50:4)
0 0.05 0.1 0.15time / s
CoM100ms
slope20ms
slope100ms
Near-field (327MUDD nf50:4)
0 0.05 0.1 0.15
-60
-40
-20
energy ratio
dB
Far-field (327MUDD ff50:4)
0 0.05 0.1 0.15
-80
-60
-40
time / s
slope20:100ms
Clap Detection - Lesser & Ellis 2005-03-22 p. /1410
Range Feature Behavior
• Original 4 featuresgood separation except CoM20
• New featuresEratio excellentslope20:100 useless...
• Range estimation?CoM20, slope20 show promise
(each plot shows 4-8 kHz band vs. 2-4 kHz band)
0 2 4 6 80
2
4
6CoM20
0 0.5 1 1.5 20
1
2
3CoM100
-40 -20 0 20 40-40
-20
0
20slope20
-40 -30 -20 -10 0-60
-40
-20
0
20slope100
0 5 10 15 20 25-10
0
10
20
30
Eratio
4-8kHz band
2-4
kH
z b
an
d
-30 -20 -10-30
-20
-10
0
10
slope20:100
327MUDD loc 5
NF p6p4
p8 p9p7
p2
p0
p3p1
Clap Detection - Lesser & Ellis 2005-03-22 p. /1411
3. Experiments
• Build and test actual near/far-field classifier
• Feature experimentsquantitative feature comparisonbest combinations
• Data experimentstraining data: amount, locationstest data: same/different room/location
• Regularized Least-Squares Classifier (RLSC)find a hyperplane in (expanded) feature space~ simplified Support Vector Machine - no QP
Clap Detection - Lesser & Ellis 2005-03-22 p. /1412
Feature Comparisons
• Train on room 327Mudd; Test on 627Mudd
• Eratio alone (9/1500 = 0.6% errors) beatsbest combination of rest: (CoM20+ CoM100+ slo20 = 0.9% errors)
difference of ~0.5% required for signficance
CoM_20 CoM_100 slo_20 slo_100 slo_20:100 Eratio0
10
20
30
40
50Feature comparison: All 3 bands, train on all M327, test on all M627
cla
p e
rror
rate
/ %
feature set
Clap Detection - Lesser & Ellis 2005-03-22 p. /1413
Generalizing Location, Room
• Matrix of 2 rooms x 2 recording locations
627Mudd loc5 is hard data; 327Mudd loc9 is easy!Cross-room (shaded) cases generalize better !?Plenty of data: 5 claps/loc (20%) just as good
CER%Test
M627L5 M627L9 M327L5 M327L9
Train
M627L5 2.0 0.5 0.4 0.0
M627L9 3.7 0.4 0.7 0.0
M327L5 1.5 0.5 0.4 0.0
M327L9 0.1 0.7 0.4 0.0
Clap Detection - Lesser & Ellis 2005-03-22 p. /1414
4. Conclusions
• Discriminating isolated near- and far-field claps is feasible (use Eratio 0..20/20..100ms)
• Detection of candidate claps likely to limit accuracy in practicebut have ‘rhythmic’ expectations...
• Applicability to general range estimation?Eratio relies on short-duration direct-sound..but other sounds have clicks (e.g. speech bursts)CoM20, slope20 closer to proportional to range
Clap Detection - Lesser & Ellis 2005-03-22 p. /1415
Azimuth Features
• Cross-correlation of L and R for azimuth:
nearby locations distinguished - usefuldistant locations (p2) give random resultsneeds nonlinear feature space expansion!
-1.5 -1 -0.5 0 0.5 1 1.5-1.5
-1
-0.5
ITD scatter vs. source (for MUDD327 pos 5)
4-8kHz
2-4kHz
NF p6p4
p8 p9p7
p2
p0
p3p1
Clap Detection - Lesser & Ellis 2005-03-22 p. /1416
Error Analysis
• 627Mudd (record loc 5) is the tough set; look at classifier margins:
50 51 52 53 54 56 57 58 59 5n 5n 5n 5n 5n 5n 90 91 92 93 94 95 96 97 98 9n 9n 9n 9n 9n-2
-1
0
1
2
m4[14:16] vs f627
50 51 52 53 54 56 57 58 59 5n 5n 5n 5n 5n 5n 90 91 92 93 94 95 96 97 98 9n 9n 9n 9n 9n-2
-1
0
1
2
m6[14:16] vs f627
0 5 10 15 20 25 30 35 40 45 50-2
-1
0
1
2
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
-1
-0.5
0
0.5
1
Claps 33 and 34 from 627M:nf90
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2-40
-20
0
20
Time
Fre
qu
en
cy
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80
0.5
1
1.5
2
x 104
false accepts for loc 6
a few solidfalse rejects...
... really look like far-field???(ambiguous
Eratio)
01 2 34 5 67 8 9
Clap Detection - Lesser & Ellis 2005-03-22 p. /1417
Usefulness of Each Position
• Train on 50 near-field claps + 50 far-field claps from a single location:
all recorded at location 5‘behind’ (p7-p9) less usefulright-side (p3, p6) most useful !?
p0p1 p2 p3p4 p5 p6p7 p8 p9
p0 p1 p2 p3 p4 p6 p7 p8 p9 all0
1
2
3
4
5
6
7Location comparison (Erat ftrs): train M327L5 one loc, test on all M627L5
cla
p e
rror
rate
/ %
Far field training examples location