MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music
-
Upload
multimediaeval -
Category
Software
-
view
271 -
download
0
Transcript of MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music
![Page 1: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/1.jpg)
1
“Emotion in Music” organizers endeavor at Crowdsourcing task:A Multimodal Approach to Drop Detection in Electronic
Dance Music
Anna Aljanaki2*, Mohammad Soleymani1*, Frans Wiering2, Remco C. Veltkamp2
1University of Geneva, Switzerland2Utrecht University, Netherlands
* Equal technical contributions
![Page 2: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/2.jpg)
2
Problem definition
• Given an electronic music excerpt, its timed comments, and labels from MTurk automatically identify whether the excerpt fully or partially contains a drop.
![Page 3: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/3.jpg)
3
Material
• 15 second excerpts with timed comments including the term “drop”
• MPEG Layer 3 files• Metadata including the comments• Labels from the crowd• 164 excerpts with full agreement (105: full
drop; 4: partial drop; 54: no drop)• 70 excerpts with no agreement
![Page 4: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/4.jpg)
4
Solutions
• Labels from crowdsourcing 1. Majority vote (MV)2. Dawid-Skene (DS)
• Labels from crowdsourcing + comments3. Naïve Bayesian classifier
• Labels from crowdsourcing + content4. Logistic regression
![Page 5: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/5.jpg)
5
Using labels (wisdom of crowd)
Aashish Sheshadri and Matthew Lease. SQUARE: A Benchmark for Research on Computing Crowd Consensus. In Proceedings of the 1st AAAI Conference on Human Computation (HCOMP), 2013
![Page 6: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/6.jpg)
6
Solution 1: Majority vote
• 3 labels each• Calculate the majority• If there is no agreement then the estimated
label is 2 (partial drop)
![Page 7: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/7.jpg)
7
Solution 2: Dawid-Skene• Dawid and Skene proposed a method to combine a
number of uncertain decisions (clinician-patient) (1979)• The method is to calculate the confusion matrices for
every labeler using Expectation-Maximization to get estimates of these values (probabilities); initialized by majority vote.
• We then look at the probability of true response given a label from a given worker for all the three workers and pick the highest one.
• Get-Another-Label toolbox https://github.com/ipeirotis/Get-Another-Label
![Page 8: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/8.jpg)
8
Solution 3: Majority Vote + comments
• For the excerpts with full or partial agreement we do not touch the MV labels
• For the remaining 70 excerpts– Features: • labels from workers• Number of times comments contain the term “drop” (We
did not normalize by the number of comments; it was a mistake!)
– Naïve Bayesian classifier trained on the samples with partial or full agreement
![Page 9: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/9.jpg)
9
Solution 4: MV + acoustic (1)
• Again only the samples with no agreement were changed.
• Trained on the samples with full agreement (164 samples)
• Assumption: there is a moment of silence or quieter segment right after drop
• Energy from 100ms segments extracted and smoothed
![Page 10: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/10.jpg)
10
Solution 4: MV + acoustic (2)
![Page 11: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/11.jpg)
11
Solution 4: MV + acoustic (3)
• Features:– The value of the biggest local minimum in an excerpt – The fraction of the biggest minimum to an average
minimum– The number of potential drop events, as detected by
decrease in loudness bigger than threshold – The dynamic range of the excerpt
• Logistic regression for binary classification (we did not consider class 2 due to not having enough samples)
![Page 12: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/12.jpg)
12
ResultsRun Method F1-score Full drop (1) Part. Drop (2) No drop (3)
1 Majority vote 0.69 0.72 0.31 0.752 Dawid-Skene 0.69 0.72 0.31 0.753 MV + comments 0.7 0.73 0.28 0.764 MV + acoustic 0.71 0.72 0.27 0.79
No significant improvement compared to majority vote
![Page 13: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/13.jpg)
13
Lessons learned
• In the presence of non-malicious workers and enough labels majority vote is very hard to beat
• The scarcity of the samples from the second class reduces our performance
• In future a separate development set and evaluation set will be beneficial
![Page 14: MediaEval 2014: A Multimodal Approach to Drop Detection in Electronic Dance Music](https://reader036.fdocuments.in/reader036/viewer/2022070512/588b2f4a1a28abed688b7213/html5/thumbnails/14.jpg)
14
Summary
• We primarily used the labels from MTurk since we believed it will be superior
• We proposed possible approaches taking advantage of the metadata and content when MV is indecisive
• As expected, we did not beat the majority vote