Drum transcription in polyphonic music using non...

1
Drum transcription in polyphonic music using non-negative matrix factorization Arnaud Moreau 2 Arthur Flexer 1,2 1 Institute of Medical Cybernetics and Artificial Intelligence Center for Brain Research, Medical University of Vienna, Austria 2 The Austrian Research Institute for Artificial Intelligence Freyung 6/6, A-1010 Vienna, Austria Introduction I Prerequisite for genre classification or beat/meter detection I Transcription more difficult in polyphonic music I Source separation based system I Extension of work by Helen and Virtanen from drum/non-drum classification and separation to full polyphonic drum transciption Overview feature extraction [ X ] f ,t [ A ] c,t SVM classification peak picking input signal NMF separation STFT [ X ] f ,t feature extraction [ S ] f ,c [ A ] c,t NMF separation STFT [ X ] f ,t drum samples transcription I Input audio is divided into 5 sec excerpts I Magnitude spectrogram representation (window size 2048, hop size 512) I Non-negative matrix factorisation (NMF) algorithm gives source-spectra and time-varying gains of c components I c components classified by Support Vector Machine (SVM) I Peak-picking algorithm Results and Discussion The algorithm is evaluated on 60 sec excerpts from 4 multi-channel recordings, which are labelled manually, containing a total number of 1019 drum onsets. Song 1, 242 onsets Song 3, 206 onsets BD SD HH mean BD SD HH mean R p % 88.66 54.93 41.03 60.85 36.84 50.68 81.77 56.43 R r % 93.33 63.64 98.97 85.31 20.51 88.24 99.25 69.33 R h % 78.89 5.45 -43.30 13.68 -10.26 -17.65 74.44 15.51 Song 2, 224 onsets Song 4, 347 onsets R p % 33.33 69.57 34.76 45.89 80.00 31.25 76.63 62.63 R r % 13.75 69.05 93.33 58.71 50.00 6.33 63.24 39.85 R h % -11.25 35.71 -135.00 -36.85 37.50 -7.59 42.16 24.02 I Most errors are already made at the classification stage which harms the subsequent drum transcription I Results not comparable - no publicly available data set I Remaining research questions (among others): I What is the optimal feature subset? I What are the optimal thresholds for peak-picking? Acknowledgement Helmut Sch ¨ onleitner of the cultural center AKKU (http://www.akku-steyr.at) provided the multichannel recordings that have been used to evaluate our algorithm. The Austrian Research Institute for Artificial Intelligence acknowledges support from the ministries BMUKK and BMVIT. System Features spectral features temporal features spectral centroid temporal centroid spectral kurtosis temporal kurtosis spectral skewness temporal skewness spectral rolloff crest factor spectral flatness peak time spectral contrast peak fluctuation noise likeness percussiveness standard deviation periodicity 10 MFCCs 20 dynamic MFCCs (mean+std) 20 dynamic ΔMFCCs (mean+std) The NMF algorithm One short-time spectrum vector x(t ) is modelled as a sum of c components, each having a constant spectrum S and time-varying gain A(t ) x(t ) c X i =1 S i A i (t ) or X SA. The components are estimated using the update rules S = S. * A T (X./SA) A T 1 and A = A. * (X./SA)S T 1S T . This is a suitable representation for drum instruments, because their spectra don’t change over time. The classifier I One SVM for classes drum/non-drum, 2580 feature vectors I One SVM for classes BD, SD, HH, 3145 feature vectors I Implemented in WEKA (www.cs.waikato.ac.nz/ml/weka/) I Trainingdata: ENST-Drums (perso.enst.fr/ ˜ gillet/ ENST-drums/) and various drum samples I Crossvalidation results inside training set: 86.28% and 92.94% Selected References S. Dixon. Onset detection revisited. In Proc. of the DAFx, pages 133–137, Montreal, Quebec, Canada, Sept. 18–20, 2006. O. Gillet and G. Richard. Enst-drums: an extensive audio-visual database for drum signals processing. In Proceedings of the 7th International Conference on Music Information Retrieval, pages 156–159, Victoria, BC, Canada, October 2006. M. Helen and T. Virtanen. Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine. In Proc. of the 13th EUSIPCO, Antalya, Turkey, September 2005. D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, pages 556–562, 2000. J. Paulus and T. Virtanen. Drum transcription with non-negative spectrogram factorisation. In Proc. of the 13th EUSIPCO, Antalya, Turkey, September 2005. K. Tanghe, S. Degroeve, and B. De Baets. An algorithm for detecting and labeling drum events in polyphonic music. In Proc. of the first MIREX, London, UK, September 11-15 2005. C. Uhle, C. Dittmar, and T. Sporer. Extraction of drum tracks from polyphonic music using independent subspace analysis. In Proc. of the 4th ICA, Nara, Japan, April 2003. [email protected] arthur.fl[email protected]

Transcript of Drum transcription in polyphonic music using non...

Page 1: Drum transcription in polyphonic music using non …ismir2007.ismir.net/posters/ISMIR2007_p353_moreau_poster.pdf · Drum transcription in polyphonic music using non-negative matrix

Drum transcription in polyphonic music usingnon-negative matrix factorization

Arnaud Moreau2 Arthur Flexer1,2

1Institute of Medical Cybernetics and Artificial IntelligenceCenter for Brain Research, Medical University of Vienna, Austria

2The Austrian Research Institute for Artificial IntelligenceFreyung 6/6, A-1010 Vienna, Austria

Introduction

I Prerequisite for genre classification or beat/meter detectionI Transcription more difficult in polyphonic musicI Source separation based systemI Extension of work by Helen and Virtanen from

drum/non-drum classification and separation to fullpolyphonic drum transciption

Overview

featureextraction

[X ] f , t [A]c , t

SVMclassification

peak picking

input signal

NMFseparation

STFT

[X ] f , t

featureextraction

[S ] f , c [A]c , t

NMFseparation

STFT

[X ] f , t

drum samples

transcription

I Input audio is divided into 5 sec excerptsI Magnitude spectrogram representation (window size 2048,

hop size 512)I Non-negative matrix factorisation (NMF) algorithm gives

source-spectra and time-varying gains of c componentsI c components classified by Support Vector Machine (SVM)I Peak-picking algorithm

Results and Discussion

The algorithm is evaluated on 60 sec excerpts from 4multi-channel recordings, which are labelled manually,containing a total number of 1019 drum onsets.

Song 1, 242 onsets Song 3, 206 onsetsBD SD HH mean BD SD HH mean

Rp% 88.66 54.93 41.03 60.85 36.84 50.68 81.77 56.43Rr% 93.33 63.64 98.97 85.31 20.51 88.24 99.25 69.33Rh% 78.89 5.45 −43.30 13.68 −10.26 −17.65 74.44 15.51Song 2, 224 onsets Song 4, 347 onsetsRp% 33.33 69.57 34.76 45.89 80.00 31.25 76.63 62.63Rr% 13.75 69.05 93.33 58.71 50.00 6.33 63.24 39.85Rh% −11.25 35.71 −135.00 −36.85 37.50 −7.59 42.16 24.02

I Most errors are already made at the classification stagewhich harms the subsequent drum transcription

I Results not comparable - no publicly available data setI Remaining research questions (among others):

I What is the optimal feature subset?I What are the optimal thresholds for peak-picking?

Acknowledgement

Helmut Schonleitner of the cultural center AKKU (http://www.akku-steyr.at)provided the multichannel recordings that have been used to evaluate our algorithm.The Austrian Research Institute for Artificial Intelligence acknowledges support fromthe ministries BMUKK and BMVIT.

System

Features

spectral features temporal featuresspectral centroid temporal centroidspectral kurtosis temporal kurtosisspectral skewness temporal skewnessspectral rolloff crest factorspectral flatness peak timespectral contrast peak fluctuationnoise likeness percussivenessstandard deviation periodicity10 MFCCs20 dynamic MFCCs (mean+std)20 dynamic ∆MFCCs (mean+std)

The NMF algorithm

One short-time spectrum vector x(t) ismodelled as a sum of c components,each having a constant spectrum S andtime-varying gain A(t)

x(t) ≈c∑

i=1SiAi(t) or X ≈SA.

The components are estimated using theupdate rules

S = S. ∗AT (X./SA)

AT1and

A = A. ∗(X./SA)ST

1ST.

This is a suitable representation for druminstruments, because their spectra don’tchange over time.

The classifier

I One SVM for classes drum/non-drum,2580 feature vectors

I One SVM for classes BD, SD, HH, 3145feature vectors

I Implemented in WEKA(www.cs.waikato.ac.nz/ml/weka/)

I Trainingdata: ENST-Drums(perso.enst.fr/˜gillet/ENST-drums/) and various drumsamples

I Crossvalidation results inside training set:86.28% and 92.94%

Selected References

S. Dixon.Onset detection revisited.In Proc. of the DAFx, pages 133–137, Montreal, Quebec, Canada, Sept.18–20, 2006.

O. Gillet and G. Richard.Enst-drums: an extensive audio-visual database for drum signalsprocessing.In Proceedings of the 7th International Conference on Music InformationRetrieval, pages 156–159, Victoria, BC, Canada, October 2006.

M. Helen and T. Virtanen.Separation of drums from polyphonic music using non-negative matrixfactorization and support vector machine.In Proc. of the 13th EUSIPCO, Antalya, Turkey, September 2005.

D. D. Lee and H. S. Seung.Algorithms for non-negative matrix factorization.In NIPS, pages 556–562, 2000.

J. Paulus and T. Virtanen.Drum transcription with non-negative spectrogram factorisation.In Proc. of the 13th EUSIPCO, Antalya, Turkey, September 2005.

K. Tanghe, S. Degroeve, and B. De Baets.An algorithm for detecting and labeling drum events in polyphonic music.In Proc. of the first MIREX, London, UK, September 11-15 2005.

C. Uhle, C. Dittmar, and T. Sporer.Extraction of drum tracks from polyphonic music using independentsubspace analysis.In Proc. of the 4th ICA, Nara, Japan, April 2003.

[email protected] [email protected]