Timbre and Modulation Features for Music Genre/Mood Classification

24
Timbre and Modulation Timbre and Modulation Features for Features for Music Genre/Mood Music Genre/Mood Classification Classification J.-S. Roger Jang & Jia-Min Ren J.-S. Roger Jang & Jia-Min Ren Multimedia Information Retrieval Multimedia Information Retrieval Lab Lab Dept. of CSIE, National Taiwan Dept. of CSIE, National Taiwan University University

description

Timbre and Modulation Features for Music Genre/Mood Classification. J.-S. Roger Jang & Jia -Min Ren Multimedia Information Retrieval Lab Dept. of CSIE, National Taiwan University. Outline. Audio features and modulation spectral analysis MIREX 2011 method and its improvement - PowerPoint PPT Presentation

Transcript of Timbre and Modulation Features for Music Genre/Mood Classification

Page 1: Timbre and Modulation Features for Music Genre/Mood Classification

Timbre and Modulation Features forTimbre and Modulation Features forMusic Genre/Mood ClassificationMusic Genre/Mood Classification

J.-S. Roger Jang & Jia-Min RenJ.-S. Roger Jang & Jia-Min RenMultimedia Information Retrieval LabMultimedia Information Retrieval LabDept. of CSIE, National Taiwan UniversityDept. of CSIE, National Taiwan University

Page 2: Timbre and Modulation Features for Music Genre/Mood Classification

2/40

Outline Audio features and modulation spectral analysis MIREX 2011 method and its improvement Experimental setup and results Conclusions and future work

Page 3: Timbre and Modulation Features for Music Genre/Mood Classification

3/40

Introduction – music genres/moods

*pictures from www.playonradio.com, brainpickings.org & mpac.ee.ntu.edu.tw

Descriptions of music contents

Page 4: Timbre and Modulation Features for Music Genre/Mood Classification

4/40

Motivation Rapid growth of digital music

Apple iTunes: 28 million songs; 7digital: 20 million tracks Organization of large collections of audio music

Important but challenging Manual labeling by tags: labor intensive/time consuming

Thus, machine learning for classification is called for!

Feature Extraction

Music clipsfor training

Classifier Training

Feature Extraction

Evaluation

Short-term: MFCC, OSCLong-term: beat, tempo, pitch

KNN, GMM, SVM

Classifiers

ResultMusic clipfor test

Feature Extraction

Evaluation Result

Page 5: Timbre and Modulation Features for Music Genre/Mood Classification

5/40

System overview

Frame-based timbre feature extraction and

summarization

Long-term modulation-based feature extraction

Music clips for training

...

SVMs training

SVMs

Concatenation

Feature extraction

Classification

Result

Feature extraction

Training stage

Test stageMusic clip for testing

Page 6: Timbre and Modulation Features for Music Genre/Mood Classification

6/40

Performance evaluation Dataset-dependent criteria for evaluation

GTZAN 10-fold cross-validation

ISMIR2004Genre Holdout test, same as the one used in ISMIR 2004 Genre

Classification Contest, with 729 clips for training and 729 clips for test

Page 7: Timbre and Modulation Features for Music Genre/Mood Classification

7/40

Audio features – short-term timbre features Statistical spectrum descriptors (SSD)

Spectral centroid (SC) Spectral flux (SF) Spectral rolloff (SR), Spectral skewness (SS) Spectral kurtosis (SK).

MFCC To model the subjective frequency contents of audio signals 21-dim (including energy)

Page 8: Timbre and Modulation Features for Music Genre/Mood Classification

8/40

Audio features – short-term timbre features Spectral contrast & valley (SCV)

Measure spectral contrast/valley in octave-based subbands

Valley: non-harmonic/noise

audio frame

FFT

For each subband, compute peak/valleyby averaging values in the larger/smaller percentage of spectra ( )

contrast=peak-valley:relative distribution20%

Peak: harmonic 8 frequencysubbands:1: [0,100)2: [100,200)3: [200,400)4: [400,800)5: [800,1600)6: [1600,3200)7: [3200,6400)8: [6400,11025]

Page 9: Timbre and Modulation Features for Music Genre/Mood Classification

10/40

Audio features – short-term timbre features Spectral flatness measure (SFM)

Measures the noisiness of spectra within a subband

≈1: similar amount of power is distributed in all spectral bands ≈0: spectral power is concatenated in a relative small # bands

Spectral crest measure (SCM)

,1

,1

( )1

aa

a

NNa ii

Na ii

a

BSFM a

BN

,1,...,

,1

max( )

1a

a

a ii N

Na ii

a

BSCM a

BN

, :a iB the i-th magnitude spectrum in the a-th subband

:aN # of spectra in the a-th subband

Page 10: Timbre and Modulation Features for Music Genre/Mood Classification

11/40

Audio features – short-term timbre features For each feature dimension, we compute its mean and

standard deviation. Total dimensions for short-term timbre features

2*(5+21+16+16)=116

SSD MFCC SCV SFM/SCM

Frame-based features

Mean & std

Octave-based subbands

Page 11: Timbre and Modulation Features for Music Genre/Mood Classification

12/40

Modulation spectral analysis MFCC, SC, SFM/SCM

Capture only short-time spectral properties of audio signals Modulation spectral analysis

Captures long-term spectral dynamics within audio signals Computes spectrogram, then creates modulation spectrogram

(by applying FFT again along time axis of spectrogram) Low/high modulation frequency slow/fast spectral change

FFT

Page 12: Timbre and Modulation Features for Music Genre/Mood Classification

13/40

Modulation spectral analysis of timbre features Flowchart

MSP/MSV: the strength of rhythm in music

7 modulation freq. subbands:[0,0.33), [0.33,0.66),[0.66,1.32),[1.32,2.64),[2.64,5.28),[5.28,10.56),[10.56, 21.03)

The same process is applied to MFCC, SFM/SCM.

(MSC: modulationSpectralcontrast)

1. OSC extraction (hop size 23 ms)

music clip

2. Segmentation(256 frames ≈ 6 sec)

3. FFT (along feature dim)

4. Average

5. Modulation frequency

decomposition

129 bins

16 dim

modulation frequency(129 bins)

windows

...

7. Mean/std computation(along feature & subband dims)

92-dim feature vector(=16*2+7*2+16*2+7*2)

feature

dim (16)

256 frames

...

texture windows window

- =

16 dim

...

...

...

16 dim

7 subbands

6. Modulation spectral peak/valley (MSP/MSV)

computation

...

MSP

16 dim

7 dim

MSV

16 dim

7 dim

MSC

16 dim

7 dim

MSV MSC

Page 13: Timbre and Modulation Features for Music Genre/Mood Classification

14/40

Modulation spectral analysis of timbre features Reference

C.-H. Lee, J.-L. Shih, K.-M. Yu, and H.-S. Lin, “Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features,” IEEE Trans. Multimedia, vol. 11, no. 4, pp.670-682, June 2009.

Page 14: Timbre and Modulation Features for Music Genre/Mood Classification

15/40

Proposed joint acoustic frequency and modulation frequency features Motivation

Averaging and mean/std computation smooth out MD info. Computation of joint frequency features (proposed)

Compute modulation spectrogram from an entire music clip Compute SCV (spectral contrast/valley), SFM/SCM (spectral

flatness/crest measure) within each joint acoustic-modulation (AM) frequency subband AMSCV, AMSFM/AMSCM

FFT

ComputeAMSCVAMSFMAMSCM

Page 15: Timbre and Modulation Features for Music Genre/Mood Classification

16/40

Audio features used in our study All possible audio features

Extract SSD, MFCC, SCV, and SFM/SCM from audio frames mean/std computation MuStd MuStd dim=2*(5+21+16+16)=116

Perform modulation spectral analysis on MFCC, OSC, SFM/SCM MMFCC dim=2*(21*2+7*2)=112 MSCV dim=2*(16*2+7*2)=92 MSFM/MSCM dim=2*(16*2+7*2)=92

Compute SCV, SFM/SCM within acoustic-modulation (AM) frequency subbands AMSCV, AMSFM/AMSCM AMSCV 8*7*2=112 AMSFM/AMSCM dim = 8*7*2=112

Page 16: Timbre and Modulation Features for Music Genre/Mood Classification

17/40

Audio feature sets and classifier Audio feature sets

MIREX 2011 method MuStd+MMFCC+MSCV+MSFM/MSCM

dim=116+112+92+92=412

Improved method MuStd+MMFCC+AMSCV+AMSFM/AMSCM

dim=116+112+112+112=452

Classifier construction with RBF kernel SVMs Three-fold inside cross-validation to tune hyper-parameters

Page 17: Timbre and Modulation Features for Music Genre/Mood Classification

18/40

30

40

50

60

70

Acc

ura

cy (

%)

Genre classification

WR1

TCCP4SSKS2 JR1

SSPK1WR2

TCCP3JR2

ES2ES1

DM1GDC2

EP2GKC4

30

40

50

60

70

Acc

ura

cy (

%)

Mood classification

JR1

TCCP4WR1

TCCP3SSKS2

ES2SSPK1

WR2ES1 JR2

DM4DM1

GDC1EP2

GDC2GKC4

Experimental setup and results of MIREX 2011 genre/mood classification tasks Datasets

Genre classification: 10 genres, 700 30-sec clips in each one Mood classification: 5 categories, 120 30-sec clips in each one

Evaluation metric Three-fold cross-validation; classification accuracy

Results (JR1 is ours)

Page 18: Timbre and Modulation Features for Music Genre/Mood Classification

19/40

Experimental results of MIREX 2008-2012 genre/mood classification tasks

ParticipationsClassification Task(Year)

Accuracy(%)

Rank(# of Submissions)

Wu and Jang Genre (2013) 76.23 1 (13)

Wu and Jang Genre (2012) 76.13 1 (16)

Wu and Ren Genre (2011) 75.57 1 (15)

Our submission Genre (2011) 74.23 4 (15)

Seyerlehner et al. Genre (2010) 73.64 1 (24)

Cao and Li Genre (2009) 73.33 1 (31)

Tzametalis Genre (2008) 67.83 1 (13)

Wu and Jang Mood (2013) 68.33 1 (23)

Panda and Paiva Mood (2012) 67.83 1 (20)

Our submission Mood (2011) 69.50 1 (17)

Wang et al. Mood (2010) 64.17 1 (36)

Cao and Li Mood (2009) 65.67 1 (33)

Peeters Mood (2008) 63.67 1 (13)

Page 19: Timbre and Modulation Features for Music Genre/Mood Classification

20/40

Extended experiments Four datasets

Performance evaluation Randomly stratified 10-fold cross-validation (repeating 10

times) Repeat the above process 10 times to obtain the average

result

Dataset Category Class # Min/Max # of clips in classes

Total # of clips

Duration of each clip

GTZAN Genre 10 100/100 1,000 30sUnique Genre 14 26/766 3,115 ~30sSoundtracks Mood 6 30/30 180 18s to 30sMIR-Mood Mood 4 464/619 2,223 ~30s or ~60s

Page 20: Timbre and Modulation Features for Music Genre/Mood Classification

21/40

Extended experiments Averaged classification accuracy (%) of combining

different feature sets on four datasets

Page 21: Timbre and Modulation Features for Music Genre/Mood Classification

22/40

Extended experiments Comparison of our methods with other recent work

Page 22: Timbre and Modulation Features for Music Genre/Mood Classification

23/40

Conclusions Timbre & modulation features

Won 1st place (MIREX 2011 mood classification) Timbre & improved modulation

Improves 2.47%/2.08% on GTZAN/Unique Achieves 2.50%/0.14% higher than MIREX 2011 method on

Soundtracks/MIR-Mood

Page 23: Timbre and Modulation Features for Music Genre/Mood Classification

24/40

Thank you for listening.Questions & comment welcome!

Page 24: Timbre and Modulation Features for Music Genre/Mood Classification

25/40