Music Genre Classificationcs229.stanford.edu/proj2018/poster/21.pdfMusic Genre Classi cation Derek...

Music Genre ClassificationDerek Huang, Eli Pugh, Arianna Serafini

Stanford University

We used the GTZAN genre collection dataset, which features1000 samples of raw 30s data. However, since this raw audiowas sampled at 22050HZ, we could reasonably use 2 secondsof data at most to keep our feature space relatively small(44100 features). To compromise, we augmented our data byrandomly sampling four 2-second windows to produce 8000samples. While this dataset has its flaws, its widespread usemakes it easy to compare our work across the field.

Data Processing

I Initially ran our models on our raw audio data(amplitudes), which take the form of 44100 lengtharrays, but found that preliminary accuracy was lowerthan hoped for in all models.

I Decided to use mel-spectrograms, which are timevs. mel-scaled frequency graphs. Similar to short-timeFourier transform representations, but frequency bins arescaled non-linearly in order to more closely mirror howthe human ear perceives sound.

I We chose 64 mel-bins and a window length of 512samples with an overlap of 50% between windows. Wethen move to log-scaling based on previous academicsuccess. Used the Librosa library – see examples below.

Motivation

Genre classification is an important task with many real world applications. As the quantity ofmusic being released on a daily basis continues to sky-rocket, especially on internet platforms suchas Soundcloud and Spotify, the need for accurate meta-data required for database management andsearch/storage purposes climbs in proportion. Being able to instantly classify songs in any givenplaylist or library by genre is an important functionality for any music streaming/purchasing service,and the capacity for statistical analysis that music labeling provides is essentially limitless.

Models

I Support Vector Machine:For the sake of computational efficiency, we first perform PCA on our data to reduce our featurespace to 15 dimensions. Then we create an SVM model with an RBF kernel. This models offersus a baseline accuracy with which to compare our more complicated deep-learning models.

I K-Nearest Neighbors:We first perform PCA to reduce our feature space to 15 dimensions. We use k = 10 anddistance weighting. Computation is deferred until prediction time.

I Feed-forward Neural Network:Our standard feed-forward neural network contains six fully-connected layers, each using ReLUactivation. We use softmax output with cross-entropy loss, and Adam optimization.

I Convolutional Neural Network:As before, we use Adam optimization and ReLU activation. Structure is as illustrated below.

Convolutional layer:zk,l =

∑nj=1

∑mi=1 θi ,j xi+ks,j+ls

Loss function:CE = −

∑x∈X y(x) log y(x)

Results

The confusion matrix to the right visualizesresults from our CNN.

Model Accuracy: Train CV Test

Support Vector Machine .97 .60 .60K-Nearest Neighbors 1.00 .52 .54

Feed-forward Neural Network .96 .55 .54Convolution Neural Network .95 .84 .82

Discussion

For this project, we used traditional machine learning meth-ods as well as more advanced deep learning methods. Whilethe more complex models took far longer to train, they pro-vided significantly more accuracy. In real world application,however, the cost/benefit of this tradeoff needs to be ana-lyzed more closely.We also noticed that log-transformed mel-spectrograms pro-vided much better results than raw amplitude data. Whereasamplitude only provides information on intensity, or how“loud” a sound is, the frequency distribution over time pro-vides information on the content of the sound. Additionally,mel-spectrograms are visual, and CNNs work better with pic-tures.

Future Work

While we are generally happy with the performance of ourmodels, especially the CNN, there are always more modelsto test out – given that this is time series data, some sortof RNN model may work well (GRU, LSTM, for example).We are also curious about generative aspects of this project,including some sort of genre conversion (in the same veinas generative adversarial networks which repaint photos inthe style of Van Gogh, but for specifically for music). Ad-ditionally, we suspect that we may have opportunities fortransfer learning, for example in classifying music by artist orby decade.

References

Mingwen Dong. Convolutional Neural Network AchievesHuman-level Accuracy in Music Genre Classification.CoRR, Feb 2018, http://arxiv.org/abs/1802.09697

Bob L. Sturm.The GTZAN dataset: Its contents, itsfaults, their effects on evaluation, and its future use.CoRR, Jun 2013, http://arxiv.org/abs/1306.1461

Piotr Kozakowski & Bartosz Michalak.Music GenreRecognition. Oct 2016,http://deepsound.io/music genre recognition.html

Fall 2018 CS229 Poster Session Emails: huangda@stanford.edu, epugh@stanford.edu, aserafini@stanford.edu

Music Genre Classificationcs229.stanford.edu/proj2018/poster/21.pdfMusic Genre Classi cation Derek...

Documents

Transcript of Music Genre Classificationcs229.stanford.edu/proj2018/poster/21.pdfMusic Genre Classi cation Derek...

Design, Prototyping & Construction - UVic.cawebhome.cs.uvic.ca/~gtzan/seng310/lectures/design.pdf · Different kinds of prototyping are used for different purposes and at different

Multi-Genre Research Projectelizabethhowlett.weebly.com/uploads/1/3/6/6/13663245/multi-genre... · Multi-Genre Research Project: ... For my multi-genre project, I have chosen to research

GenRe Kit BKKBN 2016 - Genre KIT Juknis 2016 ~

6 th Grade Language Arts Genre. What is genre? Genre is different categories or types of books.

What is Genre? What is genre? Genre: _____________ used to group different ___________ of literary works. Choose the genre by examining characteristics.

SongNet: Real-time Music Classificationcs229.stanford.edu/proj2018/poster/53.pdf · International Pop Experimental Hip-pop Folk qResults üUnderstand why C-RNN cannot perform well

Key Media Concept - Genre. Distributors/marketers like genre because: They can market the film using stars connected to a genre. They can use genre signifiers.

THE GENRE AND THE GENRE EXPECTATIONS OF CRIT … · THE GENRE AND THE GENRE EXPECTATIONS OF CRIT SESSIONS OF ARCHITECTURE STUDENTS IN UTM ... THE GENRE AND THE GENRE EXPECTATIONS

Dramatic Genres. GENRE Another word for kind Genre is also the root for our word “gender” Another word meaning genre is type Characteristics of GENRE.

Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

Genre Analysis and Conventions of Thriller Genre

Genre research and Chosen Genre

Identifying needs and establishing requirementswebhome.cs.uvic.ca/~gtzan/seng310Spring2005/lecture8.pdf · Identifying needs and establishing requirements. ... – Understanding user

An Analysis of the GTZAN Music Genre Dataset · material, etc. Sociological indicators of genre are how mu-sic listeners identify the music, e.g., through tags applied to their music

Tutorials and Quick Guides A quick introduction. Overview Genre of Tutorial Genre of Quick Guide Genre of Reference Genre of User Manual Attributes.

Genre As Social Ac.on: The Rhetorical Genre Studies ...

The Multi-genre Research Project - Honey Creek Community ...honeycreekschool.org/ms/files/2014/11/biography-genre-project.pdf · The Multi-genre Research Project “[Multi-genre papers]

1 Copyright 2011 G.Tzanetakis Music Information Retrieval George Tzanetakis (gtzan@cs.uvic.ca)gtzan@cs.uvic.ca Associate Professor, IEEE Senior Member.

Discourse and Genre (the relationship between discourse and genre)

Usercentered approaches to interaction designwebhome.cs.uvic.ca/~gtzan/seng310/lectures/user_centered.pdfBased on the slides available at book.com Degrees of user involvement Member

What is Genre? What is genre? Genre: ___ used to group different _ of literary works. Choose the genre by examining characteristics.