Item Latent Factors DEEP CONTENT-BASED MUSIC...

1
Music Recommendation has emerged as an important problem as competition in the digital music market is intensifying. Music streaming companies are struggling to attract customers by offering recommendations that accurately matches their tastes. However, conventional recommendation algorithms are known to have limitations. Recommendation based on collaborative filtering often suffers from popularity bias; only popular items are recommended most of the time. Hence it is difficult to discover new musics that matches with listeners’ taste. To solve this issue, we suggest content-based method for music recommendation. In addition, we provide some visualizations and explanations of results using various methods. DEEP CONTENT-BASED MUSIC RECOMMENDATION Changyoung Koh 1 , Taeyoung Ko 2 1 School of Computing, KAIST 2 Department of Industrial & Systems Engineering, KAIST {lucent, tyko}@kaist.ac.kr We propose a densely-connected (each convolution layer is connected with every other; omitted in the figure) 1-D convolutional neural network architecture that predicts latent factor of music from its mel-spectrogram. Million Song Dataset 20000 Users, (First 30 seconds of) 9960 Songs. Train / Test / Validation: 70% / 20% / 10% Our Dataset 430,000 Users, (First 3000 frames of) 29959 Songs. Our proposed network uses only 600 frames of musics. However, adopting sequential architectures such as CRNN would be more suitable for capturing temporal features. Advanced visualization methods like Gradient Ascent may help interpreting the relation between content and latent space. User-Item Implicit Feedbacks User Latent Factors Item Latent Factors WALS In this work, we use users’ total play count of each music as implicit feedbacks. Using weighted matrix factorization method, we can decompose the feedback matrix into two latent factors: user and item. Country Artists DJ Artists Hip-Hop Artists Female Pop Artists We projected each artist’s average predicted latent factors to 2D plane using T-SNE method and confirmed that it captures musical characteristics reasonably. We colored some representative clusters. Million Song Dataset ○ AUC(Area under receiver operating characteristics curve) Score is used to evaluate the prediction ability of model. Using AUC per user, We compared reconstructed Song×User matrix from our trained model with the original Song×User implicit feedback matrix. We achieved AUC score about 70.23 for test dataset. It is very close to AUC score from popularity recommendation (70.29). Filters from lower layers. Left: Rock Guitar, Right: Intense Beats Filters from the last layer. Left: Electronica, Right: HipHop + Buddhist We examined songs that maximize each filter’s activation and found that lower layers capture simple musical characteristics like tempo and timbre, while higher layers capture complex and multi-modal features such as genre. Collaborative Filtering Recommendation Content-based Recommendation Every Teardrop Is A Waterfall - Coldplay 1. Speed Of Sound - Coldplay 2. Charlie Brown - Coldplay 3. Clocks - Coldplay 4. A Sky Full Of Stars - Coldplay Every Teardrop Is A Waterfall - Coldplay 1. Cool (Feat. Roy English) - Alesso 2. Don't Look Down (Feat. Usher) - Martin Garrix 3. Freefall (Feat. BullySongs) - Armin van Buuren 4. Airplanes (Feat. Hayley Williams of Paramore) - B.o.B

Transcript of Item Latent Factors DEEP CONTENT-BASED MUSIC...

Page 1: Item Latent Factors DEEP CONTENT-BASED MUSIC …mac.kaist.ac.kr/~juhan/gct634/2018/finals/deep_content... · 2018-09-14 · Every Teardrop Is A Waterfall - Coldplay 1. Speed Of Sound

Music Recommendation has emerged as an important problem as competition in the digital music market is intensifying. Music streaming companies are struggling to attract customers by offering recommendations that accurately matches their tastes. However, conventional recommendation algorithms are known to have limitations. Recommendation based on collaborative filtering often suffers from popularity bias; only popular items are recommended most of the time. Hence it is difficult to discover new musics that matches with listeners’ taste. To solve this issue, we suggest content-based method for music recommendation. In addition, we provide some visualizations and explanations of results using various methods.

DEEP CONTENT-BASED MUSIC RECOMMENDATION

Changyoung Koh1, Taeyoung Ko21 School of Computing, KAIST

2 Department of Industrial & Systems Engineering, KAIST{lucent, tyko}@kaist.ac.kr

We propose a densely-connected (each convolution layer is connected with every other; omitted in the figure) 1-D convolutional neural network architecture that predicts latent factor of music from its mel-spectrogram.

• Million Song Dataset○ 20000 Users, (First 30 seconds of) 9960 Songs.○ Train / Test / Validation: 70% / 20% / 10%

• Our Dataset○ 430,000 Users, (First 3000 frames of) 29959 Songs.

• Our proposed network uses only 600 frames of musics. However, adopting sequential architectures such as CRNN would be more suitable for capturing temporal features.

• Advanced visualization methods like Gradient Ascent may help interpreting the relation between content and latent space.

User-ItemImplicit Feedbacks

UserLatent Factors

Item Latent FactorsWALS

In this work, we use users’ total play count of each music as implicit feedbacks. Using weighted matrix factorization method, we can decompose the feedback matrix into two latent factors: user and item.

Country Artists

DJ Artists

Hip-Hop Artists

Female Pop Artists

We projected each artist’s average predicted latent factors to 2D plane using T-SNE method and confirmed that it captures musical characteristics reasonably. We colored some representative clusters.

• Million Song Dataset○ AUC(Area under receiver operating characteristics curve)

Score is used to evaluate the prediction ability of model. ○ Using AUC per user, We compared reconstructed Song×User

matrix from our trained model with the original Song×User implicit feedback matrix.

○ We achieved AUC score about 70.23 for test dataset. It is very close to AUC score from popularity recommendation (70.29).

Filters from lower layers. Left: Rock Guitar, Right: Intense Beats

Filters from the last layer. Left: Electronica, Right: HipHop + Buddhist

We examined songs that maximize each filter’s activation and found that lower layers capture simple musical characteristics like tempo and timbre, while higher layers capture complex and multi-modal features such as genre.

Collaborative Filtering Recommendation Content-based RecommendationEvery Teardrop Is A Waterfall - Coldplay

1. Speed Of Sound - Coldplay2. Charlie Brown - Coldplay3. Clocks - Coldplay4. A Sky Full Of Stars - Coldplay

Every Teardrop Is A Waterfall - Coldplay1. Cool (Feat. Roy English) - Alesso2. Don't Look Down (Feat. Usher) - Martin Garrix3. Freefall (Feat. BullySongs) - Armin van Buuren4. Airplanes (Feat. Hayley Williams of Paramore) - B.o.B