Item Latent Factors DEEP CONTENT-BASED MUSIC...

Music Recommendation has emerged as an important problem as competition in the digital music market is intensifying. Music streaming companies are struggling to attract customers by offering recommendations that accurately matches their tastes. However, conventional recommendation algorithms are known to have limitations. Recommendation based on collaborative filtering often suffers from popularity bias; only popular items are recommended most of the time. Hence it is difficult to discover new musics that matches with listeners’ taste. To solve this issue, we suggest content-based method for music recommendation. In addition, we provide some visualizations and explanations of results using various methods. DEEP CONTENT-BASED MUSIC RECOMMENDATION Changyoung Koh 1 , Taeyoung Ko 2 1 School of Computing, KAIST 2 Department of Industrial & Systems Engineering, KAIST {lucent, tyko}@kaist.ac.kr We propose a densely-connected (each convolution layer is connected with every other; omitted in the figure) 1-D convolutional neural network architecture that predicts latent factor of music from its mel-spectrogram. • Million Song Dataset ○ 20000 Users, (First 30 seconds of) 9960 Songs. ○ Train / Test / Validation: 70% / 20% / 10% • Our Dataset ○ 430,000 Users, (First 3000 frames of) 29959 Songs. • Our proposed network uses only 600 frames of musics. However, adopting sequential architectures such as CRNN would be more suitable for capturing temporal features. • Advanced visualization methods like Gradient Ascent may help interpreting the relation between content and latent space. User-Item Implicit Feedbacks User Latent Factors Item Latent Factors WALS In this work, we use users’ total play count of each music as implicit feedbacks. Using weighted matrix factorization method, we can decompose the feedback matrix into two latent factors: user and item. Country Artists DJ Artists Hip-Hop Artists Female Pop Artists We projected each artist’s average predicted latent factors to 2D plane using T-SNE method and confirmed that it captures musical characteristics reasonably. We colored some representative clusters. • Million Song Dataset ○ AUC(Area under receiver operating characteristics curve) Score is used to evaluate the prediction ability of model. ○ Using AUC per user, We compared reconstructed Song×User matrix from our trained model with the original Song×User implicit feedback matrix. ○ We achieved AUC score about 70.23 for test dataset. It is very close to AUC score from popularity recommendation (70.29). Filters from lower layers. Left: Rock Guitar, Right: Intense Beats Filters from the last layer. Left: Electronica, Right: HipHop + Buddhist We examined songs that maximize each filter’s activation and found that lower layers capture simple musical characteristics like tempo and timbre, while higher layers capture complex and multi-modal features such as genre. Collaborative Filtering Recommendation Content-based Recommendation Every Teardrop Is A Waterfall - Coldplay 1. Speed Of Sound - Coldplay 2. Charlie Brown - Coldplay 3. Clocks - Coldplay 4. A Sky Full Of Stars - Coldplay Every Teardrop Is A Waterfall - Coldplay 1. Cool (Feat. Roy English) - Alesso 2. Don't Look Down (Feat. Usher) - Martin Garrix 3. Freefall (Feat. BullySongs) - Armin van Buuren 4. Airplanes (Feat. Hayley Williams of Paramore) - B.o.B

Upload
others
Category

Documents
view
6
download
0

Embed Size (px):

Transcript of Item Latent Factors DEEP CONTENT-BASED MUSIC...

Page 1: Item Latent Factors DEEP CONTENT-BASED MUSIC …mac.kaist.ac.kr/~juhan/gct634/2018/finals/deep_content... · 2018-09-14 · Every Teardrop Is A Waterfall - Coldplay 1. Speed Of Sound

DEEP CONTENT-BASED MUSIC RECOMMENDATION

Changyoung Koh1, Taeyoung Ko21 School of Computing, KAIST

2 Department of Industrial & Systems Engineering, KAIST{lucent, tyko}@kaist.ac.kr

We propose a densely-connected (each convolution layer is connected with every other; omitted in the figure) 1-D convolutional neural network architecture that predicts latent factor of music from its mel-spectrogram.

• Million Song Dataset○ 20000 Users, (First 30 seconds of) 9960 Songs.○ Train / Test / Validation: 70% / 20% / 10%

• Our Dataset○ 430,000 Users, (First 3000 frames of) 29959 Songs.

• Our proposed network uses only 600 frames of musics. However, adopting sequential architectures such as CRNN would be more suitable for capturing temporal features.

• Advanced visualization methods like Gradient Ascent may help interpreting the relation between content and latent space.

User-ItemImplicit Feedbacks

UserLatent Factors

Item Latent FactorsWALS

In this work, we use users’ total play count of each music as implicit feedbacks. Using weighted matrix factorization method, we can decompose the feedback matrix into two latent factors: user and item.

Country Artists

DJ Artists

Hip-Hop Artists

Female Pop Artists

We projected each artist’s average predicted latent factors to 2D plane using T-SNE method and confirmed that it captures musical characteristics reasonably. We colored some representative clusters.

• Million Song Dataset○ AUC(Area under receiver operating characteristics curve)

Score is used to evaluate the prediction ability of model. ○ Using AUC per user, We compared reconstructed Song×User

matrix from our trained model with the original Song×User implicit feedback matrix.

○ We achieved AUC score about 70.23 for test dataset. It is very close to AUC score from popularity recommendation (70.29).

Filters from lower layers. Left: Rock Guitar, Right: Intense Beats

Filters from the last layer. Left: Electronica, Right: HipHop + Buddhist

We examined songs that maximize each filter’s activation and found that lower layers capture simple musical characteristics like tempo and timbre, while higher layers capture complex and multi-modal features such as genre.

Collaborative Filtering Recommendation Content-based RecommendationEvery Teardrop Is A Waterfall - Coldplay

1. Speed Of Sound - Coldplay2. Charlie Brown - Coldplay3. Clocks - Coldplay4. A Sky Full Of Stars - Coldplay

Every Teardrop Is A Waterfall - Coldplay1. Cool (Feat. Roy English) - Alesso2. Don't Look Down (Feat. Usher) - Martin Garrix3. Freefall (Feat. BullySongs) - Armin van Buuren4. Airplanes (Feat. Hayley Williams of Paramore) - B.o.B

https://lyric.pw/msd

Cosmological N-Body Simulation - Topology of Large scale Structure Changbom Park with Juhan Kim (Korea Institute for Advanced Study) & J. R. Gott (Princeton),

Juhan Nam - mac.kaist.ac.krjuhan/Juhan_Nam_CV_May_2020.pdf · AWARDS AND HONORS (GIVEN TO MY STUDENTS) IEEE Signal Processing Society Travel Grant (Jongpil Lee) 2020 KAIST Global

8/27/03LSU - AEI Astrophysics1 The Nonlinear Development of Instabilities in Rotating Stars and Binary Star Systems Joel E. Tohline & Juhan Frank Kevin.

HXR 2017: Juhan Sonin, GoInvo

Item Latent Factors DEEP CONTENT-BASED MUSIC ...juhan/gct634/2018/finals/deep...• Million Song Dataset 20000 Users, (First 30 seconds of) 9960 Songs. Train / Test / Validation: 70%

Toward Certain Sonic Properties of an Audio Feedback System by …mac.kaist.ac.kr/pubs/KimNamWakefield-EvoMUSART2015.pdf · 2018-09-14 · Toward Certain Sonic Properties of an Audio

Music Generation: Part 1 - KAISTjuhan/gct634/Slides/11 music...Score (REMI) Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions , Yu-Siang

GCT634: Musical Applications of Machine Learning Polyphonic …juhan/gct634/2018/slides/10... · 2018. 9. 14. · the score, it can then be used as a reference piece for the extrac-tion

GCT535-Sound Technology for Multimediamac.kaist.ac.kr/~juhan/gct535/slides/02-digital audio.pdf · Aliasing in Frequency Domain §Sampling in time creates imaginary content of the

Cellular Phones by Juhan Rebane 2005. Topics What is a cellular phone ? How it works ? What do it consists of ? Student´s consumption. What.

The Orientia tsutsugamushi genome reveals massive proliferation …juhan/pubFiles/PNAS2007-MAY.pdf · 2007. 5. 4. · The Orientia tsutsugamushigenome reveals massive proliferation

VOCAL SEPARATION FROM SONGS USING DEEP …juhan/gct634/2018/finals/vocal... · 2018. 9. 14. · [5]Santiago Pascual, Antonio Bonafonte, and Joan Serra. Segan: Speech enhancement generative

GCT634: Musical Applications of Machine Learning Tonal …juhan/gct634/2018/slides/09... · 2018. 9. 14. · GCT634: Musical Applications of Machine Learning Tonal Analysis Hidden

Jeongmin Liu, Keyongseok Jeong, Hyunjun Lee Motivation: Basic …juhan/gct634/2019/finals/MUSIC... · 2019. 6. 23. · Motivation: Basic level task of structure analysis Jeongmin

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Pitch Detection and Tracking Juhan Nam 1.

Multimodal Deep Learning Jiquan Ngiam Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee & Andrew Ng Stanford University.

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Tonal Harmony and Chord Recognition Juhan Nam 1.

Traditional Machine Learning: Unsupervised Learningjuhan/gct634/Slides/04... · 2020. 9. 24. · Traditional Machine Learning Pipeline in Classification Tasks A set of hand-designed

$Juhan Nam - KAISTjuhan/Juhan_Nam_CV.pdf · Juhan Nam, Vesa V alim aki, Jonathan S. Abel, Julius O. Smith, \E cient Anti-aliasing Oscillators Algorithms Using Low-order Fractional$

Juhan Nam - KAISTjuhan/Juhan_Nam_CV.pdf · Juhan Nam, Vesa V alim aki, Jonathan S. Abel, Julius O. Smith, \E cient Anti-aliasing Oscillators Algorithms Using Low-order Fractional

Standpoint Juhan Parts [22.09.2011]: Estonian state supports building "Rail Baltica" The City of Tallinn shares the "Rail Baltica" standpoints of Estonian.

Item Latent Factors DEEP CONTENT-BASED MUSIC...

Documents

Transcript of Item Latent Factors DEEP CONTENT-BASED MUSIC...