e Data Mining
Transcript of e Data Mining
IIIA - CSIC
Mining Music Social Networks
for Automating Social Music Services
Claudio Baccigalupo – Enric PlazaIIIA-CSIC – September 2007
!e Goal
To automatically program the music for the channels of a Web radio, with a selection process that emulates the knowledge of an expert (DJ)
!e Goal
To automatically program the music for the channels of a Web radio, with a selection process that emulates the knowledge of an expert (DJ)
!is requires a domain knowledge about musical associations (which songs and artists are to be played one after the other?)
We present how we obtain such a knowledge from a data mining process on a large collection of playlists gathered from the Web
1. !e Data Set: gathering playlists from Web users
2. !e Data Mining: extracting musical knowledge from playlists
3. !e Evaluation: comparing with other similar measures
4. !e Application: programming a social Web radio
5. Conclusions
!e Data Set: why playlists?
Playlists are sequences of songs compiled by humans for some purpose, with cultural and social aspects that cannot be found with other sources of musical knowledge (e.g., acoustic-based)
Playlists form part of that user-created content that is nowadays more and more available, thanks to the social Web phenomenon
Playlists are easy to gather, analyse, store, and understand
Playlists have a sequential nature, and the ordering of songs is a relevant feature since our goal is to programme a radio channel
!e Data Set: which playlists?
We have collected 599,565 user-compiled playlists from the Web-based music community MyStrands (http://www.mystrands.com)
published using a Web browser
published using MyStrands plug-in
!e Data Set: which playlists?
We have collected 599,565 user-compiled playlists from the Web-based music community MyStrands (http://www.mystrands.com)
Playlists can be obtained with the Web API called OpenStrands
Playlists have an average length of 16.8 songs
Users are 65% male, 32 years old in average
MyStrands includes more than 5M songs
1. !e Data Set: gathering playlists from Web users
2. !e Data Mining: extracting musical knowledge from playlists
3. !e Evaluation: comparing with other similar measures
4. !e Application: programming a social Web radio
5. Conclusions
While a song is playing on a radio channel, we wish to know which songs are musically associated with , and are good candidates to be selected to play after on the channel
We mine the playlists to learn the song association for any pair of songs and the artist association for any pair of artists
!e Data Mining: what to look for?
I Spy (Pulp) Trash (Suede)Data
MiningProcess
X
X
X
(X, Y ) (A, B)
Song X (Artist A) Song Y (Artist B)
I Spy (Pulp) T.N.T. (AC/DC)
s(X,Y ) = 0.9 s!(A, B) = 0.7
s(X,Y ) = 0.3 s!(A, B) = 0.2
s(X,Y ) ! [0, 1] s!(A, B) ! [0, 1]
We count the co-occurrences of pairs of songs in the playlists
We normalise against the popularity of the songs in the playlists
We assign stronger associations when the distance between songs is small and when the ordering is preserved
occur together in 4 playlists
!e Data Mining: what to consider?
I Spy (Pulp) Trash (Suede)
also co-occur 4 times, but this value is not as relevant,
I Spy (Pulp) Basket Case (Green Day)
since Basket Case (Green Day) Trash (Suede)occurs in 14,897 playlists, 219 times more than
I Spy (Pulp)Playlist #1: Song 2 (Blur) Wonderwall (Oasis)Trash (Suede) Uno (Muse)contiguous post-occurrence between songs strong association
Playlist #2: Basket Case (Green Day) Trouble (Coldplay)Vertigo (U2)distant pre-occurrence between songs weak association
I Spy (Pulp)
We filter out statistically insignificant associations, and co-occurrences between songs from the same artist
We obtain from the playlists of MyStrands a set of 112,238 songs that have a song association degree with some other song
!e Data Mining: song associations
Top associated tracks for: Strangers In !e Night (Frank Sinatra)
Up, Up and Away (!e 5th Dimension)Message To Michael (Dionne Warwick)
Whatever happens, I Love You (Morrissey)Sugar Baby Love (Rubettes)
Move It On Over (Ray Charles)It Serves You Right To Suffer ( John Lee Hooker)
Blue Angel (Roy Orbison)
Smoke On !e Water (Deep Purple)
Space Truckin’ (AA.VV.)Cold Metal (Iggy Pop)
Iron Man (Black Sabbath)China Grove (!e Doobie Brothers)
Crossroads (Eric Clapton)Sunshine Of Your Love (Cream)
Wild "ing ( Jimi Hendrix)
With the same technique, we estimate the artist association degree for 25,881 artists from the playlists of MyStrands
We count the co-occurrences of pairs of artists in the playlists, normalise along their popularity and consider their distances
!e Data Mining: artist associations
Top associated artists for: Abba
Agnetha FaltskogA-Teens
ChicGloria Gaynor
!e 5th DimensionAndy Gibb
Olivia Newton-John
Frank Sinatra
Dean MartinSammy David Jr.
Judy GarlandBing Crosby
!e California RaisinsTony BennettLouis Prima
Destiny’s Child
Kelly RowlandCity High
CiaraFantasia
Christina MilianBeyoncéAshanti
John Williams
MecoDanny ElfmanJohn Carpenter
London !eatre OrchestraJohn Barry
Hollywood Studio OrchestraElmer Bernstein
1. !e Data Set: gathering playlists from Web users
2. !e Data Mining: extracting musical knowledge from playlists
3. !e Evaluation: comparing with other similar measures
4. !e Application: programming a social Web radio
5. Conclusions
We compare the top associated tracks and artists found with the most similar tracks and artists proposed by different Web sites
!e results will be distinct since we do not look for a similarity (symmetric measure) but for building a good sequence of songs (asymmetric, the ordering matters)
Still, some observations can be made
!e Evaluation: preamble
MusicSeer
We assign the highest rankings to songs which are less popular
If one of these songs is contained in the radio library, it will be played, thus the listeners will probably discover new music
Otherwise, a less associated/more popular song will be played
!e Evaluation: song association
Top associated songs for: Strangers In !e Night (Frank Sinatra)
Up, Up and Away (!e 5th Dimension) Message To Michael (Dionne Warwick)happens, I Love You (Morrissey) Sugar Baby Love (Rubettes) Move It On Over (Ray Charles)
It Serves You Right To Suffer ( John Lee Hooker) Blue Angel (Roy Orbison)
Yahoo!
Whatever
Mr. Tambourine Man (!e Byrds) Don’t You Want Me (Human League)(!e Monkees) Good Vibrations (!e Beach Boys) Stay (Shakespeare’s Sister)
Rising Sun (!e Animals) Oh Pretty Woman (Roy Orbison)
I’m a Believer"e House of "e
Some high-ranked associations are common, although inferred with different methods (human experts, playlists, listening habits)
We are able to spot out first one of the most associated artist
!e Evaluation: artist association
Top associated artists for: AbbaAgnetha Faltskog A-Teens Chic Gloria Gaynor !e 5th Dimension Andy Gibb Olivia Newton-John
Donna Summer Madonna Gloria Gaynor Cyndi Lauper Kool & !e GangMyStrands
AMG
Blondie
Ace of Base Gemini Maywood Bananarama Lisa Stansfield Gary Wright Roxette
Yahoo! !e Bee Gees !e Carpenters !e Beatles Foreigner Whitney Houston Madonna
Last.fm Madonna Cher Kylie Minogue Boney M. Michael Jackson Elton John!e Bee Gees
MusicSeer Playlists Blondie Cyndi Lauper Queen Cat Stevens Cher!e Bee Gees !e Beach Boys
1. !e Data Set: gathering playlists from Web users
2. !e Data Mining: extracting musical knowledge from playlists
3. !e Evaluation: comparing with other similar measures
4. !e Application: programming a social Web radio
5. Conclusions
!e Application: what is Poolcasting?
!e collection of songs (Music Pool) is open and dynamic
!e music played on each channel cannot be pre-programmed, every channel is automatically scheduled in real time
!e Application: song scheduling
Last song played X
Music Pool
Retrieval
Subset of candidates musically associated with X
X
X
Song and Artist Associations
!e best candidates are songs either associated with , or associated with songs by A, or associated with songs from artists associated with A, or whose artist is associated with A
Candidates
!e Application: retrieval process
Last song X (A) X
Music Pool
Retrieval
Song and Artist Associations
Trash (Suede)Go (Moby)
Uno (Muse)Drive (R.E.M.)
Pilgrim (Enya)
Nikita (Elton John)
T.N.T. (AC/DC)Noon (Eric Serra)
Roxanne (Sting)
Cody (Mogwai)
I Spy (Pulp)X
s(X, Y ) s!(A, B)
Trash (Suede)Go (Moby)Uno (Muse) Drive (R.E.M.)
X
!e best candidates are then ranked according to the music preferences of the current listeners, and the best song is played
Listeners preferences are inferred analysing their music libraries
Candidates
!e Application: reuse process
Last song X (A) X
Retrieval
Song and Artist AssociationsI Spy (Pulp)
Xs(X, Y ) s!(A, B)
Trash (Suede)Go (Moby)Uno (Muse) Drive (R.E.M.)
Music Pool
ListenersPreferences
Feedback
Ranking
the best ranked candidate is played next
!e higher the rating and the higher the play count of a song in a user library (iTunes), the higher the inferred listener preference
Listeners can interact via the Web interface to state their explicit preferences for the songs played or to rate the next candidates
When listeners have diverging preferences in the same channel, fairness is achieved by favouring at each moment those listeners who were less satisfied by the last songs played
!e Application: more details
1. !e Data Set: gathering playlists from Web users
2. !e Data Mining: extracting musical knowledge from playlists
3. !e Evaluation: comparing with other similar measures
4. !e Application: programming a social Web radio
5. Conclusions
Conclusions
We use knowledge discovered from a Web-based music community to provide a group-customised Web service
Domain knowledge about which songs and artists are musically associated originates from the data mining of patterns of songs in a large set of playlists compiled by MyStrands users
!e result is a social Web radio where channels are automatically programmed in real time to match both musical associations criteria and the preferences of the current listeners
Future work: evaluate the quality of the associations, and extend the data mining process to include patterns of three or more songs
IIIA - CSIC
Mining Music Social Networks
for Automating Social Music Services
Claudio Baccigalupo – Enric PlazaIIIA-CSIC – September 2007
ANY QUESTION?