e Data Mining

25
IIIA - CSIC Mining Music Social Networks for Automating Social Music Services Claudio Baccigalupo – Enric Plaza IIIA-CSIC – September 2007

Transcript of e Data Mining

Page 1: e Data Mining

IIIA - CSIC

Mining Music Social Networks

for Automating Social Music Services

Claudio Baccigalupo – Enric PlazaIIIA-CSIC – September 2007

Page 2: e Data Mining

!e Goal

To automatically program the music for the channels of a Web radio, with a selection process that emulates the knowledge of an expert (DJ)

Page 3: e Data Mining

!e Goal

To automatically program the music for the channels of a Web radio, with a selection process that emulates the knowledge of an expert (DJ)

!is requires a domain knowledge about musical associations (which songs and artists are to be played one after the other?)

We present how we obtain such a knowledge from a data mining process on a large collection of playlists gathered from the Web

Page 4: e Data Mining

1. !e Data Set: gathering playlists from Web users

2. !e Data Mining: extracting musical knowledge from playlists

3. !e Evaluation: comparing with other similar measures

4. !e Application: programming a social Web radio

5. Conclusions

Page 5: e Data Mining

!e Data Set: why playlists?

Playlists are sequences of songs compiled by humans for some purpose, with cultural and social aspects that cannot be found with other sources of musical knowledge (e.g., acoustic-based)

Playlists form part of that user-created content that is nowadays more and more available, thanks to the social Web phenomenon

Playlists are easy to gather, analyse, store, and understand

Playlists have a sequential nature, and the ordering of songs is a relevant feature since our goal is to programme a radio channel

Page 6: e Data Mining

!e Data Set: which playlists?

We have collected 599,565 user-compiled playlists from the Web-based music community MyStrands (http://www.mystrands.com)

published using a Web browser

published using MyStrands plug-in

Page 7: e Data Mining

!e Data Set: which playlists?

We have collected 599,565 user-compiled playlists from the Web-based music community MyStrands (http://www.mystrands.com)

Playlists can be obtained with the Web API called OpenStrands

Playlists have an average length of 16.8 songs

Users are 65% male, 32 years old in average

MyStrands includes more than 5M songs

Page 8: e Data Mining

1. !e Data Set: gathering playlists from Web users

2. !e Data Mining: extracting musical knowledge from playlists

3. !e Evaluation: comparing with other similar measures

4. !e Application: programming a social Web radio

5. Conclusions

Page 9: e Data Mining

While a song is playing on a radio channel, we wish to know which songs are musically associated with , and are good candidates to be selected to play after on the channel

We mine the playlists to learn the song association for any pair of songs and the artist association for any pair of artists

!e Data Mining: what to look for?

I Spy (Pulp) Trash (Suede)Data

MiningProcess

X

X

X

(X, Y ) (A, B)

Song X (Artist A) Song Y (Artist B)

I Spy (Pulp) T.N.T. (AC/DC)

s(X,Y ) = 0.9 s!(A, B) = 0.7

s(X,Y ) = 0.3 s!(A, B) = 0.2

s(X,Y ) ! [0, 1] s!(A, B) ! [0, 1]

Page 10: e Data Mining

We count the co-occurrences of pairs of songs in the playlists

We normalise against the popularity of the songs in the playlists

We assign stronger associations when the distance between songs is small and when the ordering is preserved

occur together in 4 playlists

!e Data Mining: what to consider?

I Spy (Pulp) Trash (Suede)

also co-occur 4 times, but this value is not as relevant,

I Spy (Pulp) Basket Case (Green Day)

since Basket Case (Green Day) Trash (Suede)occurs in 14,897 playlists, 219 times more than

I Spy (Pulp)Playlist #1: Song 2 (Blur) Wonderwall (Oasis)Trash (Suede) Uno (Muse)contiguous post-occurrence between songs strong association

Playlist #2: Basket Case (Green Day) Trouble (Coldplay)Vertigo (U2)distant pre-occurrence between songs weak association

I Spy (Pulp)

Page 11: e Data Mining

We filter out statistically insignificant associations, and co-occurrences between songs from the same artist

We obtain from the playlists of MyStrands a set of 112,238 songs that have a song association degree with some other song

!e Data Mining: song associations

Top associated tracks for: Strangers In !e Night (Frank Sinatra)

Up, Up and Away (!e 5th Dimension)Message To Michael (Dionne Warwick)

Whatever happens, I Love You (Morrissey)Sugar Baby Love (Rubettes)

Move It On Over (Ray Charles)It Serves You Right To Suffer ( John Lee Hooker)

Blue Angel (Roy Orbison)

Smoke On !e Water (Deep Purple)

Space Truckin’ (AA.VV.)Cold Metal (Iggy Pop)

Iron Man (Black Sabbath)China Grove (!e Doobie Brothers)

Crossroads (Eric Clapton)Sunshine Of Your Love (Cream)

Wild "ing ( Jimi Hendrix)

Page 12: e Data Mining

With the same technique, we estimate the artist association degree for 25,881 artists from the playlists of MyStrands

We count the co-occurrences of pairs of artists in the playlists, normalise along their popularity and consider their distances

!e Data Mining: artist associations

Top associated artists for: Abba

Agnetha FaltskogA-Teens

ChicGloria Gaynor

!e 5th DimensionAndy Gibb

Olivia Newton-John

Frank Sinatra

Dean MartinSammy David Jr.

Judy GarlandBing Crosby

!e California RaisinsTony BennettLouis Prima

Destiny’s Child

Kelly RowlandCity High

CiaraFantasia

Christina MilianBeyoncéAshanti

John Williams

MecoDanny ElfmanJohn Carpenter

London !eatre OrchestraJohn Barry

Hollywood Studio OrchestraElmer Bernstein

Page 13: e Data Mining

1. !e Data Set: gathering playlists from Web users

2. !e Data Mining: extracting musical knowledge from playlists

3. !e Evaluation: comparing with other similar measures

4. !e Application: programming a social Web radio

5. Conclusions

Page 14: e Data Mining

We compare the top associated tracks and artists found with the most similar tracks and artists proposed by different Web sites

!e results will be distinct since we do not look for a similarity (symmetric measure) but for building a good sequence of songs (asymmetric, the ordering matters)

Still, some observations can be made

!e Evaluation: preamble

MusicSeer

Page 15: e Data Mining

We assign the highest rankings to songs which are less popular

If one of these songs is contained in the radio library, it will be played, thus the listeners will probably discover new music

Otherwise, a less associated/more popular song will be played

!e Evaluation: song association

Top associated songs for: Strangers In !e Night (Frank Sinatra)

Up, Up and Away (!e 5th Dimension) Message To Michael (Dionne Warwick)happens, I Love You (Morrissey) Sugar Baby Love (Rubettes) Move It On Over (Ray Charles)

It Serves You Right To Suffer ( John Lee Hooker) Blue Angel (Roy Orbison)

Yahoo!

Whatever

Mr. Tambourine Man (!e Byrds) Don’t You Want Me (Human League)(!e Monkees) Good Vibrations (!e Beach Boys) Stay (Shakespeare’s Sister)

Rising Sun (!e Animals) Oh Pretty Woman (Roy Orbison)

I’m a Believer"e House of "e

Page 16: e Data Mining

Some high-ranked associations are common, although inferred with different methods (human experts, playlists, listening habits)

We are able to spot out first one of the most associated artist

!e Evaluation: artist association

Top associated artists for: AbbaAgnetha Faltskog A-Teens Chic Gloria Gaynor !e 5th Dimension Andy Gibb Olivia Newton-John

Donna Summer Madonna Gloria Gaynor Cyndi Lauper Kool & !e GangMyStrands

AMG

Blondie

Ace of Base Gemini Maywood Bananarama Lisa Stansfield Gary Wright Roxette

Yahoo! !e Bee Gees !e Carpenters !e Beatles Foreigner Whitney Houston Madonna

Last.fm Madonna Cher Kylie Minogue Boney M. Michael Jackson Elton John!e Bee Gees

MusicSeer Playlists Blondie Cyndi Lauper Queen Cat Stevens Cher!e Bee Gees !e Beach Boys

Page 17: e Data Mining

1. !e Data Set: gathering playlists from Web users

2. !e Data Mining: extracting musical knowledge from playlists

3. !e Evaluation: comparing with other similar measures

4. !e Application: programming a social Web radio

5. Conclusions

Page 18: e Data Mining

!e Application: what is Poolcasting?

Page 19: e Data Mining

!e collection of songs (Music Pool) is open and dynamic

!e music played on each channel cannot be pre-programmed, every channel is automatically scheduled in real time

!e Application: song scheduling

Last song played X

Music Pool

Retrieval

Subset of candidates musically associated with X

X

X

Song and Artist Associations

Page 20: e Data Mining

!e best candidates are songs either associated with , or associated with songs by A, or associated with songs from artists associated with A, or whose artist is associated with A

Candidates

!e Application: retrieval process

Last song X (A) X

Music Pool

Retrieval

Song and Artist Associations

Trash (Suede)Go (Moby)

Uno (Muse)Drive (R.E.M.)

Pilgrim (Enya)

Nikita (Elton John)

T.N.T. (AC/DC)Noon (Eric Serra)

Roxanne (Sting)

Cody (Mogwai)

I Spy (Pulp)X

s(X, Y ) s!(A, B)

Trash (Suede)Go (Moby)Uno (Muse) Drive (R.E.M.)

X

Page 21: e Data Mining

!e best candidates are then ranked according to the music preferences of the current listeners, and the best song is played

Listeners preferences are inferred analysing their music libraries

Candidates

!e Application: reuse process

Last song X (A) X

Retrieval

Song and Artist AssociationsI Spy (Pulp)

Xs(X, Y ) s!(A, B)

Trash (Suede)Go (Moby)Uno (Muse) Drive (R.E.M.)

Music Pool

ListenersPreferences

Feedback

Ranking

the best ranked candidate is played next

Page 22: e Data Mining

!e higher the rating and the higher the play count of a song in a user library (iTunes), the higher the inferred listener preference

Listeners can interact via the Web interface to state their explicit preferences for the songs played or to rate the next candidates

When listeners have diverging preferences in the same channel, fairness is achieved by favouring at each moment those listeners who were less satisfied by the last songs played

!e Application: more details

Page 23: e Data Mining

1. !e Data Set: gathering playlists from Web users

2. !e Data Mining: extracting musical knowledge from playlists

3. !e Evaluation: comparing with other similar measures

4. !e Application: programming a social Web radio

5. Conclusions

Page 24: e Data Mining

Conclusions

We use knowledge discovered from a Web-based music community to provide a group-customised Web service

Domain knowledge about which songs and artists are musically associated originates from the data mining of patterns of songs in a large set of playlists compiled by MyStrands users

!e result is a social Web radio where channels are automatically programmed in real time to match both musical associations criteria and the preferences of the current listeners

Future work: evaluate the quality of the associations, and extend the data mining process to include patterns of three or more songs

Page 25: e Data Mining

IIIA - CSIC

Mining Music Social Networks

for Automating Social Music Services

Claudio Baccigalupo – Enric PlazaIIIA-CSIC – September 2007

ANY QUESTION?