Data science-2013-heekim

A Unified Music Recommender System Using Users’ Listening Habits and Semantics of Tags

Hyon Hee Kim

Department of Statistics and Information Science,

Dongduk Women’s University

Outline

• Motivation & Objectives

• Overview of the System

• Generation of User Profiles

• A Unified Music Recommendation

• Performance Evaluation

• Related Work

• Conclusions and Future Work

Motivation (1/3)

• In a Social Music Site – Music recommendation is essential.

– Music recommendation is different from other product recommendation

• Explicit information : Rating system

• Implicit information : the number of plays

• Listening habits-based User Profiling – Cold Start Problem

• A new users with little information

• A new items with only a few ratings

– Data Sparsity Problem

• Data is very small compared to needed music items

Classic rock

british

pop

rock

• Collaborative Tagging – A tool for users to represent their preferences about web resources

– Users add keywords which are freely chosen by themselves to web resources

– Using tag data for user profiling in personalized recommender systems

• Tag-based User Profiling – More Easily added tags without listening to music

– Semantically meaningful tags

Motivation (2/3)

Motivation (3/3)

• In the case of last.fm

• Factual Tags – 85% of tags

– genre, region, instrumentation

• Emotional Tags – 10% of tags

– opinion, sentiment, mood

• Personal Tags – 5% of tags

– to organize, to browse, etc.

Objectives

• A Novel Approach to Music Recommendation – Combining listening habits and semantics of tags

• Using a Tag Ontology and an Emotion Ontology – UniTag: Resolving semantic ambiguity of tags

– UniEmotion: Assigning weighted values to the emotional tags

→ Semantically Enhanced Music Recommendation

Outline






• Related Work


Overview of the System

Outline



• Tag-based User Profiling – Preprocessing of tags

– Algorithms for generating user profiles

– Preliminary experimental results



• Related Work


Preprocessing of Tags (1/3)

• A tag does not have any pre-defined term or hierarchies of a term

• Problems of tag data – Synonymy

• Different words represents the same meaning

• E.g., hiphop, hip-hop, hip hop/ R & B, Rhythm and Blues, Blues

– Polysemy • A single word contains multiple meanings

• E.g., French => French rock, French pop, French artist

– Spelling variants

• misspelling

• Foreign language


• Tag Ontology – Tags, users, items

• UniTag Ontology – uniTag:Users

• uniTag:userID, uniTag:hasAdded, uniTag:hasAddedTo

– uniTag:Items

• uniTag:itemID

– uniTag:Tags

• uniTag:tagID, uniTag:tagName, uniTag:RTag, uniTag:subTag,

• uniTag:Rtags {rock, hiphop, electronic, metal, jazz, rap, funk, folk, blues, reggae}

• uniTag:classifiedAs, uniTag:isKindOf, uniTag:istheSameAs, uniTag:tagVariation


• Rules for reasoning prefix – French rock, progressive rock, post rock=> rock

(Tag (?t) ^ tagPrefix (?t, ?p) ^ Prefix(?p) ^ subTag(?t, ?s) ^ Rtags (?s) -> classifiedAs (?t, ?s)

• Rules for reasoning expert knowledge – Soul => rhythm and blues, rhythm and blues => blues then Soul => blues

(Tag (?t) ^ isKindof (?t, ?A) ^ isKindof (?A, ?B) -> isKindof (?t, ?B)

• Rules for reasoning synonym – Hip-hop, hiphop => hip hop

(Tag(?t) ^tagVariation (?t, ?R) ^ istheSameAs (?t, ?s) -> tagVariation (?s, ?R)

Algorithm for Generating User Profiles (1/2) Algorithm 1. Generation of A Tag-based Profile

Input: set of Representative tags Tr, set of a user’s tag Tu

Output: set of frequencey for each representative tag of the user FTr

var RTags[] = {rock, hiphop, electronic, metal, jazz, rap, funk, folk, blues, reggae}

var tagFrequency[] = { }, tempFrequency [] = { }

var RTag = null

while ∃next tag t in Tu do

RTag = FindRTag (t)

If Rtag == RTags [i] then

{ tempFrequency[i] = tempFrequency[i] + 1

tagFrequency [i] = tempFrequency [i] }

else

tagFrequency [i] = tempFrequency [i]

endwhile

rock hiphop electronic metal jazz rap funk folk blues reggae

user1 6 2 2 3 2 4 3 1 1 1

user2 5 0 0 0 0 0 0 0 1 0

user3 2 2 1 1 1 1 2 0 0 1

user4 10 1 0 1 2 0 2 3 3 1

user5 1 4 0 0 0 4 1 0 0 0

Table 1. An example of tag-based profiles

Algorithm for generating User Profiles (2/2)

Algorithm 2. Generation of A Track-based Profile Input: set of tracks of a usr TRu, set of Representative tags Tr Output: set of number of a user’s tracks for each representative musical genre Tn var RTags[] = {rock, hiphop, electronic, metal, jazz, rap, funk, folk, blues, reggae} var numTrack[ ] = { }, tempnumTrack [ ] = { } var RTrack = null while ∃next tag t in Tu do RTrack = FindGenre (t) If Rtrack == RTags [i] then { tempnumTrack [i] = tempnumTrack[i] + 1 numTrack[i] = tempnumTrack [i] } else numTrack [i] = tempnumTrack [i] endwhile

rock hiphop electronic metal jazz rap funk folk blues reggae

User1 65 176 5 4 0 168 0 3 0 0

User2 411 8 11 109 3 5 8 1 0 0

User3 157 7 11 10 6 2 1 39 4 2

User4 257 20 9 18 2 5 0 9 0 0

User5 110 277 15 8 6 85 10 3 2 7

Table 2. An example of track-based profiles

Preliminary Experimental Results (1/3)

• 1,000 user data set from Last.fm – Users, tags, music items

• Standardization – To remove extensive preference

• K-Means clustering algorithm – Canopy Clustering

– 6 centroid points and 6 clusters


X1 X2 X3 X4 X5 X6 X7 X8 X9 X10

Cluster1 0.241 1.472 0.626 0.130 1.267 1.621 2.168 0.274 1.078 0.381

Cluster2 2.171 0.032 0.517 3.052 0.011 -0.030 0.328 1.533 1.245 0.162

Cluster3 -0.206 -0.273 -0.517 -0.178 -0.180 -0.294 -0.233 -0.171 -0.204 -0.136

Cluster4 -0.341 0.660 -0.459 -0.284 -0.208 1.178 -0.179 -0.321 -0.166 0.273

Cluster5 -0.074 -0.155 1.320 -0.230 -0.115 -0.261 -0.209 -0.070 -0.172 -0.071 Cluster6 2.815 7.640 5.168 -0.136 9.254 6.135 7.000 4.286 4.421 5.254

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10

Cluster1 -0.411 0.495 0.406 -0.338 1.565 0.131 1.632 -0.135 0.147 0.812

Cluster2 0.200 -0.444 0.007 -0.341 0.907 -0.468 -0.288 2.617 1.097 0.020

Cluster3 -0.897 1.651 -0.539 -0.442 -0.213 1.836 0.059 -0.507 -0.415 0.034

Cluster4 1.925 -0.590 -0.404 0.852 -0.264 -0.491 0.655 -0.002 2.850 -0.108

Cluster5 0.914 -0.557 -0.216 0.794 -0.296 -0.511 -0.297 0.014 -0.157 -0.147 Cluster6 -0.472 -0.327 0.380 -0.373 -0.184 -0.371 -0.241 -0.205 -0.300 -0.093

Table 3. Values of Centers of Tag-based Profiles

Table 4. Values of Centers of Track-based Profiles

• Clustering Validity – Inter-cluster distances

– Distances between all pairs of centroids using cosine distance measure


– T-test

• Mean of inter-cluster distances of tag-based profiles

• Mean of inter-cluster distances of track-based profiles

N Mean Std Dev t p-value

Tag-based profiles

15 0.8325 0.6834

2.55 0.0165 Track-based profiles

15 0.3785 0.0885

Table 5. T-test result for the means of inter-cluster distances

Outline




• A Unified Music Recommendation – UniEmotion Ontology

– Generation of User Profiles

– Music Recommendation Algorithm


• Related Work


UniEmotion Ontology (1/5)

[Plutchik’s model]


P: 0.625, O: 0.25, N: 0.125

P: 0.375, O: 0.625, N: 0

P: 1.0, O: 0, N: 0

• Definition of the intensity of emotional tags • SentiWordNet, http://sentiwordnet.isti.cnr.it/


• Intensity of emotional tags

– Strong • Positive value >= 0.75 or Negative value>= 0.75

– Middle • 0.25 <= Positive value <= 0.75 or

• 0.25 <= Negative value <= 0.75

– Weak • Positive value < 0.25 and Negative value < 0.25


• Assigning the weights to the tags

– Factual tags: 1

– Positive tags • Strong: 2.5

• Middle: 2

• Weak: 1.5

– Negative tags • Strong: -2.5

• Middle: -2

• Weak: -1.5

• Final score of an item => sum of the weights


• Two classes

– UniEmotion:Positive • Emotional tags belonging to the positive emotional categories

• trust, surprise, anticipation, and happiness

– UniEmotion:Negative • Emotional tags belonging to the negative emotional categories

• disgust, anger, fear, and sadness

• Two properties

– UniEmotion:Intensity • Specifying the intensity of tags

– UniEmotion:Weight • Specifying the weight of tags

Generation of User Profiles (1/2)

1. Listening habits-based User Profiles – U1 = {u1, u2, …, um}, I1 = {i1, i2, …, in},

– <u, I, n> • N: number of plays

2. Tag score-based User Profiles – U2 = {u1, u2, …, um}, I2 = {i1, i2, …, in},

– <u, I, s> • S: scores of tags assigned by UniEmotion ontology

3. Hybrid User Profiles – U3 = {u1, u2, …, um}, I3 = I1 ∩ I2,

– <u, I, m> • M = α * n +(1- α) * s; α = 0.5

Generation of User Profiles (2/2) 1. Listening habits-based

User profiles

2. Tag score-based User profiles

3. Hybrid User profiles

Music Recommendation Algorithm (1/2)

• Finding Similar Users

– Pearson Correlation Similarity

• Calculating scores of items

– Considering the similar users’ rates

• Recommending top n items

Music Recommendation Algorithm (2/2)

Input: a set of user profiles UP

Output: a set of recommended items RI

1. For all yi ∈ U

Compute a similarity s between X and yi.

2. Sort by similarity

3. Select top n neighbors

4.

5. For all

Compute a similarity t between x and

For all

preference +=t * pref

6. Rank by preference

7. Select top n items

Outline






• Related Work


Performance Evaluation

• Implementation Environment: Apache Web Server

– User database : MySQL 5.0

– Listening habits collector, tag score generator: PHP

– Recommendation Engine: Apache Mahout

– UniTag and UniEmotion Ontology: JDK6.0

• Experimental Data

– 1, 000 user information from last.fm [http://mir.dcs.gla.ac.uk/]

– Containing 18,700 artist and 12,600 tags

– 70% training data, 30% test data

Performance Evaluation • Evaluation Model

– Recommended items • Items which users are interested in (True Positive, TP)

• Items which users are not (False Positive, FP)

– Items which are not recommended • Items which users are interested in (False Negative, FN)

• Items which users are not interested in (True Negative, TN)

– Precision P = TP/ TP+ FP • # of correct recommendation/# of all recommended items

– Recall R = TP / TP+FN • # of correct recommendation/# of preferred items

– F-measure F = 2* P* R / P+R • Harmonic average between precision and recall

Experimental Results (1/3)

• Precisions

[Number of similar users] [Number of recommended items]

A: Listening habits-based approach

B: Tag-based approach

C: Hybrid approach


• Recalls




C: Hybrid approach


• F-measure




C: Hybrid approach

Statistical Validation

• One-way ANOVA about three groups

– Method1: listening habits-based approach

– Method2: tag-based approach

– Method3: hybrid approach

• Tukey Multiple Comparison Test

– Asymmetric distributions • Log transformation

– Different characters in case two groups have significant difference

Method 1 2 3 F

Mean of log(prec) -3.962B -4.036B -2.879A 34.27***

Mean Precision(SD)

0.020 (0.006)

0.020 (0.009)

0.068 (0.040)

N 24 24 24

Method 1 2 3 F

Mean of log(recall) -3.285B -4.099c -2.635A 26.80***

Mean Recall (SD)

0.044 (0.023)

0.019 (0.010)

0.093 (0.056)

N 24 24 24

<Table1. test for precision> ***: p<0.001

<Table2. test for recall> ***:p<0.001

Method 1 2 3 F

Mean of log(F-measure) -3.748B -4.117c -2.894A 41.31***

Mean F-measure (SD)

0.024 (0.006)

0.018 (0.008)

0.06 (0.034)

N 24 24 24

<Table2. test for F-measure> ***: p<0.001

Related Work

• MusicBox – A personalized music recommender system based on social tags

– 3-order tensors model

– The method improves the recommendation quality

• Foafing the music – Collecting music information in a semantic web environment

– User information, music information, concert information

– Recommendation of similar music items

• OntoEmotions – An ontology of emotional categories covering the basic emotions

– Armeteo art portal

– New relations can be inferred by reasoning on the ontology of emotions

Conclusions

• Solution to Cold Start Problem – It takes time to collect users’ listening habits.

– Adding tags is easily done

– Tags look like word-of-mouth

• Performance Enhancement – Precision, Recall, F-measure

– Hybrid approach > listening habits-based approach, tag-based approach

Future Work

• Elaborating UniEmotion Ontology – Emerging Internet Slangs

• Item Selection – Product Network Analysis Considering Tags

– Analyzing short description

Data science-2013-heekim

Technology

Transcript of Data science-2013-heekim