Download - Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Transcript
Page 1: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Mining Cross-Domain Rating Datasets from Structured Data on Twitter

@sidoomsSimon Dooms

Page 2: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Rating Datasets

What are ratings? Explicit user preference information

Why ratings? Recommender systems

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 2

Page 3: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Rating Datasets

What are ratings? Explicit user preference information

Why ratings? Recommender systems

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 3

Page 4: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Ratings Scarcity in Research

Ratings = private data Public datasets to the rescue?– MovieLens 100K (1998)– MovieLens 1M (2000)– MovieLens 10M (2008)– More on recsyswiki.com

Old, Synthetic Datasets

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 4

Page 5: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Social Sharing = Ratings Goldmine

Previous research: MovieTweetings

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 5

Page 6: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Social Sharing = Ratings Goldmine

Previous research: MovieTweetings– Movie Rating dataset from IMDb – Twitter– https://github.com/sidooms/MovieTweetings

What about other domains? Websites?

Well, let’s try it out!

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 6

Page 7: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Target Websites - GoodreadsConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 7

Twitter user - Rating - Book titleBook author - Goodreads URL - Time

Page 8: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Target Websites - PandoraConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 8

Twitter user - SongPandora URL - Time

Page 9: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Target Websites - YouTubeConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 9

Twitter user - (Video uploader)YouTube URL - Time

Page 10: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Mining Experiment

But words are wind…– 2 Weeks experiment– 4 Online platforms

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 10

Page 11: Mining Cross-Domain Rating Datasets from Structured Data on Twitter
Page 12: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 12

Python code + Task Scheduler = Dataset fileshttps://github.com/sidooms/Twitter-ratings

Page 13: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

The Numbers

One more thing …

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 13

Page 14: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Cross-Domain Rating DatasetConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 14

Page 15: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Applications

Collect ratings for recsys research / input Cross-domain recsys research Trend detection, analytics, ... Applicable for all social sharing webs

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 15

Page 16: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Conclusions

Ratings scarcity in research Public dataset are old and synthetic Social sharing = ratings goldmine 2 week experiment, 4 major websites Python code & datasets on Github True cross-domain ratings dataset

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 16

Page 17: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

@sidoomsSimon Dooms

Mining Cross-Domain Rating Datasets from Structured Data on Twitter