Social Media Data Mining in Taggingdaten - GBV · 2011. 5. 24. · Social Media-Data Mining in...

Post on 16-Oct-2020

1 views 0 download

Transcript of Social Media Data Mining in Taggingdaten - GBV · 2011. 5. 24. · Social Media-Data Mining in...

Social Media-

Data Mining in TaggingdatenProf. Andreas Hotho

Universität Würzburg & Universität Kassel

29.12.2009Prof. Andreas Hotho 2

Knowledge and Data EngineeringKnowledge and Data Engineering Groupat the University of Kasselto knowledge management data engineeringby jaeschke and 1 other person on 2006-01-27 10:39:07edit delete |

Meine Forschungs-„Tag Cloud“

Trend Detection

Tag Recommender

Spam

LogSonomies

Semantic

Ranking

Graph Structures

Tag Similarity Community Detection

Ontology Learning

Information Retrieval

Data MiningSemantic Web

Social Network Analysis

Statistical Physics

Web 2.0

Machine Learning

Business Intelligence

29.12.2009Prof. Andreas Hotho 3

Definition: Web 2.0

“Web 2.0 ist ein Schlagwort, das für eine Reihe interaktiverund kollaborativer Elemente des Internets, speziell des WWWsteht und damit in Anlehnung an die Versionsnummern von Softwareprodukten eine Abgrenzung von früheren Nutzungsarten postuliert. ”

Wikipediahttp://de.wikipedia.org/wiki/Web_2.0

Tim O'Reilly prägte ihn durch den Artikel „What is Web 2.0“ (30. September 2005)

29.12.2009Prof. Andreas Hotho 4

Eine Web 2.0 Landkarte

artwork by R. Munroe http://xkcd.com/

29.12.2009Prof. Andreas Hotho 5

Agenda

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2 4 6 8 10 12 14

rank

month

"blog""css"

"design""linux"

"music""news"

"programming""software"

"web"

Einleitung

Tagging

Ranking

Tag-Ähnlichkeiten

Recommender

29.12.2009Prof. Andreas Hotho 6

Lesezeichen im Web

29.12.2009Prof. Andreas Hotho 7

Lesezeichen im Web

Tags

User

Resource

29.12.2009Prof. Andreas Hotho 8

Lesezeichen für Audio Streams

Tags

Users

Resource

29.12.2009Prof. Andreas Hotho 9

Lesezeichen für Photos

Tags

UserResource

29.12.2009Prof. Andreas Hotho 10

Lesezeichen für Videos

Tags

UserResource

29.12.2009Prof. Andreas Hotho 11

Unser System: BibSonomy

BibSonomy zum Teilen von Bookmarks, zur Verwaltung von Literaturlisten

für Forscher, für Forschungsgruppen, für Projekte, ...

http://www.bibsonomy.org

12

Folksonomies allow users

to assign tags

to resources.

Folksonomies

A folksonomy is a tuple F := (U, T, R, Y, Á) where U, T, and R are finite sets, whose elements are called users, tags and resources, Y µ U £ T £ R, called set of tag assignments, Á µ U £ T £ T is a user-specific sub-tag/super-tag relation.

The personomy Pu of user u is the restriction of F to u.

29.12.2009Prof. Andreas Hotho 13

Alle sind am Taggen…

einfacher Weg zur Organisation von Ressourcen

sofort nützlich

Allerdings ist das Vokabular unkontrolliert.

Indizien für konvergierendes und gemeinsam genutztes Vokabular (emergente Semantik): geteiltes implizites Wissen

gegenseitige Beeinflussung der Nutzer

zugrunde liegendes Netzwerk(Folksonomy)

Tag NutzerRessource

http://xkcd.com/

14

Dataset

Data from the del.icio.us folksonomy site Obtained in July 2005 (monthly dumps (14) June 2004 – July 2005) Consists of

|U| = 75,242 users |T| = 533,191 tags |R| = 3,158,297 resources |Y| = 17,362,212 triples

29.12.2009Prof. Andreas Hotho 15

Power-Law-Verteilung in del.icio.us

Tag “unlabeled” kommt 415,950 mal vor

Tag “web” kommt 238,891 mal vor

ungefähr 40% der Tags kommen genau einmal vor

29.12.2009Prof. Andreas Hotho 16

Small World

Milgram prägte den Begriff „Small World“:(Stanley Milgram. The small world problem. Psychology Today, 67(1):61–67, 1967.)

Praktisches Experiment in den USA Zwei beliebige Personen in den USA sind durch eine sehr kurze

Kette miteinander verbunden: „six degrees of separation”

Folksonomies besitzen die so genannten „Small World” Eigenschaften:

kurze charakteristische Pfadlänge hohe Clusterung im Graphen

29.12.2009Prof. Andreas Hotho 17

Small World

29.12.2009Prof. Andreas Hotho 18

Agenda

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2 4 6 8 10 12 14

rank

month

"blog""css"

"design""linux"

"music""news"

"programming""software"

"web"

Einleitung

Tagging

Ranking

Tag-Ähnlichkeiten

Recommender

19

Search in Folksonomies

Search engines need1. to compute the hits for a query2. and rank them. PageRank algorithm is very successful in the web

(see Google):

each row of A is normalized to 1

Authority values are propagated along the hyperlink according to

x à d Ax + (1-d) p

where A is the row-stochastic adjacency matrix of the web graph, x is the rank vector, p is the random surfer component

(may be used as preference vector),d 2 [0,1] is a weighting factor.

If |A|1 := |p|1 := 1 and there are no rank sinks, then the computation of a fixed point equals the computation of the first eigenvector of the matrix dA + (1-d) p1T .

20

Search in Folksonomies

Folksonomies have a different structure as the web graph:

Web graph Folksonomies

How can a ranking algorithm for this structure look like?

User 3User 4

User 2User 3

User 4

User 2User 3

User 4

User 1User 2

User 3User 4

User 3User 4

User 2User 3

User 4

User 2User 3

User 4

Tag 1Tag 2

Tag 3

Res 1Res 2

Res 3

21

First Aproach: Adapted PageRank

1. Split each hyperedge into six directed edges.

1. Iterative weight propagation according to PageRank:

x à d Ax + (1-d) p .

User 1

Tag 1

Res 1

User 1

Tag 1

Res 1

22

Ranking in Folksonomies: FolkRank

Problems of folksonomy-adapted PageRank dominated by graph structure undirected: weight flows back (PageRank ¼ edge degree)

Differential approach compute rank with and without preferences FolkRank = difference between those rankings normalized to [0,1]

Let RAP be the fixed point with p = 1 Let Rpref be the fixed point with p representing the high

weights for the preferred items R := Rpref – RAP is the final weight vector

23

Results for: “Semantic Web”

PageRank without preference PageRank with preference FolkRank with preference

24

Rankings for „semanticweb“

for discovering semantic relationships, user comunities, and web pages

29.12.2009Gerd Stumme 25

Trends with respect to tag “politics”

US elections in Nov. 2004

29.12.2009Prof. Andreas Hotho 26

Agenda

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2 4 6 8 10 12 14

rank

month

"blog""css"

"design""linux"

"music""news"

"programming""software"

"web"

Einleitung

Tagging

Ranking

Tag-Ähnlichkeiten

Recommender

29.12.2009Prof. Andreas Hotho 27

Assoziationsregeln

Wenn ein Nutzer Ressourcen mit tag ti getaggt hat, dann hat er auch häufig tj dafür genutzt.

Anwendung: Empfehlen von Tags Lernen von Abhängigkeiten (Taghierarchie)

Aufgabe: Finde alle Regeln der Form: Kunden die i1, ..., in gekauft haben, haben auch j1, ... , jm .

28

Folksonomy Dataset

Del.icio.us crawl 2006 |U| = 667,128 |T| = 2,454,546 |R| = 18,782,132 |Y| = 140,333,714

Excerpt: 10,000 most popular tags |U| = 476,378 |T| = 10,000 |R| = 12,660,470 |Y| = 101,491,722

In the following: tag rank = position in most-popular list: 1: design 2: software 3: blog 4: web …

29

social similarity

29.12.2009Prof. Andreas Hotho 30

contextart graphic creative print portfolios niceweb2.0 web2 web-2.0 webapp “web web_2.0news blogs people weblog culture futurehowto how-to guide tutorials help how_tovideo entertainment awesome fun cool randomajax dhtml dom js ecmascript webdev

tutorial tutorials tips coding code examplesjavascript webdevelopment webdev example examples webprogramming

art design photography illustration blog graphicsweb2.0 ajax web tools blog webdesignnews blog technology politics media dailyhowto tutorial reference tips linux programmingvideo music funny tv software mediaajax javascript web2.0 web programming webdesign

tutorial howto programming reference design cssjavascript ajax programming css web webdesign

freq

Semantische Beziehungen zwischen Tags in Bookmarking Systemen

Ciro Cattuto and Dominik Benz and Andreas Hotho and Gerd Stumme. Semantic Grounding of Tag Relatedness in Social Bookmarking Systems. The Semantic Web - ISWC 2008, 615--631,2008.

29.12.2009Prof. Andreas Hotho 31

Original Tag: „java“

Ähnlichstes Tag:

Freq, FolkRank:„programming“

Cosine:„python“

Beispiel einer semantischen Fundierung

computers

programming

languagesdesign_patterns

java python

Wordnet Synset Hierarchie:

Abb.

29.12.2009Andreas Hotho 32

siblingslength of shortest path

to most related tag

random

shortest paths in WordNet

29.12.2009Andreas Hotho 33

Results for delicious together with similarity pruning

29.12.2009Andreas Hotho 34

Results for delicious together with similarity pruning

29.12.2009Prof. Andreas Hotho 35

Agenda

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2 4 6 8 10 12 14

rank

month

"blog""css"

"design""linux"

"music""news"

"programming""software"

"web"

Einleitung

Tagging

Ranking

Tag-Ähnlichkeiten

Recommender

36

Personalized Tag Recommendation

37

Personalized Tag Recommendation

38

Personalized Tag Recommendation

Datasets

Pruning the graph based on the post degree (compute the p-core at level k, cf. Batagelj and Zaversnik 2002)

Characteristics of the p-cores at level k.

39

Personalized Tag Recommendation

Results delicious: post core precision/recall plot at level 10

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Prec

isio

n

Recall

FolkRankCollaborative Filtering UT

most popular tags by resourceCollaborative Filtering UR

adapted PageRankmost popular tags

40

Agenda

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2 4 6 8 10 12 14

rank

month

"blog""css"

"design""linux"

"music""news"

"programming""software"

"web"

Einleitung

Tagging

Ranking

Tag-Ähnlichkeiten

Recommender