Christian Körner 1, Dominik Benz 2, Andreas Hotho 3, Markus Strohmaier 1, Gerd Stumme 2 Stop...
-
Upload
martin-morrison -
Category
Documents
-
view
213 -
download
0
Transcript of Christian Körner 1, Dominik Benz 2, Andreas Hotho 3, Markus Strohmaier 1, Gerd Stumme 2 Stop...
Christian Körner1, Dominik Benz2, Andreas Hotho3,
Markus Strohmaier1, Gerd Stumme2
Stop thinking, start tagging:Tag Semantics arise from
Collaborative Verbosity
1Knowledge Management Institute
and Know Center,Graz University of
Technology, Austria
2Knowledge and Data Engineering Group (KDE), University of
Kassel, Germany
3Data Mining and Information Retrieval
GroupUniversity of Würzburg,
Germany
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 2 / 20
Where do Semantics come from?
Semantically annotated content is the „fuel“ of the next generation World Wide Web – but where is the petrol station?
Expert-built expensive
Evidence for emergent semantics in Web2.0 data Built by the crowd!
Which factors influence emergence of semantics?
Do certain users contribute more than others?
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 3 / 20
The Story
Emergent Tag Semantics
Pragmatics of tagging
Semantic Implications of Tagging Pragmatics
Conclusions
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 4 / 20
Emergent Tag Semantics
tagging is a simple and intuitive way to organize all kinds of resources
uncontrolled vocabulary, tags are „just strings“
formal model: folksonomy F = (U, T, R, Y) Users U, Tags T, Resources R Tag assignments Y (UTR)
evidence of emergent semantics Tag similarity measures can
identify e.g. synonym tags (web2.0, web_two)
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 5 / 20
Tag Similarity Measures: Tag Context Similarity
Tag Context Similarity is a scalable and precise tag similarity measure [Cattuto2008,Markines2009]: Describe each tag as a context vector Each dimension of the vector space correspond to
another tag; entry denotes co-occurrence count Compute similar tags by cosine similarity
5 30 1 10 50design
software blog web programming
…JAVA
Will be used as indicator of emergent semantics!
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 6 / 20
= tag
Assessing the Quality of Tag Semantics
JCN(t,tsim) = 3.68TagCont(t,tsim) = 0.74
Folksonomy Tags= synset
WordNet Hierarchy
Mapping
Average JCN(t,tsim) over all tags t: „Quality of semantics“
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 7 / 20
The Story
Pragmatics of tagging
Semantic Implications of Tagging Pragmatics
Conclusions
Tag Similarity measures can capture emergent tag semantics
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 8 / 20
Tagging motivation
Evidence of different ways HOW users tag (Tagging Pragmatics)
Broad distinction by tagging motivation [Strohmaier2009]:
donuts
duff
margebeer
bart
barty
Duff-beer
bev
alc nalc
beer wine
„Categorizers“…
- use a small controlled tag vocabulary
- goal: „ontology-like“ categorization by tags, for later browsing
- tags a replacement for folders
„Describers“…
- tag „verbously“ with freely chosen words
- vocabulary not necessarily consistent (synomyms, spelling variants, …)
- goal: describe content, ease retrieval
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 9 / 20
Tagging Pragmatics: Measures
How to disinguish between two types of taggers? Intuition: Describers use open set of many tags,
Categorizers use small set of controlled tags:
Vocabulary size:
Tag / Resource ratio:
Average # tags per post:
high
low
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 10 / 20
Tagging Pragmatics: Measures
Next Intuition: Describers don‘t care about „abandoned“ tags, Categorizers do
Orphan ratio:
R(t): set of resources tagged by user u with tag t
high
low
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 11 / 20
Tagging pragmatics: Limitations of measures
Real users: no „perfect“ Categorizers / Describers, but „mixed“ behaviour
Possibly influenced by user interfaces / recommenders
Measures are correlated
But: independent of semantics; measures capture usage patterns
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 12 / 20
The Story
Semantic Implications of Tagging Pragmatics
Conclusions
Tag Similarity measures can capture emergent tag semantics
Measures of tagging pragmatics differentiate users by tagging motivation
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 13 / 20
Influence of Tagging Pragmatics on Emergent Semantics
Idea: Can we learn the same (or even better) semantics from the folksonomy induced by a subset of describers / categorizers?
Extreme Categorizers
Extreme Describers
Complete folksonomy
Subset of 30% categorizers
= user
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 14 / 20
Experimental setup
1. Apply pragmatic measures vocab, trr, tpp, orphan to each user2. Systematically create „sub-folksonomies“ CFi / DFi by subsequently
adding i % of Categorizers / Describers (i = 1,2,…,25,30,…,100)
3. Compute similar tags based on each subset (TagContext Sim.)4. Assess (semantic) quality of similar tags by avg. JCN distance
TagCont(t,tsim)= …
JCN(t,tsim)= …
DF20CF5
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 15 / 20
Dataset
From Social Bookmarking Site Delicious in 2006 ORIGINAL Two filtering steps (to make measures more meaningful):
Restrict to top 10.000 tags FULL Keep only users with > 100 resources MIN100RES
dataset |T| |U| |R| |Y|
ORIGINAL 2,454,546 667,128 18,782,132
140,333,714
FULL 10,000 511,348 14,567,465
117,319,016
MIN100RES
9,944 100,363 12,125,176
96,298,409
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 16 / 20
Results – adding Describers (DFi)
Almost all sub-folksonomies are better than random-picked ones
40% of describers according to trr outperform complete data!
Optimal performance for 70% describers (trr)
more describers
bett
er
sem
an
tics
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 17 / 20
Results – adding Categorizers (CFi)
Almost all sub-folksonomies are worse than random-picked ones
Global optimum for 90% categorizers (tpp) removing 10% most extreme describers!(Spammers?)
bett
er
sem
an
tics
more categorizers
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 18 / 20
The Story
Tag Similarity measures can capture emergent tag semantics
Measures of tagging pragmatics differentiate users by tagging motivation
Sub-folksonomies introduced by measures of pragmatics show different semantic qualities
Conclusions
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 19 / 20
Summary & Conclusions
Introduction of measures of users‘ tagging motivation (Categorizers vs. Describers)
Evidence for causal link between tagging pragmatics (HOW people use tags) and tag semantics (WHAT tags mean)
„Mass matters“ for „wisdom of the crowd“, but composition of crowd makes a difference („Verbosity“ of describers in general better, but with a limitation)
Relevant for tag recommendation and ontology learning algorithms
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 20 / 20
Guess who‘s a Categorizer from the authors
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 21 / 20
Thanks for the attention! Questions? Be verbous
Tag Similarity measures can capture emergent tag semantics
Measures of tagging pragmatics differentiate users by tagging motivation
Sub-folksonomies introduced by measures of pragmatics show different semantic qualities
Evidende of causal link between pragmatics and semantics of tagging!
[email protected]@cs.uni-kassel.de
30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 22 / 20
References
[Cattuto2008] Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme: Semantic Grounding of Tag Relatedness in Social Bookmarking Systems. In: Proc. 7th Intl. Semantic Web Conference (2008), p. 615-631
[Markines2009] Benjamin Markines, Ciro Cattuto, Filippo Menczer, Dominik Benz, Andreas Hotho, Gerd Stumme: Evaluating Similarity Measures for Emergent Semantics of Social Tagging. In: Proc. 18th Intl. World Wide Web Conference (2009), p.641-641
[Strohmaier2009] Markus Strohmaier, Christian Körner, Roman Kern: Why do users tag? Detecting users‘ motivation for tagging in social tagging systems. Technical Report, Knowledge Management Institute – Graz University of Technology (2009)