Tags as tools for social classification
-
Upload
isabella-peters -
Category
Documents
-
view
106 -
download
0
description
Transcript of Tags as tools for social classification
Tags as Tools for Social Classification
Dr. Isabella Peters
Department of Information ScienceInstitute for Language and Information
Heinrich-Heine-University Düsseldorf, Germany
34th Annual Conference of the German Classification Society, July 2010
July 21, 2010 GfKl Symposium 2010 2
Outline
Theoretical assumptions: • Social classification can be based on folksonomies• Power Tags are most relevant tags• Tag distributions on resource level become stable
Three main research questions:• How to build social classifications (automatically) ?• Are Power Tags most relevant for a resource?• (When do tag distributions become stable?)
Results• Based on study with students of University of
Düsseldorf
July 21, 2010 GfKl Symposium 2010 3
Assumption I
Social classification can be based on folksonomies • Folksonomy = sum of all tags of all users of a collaborative
information service (e.g. delicious)
• Platform folksonomy vs. resource folksonomy
• Broad folksonomy (delicious) vs. narrow folksonomy (youtube)
• Social classification = collaborative knowledge representation with natural-language terms = “social categorization”
July 21, 2010 GfKl Symposium 2010 4
Assumption I
Social classification can be based on folksonomies • Resource folksonomy reflects via tags collective user intelligence
in giving meaning to the resource
• Most popular tags are the most important tags for the resource = Power Tags
Only observable in broad folksonomies because of multiple tagging!
• Folksonomies deliver concept candidates for social classification
July 21, 2010 GfKl Symposium 2010 5
Method I
• Aim: Finding tag pairs for construction of social classification
• Step 1: Calculating Power Tags for resource– Number n of Power Tags depends on
type of tag distribution• Power law
n = exponent• Inverse-logistic distribution
n = tags left from turning point
0
20
40
60
80
100
120
140
160
Tag 1 Tag 2 Tag 3 Tag 4 Tag 5 Tag 6 Tag 7 Tag 8 Tag 9 Tag 10
0
10
20
30
40
50
60
70
80
90
100
Tag 1 Tag 2 Tag 3 Tag 4 Tag 5 Tag 6 Tag 7 Tag 8 Tag 9 Tag 10
Social classification can be based on folksonomiesPower Law
Inverse-logistic distribution
July 21, 2010 GfKl Symposium 2010 6
Method I
• Step 2: Calculating co-occurrence for Power Tags and tags of platform folksonomy
• Basis = Power Tags I from resource level
• Power Tags II = co-occurring tags from platform level
• Tag pair is most valuable for social categorization
Because of reflecting collective user intelligence
0
10000
20000
30000
40000
50000
60000
70000
mo
bile
go
og
le
dev
elo
pm
ent
ph
on
e
pro
gra
mm
ing
app
s
soft
war
e
htc
hac
k
her
o
des
ire
Social classification can be based on folksonomies
02468
101214161820
and
roid
dev
elo
pm
ent
go
og
le
mo
bile
pro
gra
mm
ing
gen
erat
or
too
ls
app
too
ls
apis
Power Tags I
Power Tags II
July 21, 2010 GfKl Symposium 2010 7
Research Question I
• Step 3: Determination of Power Tags I and II can be carried out automatically
1) Identifying distribution type2) Labeling first n tags as Power Tags I3) Identifying co-occurring tags4) Identifying distribution type5) Extracting first n tags as Power Tags II6) Combining Power Tags I and Power Tags II as tag pairs
• Step 4: Intellectual determination of relationship between Power Tags I and Power Tags II collaborative or individual
How to build social classifications (automatically) ?
July 21, 2010 GfKl Symposium 2010 8
Research Question I
Examples: 1. a) Power Tags I
– Android1. b) Power Tags II
– Mobile– Google
2. a) Power Tags I– Web 2.0
2. b) Power Tags II– Tools– Social– Blog – Socialsoftware – Bookmarks
descriptor set relation
Android
RT mobile related term association
RT Google related term association
How to build social classifications (automatically) ?
– Community– Tagging– Web – AJAX– online
descriptor set relation
Web 2.0
UF Socialsoftware used for synonymy
BT web broader term hierarchy
NTP blog narrower term partitive meronymy
NTP bookmarks narrower term partitive meronymy
NTP tagging narrower term partitive meronymy
NTP community narrower term partitive meronymy
NTP ajax narrower term partitive meronymy
RT online related term association
July 21, 2010 GfKl Symposium 2010 9
Assumption II
Power Tags are most relevant tags • To build social classifications based on Power Tags an important
precondition must be fulfilled:– Power Tags ARE the most relevant tags for a resource
• Problem: relevance judgments as well as tagging behaviour are highly subjective and error-prone (regarding spelling etc.)
• Is the collective intelligence of users capable of “ironing out” too personal and erroneous tags so that all users are satisfied with high-frequent tags?
July 21, 2010 GfKl Symposium 2010 10
Method II
Power Tags are most relevant tags • Investigation of 30 resources downloaded from delicious in February
2010
• Participants: 20 students of Information Science at the HHU Düsseldorf
• All resources tagged with “folksonomy” and tagged from at least 100 users– To guarantee that students are technical able to judge relevance of tags– To guarantee that broad tag distributions can be used as test sample
• User evaluation– Tag is relevant for resource = indicated with 1– Tag is not relevant for resource = indicated with 0– Students had access to resource– Students did not know the delicious-rank of the tags– Relevance distribution of tags for every resource by student judgments
July 21, 2010 GfKl Symposium 2010 11
Research Question II
Are Power Tags most relevant for a resource? • Determination of relevance: 50% and more of students judged tag as relevant• Extraction of Top 10-delicious-tags• How many students called these Top 10-tags relevant?• Calculation of relative frequency of students relevance judgments
Ø Pearson ≈ 0,49 N = 30
July 21, 2010 GfKl Symposium 2010 12
Research Question II
Are Power Tags most relevant for a resource? • Result: only the first two tags are relevant• Strong indication for Power Tags
• Problems in relevance judgments• Bias to german tags• No unification of spelling variants solution: tag gardening (NLP)• No combination of phrase tags
0,00
0,10
0,20
0,30
0,40
0,50
0,60
0,70
0,80
0,90
Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank 7 Rank 8 Rank 9 Rank10
July 21, 2010 GfKl Symposium 2010 13
Assumption III
Tag distributions on resource level become stable • Studies showed that the shape of tag distributions remains stable
after reaching a particular number of tags and users– Kipp & Campbell (2006)– Maarek et al. (2006)– Halpin, Robu, & Shepherd (2007)– Maass, Kowatsch, & Münster (2007)– Maier & Thalmann (2007)
July 21, 2010 GfKl Symposium 2010 14
Assumption III
Tag distributions on resource level become stable • If this assumption is true and “stable” is considered as
– No rank permutation of tags appear anymore– Relative number of tags does not change anymore
it means that …
– Power Tags I and II are like controlled vocabulary for a resource– Users gained consenus in describing and tagging the resource – visualized
in Power Tags– Tags in Long Tail of distribution may be synonyms, tags with typing errors,
narrower concepts, etc.
July 21, 2010 GfKl Symposium 2010 15
Open Research Question III
When do tag distributions become stable? • To automate classification processes we need to know after
which number of tagging users a tag distribution remains stable and when no changes in the ranking of tags appear anymore
• After that we can extract Power Tags for social classification for the particular resource
July 21, 2010 GfKl Symposium 2010 16
Open Research Question & Method III
When do tag distributions become stable? • Comparison of tag distribution with n users and final tag
distribution (downloaded at a point in time)
• Calculation of relative frequency of every tag rel. freq (t1 … tn) for particular user numbers
• Calculation of average distance between final tag distribution and tag distribution with n users – Subtraction of ∑rel. freq (tn,fd) of final distribution and ∑rel. freq (tn ,td)
of tag distribution with n users
• Stability achieved when ∑rel. freq (tn,fd) - ∑rel. freq (tn ,td) < threshold value
July 21, 2010 GfKl Symposium 2010 17
Conclusion
• Social Classification can be based on folksonomies – Power Tags are concept candidates
• Extraction of Power Tags I and II pairs can be carried out automatically
• Determination of the relationship inherent in tag pairs requires intellectual processing
• Power Tags are most relevant tags
• Relevance of tags can be enhanced through unification and combination of similar tags (here: not synonyms but spelling variants) tag gardening
• Ongoing research: when do tag distributions become stable?
July 21, 2010 GfKl Symposium 2010 18
Conclusion
What type of tag distribution ?
Tag distribution
stable?
Extraction of Power Tags I & II
Pairs of relevant Power Tags
Candidate vocabulary
Definition of concepts and of
semantic relations
Intellectual structuring
Social knowledge organization
system
Automatic processing
Intellectual processing
July 21, 2010 GfKl Symposium 2010 19
Comments? Questions?
Isabella Peters: [email protected]
Greetings from Düsseldorf!
This presentation is available on SlideShare: http://www.slideshare.net/isabellapeters.
July 21, 2010 GfKl Symposium 2010 20
References
• Halpin, H., Robu, V. and Shepherd, H. (2007): The Complex Dynamics of Collaborative Tagging. In: Carey L. Williamson, C. L., Zurko, M. E., Patel-Schneider, P. F. and Shenoy, P. J. (Eds.): Proceedings of the 16th International WWW Conference, Ban, Alberta, Canada. ACM, New York, 211-220.
• Kipp, M., & Campbell, D. (2006). Patterns and Inconsistencies in Collaborative Tagging Systems: An Examination of Tagging Practices. In Proceedings of the 17th Annual Meeting of the American Society for Information Science and Technology, Austin, Texas, USA .
• Maarek, Y., Marnasse, N., Navon, Y., & Soroka, V. (2006). Tagging the Physical World. In Proceedings of the Collaborative Web Tagging Workshop at WWW 2006, Edinburgh, Scotland.
• Maass, W., Kowatsch, T., & Münster, T. (2007). Vocabulary Patterns in Free-for-all Collaborative Indexing Systems. In Proceedings of International Workshop on Emergent Semantics and Ontology Evolution, Busan, Korea (pp. 45–57).
• Maier, R., & Thalmann, S. (2007). Kollaboratives Tagging zur inhaltlichen Beschreibung von Lern- und Wissensressourcen. In R. Tolksdorf & J. Freytag (Eds.), Proceedings of XML Tage, Berlin, Germany, Proceedings of XML Tage, Berlin, Germany (pp. 75–86). Berlin: Freie Universität Berlin.
• Peters, I. (2009). Folksonomies: Indexing and Retrieval in Web 2.0. Berlin: De Gruyter, Saur.
• Peters, I., & Stock, W. G. (2010). "Power Tags" in Information Retrieval. Library Hi Tech, 28(1), 81-93.
• Peters, I., & Weller, K. (2008). Tag Gardening for Folksonomy Enrichment and Maintenance. Webology, 5(3), Article 58, from http://www.webology.ir/2008/ v5n3/a58.html.
• Stock, W.G. (2006). On Relevance Distributions. Journal of the American Society for Information Science and Technology, 57(8), 1126-1129.