Tags as tools for social classification

Tags as Tools for Social Classification

Dr. Isabella Peters

Department of Information ScienceInstitute for Language and Information

Heinrich-Heine-University Düsseldorf, Germany

34th Annual Conference of the German Classification Society, July 2010

July 21, 2010 GfKl Symposium 2010 2

Outline

Theoretical assumptions: • Social classification can be based on folksonomies• Power Tags are most relevant tags• Tag distributions on resource level become stable

Three main research questions:• How to build social classifications (automatically) ?• Are Power Tags most relevant for a resource?• (When do tag distributions become stable?)

Results• Based on study with students of University of

Düsseldorf


Assumption I

Social classification can be based on folksonomies • Folksonomy = sum of all tags of all users of a collaborative

information service (e.g. delicious)

• Platform folksonomy vs. resource folksonomy

• Broad folksonomy (delicious) vs. narrow folksonomy (youtube)

• Social classification = collaborative knowledge representation with natural-language terms = “social categorization”


Assumption I

Social classification can be based on folksonomies • Resource folksonomy reflects via tags collective user intelligence

in giving meaning to the resource

• Most popular tags are the most important tags for the resource = Power Tags

Only observable in broad folksonomies because of multiple tagging!

• Folksonomies deliver concept candidates for social classification


Method I

• Aim: Finding tag pairs for construction of social classification

• Step 1: Calculating Power Tags for resource– Number n of Power Tags depends on

type of tag distribution• Power law

n = exponent• Inverse-logistic distribution

n = tags left from turning point

0

20

40

60

80

100

120

140

160

Tag 1 Tag 2 Tag 3 Tag 4 Tag 5 Tag 6 Tag 7 Tag 8 Tag 9 Tag 10

0

10

20

30

40

50

60

70

80

90

100

Tag 1 Tag 2 Tag 3 Tag 4 Tag 5 Tag 6 Tag 7 Tag 8 Tag 9 Tag 10

Social classification can be based on folksonomiesPower Law

Inverse-logistic distribution


Method I

• Step 2: Calculating co-occurrence for Power Tags and tags of platform folksonomy

• Basis = Power Tags I from resource level

• Power Tags II = co-occurring tags from platform level

• Tag pair is most valuable for social categorization

Because of reflecting collective user intelligence

0

10000

20000

30000

40000

50000

60000

70000

mo

bile

go

og

le

dev

elo

pm

ent

ph

on

e

pro

gra

mm

ing

app

s

soft

war

e

htc

hac

k

her

o

des

ire

Social classification can be based on folksonomies

02468

101214161820

and

roid

dev

elo

pm

ent

go

og

le

mo

bile

pro

gra

mm

ing

gen

erat

or

too

ls

app

too

ls

apis

Power Tags I

Power Tags II


Research Question I

• Step 3: Determination of Power Tags I and II can be carried out automatically

1) Identifying distribution type2) Labeling first n tags as Power Tags I3) Identifying co-occurring tags4) Identifying distribution type5) Extracting first n tags as Power Tags II6) Combining Power Tags I and Power Tags II as tag pairs

• Step 4: Intellectual determination of relationship between Power Tags I and Power Tags II collaborative or individual

How to build social classifications (automatically) ?


Research Question I

Examples: 1. a) Power Tags I

– Android1. b) Power Tags II

– Mobile– Google

2. a) Power Tags I– Web 2.0

2. b) Power Tags II– Tools– Social– Blog – Socialsoftware – Bookmarks

descriptor set relation

Android

RT mobile related term association

RT Google related term association

How to build social classifications (automatically) ?

– Community– Tagging– Web – AJAX– online

descriptor set relation

Web 2.0

UF Socialsoftware used for synonymy

BT web broader term hierarchy

NTP blog narrower term partitive meronymy

NTP bookmarks narrower term partitive meronymy

NTP tagging narrower term partitive meronymy

NTP community narrower term partitive meronymy

NTP ajax narrower term partitive meronymy

RT online related term association


Assumption II

Power Tags are most relevant tags • To build social classifications based on Power Tags an important

precondition must be fulfilled:– Power Tags ARE the most relevant tags for a resource

• Problem: relevance judgments as well as tagging behaviour are highly subjective and error-prone (regarding spelling etc.)

• Is the collective intelligence of users capable of “ironing out” too personal and erroneous tags so that all users are satisfied with high-frequent tags?


Method II

Power Tags are most relevant tags • Investigation of 30 resources downloaded from delicious in February

2010

• Participants: 20 students of Information Science at the HHU Düsseldorf

• All resources tagged with “folksonomy” and tagged from at least 100 users– To guarantee that students are technical able to judge relevance of tags– To guarantee that broad tag distributions can be used as test sample

• User evaluation– Tag is relevant for resource = indicated with 1– Tag is not relevant for resource = indicated with 0– Students had access to resource– Students did not know the delicious-rank of the tags– Relevance distribution of tags for every resource by student judgments


Research Question II

Are Power Tags most relevant for a resource? • Determination of relevance: 50% and more of students judged tag as relevant• Extraction of Top 10-delicious-tags• How many students called these Top 10-tags relevant?• Calculation of relative frequency of students relevance judgments

Ø Pearson ≈ 0,49 N = 30


Research Question II

Are Power Tags most relevant for a resource? • Result: only the first two tags are relevant• Strong indication for Power Tags

• Problems in relevance judgments• Bias to german tags• No unification of spelling variants solution: tag gardening (NLP)• No combination of phrase tags

0,00

0,10

0,20

0,30

0,40

0,50

0,60

0,70

0,80

0,90

Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank 7 Rank 8 Rank 9 Rank10


Assumption III

Tag distributions on resource level become stable • Studies showed that the shape of tag distributions remains stable

after reaching a particular number of tags and users– Kipp & Campbell (2006)– Maarek et al. (2006)– Halpin, Robu, & Shepherd (2007)– Maass, Kowatsch, & Münster (2007)– Maier & Thalmann (2007)


Assumption III

Tag distributions on resource level become stable • If this assumption is true and “stable” is considered as

– No rank permutation of tags appear anymore– Relative number of tags does not change anymore

it means that …

– Power Tags I and II are like controlled vocabulary for a resource– Users gained consenus in describing and tagging the resource – visualized

in Power Tags– Tags in Long Tail of distribution may be synonyms, tags with typing errors,

narrower concepts, etc.


Open Research Question III

When do tag distributions become stable? • To automate classification processes we need to know after

which number of tagging users a tag distribution remains stable and when no changes in the ranking of tags appear anymore

• After that we can extract Power Tags for social classification for the particular resource


Open Research Question & Method III

When do tag distributions become stable? • Comparison of tag distribution with n users and final tag

distribution (downloaded at a point in time)

• Calculation of relative frequency of every tag rel. freq (t1 … tn) for particular user numbers

• Calculation of average distance between final tag distribution and tag distribution with n users – Subtraction of ∑rel. freq (tn,fd) of final distribution and ∑rel. freq (tn ,td)

of tag distribution with n users

• Stability achieved when ∑rel. freq (tn,fd) - ∑rel. freq (tn ,td) < threshold value


Conclusion

• Social Classification can be based on folksonomies – Power Tags are concept candidates

• Extraction of Power Tags I and II pairs can be carried out automatically

• Determination of the relationship inherent in tag pairs requires intellectual processing

• Power Tags are most relevant tags

• Relevance of tags can be enhanced through unification and combination of similar tags (here: not synonyms but spelling variants) tag gardening

• Ongoing research: when do tag distributions become stable?


Conclusion

What type of tag distribution ?

Tag distribution

stable?

Extraction of Power Tags I & II

Pairs of relevant Power Tags

Candidate vocabulary

Definition of concepts and of

semantic relations

Intellectual structuring

Social knowledge organization

system

Automatic processing

Intellectual processing


Comments? Questions?

Isabella Peters: [email protected]

Greetings from Düsseldorf!

This presentation is available on SlideShare: http://www.slideshare.net/isabellapeters.


References

• Halpin, H., Robu, V. and Shepherd, H. (2007): The Complex Dynamics of Collaborative Tagging. In: Carey L. Williamson, C. L., Zurko, M. E., Patel-Schneider, P. F. and Shenoy, P. J. (Eds.): Proceedings of the 16th International WWW Conference, Ban, Alberta, Canada. ACM, New York, 211-220.

• Kipp, M., & Campbell, D. (2006). Patterns and Inconsistencies in Collaborative Tagging Systems: An Examination of Tagging Practices. In Proceedings of the 17th Annual Meeting of the American Society for Information Science and Technology, Austin, Texas, USA .

• Maarek, Y., Marnasse, N., Navon, Y., & Soroka, V. (2006). Tagging the Physical World. In Proceedings of the Collaborative Web Tagging Workshop at WWW 2006, Edinburgh, Scotland.

• Maass, W., Kowatsch, T., & Münster, T. (2007). Vocabulary Patterns in Free-for-all Collaborative Indexing Systems. In Proceedings of International Workshop on Emergent Semantics and Ontology Evolution, Busan, Korea (pp. 45–57).

• Maier, R., & Thalmann, S. (2007). Kollaboratives Tagging zur inhaltlichen Beschreibung von Lern- und Wissensressourcen. In R. Tolksdorf & J. Freytag (Eds.), Proceedings of XML Tage, Berlin, Germany, Proceedings of XML Tage, Berlin, Germany (pp. 75–86). Berlin: Freie Universität Berlin.

• Peters, I. (2009). Folksonomies: Indexing and Retrieval in Web 2.0. Berlin: De Gruyter, Saur.

• Peters, I., & Stock, W. G. (2010). "Power Tags" in Information Retrieval. Library Hi Tech, 28(1), 81-93.

• Peters, I., & Weller, K. (2008). Tag Gardening for Folksonomy Enrichment and Maintenance. Webology, 5(3), Article 58, from http://www.webology.ir/2008/ v5n3/a58.html.

• Stock, W.G. (2006). On Relevance Distributions. Journal of the American Society for Information Science and Technology, 57(8), 1126-1129.

Tags as tools for social classification

Documents

Transcript of Tags as tools for social classification