Emergent Semantics from Game-induced Folksonomies · We find that game annotations tend to...

9
Emergent Semantics from Game-induced Folksonomies Lilian Weng * and Filippo Menczer Center for Complex Networks and Systems Research School of Informatics and Computing, Indiana University, Bloomington ABSTRACT We describe the GiveALink Slider, an experimental social tagging game designed with the purpose of generating meaningful and use- ful annotations to improve upon the drawbacks of existing folk- sonomies. Knowledge, in the form of reliable annotations, is val- idated and accumulated through the implicit interactions among multiple players. In this paper we explore the hypothesis that such a game can improve both the quality and quantity of social an- notation data. Our evaluation of game-induced annotations shows that games may improve on the semantic structure of existing folk- sonomies from several perspectives, including searchability, nov- elty, and coherence. Games can therefore play a valuable role in the collection of helpful annotations, by leveraging human power and specific game design features. Categories and Subject Descriptors H.3.4 [Information Storage and Retrieval]: Systems and Soft- ware—Information networks; H.4 [Information Systems Applica- tions]: Miscellaneous; J.4 [Computer Applications]: Social and Behavioral Sciences—Sociology General Terms Design, Measurement, Experimentation, Human Factors Keywords Social tagging, GiveALink, Game with a purpose, Folksonomies 1. INTRODUCTION Users can collaboratively organize and share Web resources through social tagging, as is widely done in many popular Web 2.0 ap- plications. Various types of online resources can be tagged, re- flecting user interests and becoming easily accessible to the public. These resources include Web pages (Delicious), blog posts (Live- Journal), photos (Flickr), music (Last.fm), movies (YouTube), and * Corresponding author. Email: [email protected] Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CrowdKDD ’12, August 12, 2012, Beijing, China Copyright 2012 ACM 978-1-4503-1557-9/12/08 ...$15.00. short messages (Twitter). The output of social tagging systems forms folksonomies, or community-induced taxonomies of impor- tant and emerging concepts [31, 22]. Folksonomic collections of social annotation data have been shown to be useful to improve social navigation [25], Web search [6, 35], personalized experi- ence [17], recommendation services [21], and social links predic- tion [28]. However, folksonomies do not scale with the growth rate of the Web [11], so that annotations are sparse. Their reliability is also often questioned, as spammers attempt to manipulate and exploit tagging systems [18]. To mitigate these limitations, this paper explores the hypothe- sis that games can leverage human power to produce high-quality social annotation data, inducing meaningful emergent semantics. We exemplifies with a specially designed game, GiveALink Slider (slider.givealink.org) [32, 34], to demonstrate the poten- tiality of crowdsourcing useful folksonomies with games and ex- plore the effectiveness of the mechanism we previously proposed [33]. Our tagging games follow the paradigm of games with a purpose [3, 4], solving hard computational problems by asking real people for helpful input in an entertaining way. We further adopt the design principles of the chain model [33] for object association games. The idea of the chain model is to collect semantic descriptions of various objects and discover hidden relations among them. After introducing relevant background in Sec. 2, we discuss game design in Sec. 3, aimed at making the GiveALink Slider entertain- ing while generating meaningful and useful output. We perform an analysis of data collected over an experimental user study, in- volving over 34,000 annotations by 200 players, with focuses on patterns of user tagging behaviors (Sec. 4) and evaluations of the quality of game-driven folksonomies (Sec. 5). The report of the game output is the main concentration of this paper, as it presents the value and the quality of game-driven emergent semantics and demonstrates various approaches for social annotation evaluations. Although our game is still an exploratory project where we re- solve to test whether certain design decisions can lead to annota- tions of satisfactory qualify as a complement to free tagging, the analysis does give substance to the hypothesis: such similar games can play a valuable role in the collection of quality data by lever- aging human abilities and specific game design features. The main contributions and findings of the paper are: We find that game annotations tend to gravitate toward topics familiar to users (Sec. 4.1). The confidence of annotations is validated through the agree- ment among multiple players, suggesting that game-induced annotations become more reliable over time (Sec. 4.2). We show that the patterns of social tagging activity of the game are consistent with human language, and in particular

Transcript of Emergent Semantics from Game-induced Folksonomies · We find that game annotations tend to...

Emergent Semantics from Game-induced Folksonomies

Lilian Weng∗

and Filippo MenczerCenter for Complex Networks and Systems Research

School of Informatics and Computing, Indiana University, Bloomington

ABSTRACTWe describe the GiveALink Slider, an experimental social tagginggame designed with the purpose of generating meaningful and use-ful annotations to improve upon the drawbacks of existing folk-sonomies. Knowledge, in the form of reliable annotations, is val-idated and accumulated through the implicit interactions amongmultiple players. In this paper we explore the hypothesis that sucha game can improve both the quality and quantity of social an-notation data. Our evaluation of game-induced annotations showsthat games may improve on the semantic structure of existing folk-sonomies from several perspectives, including searchability, nov-elty, and coherence. Games can therefore play a valuable role inthe collection of helpful annotations, by leveraging human powerand specific game design features.

Categories and Subject DescriptorsH.3.4 [Information Storage and Retrieval]: Systems and Soft-ware—Information networks; H.4 [Information Systems Applica-tions]: Miscellaneous; J.4 [Computer Applications]: Social andBehavioral Sciences—Sociology

General TermsDesign, Measurement, Experimentation, Human Factors

KeywordsSocial tagging, GiveALink, Game with a purpose, Folksonomies

1. INTRODUCTIONUsers can collaboratively organize and share Web resources through

social tagging, as is widely done in many popular Web 2.0 ap-plications. Various types of online resources can be tagged, re-flecting user interests and becoming easily accessible to the public.These resources include Web pages (Delicious), blog posts (Live-Journal), photos (Flickr), music (Last.fm), movies (YouTube), and

∗Corresponding author. Email: [email protected]

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.CrowdKDD ’12, August 12, 2012, Beijing, ChinaCopyright 2012 ACM 978-1-4503-1557-9/12/08 ...$15.00.

short messages (Twitter). The output of social tagging systemsforms folksonomies, or community-induced taxonomies of impor-tant and emerging concepts [31, 22]. Folksonomic collections ofsocial annotation data have been shown to be useful to improvesocial navigation [25], Web search [6, 35], personalized experi-ence [17], recommendation services [21], and social links predic-tion [28]. However, folksonomies do not scale with the growth rateof the Web [11], so that annotations are sparse. Their reliabilityis also often questioned, as spammers attempt to manipulate andexploit tagging systems [18].

To mitigate these limitations, this paper explores the hypothe-sis that games can leverage human power to produce high-qualitysocial annotation data, inducing meaningful emergent semantics.We exemplifies with a specially designed game, GiveALink Slider(slider.givealink.org) [32, 34], to demonstrate the poten-tiality of crowdsourcing useful folksonomies with games and ex-plore the effectiveness of the mechanism we previously proposed [33].Our tagging games follow the paradigm of games with a purpose [3,4], solving hard computational problems by asking real people forhelpful input in an entertaining way. We further adopt the designprinciples of the chain model [33] for object association games.The idea of the chain model is to collect semantic descriptions ofvarious objects and discover hidden relations among them.

After introducing relevant background in Sec. 2, we discuss gamedesign in Sec. 3, aimed at making the GiveALink Slider entertain-ing while generating meaningful and useful output. We performan analysis of data collected over an experimental user study, in-volving over 34,000 annotations by 200 players, with focuses onpatterns of user tagging behaviors (Sec. 4) and evaluations of thequality of game-driven folksonomies (Sec. 5). The report of thegame output is the main concentration of this paper, as it presentsthe value and the quality of game-driven emergent semantics anddemonstrates various approaches for social annotation evaluations.

Although our game is still an exploratory project where we re-solve to test whether certain design decisions can lead to annota-tions of satisfactory qualify as a complement to free tagging, theanalysis does give substance to the hypothesis: such similar gamescan play a valuable role in the collection of quality data by lever-aging human abilities and specific game design features. The maincontributions and findings of the paper are:

• We find that game annotations tend to gravitate toward topicsfamiliar to users (Sec. 4.1).

• The confidence of annotations is validated through the agree-ment among multiple players, suggesting that game-inducedannotations become more reliable over time (Sec. 4.2).

• We show that the patterns of social tagging activity of thegame are consistent with human language, and in particular

with both Zipf’s law on word frequencies [36] and Heap’slaw on vocabulary growth [10] in written text (Sec. 4.3).

• Evidence is offered that the game elicits novel annotation,aiding the discovery of semantic relations that used to be hid-den (Sec. 5.1).

• Game-driven folksonomies complement the existing socialannotation repository, evidenced by a few case studies lend-ing further support for the reliability of the game data (Sec. 5.2).

• Contrary to labeling bookmarks for personal reference, play-ers are found to tag both general and specific resources withgeneral tags, making these resources easier to search (Sec. 5.3).

• Finally, we show that the semantic networks of tags and re-sources networks become denser, better connected, and morecoherent as a result of the game-induced annotations (Sec. 5.4).

2. BACKGROUNDSocial tagging data come in the form of triples, defined as tri-

partite relationships 〈u, r, t〉 between a user u, a resource r, and atag t [24]. An annotation in the paper is defined as a tag-resourcepair 〈t, r〉, aggregated across multiple users who believe that t is adescriptive word for r, and who contribute corresponding triples.

Social tagging data, generated by autonomous behaviors of allthe users involved, help people manage and organize their onlineresource collections, and make sharing and searching easier. Otherthan these original functions, the data prove to be highly valuablein many other fields. For example, annotations can help enhanceWeb search [6, 35], generate user profiles [23], personalize recom-mendations and other user experiences [7, 29], improve social nav-igation and information accessibility [16, 3, 15], promote seman-tic Web techniques and construction of ontologies [13]. However,users share annotations largely for individual needs and aspirations,sometimes leading to low-quality annotation data. Current socialtagging systems lack enough motivation for the majority of usersto label many resources with sufficient numbers of accurately de-scriptive tags, and the number of new pages posted per day in socialtagging systems is small compared to the rate of Web growth [11].This causes a sparse semantic network of social annotations. With-out any control over tagging behaviors, users can easily employpoor tags or even abuse the system by spamming [18]. They canalso use tags that are unrelated to a resource, too general to mean-ingfully describe a given resource, or so specific that they are onlyuseful for one individual.

The idea of using games to help boost both the quality and quan-tity of social tagging is largely inspired by the popular usage ofserious games, social games, and crowdsourcing applications. Se-rious games are games designed with a non-entertainment goal,usually to facilitate learning in education, military, and other set-tings. Crowdsourcing applications harness community knowledgeto accomplish certain tasks. Examples include volunteer-based hu-man knowledge repositories such as Wikipedia, human subject taskmarketplaces such as Amazon Mechanic Turk [14], and scholarlydata collection and impact evaluation tools such as Scholarom-eter [12]. The idea of games with a purpose is a combinationof serious gaming with crowdsourcing. The best known instanceis the ESP Game [3], in which players are paired randomly andasked to achieve agreement on image labels. Many more gameswith such non-entertainment intentions are utilized in social con-texts for both scientific and business purposes. Games have beenleveraged to solve protein folding problems in biochemistry [8],celestial object recognition (galaxyzoo.org), and human brain

desc00 ,

desc01 ,

… , desc0

k 0

desc10 ,

desc11 ,

… , desc0

k 1

OUTPUT: (p, obj0, desc0

0 ) (p, obj0, desc0

1 ) ... (p, obj0, desc0

k0 ) … (p, obj1, desc1

0 ) (p, obj1, desc1

1 ) ... (p, objn-1, desc n-1

0 ) (p, objn-1, desc n-1

1 ) ... ...

obj0 obj1 objn-1 objn

Active Origin

desc n-10 ,

desc n-11

… desc n-1

k

? , ,

n-1

Figure 1: The primary design principle of GiveALink Slider:the chain model for object association games.

training (www.lumosity.com). Microsoft uses games in ClubBing (clubbing.com) to promote the Bing search engine. Allof these applications are instances of human computation [4].

The game presented here is designed as a part of the GiveAlink.org project, a social bookmarking platform built for research pur-poses. Previous research in the GiveALink project benefits thegame a lot, including a large initial triple repository, a scalableand reliable similarity measure named Maximum Information Path(MIP) [19, 20] for declaring relations among resources, tags, andusers, and a social spam filter [18]. The semantic space extractedfrom the social tagging data is mainly represented as networks oftags or resources, such as the inter-tag correlation graph definedin [9]. Edges can be associated with the similarity values amongnodes if the network is weighted. MIP is used for assigning theedge values in the Slider game.

GiveALink Slider originates from the simple idea of building achain of semantically related objects. The objects are connectedby a measure of similarity. The players extend the chain by mak-ing these relationships explicit. The idea is formalized as chainmodel [33] for object association games that collect descriptionsabout, and discover hidden relationships among, Web resources,media, people, and even geographic locations. The chain model, il-lustrated in Fig. 1, produces an ordered sequence of objects 〈obj0,. . . , objn〉. We refer to the last element objn as the active ob-ject. Chain model games allow players to characterize an objectobji with a set of descriptions Di = {desci0, . . . , desciki} in somelanguage. At each step the player p can add a new description toobjn or make a game move to extend the chain, i.e. a transitionobjn

µ7→ objn+1 where µ is a user-defined relation. To facilitate theplayer’s decision of the next object in the chain, the model suggestsa set of candidate objects Cn that are computed from a system-defined similarity measure with respect to the active object.

To discover user tagging behavior patterns is another related topicconcerning the study of user behavior to improve the user expe-rience and support system design. Previous studies have shownthat tags are frequently reused even among people who have dif-ferent background and knowledge [27], and personal experience,such as familiarity with resources, influences tag effectiveness forcontent sharing [17]. In this paper we will investigate how socialtagging works more effectively for sharing and searching, driven bya different tagging goal and by design features particular to gam-ing. Social tagging can also be exploited to characterize shareduser interests [27]. This is analogous to the approach we proposefor confirming annotations by shared social tagging actions. In ourgame, we design a measure of knowledge agreement and track howit changes over time.

Figure 2: Slider interfaces for tagging the active page and se-lecting pages among provided candidates.

3. GIVEALINK SLIDER

3.1 Game designGiveALink Slider (slider.givealink.org) is built on the

chain model for object association games [33, 34], where playerscan create chains of relevant Web resources by tagging and se-lecting Web pages. We experiment several design mechanisms inSlider to foster better game experience and the qualified data col-lections simultaneously.

The nodes of the chain model in Slider represent Web resources.Players build chains of pages and generate descriptions by choos-ing and tagging these pages. The origin page obj0 in the chainis provided at random, and then the player extends the chain bycontinuously tagging the active page with one or multiple tags. Inthe main screen of the game (Fig. 2) the chain is displayed in acarousel, and the player can easily switch among pages. The URL,title, and thumbnail of each page help the player easily learn itscontent. Each time the player enters tags for a page, the game dis-plays three candidate pages based on the new submission for theplayer to choose the next node in the chain. Another indicator ofthe chain is presented as a series of colored nodes above all Webpage thumbnails: connected adjacent nodes represent related pages,while nodes may be disconnected due to poor tag inputs or inappro-priate candidate choices. It notifies the player whether tags at theprevious step is good or not. The final output of the Slider game isa collection of manual labels for various Web pages.

Scoring. We expect to discover novel semantic relations and tostrengthen relationships that seem to be weaker than they shouldbe. This goal is fortified with the game scoring system, where eachuser submission is treated as a set of annotation pairs and each pairis classified into one of three categories:

Trusted annotations overlap with data in reliable external sources,such as GiveALink and Delicious;

Novel annotations have been suggested by multiple previous gameplayers, but not found in external data sources;

New annotations appear for the first time, and therefore it is hardto judge whether they are reasonable enough.

(a) Weak Social Links

(b) Strong Social Links

Figure 3: Visualizations of social links in GiveALink Slider.

The reliability of trusted and novel annotations has been con-firmed by external data or other players. We reward players forqualified submissions if the proportion of trusted or novel annota-tions in one submission is higher than a threshold ratio. However,the relevance of new annotations is still questionable, so they aresaved for future verification. The score of each tagging step is ob-tained by adding the score associated with each annotation pair.Novel annotations are most valued and new ones are worth less.Unqualified submissions lead to loss of points.

Enjoyability. The basic motivation of the game is to exploreunknown Web pages, while players can observe how their mindsflow. It is similar to one of the incentives of the online multiplayerword association game, Human Brain Cloud [2], that had achievedbig success in the past.

To make the game more enjoyable, we visualize social ego net-works based on game-driven links among players (Fig. 3). Duringeach tagging step the player may be connected to other players bystrong social links if they share the same annotations. Weak so-cial links can be constructed by sharing either common tags or re-sources. Both types of social links can be displayed in network vi-sualizations. Another incentive comes from the design of badges ofachievement, each corresponding to a game task with multiple lev-els of difficulty. Completion of harder tasks earns a larger bonus.Tasks deal with Web pages in the chain, social links with otherplayers, score milestones, and other game features.

Flexibility. Additionally, players are allowed to discard pagesthat are spam, broken links, or written in an unknown language,

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7Non-zero Jaccard Sim

0.0

0.2

0.4

0.6

0.8

1.0C

CD

FEmpiricalShuffled (baseline)

Figure 4: The red area is the complementary cumulative dis-tribution, P (X > x), of Jaccard similarities between tags inGiveALink (base vocabulary) and Slider game, for each player.The blue area is the baseline distribution, generated by shuffledgame tags across all game participants.

since those pages distract players and result in poor annotations.The player can replace a candidate page by reporting the reason.All the spam or broken link reports are stored, flagging suspiciousor invalid resources and helping improve the accuracy of the clas-sifier employed by the GiveALink spam detector [18].

3.2 User studyWe released GiveALink Slider for a six week online user study

between March and May 2011. The study eventually involved 200users. Most participants are of background in computer science,HCI or game design, as we announced the study in several on-line forums, discussion groups or mailing lists related with thesefields. Players generated 43,795 distinct triples (34,642 annota-tions), among which 13,860 triples (9,111 annotations) were con-firmed. On average, each player built a chain of about 26 pagesduring one game, contributed about 6 chains in total, and each Webpage gained about 6 tags.

4. USER TAGGING BEHAVIORIn this section we analyze the collection of triples generated by

the game to study the tagging behavior of players. First, we com-pare tags from the game versus those from an existing social book-marking system to examine whether players have a preference forfamiliar tags. Second, we define a measure of confidence to quan-tify the reliability of an annotation pair. The average confidence asa measure of the confirmed knowledge in the whole system is foundto grow with the player involvement and the system size. Third, wecompare patterns of tag usage with known regularities of writtentext.

4.1 Tag familiarityLearning a new word requires more work than referencing known

words. It is reasonable to assume that players may prefer to reuseknown words when they need to tag Web resources. If we con-sider an individual’s vocabulary to be a tag repository with sizelimited by human memory, the tags in the game should to someextent overlap with the tags used previously in other places or forother purposes. Since we have existing bookmark collections frommany game players in the GiveALink system, we are able to de-fine all the tags of a player in GiveALink as a base vocabulary thatshe knows and has used before. The Jaccard similarity between thebase vocabulary of a player and her game tags represents the extent

Figure 5: Probability distribution of annotation confidence.The data points are grouped into log-size bins and the fre-quency of each bin is normalized by the width of the bin. Thedistribution can be fitted by a log-normal with µ = −1.34 andσ = 0.72 by maximum-likelihood.

to which the player tends to employ familiar tags. As a baseline,we shuffled game tags across players to simulate players withoutany memory of words for reference and who thus contribute gametags at random. Fig. 4 shows that similarities are higher than thebaseline, suggesting that players are more likely to describe objectsaccording to familiar tags in their vocabulary.

4.2 User agreementA basic semantic relation in social tagging is an annotation 〈t, r〉

indicating that a Web resource r can be semantically described bya word t. To determine whether an annotation is reliable (seman-tically reasonable), which is one of the primary goals of Slider,we can validate it across multiple game participants. Each triplecontributor can increase the confidence in the reliability of annota-tions by agreeing with previous players. Agreement among mul-tiple players on a single annotation is strong evidence that this se-mantic relation is reliable. To quantify the extent to which play-ers concur on annotation pairs, we define a measure of agreementnamed confidence:

c(r, t) =N(r, t)− 1√

(N(r)− 1)(N(t)− 1)

where r is a Web resource, t is a tag, and N(·) is the numberof unique users with a resource, tag, or annotation. The numera-tor N(r, t) is the absolute agreement count for annotation 〈t, r〉.The denominator reduces the influence of very popular resourcesor tags; a large number of people using an annotation may resultfrom an extremely popular resource or tag, rather than a high confi-dence in the association between resource and tag. For example, theannotation 〈google.com, web〉 may have more supporters thanthe annotation 〈google.com, pagerank〉, owing to the fact thatthe tag web is heavily utilized to tag any online resource. Conse-quently, users may tag google.com with web simply becausethe tag itself is popular and general rather than because the tag isthought to be especially relevant. The presence of both tag and re-source popularity in the denominator of the definition of confidencediscounts such a popularity bias. All the counts are decremented byone to avoid personal use cases that has not be validated by morethan two distinct players.

The confidence of a pair 〈t, r〉 represents the strength of the con-

Figure 6: Growth of average confidence with system time, mea-sured by the total number of triples.

nection between tag t and resource r, that is, how well t describesr. We consider an annotation with high confidence to be reliable.According to Fig. 5, most annotation confidence scores are between2−3 = 0.125 and 2−1 = 0.5. We observe only a few annotationswith extremely low or high confidence scores. The low-confidenceannotations could be too personal to be useful to a general audi-ence and the high-confidence ones are likely well known, so themost valued knowledge derives from the social annotations withintermediate confidence.

With more players contributing triples to the system, the numberof confirmed semantic relations increases and the annotations gainconfidence. As shown in Fig. 6, the average annotation confidencegrows with system time, measured by the total number of avail-able triples. This also suggests that confidence is a good measureof annotation reliability, and could even be used to gauge the totalamount of semantic knowledge accumulated by the system. TheSlider game is able to capture more valued knowledge with moreaccumulated triples. In Web applications with social tagging com-ponents, confidence can be incorporated into search algorithms asit scores the capability of a keyword to describe search results.

4.3 Tagging versus written textThe way people collectively describe Web resources in the Slider

game is consistent with regularities of word usage in written text [5].We confirm this by verifying that both Zipf’s law [36] and Heaps’law [10] hold for game-induced social tagging data. Zipf’s lawsays the global word frequency in written documents is inverselyproportional to word rank by frequency, which is equivalent to say-ing that the frequency has a power-law distribution with exponentclose to 2. Heaps’ law describes the sub-linear growth of the num-ber of distinct words with the length of a document, according towhich vocabulary size grows more slowly than document size.

We can verify both regularities for the game output if we con-sider each Web resource as a document associated with a set oftags from the annotations. Under this view, the global popularityof tags displays a power-law distribution consistent with Zipf’s law(Fig. 7(a)). Furthermore, as the number of tags associated with asingle resource increases, the number of distinct tags grows sub-linearly, consistently with Heaps’ Law (Fig. 7(b)). Agreement onthe same annotation by multiple players generates the repetitionsof tags. Therefore Heaps’ Law is a reflection of the rise of systemconfidence and user agreement.

5. DATA QUALITY

100 101 102 103

Tag usage

10-710-610-510-410-310-210-1100

P

(a) Zipf’s Law

100 101 102 103

Url

100

101

102

103

Num

of d

escr

iptiv

e ta

gs

(b) Heaps’ Law

Figure 7: (a) Distribution of tag frequencies. The triangles areempirical data from the game, and the dashed line marks theslope corresponding to a power law with exponent around 2.(b) Number of unique tags versus total number of tags used fora single URL. The squares represent empirical data from thegame, and the dashed line is the diagonal, showing sub-lineargrowth.

We evaluate the quality of game-driven folksonomies from sev-eral perspectives, including the novelty of newly collected data, thereliability of semantic descriptions, resource searchability, and se-mantic network coherence. Game-driven social tagging data im-proves on each of these aspects, suggesting that it is a good com-plement to the semantic space of existing social tagging data.

5.1 NoveltyOne goal of game-driven folksonomies is to enhance the social

tagging data with fresh semantics that are missing in existent sys-tems. We are able to accumulate new triples and discover hiddensemantics only when the game data does not overlap excessivelywith existing data. First, we examined the novelty of annotationsby calculating the overlap between the game output and existingsocial tagging data. There are around 9,000 annotations confirmedthrough Slider; only 5.74% of them overlap with the GiveALinkbookmark repository and 21.87% appeared through the Delicioustag recommendation API. The remaining majority are novel anno-tations discovered via the game.

Then we analyzed the implicit semantics hidden under the col-lected triples and annotations, as semantic connections among tagsor resources. We calculated the similarity between each pair oftags or Web resources based on collected triples. We leveraged themaximum information path (MIP) similarity [20], a scalable andcollaborative measure, to compute relationships among all tags orresources, and generated two sets of tag-tag and resource-resourcesimilarities, respectively. The MIP similarity measure has the fol-lowing definition:

σMIPu (x1, x2) =

2 log(miny∈Xu1 ∩X

u2[p(y|u)])

log(miny∈Xu1[p(y|u)]) + log(miny∈Xu

2[p(y|u)])

where x represents a tag or a resource and X is its vector represen-tation [20]. If x is a tag, y represents a resource, and for any useru, y ∈ Xu indicates that there is a triple 〈u, y, x〉 where u tags ywith the tag x. The similarity has a value between [0, 1] where asimilarity of zero indicates that two tags or resources are unrelated.A relation between a pair of tags or resources is considered to benovel if σ = 0 according to existing data but σ > 0 accordingto game data. In other words, the relationship is newly discoveredthrough the game. In the data generated by our user study of theSlider game, 60.22% of tag-tag relations and 72.92% of resource-resource relations are novel.

The fact that the game can efficiently accumulate novel triples

8,403

Figure 9: Distribution of tag generality in Slider andGiveALink.

and uncover novel semantics originates from the different goalsof the game compared to traditional social bookmarking systems.Most existing social tagging websites mainly function as onlinebookmark repositories where users are allowed to freely tag re-sources or upload bookmarks from their browsers. Such annota-tions primarily reflect information management of an individualuser. The game, on the other hand, asks players to tag resourcesnot for the sake of an individual’s future reference, but simply forfun, while their mind flows. As a consequence, we are able to avoidoverly personal tags and collect more representative and descriptiverelations [34].

5.2 ReliabilityWe wish to find support for our hypothesis that the game mecha-

nisms lead to collection of reliable data. Several tag similarity ego-networks are shown in Fig. 8 to illustrate the good quality of gametags. The comparison between the game and GiveALink suggestsgame-induced folksonomies have positive effect on the existing so-cial annotation repository, that can be similarly extended to otherpopular social tagging systems, such as Delicious and Flickr.

Fig. 8(a) displays the top 30 tags of both game and GiveALink.They share several top buzzwords including common internet terms(internet, web, software), suspicious/spam words (free),and news related tags (news, business). Among other non-overlapping tags, Slider shows a preference for technology and ed-ucation related terms, while the GiveALink tags tend to be broaderor more suspicious/spam in nature. Fig. 8(b) displays a tag simi-larity networks formed by tags similar to the word social. Nodeswith edges of both colors reflect many reasonable social-relatedconcepts. However, several nodes connected by GiveALink edges(blue), are not clearly relevant to social, including a clique of name-like tags (amanda, derek) and unrelated concepts (dior). Toptags in Slider incorporate many popular social networking web-sites (facebook, linkedin, twitter) and social activities(sharing, tagging, gaming). In general the game tags areof better quality and more searchable. Similar trends can be foundin Fig.s 8(c) for apple and 8(d) for sports.

5.3 SearchabilityHalpin et al. [9] discuss the information value of a tag, which

refers to the information conveyed by the natural language termused in the tag and how this makes the tag useful for the retrievalof resources. More precisely, the information value of a tag is in-terpreted as a function of the number of resources a tag can retrieve

(a) (b)

Figure 10: Attractiveness of tags with different generalities.

while searching. Tags with high information value attract futureusage from users while contributing to discriminating among re-sources to make them more searchable and accessible.

The retrieval power of a tag is closely associated with its general-ity, or capability to describe different resources. When a tag relatesto a broad semantic spectrum, it may be adopted for resources inmany categories. GiveALink tags are mainly collected from users’browser bookmarks, so some of them are only meaningful for oneindividual. Those tags can only describe personal resources frompersonal perspectives, causing disconnected resources and sparsesemantic networks; they cannot contribute much to retrieval.

To compare the information value carried by GiveALink andSlider tags, let us quantify the information value of a tag as thesum of its similarities with all other tags, known as tag generalityor tag strength:

s(t) =∑

t′∈T,t 6=t′σ(t, t′)

where t, t′ are tags, T is the set of all tags, and σ is the similaritymeasure between a pair of tags. The strength is computed globallyfor all existing tags in GiveALink, of which Slider tags are a subset.Then we are able to compare the strengths of these two sets oftags. According to Fig. 9, the Slider game succeeds to avoid manytags with very small strength that are too specific to be useful forguiding other users or helping the search. The game therefore leadsto more re-usable tags, and to a more “information-rich” semanticspace. Note that Fig. 9 is plotted in the logarithm scale; Thus, themajority of the game tags are still specific and novel (Sec. 5.1).

Fig. 10(a) shows a clear trend that general tags are utilized morefrequently for describing resources in the game. This is not sur-prising, considering that general tags are better capable of label-ing concepts. On the other hand, similarly to the definition of tagstrength, each resource can be associated with a strength or gen-erality value by summing up its similarity scores to all other re-sources. High-strength resources are widely connected with othersby shared annotations, and they usually have general informationthat is appropriate for a broad group of users. Conversely, low-strength resources are valuable for smaller groups. Thus it is possi-ble to hypothesize that high-strength resources attract high-strengthtags, since they both cover broad categories of information and thusthey are more likely to be associated with each other. However, asshown in Fig. 10(b), both general (high-strength) and specific (low-strength) resources are annotated with a mixture of high-strengthand low-strength tags. This implies that both general and specificresources have equal chances to be tagged with searchable descrip-tions. While general tags are more popular, they are used to de-scribe specific resources as well, helping users find them in broadcategories.

5.4 Discovering hidden semantics

(a) Top popular tags (b) Top tags most related to “social”

(c) Top tags most related to “apple” (d) Top tags most related to “sports”

Figure 8: Visualization of tag similarity ego-networks. Red edges are similarity connections from the game, blue ones are from GiveALink.

Since the game can collect novel descriptions (Sec. 5.1), the se-mantic space of folksonomies is expected to reveal more novel re-lations and to strengthen weak semantic relations that should bestronger. We can represent the semantics as either unweighted orweighted similarity network, and then study how the space is re-shaped with both representations while adding game data. First,the space is found to become denser and better connected whentreated as an unweighted network. Second, with consideration ofsimilarity values and edge weights, we illustrate that the weightednetwork built on folksonomies grows more coherent via the game.

5.4.1 Network structureWith more game data flowing into the semantic space, we expect

the structure of the folksonomy built from the tripartite relation-ships between tags, users, and resources to exhibit some changeswith more hidden semantics recovered. Semantic spaces of bothtags and resources can be created, with each simply being treated asan undirected and unweighted similarity network, where each noderepresents a single tag or resource and a connection (edge) betweena pair of nodes is implied if the similarity is larger than zero. A pairof nodes without a connected edge indicates that either they are notsemantically related, or the relation is still unknown to the system.

We built two similarity networks with the same set of 3,000 tagssampled randomly from the intersection of GiveALink and Slider.One network included only relations in the original GiveALinkfolksonomy, the other also included relations from the game. A

comparison between the two networks reveals how the game-drivenfolksonomies change the semantic space. The same comparisonwas performed with resource similarity networks. Table 1 summa-rizes the results of several measures, which clearly demonstrate thatwith game data included, both tag and resource similarity networksbecome denser and better connected, with more closed trianglesand fewer small, isolated components.

5.4.2 Metric coherenceTag and resource networks can be created with weighted edges

if the similarity values are taken into account. We inspected thechanges in coherence among edge weights based on the hypothe-sis that the network can evolve to be more coherent because weakrelations that should be stronger are recovered by game data.

When we try to describe relations among a set of nodes V withsome measurement, such as a distance function d : V × V → R,we are able to claim the function d is metric if d satisfies non-negativity, identity of indiscernibles, symmetry, and triangular in-equality. The first three are trivial. The triangular inequality statesthat for ∀a, b, c ∈ V , d(a, b) ≤ d(a, c) + d(c, b), indicating thatthe shortest path between a pair of nodes should be the direct link.When given a similarity measure instead of a distance, the ana-log of the triangular inequality is that the direct similarity betweentwo nodes should be higher than the indirect similarity through athird node. If the direct similarity between a pair of nodes is lowerthan some indirect similarity, we say that the triangular inequalityis violated by the pair and that the space is semi-metric [26]. For

Table 1: Semantic network comparisonsTag Similarity Network Resource Similarity Network

GiveALink Combined Change GiveALink Combined ChangeNodes 3,000 3,000 3,000 3,000Edges 147,182 159,247 +8.2% 213,498 307,363 +44.0%Density 0.0044 0.0048 +9.1% 0.0083 0.0119 +43.4%Closed triangles 5,090,467 5,657,185 +11.1% 13,684,666 18,383,568 +34.4%Components 5,917 5,316 –10.2% 4,424 4,199 –5.1%

Table 2: Network metric coherence ratioNetwork GiveALink Slider Combined Change

Tag 0.0963 0.8375 0.1049 +8.87%URL 0.8908 0.8956 0.9574 +7.48%

instance, if a is similar to b and a third node c is similar to both, butσ(a, b) < σ(a, c)σ(c, b), we say that the pair (a, b) is semi-metric.Finding semi-metric pairs is valuable for discovering novel seman-tics because they highlight cases where a semantic relationship isimplied by transitive closure of the global folksonomy, but has notyet been uncovered by explicit annotations [30].

A network with many semi-metric pairs is not as semanticallycoherent as a metric network. Our hypothesis is that game datacan make the network structure more coherent by increasing met-ric pairs, since it can strengthen the weak relations that should bestronger according to the triangular inequality. We evaluate coher-ence by measuring the ratio of the number of metric pairs to thenumber of all pairs. Using the similarity function σ, let us refor-mulate the triangular inequality as follows: ∀a, b, c ∈ V , σ(a, b) ≥σ(a, c)σ(c, b), then the edge (a, b) of the triangle (a, b, c) is countedas being metric.

As reported in Table 2, the metric coherence of both tag and re-source similarity networks is improved by the game; game-drivensocial tagging data makes the GiveALink folksonomy more coher-ent semantically. It is noteworthy that the resource similarity net-work presents a much higher metric coherence. This is because themeaning of a tag is more likely to be ambiguous compared to themeaning of a Web resource, thus it is easier to describe the similar-ity relations among resources in a coherent way.

6. CONCLUSIONSUser-generated social annotations are valuable for many Web ap-

plications, such as those providing a better Web search experienceor personalized recommendations. However, in current social tag-ging systems, users can easily employ poor tags or even abuse thesystem by tagging spam. The lack of control yields tags that areoverly personal or general as well as spam. We propose gameswith the purpose of improving both the quantity and quality of so-cial annotation data. We presented one such experimental socialtagging game, the GiveALink Slider, that we have designed anddeveloped for desktop browsers. The basic design principles arederived from the chain model for games that generate object de-scriptions by object associations [33]. Players win points by ex-tending a chain of related Web pages through tag descriptions. Thescoring mechanism fosters novel but accurate tags and the socialconnection component increases the game’s enjoyability.

The output of the game accumulates as more participants get in-volved. While playing, users label personal bookmarks and otherresources with tags. Compared with data from traditional socialtagging systems, users tend to apply more general and information-

rich words, which is encouraged by easily winning points and build-ing social links among players according to the game’s features.Such tags are equally applied to general and specific resources,increasing the searchability and accessibility of both types. Thetag space of the game’s output is therefore more re-usable. Mean-while, players prefer familiar words, such as tags previously usedfor bookmark annotations, since they have limited vocabularies andlearning new words is usually more difficult than referencing an oldone. Confidence is defined for an annotation pair 〈t, r〉 to representits reliability from the community’s perspective. The value of con-fidence is proportional to the number of players who describes re-source r with tag t. In other applications with social tagging com-ponents, confidence can be incorporated into search. The growthof average confidence in game annotations over time validates thedefinition of confidence as a measure of confirmed knowledge.

According to our evaluation, the quality of game-induced folk-sonomies meets our expectation of improving exiting social taggingdataset. A large proportion of the game’s annotations reveal novelrelations among tags or resources, unknown to GiveALink or De-licious. The output difference between our game and other socialtagging systems originates from the primary goals of the two appli-cations: one is to label sites for future personal references, the otheris simply for fun. Additionally, resources become more searchablebecause of different user tagging behavior concerning tag choices.While the semantic quality of the data can be sustained by the gamemechanisms, the semantic networks of concepts that emerge fromthese annotations grow denser and better connected as novel rela-tions are absorbed to recover previously hidden semantics. Finally,the semantic networks become more coherent as game annotationsare added.

In summary, the result of our study is a small step in entertain-ingly crowdsourcing large streams of high quality annotations, butit is quite promising. Our game substantiates the hypothesis thatuser tagging behavior induced by the game rules enriches the qual-ity and quantity of annotations. To solve the scalability problem,a simplified version of game rules is necessary. Another challengethat is yet to be pursued is to design mechanisms that make thegames sufficiently entertaining to engage players for long periodsof time. More designs to enhance the pleasurableness of gamingneed to be adopted, for example, an interesting visualization ofword associations by using d3.js chord diagram [1]. Besides theSlider, we are designing and developing another tagging game formobile devices, called Great Minds Think Alike (greatminds.givealink.org). The lessons learned from the present studywill inform the design of this and other games, and it will be chal-lenging to see how they can be applied to deal with other types ofobjects, such as rich media, locations, and friends.

AcknowledgmentsWe are grateful to Rossano Schifanella for assistance in the gamedesign and implementation and the members of the Networks andagents Network (cnets.indiana.edu/groups/nan) at the

Indiana University School of Informatics and Computing for help-ful suggestions during game design and testing. This work is fundedby NSF award IIS-0811994 Social Integration of Semantic Anno-tation Networks for Web Applications.

7. REFERENCES[1] Diagram in d3.js,

mbostock.github.com/d3/ex/chord.html.[2] Human brain cloud: Massively multiplayer word association

game, www.readwriteweb.com/archives/human_brain_cloud.php.

[3] L. V. Ahn and L. Dabbish. Labeling images with a computergame. In Proc. of SIGCHI Conf. on Human Factors inComputing Systems, 2004.

[4] L. V. Ahn and L. Dabbish. Designing games with a purpose.Communication of the ACM, 51(8):58–67, 2008.

[5] M. Ángeles Serrano, A. Flamminia, and F. Menczer.Modeling statistical properties of written text. PLoS ONE4(4):e5372, 2009.

[6] S. Bao, X. Wu, B. Fei, G. Xue, Z. Su, and Y. Yu. Optimizingweb search using social annotation. In Proc. of Intl. WorldWide Web Conf. (WWW), 2007.

[7] F. Carmagnola, F. Cena, O. Cortassa, C. Gena, and I. Torre.Towards a tag-based user model: how can user model benefitfrom tags? In Proc. of Conf. on User Modeling, pages445–449, 2007.

[8] S. Cooper, A. Treuille, J. Barbero, A. Leaver-Fay, K. Tuite,F. Khatib, A. C. Snyder, M. Beenen, D. Salesin, D. Baker,and Z. Popovic. The challenge of designing scientificdiscovery games. In Proc. of Intl. Conf. on Foundations ofDigital Games (FDG), 2010.

[9] H. Halpin, V. Robu, and H. Shepherd. The complexdynamics of collaborative tagging. In Proc. of Intl. WorldWide Web Conf. (WWW), 2007.

[10] H. S. Heaps. Information Retrieval: Computational andTheoretical Aspects. Orlando: Academic Press, 1978.

[11] P. Heymann, G. Koutrika, and H. Garcia-Molina. Can socialbookmarking improve web search? In Proc. of ACM Intl.Conf. on Web Search and Data Mining (WSDM), 2008.

[12] D. T. Hoang, J. Kaur, and F. Menczer. Crowdsourcingscholarly data. In Proc. of Web Science Conf., 2010.

[13] H. L. Kim, A. Passant, J. Breslin, S. Scerri, and S. Decker.Review and alignment of tag ontologies forsemantically-linked data in collaborative tagging spaces. InProc. of IEEE Intl. Conf. on Semantic Computing, 2008.

[14] A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user studieswith mechanical turk. In Proc. of SIGCHI Conf. on HumanFactors in Computing Systems, 2008.

[15] P. Lamere. Social tagging and music information retrieval.Journal of New Music Research, 37(2):101–114, 2008.

[16] K. G. Lawson. Mining social tagging data for enhancedsubject access for readers and researchers. Journal ofAcademic Librarianship, 35:574–582, 2009.

[17] C. S. Lee, D. H.-L. Goh, K. Razikin, and A. Y. Chu. Tagging,sharing and the influence of personal experience. Journal ofDigital Information, 10(1), 2009.

[18] B. Markines, C. Cattuto, and F. Menczer. Social spamdetection. In Proc. of Intl. Workshop on AdversarialInformation Retrieval on the Web (AIRWeb), 2009.

[19] B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, andG. Stumme. Evaluating similarity measures for emergent

semantics of social tagging. In Proc. of Intl. World Wide WebConf. (WWW), 2009.

[20] B. Markines and F. Menczer. A scalable, collaborativesimilarity measure for social annotation systems. In Proc. ofACM Conf. on Hypertext and Hypermedia, 2009.

[21] B. Markines, L. Stoilova, and F. Menczer. Social bookmarksfor collaborative search and recommendation. In Proc. ofNational Conf. on Artificial Intelligence (AAAI), pages1375–1380. AAAI Press, 2006.

[22] C. Marlow, M. Naaman, danah boyd, and M. Davis. Ht06,tagging paper, taxonomy, flickr, academic article, to read. InProc. of ACM Conf. on Hypertext and Hypermedia, 2006.

[23] E. Michlmayr and S. Cayzer. Learning user profiles fromtagging data and leveraging them for personal (ized)information access. In Proc. of Intl. World Wide Web Conf.(WWW), 2007.

[24] P. Mika. Ontologies are us: A unified model of socialnetworks and semantics. The Semantic Web – ISWC 2005,5:522–536, 2005.

[25] D. Millen and J. Feinberg. Using social tagging to improvesocial navigation. In Workshop on Social Navigation andCommunity-based Adaptation, 2006.

[26] L. M. Rocha. Semi-metric behavior in document networksand its application to recommendation systems. In SoftComputing Agents: A New Perspective for DynamicInformation Systems, pages 137–163. IOS, 2002.

[27] E. Santos-Neto, D. Condon, N. Andrade, A. Iamnitchi, andM. Ripeanu. Individual and social behavior in taggingsystems. In Proc. of ACM Conf. on Hypertext andHypermedia (HT), 2009.

[28] R. Schifanella, A. Barrat, C. Cattuto, B. Markines, andF. Menczer. Folks in folksonomies: Social link predictionfrom shared metadata. In Proc. of ACM Intl. Conf. on WebSearch and Data Mining (WSDM), 2010.

[29] A. Shepitsen, J. Gemmell, B. Mobasher, and R. Burke.Personalized recommendation in social tagging systemsusing hierarchical clustering. In Proc. of Conf. onRecommender Systems, 2008.

[30] L. Stoilova, T. Holloway, B. Markines, A. Maguitman, andF. Menczer. Givealink: Mining a semantic network ofbookmarks for web search and recommendation. In Proc. ofSIGKDD Workshop on Link Discovery: Issues, Approachesand Applications, 2006.

[31] T. V. Wal. Folksonomy coinage and definition.http://vanderwal.net/folksonomy.html.

[32] L. Weng and F. Menczer. Givealink tagging game: Anincentive for social annotation. In Proceeding of the SIGKDDWorkshop on Human Computation Workshop, 2010.

[33] L. Weng, R. Schifanella, and F. Menczer. The chain modelfor social tagging game design. In Proc. of Intl. Conf. on theFoundations of Digital Games (FDG), 2011.

[34] L. Weng, R. Schifanella, and F. Menczer. Design of socialgames for collecting reliable semantic annotations. In Proc.of IEEE Intl. Conf. on Computer Games (CGAMES), 2011.

[35] S. Xu, S. Bao, Y. Cao, and Y. Yu. Using social annotations toimprove language model for information retrieval. In Proc.of ACM Conf. on Information and Konwledge Management,pages 1003–1006, 2007.

[36] G. K. Zipf. Human Behaviour and the Principle of LeastEffort. Cambridge: Addison-Wesley, 1949.