c 2012 Prithvi Raj Venkat Raj - University of...

PALANTIR: CROWDSOURCED NEWSIFICATION USING TWITTER

By

PRITHVI RAJ VENKAT RAJ

A THESIS PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCE

UNIVERSITY OF FLORIDA

2012

c! 2012 Prithvi Raj Venkat Raj

2

To my family

3

ACKNOWLEDGMENTS

I would like to convey my sincere gratitude to my advisor, Dr. Helal, for his excellent

counsel, support, and encouragement in pursuing research in this exciting field.

I would also like to thank Dr. Thai and Dr. Xia for serving on my supervisory

committee.

At the same time, I wish to thank the members of the online forum Turker Nation

who provided me valuable feedback on some aspects of my evaluation methodology,

and workers on Amazon Mechanical Turk without whose labor I couldn’t have actual

human generated results.

I would especially like to thank my family for their persistent encouragement and

belief in me.

4

TABLE OF CONTENTSpage

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

CHAPTER

1 OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.1.1 Motivation and Current Problems . . . . . . . . . . . . . . . . . . . 101.1.2 Thesis Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1 Commercial Products on Twitter . . . . . . . . . . . . . . . . . . . . . . . 132.1.1 Storify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.2 Vibe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.3 Dataminr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Commercial Products Supporting Citizen Journalism . . . . . . . . . . . . 142.2.1 Wikinews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.2 CNN iReport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Overview of Related Academic Research . . . . . . . . . . . . . . . . . . 162.4 Microblogging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.2 Providers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.3 Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.3.2 Research on Twitter . . . . . . . . . . . . . . . . . . . . . 18

3 OVERALL APPROACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1 A Brief Incursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 An Outline of Palantir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Palantir Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2 Tweet Tagging and Annotation Services . . . . . . . . . . . . . . . . . . . 234.3 Tags in Palantir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3.1 How Users Tag Tweets . . . . . . . . . . . . . . . . . . . . . . . . . 26

5

4.3.2 Tag Recommender . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3.2.1 Text Transformation . . . . . . . . . . . . . . . . . . . . . 284.3.2.2 Geographic Location . . . . . . . . . . . . . . . . . . . . 284.3.2.3 Tweetopic . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.3.2.4 Tag co-occurrence and Prefix matching . . . . . . . . . . 304.3.2.5 Tag Ranking . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.5 What Palantir Uses Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.5.1 Create an User Interest Profile . . . . . . . . . . . . . . . . . . . . 334.5.2 Searching, Topic Following . . . . . . . . . . . . . . . . . . . . . . . 344.5.3 Tag Consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 EXPERIMENTATION AND EVALUATION . . . . . . . . . . . . . . . . . . . . . 35

5.1 Crowdsourcing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.1.1 Amazon Mechanical Turk (AMT) . . . . . . . . . . . . . . . . . . . 35

5.1.1.1 Basic Terminology . . . . . . . . . . . . . . . . . . . . . . 365.1.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.2.1 Experiment 1: Palantir Baseline . . . . . . . . . . . . . . . . . . . . 41

5.2.1.1 Experimental Data . . . . . . . . . . . . . . . . . . . . . . 425.2.1.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2.2 Experiment 2: Unguided Human Baseline . . . . . . . . . . . . . . 435.2.2.1 Experimental Results . . . . . . . . . . . . . . . . . . . . 435.2.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2.3 Experiment 3: Heated Palantir . . . . . . . . . . . . . . . . . . . . . 475.2.3.1 Experimental Results . . . . . . . . . . . . . . . . . . . . 485.2.3.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.2.4 Experiment 4:AMT Synonym Detection . . . . . . . . . . . . . . . . 525.2.4.1 Experimental Results . . . . . . . . . . . . . . . . . . . . 535.2.4.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2.5 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . 556.0.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.0.7 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.0.7.1 Content Syndication . . . . . . . . . . . . . . . . . . . . . 556.0.7.2 Survey Creation . . . . . . . . . . . . . . . . . . . . . . . 56

APPENDIX: WORDLISTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6

LIST OF TABLESTable page

A-1 Filter Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7

LIST OF FIGURESFigure page

2-1 Storify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4-1 Palantir Usage patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4-2 Palantir Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4-3 Palantir Tag Recommender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4-4 Tweetopic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4-5 Tweet Entry Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4-6 Tag Suggestion Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5-1 Experiment 1: Palantir baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5-2 Experiment 1: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44


5-4 Experiment 2: Tweets tagged using only using AMT . . . . . . . . . . . . . . . 45



5-7 Experiment 3: Tweets tags recommended by Palantir and validated by AMT . . 48



5-10 Experiment 3: Histograms showing variation in usefulness . . . . . . . . . . . . 50

5-11 Experiment 4: Synonym detection on AMT . . . . . . . . . . . . . . . . . . . . 52

5-12 Experiment 4: Similar Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

8

Abstract of Thesis Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of the

Requirements for the Degree of Master of Science

PALANTIR: CROWDSOURCED NEWSIFICATION USING TWITTER

By

Prithvi Raj Venkat Raj

December 2012

Chair: Abdelsalam (Sumi) HelalMajor: Computer Engineering

People today generate and consume several exabytes of content. A significant

portion of this is on social media and microblogging sites like Twitter. The popularity of

these services encourages people to share real world developments and experiences

online. The value of such sharing is very apparent during times of duress, when people

take to posting important news on social media. While this enlightens the outside world,

it lacks coherence and clarity that might make a bigger impact. In addition, there is no

obvious way of communicating back to these people, to blend, assimilate and stimulate

the flow of information.

In this work, we develop and evaluate a colloboration system, Palantir, which is

desgined for people who glean information from Twitter during an event, and consolidate

the information into stories, which allows them to capture a snapshot of how things were

at the time of the event.

Palantir is designed so that people can easily annotate Tweets from mobile clients,

which uses a web client to track Tweets, and finally consolidate these Tweets into stories

which can later be published.

9

CHAPTER 1OVERVIEW

1.1 Introduction

1.1.1 Motivation and Current Problems

People have been consuming news on Newspapers since the 16th century. In

2011, 46% of Americans turned to the Internet for news at least 3 times a week, as

opposed to the 40% of people who got their news on newspapers [1]. The study also

finds that 84% of Americans own mobile devices, 47% of whom consume news on

these devices. While these numbers show that many people today are moving towards

digital news sources, it does not take into account news that finds people by means

of social networks. News of almost every major earthquake in the last three years

broke out on Twitter before catching the attention of mainstream media. An article [2]

by Megan Garber, Neiman labs comments that most mainstream news organizations

that are on the social media bandwagon, use Twitter as a glorified RSS feed. A PEJ

study shows that less than 2% of Tweets by 13 of America’s popular news organizations

used Twitter as a conversational medium to gather information from people. Oftentimes,

transient events of local importance are Tweeted promptly long before mainstream

media is aware. In her article [3], Gina Chen highlights how people supported each

other by publishing alternative routes that could be taken to avoid a multi-car pile up.

In cases like these, there is much value to be had in getting realtime information that is

immediately useful. Twitter also played a pivotal role in Iran election protests of 2009,

Egypt revolution of 2011, and currently, the Occupy movement. Twitter, however useful,

is more a transient medium. As Tweets age, they slip out of context, and what was once

important now rapidly vanish. We believe that Tweets, a broadly adopted medium, can

be leveraged to provide a far more powerful experience by sparking news tidbits and

circulating timely news. We imagine that contextual snapshots of Tweets would have

potency to stay relevant long after an event has elapsed.

10

1.1.2 Thesis Objective

We describe a collaborative tool, Palantir, designed for people to glean information

from Twitter during an event, and enable them to consolidate Tweets into stories as and

when they happen. The fundamental idea of Palantir is tapping into the social network

effect of mobile Twitter users who are intimately following development of a topic. Such

users, if given the appropriate tools can naturally channel their passion and energy to

write a more organized view of lower level tweets. Palantir makes organizing tweets by

tagging simpler by providing tag recommendations which are influenced by both content,

and geospatial context. These tags are exploited to create profiles for individual users

of this system, who form a crowd that can be categorized and contacted for information.

Palantir allows for querying relevant portions of this crowd and aggregating their results.

These aspects make Palantir a valuable tool which can be used for realtime remote

reporting. By following topics coded by tags, filling in missing data, and publishing

consumable story lines, people are enabled to productively create content based on real

time information from Tweets, and share them with others. Fundamental to the working

of Palantir is the concept of recommending tags, which are text annotations applied to

a microblogging post. While there are studies about tagging content, none focus on

humans tagging short texts so that it is more discoverable for others. We test whether

Sen et al’s findings [4] could be applied to the domain of microblogs.In this thesis, we

concern ourselves with the following

• Do tag recommendations affect the cognitive load of people applying tags ?

• Do people find value in applying tags to microblog posts ?

• Is there a faster convergence of tag vocabulary when tags are recommended ?

• Does the quality and quantity of tags improve when tags are recommended ?

1.1.3 Thesis Organization

This thesis is organized into 5 chapters. Chapter 2 provides insight about related

work in the industry, and provides detail about academic research work covered in

11

this thesis. Chapter 3 presents the overall approach taken by Palantir, while Chapter

4 provides an in-depth discussion of Palantir architecture, design considerations, and

implementation details. Chapter 5 covers validation and evaluation of results using a

crowdsource based approach. Chapter 6 concludes with thoughs about future work.

12

CHAPTER 2RELATED WORK

2.1 Commercial Products on Twitter

2.1.1 Storify

Storify [5] is a social media curation service that launched in late 2011 [6]. Users

of Storify search social networks, and select individual elements into stories. Figure 2-1

shows a typical article written with the service. Storify allows users to import social

media elements like Tweets and images into their story, and allows users to supply their

own textual content to maintain the flow of their story. Storify has been used to provide

political coverage [7], and even document meetings and workshops [8]. However, Storify

does not make finding these social media elements easier, nor does it provide a way to

collaborate over stories.

2.1.2 Vibe

Vibe [9] is a messaging application that allows users to post anonymous short

messages that are pinned down to a certain geographical region, and have an expiry

time after which they are deleted. Any user with the Vibe application can view messages

within a radius from the current location of the user. The current version on this

application allows users to set a large radius (12000 miles). Vibe differs from Twitter

and other services in that it does not require users to sign up. While the application

isn’t popular with everyday Twitter users, it was beneficial to people participating in the

wall street protests, providing them with an electronic channel of communication with

the anonymous crowd around them [10]. It is of interest to note that people are making

use of applications that pins down posts at specific locations, we feel that with a social

network like Twitter, messages, and by extension tags, should have geographical affinity.

2.1.3 Dataminr

Dataminr [11] is a real-time social media analytics engine that listens to every public

post on Twitter to mathematically determine events and micro trends. Dataminr claimed

13

Figure 2-1. Storify

to have informed clients about Osama Bin Laden’s death before it was reported by social

media outlets [12].

2.2 Commercial Products Supporting Citizen Journalism

2.2.1 Wikinews

Wikinews [13] is a collaborative journalism platform that was established in Nov

2004. An interesting facet of Wikinews is that it allows original work in addition to work

that has been sourced. Wikinews is valuable in covering news of large events affecting a

large population of people who can report about it from different viewpoints.

14

The major detraction for Wikinews is the perceived inability to preserve a neutral

point of view. Some of the more complex issues lie at the heart of what news is. i.e.,

delivering information in a timely basis, and providing that information with a captivating

narrative. Andrew Lih, a noted authority on Wikipedia, feels that is it difficult to get two

or more people to write in the same style, as evidenced by the failed project ‘A million

penguins’ by Penguin Publishing.

Wikinews has a model similar to Wikipedia where submitted articles are reviewed

by trusted users. Where it differs is that the purpose of news is to capture a snapshot

in time. If a new development takes place, a news article makes a reference to the

previous article, recaps the story, and then takes it forward. This is in strong contrast

with how Wikipedia works, where such a change would have caused someone to edit

an existing article. Another key difference is that Wikipedia sticks to an formulaic style

like the strict inverted pyramid. An article in a Wikipedia page might start like ”On 14

November, event x happened at location y, z people were affected”. Articles give an

overall view before drilling into details, and most articles on Wikipedia strictly follow such

a style.

A problem that plagues community reviewed sites is the quest for perfectionism.

This leads to instruction creep, which, in the case of Wikinews gradually caused output

of news articles to fall from 6-8 articles per day to the same number per week. As an

artifact of maintaining a snapshot of history in time, Wikinews imposes some time limit

before which an article meeting their standards is written. Many authors are unable

to get timely help meeting these requirements, and thus might not have their articles

published on the main site.

Palantir, on the other hand is designed for people who are tuned into social media

like Twitter, and have a need to quickly catalyze information flow during an ongoing

event. While a Wikinews model works well for reporting after an event, we feel that

Palantir is suited more for live reporting.

15

2.2.2 CNN iReport

iReport [14] is a tool that citizen journalists can use to submit their news articles

to CNN. iReport is similar to Wikinews in that it allows people to submit photos, videos

or articles, but it does allow allow for collaboration as Wikinews does. When a story is

posted to iReport, it immediately appears on the CNN iReport site. CNN has a staff of

reports who comb through these stories for one that are interesting and can be used

on their main site. These selected stories are then verified, and used. Verified stories

are badged saying ”Vetted by CNN”. People submit stories to CNN, because there is a

possibility that their story might appear on the main site, and in 2011, CNN recognized

good contributions to iReport by holding the first iReport Awards, further fostering

community effort.

2.3 Overview of Related Academic Research

We review work done on microblogging, tagging systems, topic detection algorithms,

temporal exploration interfaces and crowdsourcing marketplaces.Twitter, a microblogging

service, is growing at a rapid pace, and is spurring research. Ehrlich and Shami [15]

found microblogs being used as real-time information sources, but highlighted concerns

about the volume of data, noise and relevancy issues. We draw on research on human

guided unstructured tagging systems, folksonomies, by Adam Mathes [16], Marieke Guy

and Emma Tonkin [17]. The advantage folksonomies have over formal classification

systems is that the terms used in formal tagging systems may be imprecise. Adam

Mathes’ [16] paper contains good discussion about how tags are distributed, and allays

fears of single use tags dominating others. Sen et al. [4], provide a good treatment

on how recommending tags to people affects the tags that they choose for tagging.

Palantir’s tag recommendation system is based on the TweeTopic Algorithm that has

been described by Bernstein et al. in [18]. This algorithm makes use of a online Internet

search engine to assign topics to Tweets or other short pieces of text.

16

In Section 2.4, we provide a discussion on microblogging, with a focus on Twitter.

Crowdsourcing systems are covered in Subsubsection 5.1.1.2.

2.4 Microblogging

2.4.1 Overview

Microblogging is an emerging form of broadcast communication that has been

gaining popularity over the past few years. Microblogging services allow users to post

short content online [19, 20]. A majority of this content is text, but users also link to other

types of content like audio, video, images, or other web resources.

2.4.2 Providers

Microblogging services are provided by many organizations like Twitter, Plurk,

Yammer, Jaiku, Pownce, Social.Net, App.Net, etc. In this work, we will be concentrating

on Twitter, which was among the first services to launch in May 2006, and had 100

million active users in September 2011 [21], and is believed to have more than 500

million registered users as of April 2012 [22].

2.4.3 Twitter

2.4.3.1 Overview

While Twitter is a microblogging site, it has some social networking semantics.

Twitter supports and encourages the actions below.

Following The act of a user subscribing to the updates of another user

Followers The subscribers for a specific user

@reply A user can use @username to mention other users in a post. Others can view

this if the account of the poster is public

Direct Message (DM) A private message that can be sent to a follower

The social network model of Twitter is different from many other social networks.

Specifically, in Twitter relations between users are directed. A user can follow another

without requiring the other user to follow him/her back. It has been noted that only 22%

of follows are mutual [23]. The Twitter feed of a given user contains Tweets (Messages

17

posted on Twitter) from all users that the current user follows, arranged in a reverse

chronological manner.

Twitter allows short 140 character messages to be posted using it’s service, which

can be seen immediately by other people on the service. This allows for Twitter to be

a realtime stream of information. The realtime nature of Twitter has been exploited to

spread news. Twitter has been the preferred medium on communication in the Arab

Spring [24], the Iran elections [25], and Occupy Wall Street Movement. In certain events

like the US Airways Flight 1549 crashing into the Hudson river [26], the death of pop idol

Michael Jackson [27], the terrorist attacks in Mumbai [28], the death of terrorist Osama

Bin Laden [29] and for every earthquake in the past 3 years [30], Twitter provided

news quicker than other news media. As such, Twitter acts as a source of information,

allowing users to discover new and interesting content on the Internet.

Although Twitter is a good source of information, there isn’t an easy way to organize

Tweets, or retrieve Tweets corresponding to a topic of interest. While Twitter introduced

hashtags, which are word tags in the body of the Tweet having a syntax of [word], e.g.

OWS (Occupy Wall Street) they are inline with the Tweet, usually contained at the end

and eat into the 140 character limit, forcing people to use tags that are already short,

and limiting the number of tags used per Tweet. In a study using a sampling of Tweets

from 2009, researchers from Microsoft [31] found that only 5% of Tweets contain a

hashtag. In addition, not all Tweets have hashtags that are pertinent to the content in

the Tweet. Currently, there isn’t a way to retrieve all the Tweets by a specific user on a

specific topic.

2.4.3.2 Research on Twitter

Kwak et al. [23] crawl the Twitter network as on 2009, study network structure,

determine influential users, information diffusion and conclude that Twitter takes after

an information sharing network, rather than a social network. Of particular interest to

us is the quantitative comparison between CNN Headline News and trending topics on

18

Twitter. It was found that though CNN Headlines had better coverage, news of a live

broadcasting nature broke out first on Twitter. [32] study the intent of Twitter users, which

are daily chatter, conversations, sharing information/news etc. Andre et al. [33] analysed

the contents of over 43,000 Tweets on Twitter to find that 36% of the rated tweets are

worth reading, 25% are not, and 39% are middling. Thought this is just a sampling of

Tweets, it shows that there is significant room for improvement which can be achieved

by presenting to users Tweets that have connect that they value. In fact, the authors

argue that taking a social intervention approach by informing users about content value,

audience reaction and emerging norms while leaving the user in control has potential to

improve the microblogging experience. While this seems like leaving the human value of

the equation, i.e. one can’t speak to people solely about things the listener is interested

in, we feel that such an approach might be needed for media like Twitter where some

researchers [34] are of the opinion that about 40% of Tweets might be pointless babble.

A contributing factor to the large body of research done in the past few years

was the openness of the Twitter platform. Twitter supported academia by allowing

unrestricted access to Tweets, and follower/followee information. Recently (September,

2012), Twitter has changed it’s business model to showing advertisements in the Tweet

Stream, and is now preventing people from the same level of access they had before.

This article [35] describes how these changes affect research on Twitter.

Higashinaka et al. [36] show that the majority of conversations on Twitter are

composed of just two tweets and that this is sufficient to model conversation. Researchers

from Yahoo [37] find that Twitter is a very homophilious network, where information

diffusion occurs primarily in the same community that the information originated in.

A line of research on Twitter focusses on finding the most influential people on the

network.

19

CHAPTER 3OVERALL APPROACH

3.1 A Brief Incursion

Online communities are larger today, than at any point in history. Services such

as Twitter, Jaiku, Facebook, Tumblr, Reddit, etc; have fostered communites passionate

about diverse topics. However, every online community suffers from the 1% rule [38],

which states that only 1% of the user base actually create content, 9% edit content,

and the remaing 90% of a virtual community only consume content. Bill and Mikolaj

[39] randomly sampled the Tweets of 300,000 users in 2009 to find that 10% of profilic

users create 90% of the Tweets. While this emphirical results concerning pariticipation

inequality on the Internet seem extreme, we are familiar with the Pareto principle, or the

law of the vital few, which states that, for many events, roughly 80% of the effects come

from 20% of the causes [40].

Another influence for the design of Palantir is the existence and the rise of citizen

journalism. Tools like CNN iReport, Fox uReport, etc; allow citizen journalists to write

their own articles. In the case of CNN iReport, these articles have their own separate

section easily accesible from the main navigation bar on CNN’s website, where users

can read through all such contributed articles. It is important to note that iReport allows

anonymous articles, and posts all articles posted by every user on to their website,

some of which are subsequently verified by CNN reporters, and their website amended

to include this information. While CNN iReport has some guidelines that advice people

to post articles that are known to be true, Fox uReport provides no such guidance, and

does not have a concept of verifying postings. It however includes a section titled editor

picks. A common note among these plaforms is the inablility to mobilize people into

creating information that one needs for a report in a timely manner, which we feel is a

significant drawback—one that could have the most impact if solved.

20

With Palantir, our goal is to stimulate people online to create timely content that

could be useful, depreciating the skew between content producers and consumers.

The basis of this idea is to find sufficiently interested people who are motivated about

creating content that is relevant to issues on hand and catalyzing them into action. It is

designed as the evolution of citizen journalism and reporting from a mostly one sided

affair to dialog.

3.2 An Outline of Palantir

To enable Palantir, we draw into work by Maslow [41] and tap into humans’ natural

need for esteem and recognition. Palantir is a collaboration platform, where users

post about topics they are interested in, while explicity mentioning topics that their

post mentions. The selection of topic(s) for the post is guided by Palantir to ensure

that existing tags that are similar are considered first by the user, promoting reuse and

borrowing over reinvention of tags. Users are also allowed to apply tags to posts that

aren’t their own.

The use of tags by a user determines topics that the user is interested in posting

about. We use this information to form communities which can then be used to create

flow of information on a particular topic. This is done by having people ask questions

which are then posted to the relevant interested communities.

3.3 Challenges

At the heart of Palantir are the methods used to recommened topics to users. Sen

et. al [4, 42] show that tag recommendations significantly influence the tags that users

choose. As Palantir runs on Twitter, we face the following complexities

• Modest character limit for posts. Twitter posts (Tweets), are limited to 140characters, and thus have only about 15 words [39]. Traditional topic modellingalgorithms have been designed to work well with large text corpuses. An algorithmlike Latent Dirichlet Allocation (LDA) [43], requires as input a text corpus andnumber of topics to mine. Algorithms like LDA rely on inferring word distributionsin a document to split into topics. When documents contain a small number ofwords, the resulting performance is poor. In addition to this, the number of topics isassumed to be known a priori.

21

• Variances in language. Twitter users use creative shortened words to convey whatthey mean within the 140 character limit. Many of these words may be nouns, nothaving a canonical form, thus creating problems where for a same intent, differentwords are used.

• Twitter API. While Twitter was conducive towards developers when they started(circa 2006), they did not have a clear monetization strategy, and were relying oninvestor money for their expenses, while spending time and resources to builda great product. However, going forward, Twitter feels that their biggest assetis the data that the have on the system and the eyeballs of the people usingTwitter. As of 2012, Twitter monetizes on this data stream by providing bulk rawaccess to Tweets to companies as a pricing that isn’t affordable for most individualdevelopers. Twitter severely restricts API use and added new terms of servicewhich prevent sharing collections of Tweets. This year, 2012, Twitter have addedfurther restrictions on the API, which would make scholarly research on Twittereven more difficult. The difficultly faced is the unpredictability of the licencingterms and the pace at which they are revised. Unfortunately, Managing risk in alandscape of vacillating and unpredictable licensing poses immense challenges inany research depending on third party services and data.

22

CHAPTER 4ARCHITECTURE

4.1 Palantir Architecture

Figure 4-1 shows typical uses of Palantir. Palantir is an abstraction which allows

authors to make use of information available on Twitter, while enabling them to turn a

trickle of information into a gush by asking questions. People interact with Palantir in the

following ways

1. Submit Tweets and Tags: Using the UI, people can submit Tweets along with tagspertinent to those Tweets. A recommender system aids Palantir users to selectgood tags based on the topic they are contributing to.

2. Search and Follow tags: As Tweets are organized by tags, people search forTweets they like by following tags that are interesting to them.

3. Correct, Consolidate and Vote: People see the tags that others have applied, anddepending on whether they agree with it or not, they vote for or against the tag.Palantir users may also consolidate similar tags and Tweets into a bundle.

4. Write articles using permanent references: A set of Tweets might containsubstantial information that are coherent together, and while identifying thisPalantir users use them to form and contribute articles based on them, addingvalue to these ordinary Tweets.

5. Survey Responses: Sometimes, there might not be enough information on aparticular tag, and people could create surveys to gather data. Replies to theserequests are composed by Palantir users on mobile devices.

Tough the Palantir architecture provides for several broad and useful interactions,

the scope of this thesis is limited to the design and evaluation of a tag recommender

system, and implementation that allows for sumbitting Tweets and tags, and searching

and following tags.

4.2 Tweet Tagging and Annotation Services

As mentioned earlier, Andre et al. [33] analyzed over 43,000 volunteer rated Tweets

to find that nearly 36% of the rated Tweets are worth reading, 25% are not, and 39% are

middling. Though this is just a sampling of Tweets, it shows that there is significant room

for improvement which can be achieved by presenting to users Tweets that have content

23

Palantir

1:Submit Tweet

and tags

Tag db

2: C

orre

ct, C

onso

lidat

e

and

Vote

on

Tags

, Twe

ets 3: Respond to surveys

5: Write articles Get permanent

references4: Search and Followtags

Figure 4-1. Palantir Usage patterns

they value. Other studies [34] mention that up to 40% of Tweets might just be pointless

babble. Palantir relies on Tweets having reasonably accurate tags to support selective

exploration and topic following. Palantir also exploits these tags to find pertinent users

for surveys and polls. As of 2011, users of Twitter produce on average 1620 Tweets

per second, with a record high of 6939 Tweets per second. Each of these Tweets

are just 140 characters, and therefore techniques like Latent Dirichlet Allocation [43],

Probabilistic Latent Semantic Analysis [44] are difficult to apply successfully. Palantir

partially shifts the onus of determining the topics mentioned on the Tweet, by having

users tag Tweets when they post it. Users are aided by topic recommendations provided

by Palantir which ultimately shapes up the vocabulary used in the Tag set.

24

Tag Rec.Handler

Tag DB

Article DB

User DB

Tweet id

DB

Cac

he

Rep

licat

ion

Palantir Community

AlgorithmsTag RecommenderUser Class RecommenderTag Co-occuranceGeographic Grid Conversionetc.

Future Plugins

Aggregate and Normalize results

Appl

icat

ion

Laye

r Newsification

Survey Operations

Bundling Tweets

(Articles)

Authenticate Twitter OAuth

User Validation

Explore Articles Tags,

Tweets

Palantir User

Figure 4-2. Palantir Architecture

25

4.3 Tags in Palantir

We use tags to mark Tweets so that other users may retrieve them easily when

they are searching for a specific topic. In addition, tags allows users and their circles

to create their own niche tags which could aid collaboration further. It may be argued

that Twitter hashtags serve the same purpose, but they are not as effective when we

do not know what exactly a topic belongs to and want to add multiple tags, in which

case, hashtags eat into content length. We encourage users to enter tags that are

already in the system using prefix matching, or tags that are suggested by the Tweetopic

algorithm. The rationale behind this is to reduce the number of tags with different

spelling variations.

4.3.1 How Users Tag Tweets

The fundamental contribution of Palantir is the idea of harnessing the activity of

people submitting and reading Tweets to allocate topics to Tweets, called tags. Palantir

allows users to tag their own Tweets, or Tweets that have been posted by others.

Palantir also allows people to have niche tags which can only be found by knowing the

name of the tag beforehand. These niche tags would not be suggested by the system,

and may be considered private to a user, or group of users who know the tag.

4.3.2 Tag Recommender

The motivation for providing recommendations is derived from Sen et. al’s [4] work

that shows that when tags are recommended to people, they tend to select factual

tags as opposed to personal or subjective tags.This approach also optimizes user

experience on a mobile device in that the user does not have to type out tags when

they are recommended correctly. In addition, it also reduces the number of single use

tags, tags that are misspelled, or have different punctuation, and the quality of tags are

maintained.

The Tag Recommender system operates when the user is inputting data into

Palantir, and analyzes partial user input to produce candidate tags. Figure 4-3 outlines

26

Partial Tweet, potential user entered tags

Tag Ranking

Geographic Binning

Tweetopic Cooccurance

Text Transformation

WeightedRecommendataions

tags

tags

tags,location

tweet tag

Figure 4-3. Palantir Tag Recommender

27

the working of this system. Palantir uses three distinct mechanisms, Geographic

Binning, Tweetopic, and Tag Cooccurance to generate candidate tags, which are ranked

to produce a weighted recommendation list

4.3.2.1 Text Transformation

We note that search engines give better results when short specific words which

characterize pages that we are searching for are given as the search query. To achieve

this we remove Twitter specific idiosyncrasies like RT and @username replies. We then

use a maximum entropy part of speech tagger[45] to identify all the noun phrases in the

Tweet.

4.3.2.2 Geographic Location

The intuition behind binning is that we can rewrite noisy latitude and longitude

coordinates using a grid system, which allows for a simpler search for nearby neighbors.

Palantir collects Tweets and tags submitted by the users along with their physical

coordinates. Each Tweet can have a location associated with it, and can be tagged with

multiple tags. We define tag location as the list of locations that the Tweets utilizing this

tag have. We map the raw latitude and longitude values into the Universal Transverse

Mercator (UTM) Grid system [46], which uses a two dimensional Cartesian coordinate

system based on an ellipsoidal model of Earth to give locations on the surface of the

earth. Topics are stored, indexed by their grid identifier. This allows for retrieval of topics

that have been used in a specific grid easily. Further, such indexing makes it easier to

look at topics that have been used in grids nearby.

In Palantir the geographic binning recommender takes in a latitude and longitude,

and outputs a set of candidate tags that are nearby.

4.3.2.3 Tweetopic

This algorithm, as described in [18], Figure 4-4 uses data of the users Tweet along

with data from a search engine to provide prospective tags. In short, the algorithm

formulates queries for a search engine and mines the results.

28

TWEETOPIC[nounPhrase]

1 results " search noun phrase on Google2 if results.length < 103 then tokNouns = TOKENIZE(nounPhrase)4 for each noun in tokNouns5 do noun.result " no. of results by

searching only for that noun6 sort tokNouns based on noun.result7 resultsMax gets NIL8 resultsMin gets NIL9 while resultsMax .length < 10

10 do11 resultsMax "

search tokNouns # tokNouns.minimum12 tokNouns "

tokNouns # tokNouns.minimum13 while resultsMin.length < 1014 do15 resultsMin "

search tokNouns # tokNouns.maximum16 tokNouns "

tokNouns # tokNouns.maximum17 results " resultMax $ resultMin18 for each url in results19 do keywords " keywords % TF-IDF(url .Text, 20)20 sort keywords by number of occurrences of words21 return top 5 unique keywords

Figure 4-4. Tweetopic Algorithm

Query a Search Engine

The noun phrases obtained from the previous step are sent to a search engine. An

iterative backoff is used to adjust the query until at least 10 results are obtained.

Identify Popular Terms in the Results

TF-IDF is used to identify about 20 key words per page. Key terms that occur more

than 5 times are considered as valid descriptors for the Tweet.

29

The Tweetopic algorithm does not learn directly from the tags entered into the system,

and cannot determine tags that are already in the system. Palantir uses prefix matching

to display preexisting tags when the user types in a tag not shown by Tweetopic.

An advantage of using Tweetopic is that the system does not have a cold start, and

provides tag recommendations even for Tweets which do not yet have corresponding

tags in the system.

4.3.2.4 Tag co-occurrence and Prefix matching

Tag co-occurrence grades pairs of tags based on their association with each other.

We assume that tags are similar and share traits if they occur with in similar context, i.e.

they are used to describe the same Tweet.

Each tag t, is numbered, and is represented as a sparse co-occurrence vector in

multidimensional space w = (f1, f2, ... , fN), where fi indicates how often w occurs with

ti . The similarity of two tags is measured by the proximity of the vectors. We use the

measure of cosine similarity to measure proximity which is given by

cos (#&x ,#&y ) =!ni=1 xiyi"!n

i=1 x2"!n

i=1 y2

The top n similar tags are chosen to be presented to the user.

Prefix matching works by considering the tag entered by the user as a prefix, and

fetching all tags already in the database which have the same prefix.

4.3.2.5 Tag Ranking

We adapt the notion that same tags coming from different mechanisms are more

important. To place these tags first, we sort the set of tags that have been identified from

all different modules on the frequency of occurrence, and then discard duplicates. Tags

with frequencies greater than one are presented in decreasing order of frequencies. For

tags that have a frequency of one, we compare them with the set of tags the user has

used before, and present those first, followed by the remaining tags, ordered by their

global frequency counts.

30

Figure 4-5. Tweet Entry Screen

31

Figure 4-6. Tag Suggestion Screen

32

4.4 User Interface

Tweets posted through Palantir are tagged by users aided by the tag recommender.

When users compose Tweets on the mobile client, tags are computed periodically after

the input of n words, and are then displayed on the screen, allowing the user to select

some, or enter a new ones. If the system is able to recommend more than 5 tags, users

could get to the next set of results by flicking the result bar. To enter a tag that is not

recommended, the user clicks on “Add New”, and is allowed to enter a custom tag.

When users are in the process of entering custom tags, they are shown tags that are

matched by prefix, to further enable them to pick a tag that is already in the system.

However, they are not restricted to any vocabulary, and are free to complete their tags in

any way that they see fit.

Palantir provides immediate feedback to users tagging Tweets by allowing them to

view Tweets described by similar tags, and providing them an opportunity to retroactively

change the tags that they selected for the Tweet they submitted.

Palantir allows people to browse others’ Tweets possibly constrained by a

geographic location or tags. While doing so, users are allowed to add or remove tags

from Tweets. This information is stored to allow Palantir to determine the membership of

a tag to a particular Tweet, which is the ratio of users who don’t remove the tag.

4.5 What Palantir Uses Tags

4.5.1 Create an User Interest Profile

For each user, we compute the top k most frequently used tags. This forms the

feature set for a particular user, and lets call it frequentSet. We incorporate feedback

given by other users (by virtue of removing tags) by computing the percentage of tags

that were removed by other users of the system, and a top-k list of these tags, called

disputedSet. These terms are recomputed whenever the user makes a post.In particular,

the set difference of frequentSet and DisputedSet gives us an idea of the topics posted

33

by this user that is accepted by other users, called acceptedSet, is the computer user

profile which is used to determine which polls are to target this user.

4.5.2 Searching, Topic Following

Using the web interface, users would follow Tweets by searching for topics that

interest them. This would be updated whenever people post new Tweets. This interface

allows users to reference Tweets and consolidate them into stories. The system makes

use of tag co-occurrence to show Tweets from related topics as well.

4.5.3 Tag Consolidation

When the number of unique tags in the system reach a specified size, we compute

the semantic similarity of words in the tag. Groups of words are presented to users

when they search for topics in the web application, and users are allowed to give input

on whether these words should be merged. When there is strong agreement between

users, the tags are merged and saved.

34

CHAPTER 5EXPERIMENTATION AND EVALUATION

Palantir relies on user input for its tasks, namely tag recommendation, polling

and fact consolidation. One method of doing this is by publicizing Palantir, and finding

a group of volunteers willing to create data that we could use for evaluation. This is

a tough undertaking, mainly in the way of recruiting volunteers, and iterating over

algorithms and experiments.

We believe that a labor market for micro-tasks, like Amazon Mechanical Turk (AMT),

is well suited for our experiments. Subsubsection 5.1.1.2 explores crowdsourcing with

a focus on AMT, existing research that makes use of that platform, our reasons for

choosing AMT and finally the experiments we run on AMT, and discussion about results.

5.1 Crowdsourcing

Crowdsourcing, a term coined by Jeff Howe in 2006, was described by him as

”the process by which the power of many can be leveraged to accomplish feats that

were once the province of a specialized few” [47]. Howe used examples of people

contributing to Wikipedia, uploading videos to YouTube, to demonstrate the concept.

Today, crowdsourcing is more popular when referring to online services where

publishers post tasks that are completed by a group of people to fulfill requirements

of the publisher. The tasks are diverse, and marketplaces like Amazon Mechanical Turk

[48] have abundant workers to ensure timely completion of tasks.

5.1.1 Amazon Mechanical Turk (AMT)

Amazon Mechanical Turk (AMT) [48] is an online labor marketplace that focuses on

assisting developers in using human input for their programs. Typically, the human input

required is for simple tasks that cannot yet be done algorithmically using computers

while being cost and time efficient. Some of these problems are easily solved by

humans after minimal training. An example of such a microtask might be to look at two

photos of a single person taken under different conditions to determine whether they

35

are the same person. AMT uses the tag line Artificial Artificial Intelligence to brand their

service, stemming from the fact that AMT could be used to perform tasks that Artificial

Intelligence cannot.

5.1.1.1 Basic Terminology

Workers

Workers are humans who select and complete one or multiple microtasks on

AMT. Workers are paid with rewards that are deposited into Amazon Payments

accounts, and may be withdrawn as cash. The reward or wage is determined

individually for each task group by requesters, and is subject to approval by

requesters.

Requesters

Requesters are people who publish tasks which are to be completed by workers.

The requester designs the tasks and the steps needed for its completion, decides

the reward for a task, and acceptance criteria for the task. Requesters may also

limit visibility of tasks depending on qualification of workers and range of previous

acceptance ratio of workers.

HIT

A Human Intelligence Task (HIT) is a job posted on AMT. Tasks that are easy and

well defined produce good results. Tasks range from selecting good pictures of

storefronts, identifying addresses, etc., to writing product description, etc.

HIT Type

The HIT Type refers to the characteristics of the HIT, viz. title of the HIT, the

requester who created the HIT, reward being offered, number of HITs of this

type, time for completion of HIT, auto-approval time after which workers get paid

automatically, qualifications of workers who can accept the HIT, and HIT expiry

time.

HIT Group

36

A HIT Group comprises of HITs of the same type. This is to enable workers to

easily find similar HITs. Workers prefer HIT groups with a large number of tasks

because they do not have to retrain, and could better their skill while picking up

jobs.

Assignment

AMT supports having multiple workers working on a replica of the same HIT. Each

of these replicas is called an assignment. AMT ensures that workers can only

complete a single assignment for an HIT. This allows requesters to evaluate the

quality of submissions by looking at how other workers have completed the same

task.

Qualifications

There are requirements the worker must have to work on HITs. These qualification

can be auto-generated by AMT or can be created by requesters. Auto-generated

qualification include criteria like approval percentage of the worker, the number of

assignments completed successfully, etc.; while custom qualifications are those

that are created by requesters, and are usually time bound tasks that need to be

completed that need to be completed according to the requesters specification to

be granted. Requesters are also able to grant qualifications for workers who have

previously worked for them. Sometimes multiple qualifications may be required for

a given HIT.

Reward

Reward is the wage paid to a worker for completing a HIT. On approval of a HIT

by a requester, rewards are automatically transferred from the prepaid Amazon

payments account of the requester to the account of the worker.

Life cycle of a HIT

For a requester, the first step is to register an account with Amazon payment

services using a US based credit/debit card. The role of the requester is creating

37

HITs which are subsequently put on the AMT marketplace. A HIT may be created

using one of three ways, viz. using the Requester User Interface (RUI), using the

AMT Application Programming Interface (API) or using the AMT Command Line

Tools (CLT). Regardless of the technique used to create a HIT, the requester

needs to provide a title, reward, time for completion of HIT, assignments,

auto-approval time, qualifications, HIT expiry time. The HIT itself can be hosted

on Amazon Web Services (AWS) or hosted externally. For a HIT submitted using

RUI and hosted on AWS, the requester provides a HIT template with placeholders

for data, which are then filled from a data file before being sent out to workers.

Requesters need to prepay the maximum amount which can be consumed by

their HIT groups before they appear in the market place. The minimum reward

for a HIT is $0.01, with Amazon charging a 10% commission of all payments

made using it’s platform. The minimum commission charged is $0.005. Once

HITs have been posted, they appear on the worker interface, which is by default

sorted by the most recent time a HIT has been posted/update. Workers then

select one or multiple jobs are of interest, and complete the tasks. When the

worker is done with one task in a HIT group, he/she is given the option to complete

another task with the same HIT type. One way to control this is to have a HIT

group with a number of assignments, where each worker is limited to submitting

a single assignment. Once HITs have been completed by workers, requesters

can review submissions and approve or deny payments. At this point, requesters

may optionally present a bonus to a worker, or set of workers. For workers on

AMT, the ratio of submissions to that of approvals is calculated and displayed

along with their profiles. Requesters have the option of restricting HITs to only

workers meeting a certain ratio. To deal with workers who consistent unsatisfactory

performance, AMT allows requesters to block workers. AMT does not provide any

such metrics for requesters, leaving workers without a way to rate requesters.

38

Community forums like Turker Nation [49] and Turkopticon [50] step in to provide

ratings and guidance about requesters. In general workers prefer to work with

requesters who have well defined HITs with clear acceptance criteria, and prompt

approval and payment practices. If a requester has ambiguous practices that hurt

the interests of workers, they may be blacklisted on a forum like Turker Nation

[49], causing them a lot of difficulty in getting work done using AMT. A HIT stops

appearing on the worker interface when either all HITs posted in that group has

completed, or if the time assigned for the expiry of the HIT is past. The expiry

time, and other parameters could be changed by a requester when the HIT is still

running, which causes the HIT to be updated and listed at the top of the worker

interface.

5.1.1.2 Related Work

Crowdsourcing is a relatively new concept that researchers have been toying with.

In 2004, Luis von Ahn and Laura Dabbish came up with the ESP Game [51], which

made use of crowd wisdom to annotate images. They estimated that 5000 people

playing the game for 31 days would assign labels to all images indexed by Google.

In a novel attempt, Michael Denton [52] used crowdsourcing to create art on a public

sidewalk. More traditional uses of crowdsourcing are data collection and sensing.

Apple collects wifi signal strength along with GPS coordinates to make their mapping

services accurate. Waze collects velocity information to provide realtime traffic data

to motorists. Google enriches their maps with user submitted photos pinned down to

specific coordinates.

Yahoo Researchers Mason and Suri [53] delve deeply into using Amazon

Mechanical Turk for behavioral research. The examine a plethora of work pertaining

to comparing AMT to traditional offline tests, and other tests administered online, and

summarize results. They notice that results from well designed studies on AMT are

qualitatively and quantitatively the same as those conducted in lab settings. Worker

39

demographics given by Suri and Watts [54] show that the average age of a worker is

30 years old, with 55% being female, and 45% being male, with majority of turkers from

USA and India. They also detail their experience conducting synchronous experiments

in AMT using waiting rooms.

AMT has been put through a variety of creative uses by developers and researchers.

Bernstein et. al [55] rely on TurKit’s [56] AMT algorithms to iteratively refine text inside

Microsoft Word. Soylent provides human powered text shortening, proofing, and an

interface for requesting arbitrary word processing tasks. CrowdDb [57], presents an

open world database model where queries for missing information is transparently

converted into AMT HITs, and crowd results aggregated into the results presented to the

user. VizWiz [58] empowers people with limited visual acuity to perform visual search

by harnessing cognitive skills of humans on AMT. VizWiz on a mobile device allows

the user to snap a picture, verbally ask a question about it, and get a human to reply in

minutes.

5.2 Experiments

This section presents results from experiments run with Palantir on AMT using data

from Twitter. Using the Twitter Streaming API, we collected about 270,000 Tweets from

Twitter during the second presidential debate between 9:00PM to 10.30PM EST October

16, 2012; which forms our base dataset. We found that a significant percent of these

Tweets were reTweets, which were eliminated. To ensure that we captured only English

Tweets related to politics, we use the word list provided in Table A-1. We then constrain

the Tweets to those that originated from the state of New York. This process reduced

the number of Tweets to 245, which were used for running 455 HITs on Mechanical

Turk between October 16 to October 24, 2012. We designed experiments to mimic

ways in which Palantir could be used, and evaluate user behavior when using Palantir’s

keystone, it’s tag recommendation system. We also vary parameters specific to AMT,

viz. price and jobs per HIT, and report on the time taken to complete tasks, and the

40

Tweet Corpus Palantir Tag Recommender

Annotated Tweet Corpus A (ATC-A)

A Stage 1


Annotated Tweet Corpus B (ATC-B)AMT

B Stage 2

Figure 5-1. Experiment 1: Palantir baseline

quality of responses. Workers who had an approval rating of at least 98% and could

demonstrate a basic understanding of Twitter concepts and terminology were chosen for

this task.

We note that we are using the same Tweet data corpus for different experiments so

that comparisons may be drawn. However, the functioning of AMT cannot be stringently

controlled, and results derived using AMT are dependent on properties of AMT at the

time of the experiments. Specifically, the portion of crowd doing experiments determine

results. This might cause variances in the results every time an experiment is run with

a different portion of the crowd. In addition, design of the specific tasks and assignment

of HITs significantly affect results. Despite this, we believe that we can get some

interesting insights from these experiments.

5.2.1 Experiment 1: Palantir Baseline

Our initial evaluation seeks to establish cold baseline performance of Palantir’s

tag recommender system. We run this experiment to see how well Palantir’s tag

recommender works when there is no previous data in Palantir.

41

This experiment is run in two stages. In the first stage, we run Palantir’s tag

recommender on every Tweet in our corpus, and set the number of tags to 5. As there

are no existing tags in the system, the results of this experiment are solely contributed

by our implementation of the Tweetopic algorithm, as described in Figure 4-4. In stage

two, to have an idea of qualitative performance of this algorithm, results were sent to

AMT (5 jobs per HIT, 2 assignments, 25 cents per HIT), asking workers to pick out

useful tags from the ones generated by Palantir. For each Tweet, workers were asked to

input the usefulness of every tag on a 5 point scale. The work flow for this experiment

is shown in Figure 5-1. Workers had to have an approval rating of above 98% and

demonstrate adequate understanding of Twitter and tagging, as determined by a custom

qualification to take part in this experiment.

5.2.1.1 Experimental Data

A histogram depicting the distribution of Tweet scores are shown in Figure 5-2B.

We use a tag rating of 0 to signify that the tag is blank, while ratings 1 to 5 range from

’not useful’ to ’most useful’. We found that just 7.1% of tags supplied by Palantir were

marked as most useful, and 6.2% marked as useful. The tag cloud corresponding to the

338 unique tags generated by Palantir are shown in Figure 5-2A, while the popularity of

tags are shown in Figure 5-3A.

Workers took about 19 hours to complete this job, with an average completion time

per HIT being 5.9 minutes.

5.2.1.2 Analysis

We find that Palantir performs inadequately at recommending tags to users when it

it started without any data. We note that a majority of turkers strongly felt that tags were

unrelated to the Tweets presented. We posit that the rather long time to complete this

set of HITs was because we were new requesters on the AMT marketplace. Workers

are cautious about new requesters as requesters have the ability to arbitrarily reject

work done by workers, which impacts worker’s approval rating negatively. In fact, the

42

first results for this batch of HITs started coming in only after introducing the HITs, and

providing a short description of what they are used for, and how we would evaluate them

on TurkerNation [49]. Some workers are apprehensive of submitting work to requesters

who evaluate their work using majority rules, perhaps due to a perceived threat that their

work could be rejected even if they are correct.

5.2.2 Experiment 2: Unguided Human Baseline

This experiment, depicted in Figure 5-4, is used to determine baseline performance

of AMT for tagging a Tweet. Our goal is to examine the behavior of people annotating

Tweets without any guidance or recommendations. We are interested in finding out the

quantity and quality of these tags, and how useful the community perceives these tags.

This experiment runs similar to Experiment 1, but with a different first stage. As we are

benchmarking the performance of the crowd, the first stage is changed so that tags are

generated by turkers. This HIT provides a Tweet, and asks the turker to suggest up to 5

tags for it.The methodology of stage two of this experiment is identical to that of the first

experiment. We hope to measure community acceptance of tags generated by turkers.

5.2.2.1 Experimental Results

A histogram of workers rating is presented in Figure 5-5B, and the tag cloud is

shown in Figure 5-5A. We find, on average that 14.94% of tags generated by the turker

population working on the first stage was marked as most useful, and 10.16% of tags

marked useful by turkers working on the second stage. It is also interesting to note

that 19.36% of spaces for tags were left blank, indicating that fewer than five tags were

required for some Tweets. For stage one, when we paid the turkers a wage of 27 cents

per job, the batch was completed in 19 hours, with average completion time per Tweet

being 7 minutes. This batch was submitted to AMT along with the first experiment.

For the second stage, when we paid the turkers a wage of 27 cents per job, the batch

was completed in 1 hour and 40 minutes, with the average completion time per Tweet

43

A Tag cloud

0 1 2 3 4 5

Usefulness

0

500

1000

1500

2000

Num

ber

ofTa

gs

E1: Palantir baseline

B Usefulness of tags

Figure 5-2. Experiment 1: Results

44

0 50 100 150 200 250 300 350

Tags Tested

100

101

102P

opula

rity

E1: Palantir baseline

A Tag popularity


Tweet Corpus Annotated Tweet Corpus A (ATC-A)AMT

A Stage 1


Annotated Tweet Corpus B (ATC-B)AMT

B Stage 2

Figure 5-4. Experiment 2: Tweets tagged using only using AMT

45

A Tag cloud

0 1 2 3 4 5

Usefulness

0

500

1000

1500

2000

Num

ber

ofTa

gs

E2: Human baseline



being 14.7 minutes. This differs significantly from the time taken by the second stage of

experiment one by +8.8 min.

5.2.2.2 Analysis

We see that for a good part, humans in the AMT community agree about the quality

of tags applied by their peers. Only 3.1% of tags were marked as not useful. There

were 303 unique tags generated by humans in this experiment, similar to what Palantir

baseline generated. However, there is a significant difference in the multiplicity of the

tags as revealed by the respective tag clouds in Figure 5-5A, and Figure 5-2A. We

46

0 50 100 150 200 250 300 350

Tags Tested

100

101

102

103

Popula

rity

E2: Human baseline

A Tag popularity


also compare tag popularity of these two experiments, as shown in Figure 5-3A, and

Figure 5-6A to note this difference. We see that while the number of unique tags are

similar, there are few tags which are considerably more popular than their alternatives.

The fact that these alternative tags aren’t discoverable proves unfavorable to them,

effectively burying them. The additional time of about 9 minutes taken per job was

puzzling to us until we communicated with a worker, who informed us that when workers

accept multiple jobs, the timer signifying that a job was started is triggered (even if they

did not start working on the job). This shows that relying on job completion time as a

metric for AMT experiments should be a well considered decision.

5.2.3 Experiment 3: Heated Palantir

The goal of this experiment is to measure the quality of tags produced by Palantir’s

tag recommender when it has prior data about tags that have been used. It also

47

ATC-B-EXPT1

ATC-B-EXPT2

AMTAnnotated Tweet Corpus B(ATC-D)

Annotated Tweet Corpus C(ATC-C)

Palantir Tag Recommender

Annotated Tweet Corpus C(ATC-C)

Palantir Tag DB

Tweet Corpus

Figure 5-7. Experiment 3: Tweets tags recommended by Palantir and validated by AMT

illustrates the influence of guidance, showing us the number of times people pick a

tag that exists as opposed to creating their own tags.

We use the results of the previous experiment, described in Subsection 5.2.2 to

populate data structures used by Palantir. To realistically model Palantir’s use case, we

run this experiment in a single stage, with Palantir providing tag recommendations. The

experiment is depicted in Figure 5-7. This HIT was run with 1 job per assignment, with

each job containing 5 Tweets.


48

A Tag cloud

0 1 2 3 4 5

Usefulness

0

500

1000

1500

2000

Num

ber

ofTa

gs

E3: Heated palantir



49

0 50 100 150 200 250

Tags Tested

100

101

102

Popula

rity

E3: Heated Palantir

A Tag popularity


0 1 2 3 4 5

Usefulness

0

500

1000

1500

2000

Num

ber

ofTa

gs

A All tags

0 1 2 3 4 5

Usefulness

0

500

1000

1500

2000

Num

ber

ofTa

gs

B First four tags

0 1 2 3 4 5

Usefulness

0

500

1000

1500

2000

Num

ber

ofTa

gs

C First three tags

Figure 5-10. Experiment 3: Histograms showing the variation in the usefulnessdistribution of Tweets when only the first 5, 4, 3 tags presented by Palantirare chosen

50

Figure 5-8B shows the distribution of workers rating the Tweets generated by

Palantir when it has access to a tag corpus. The tag cloud corresponding to this

experiment is shown in Figure 5-8A. This tag cloud is generated by taking into account

overrides by turkers. In cases where a tag recommended by Palantir is replaced by the

turker, we use the replaced tag. In the histogram, we see that 19.7% of tags are rated as

most useful by workers while 10.8% are rated as useful. The percentage of tags marked

as not useful is 18.26%, which is much better than the initial performance of Palantir in

experiment 1, where we saw this metric was as high as 36.9%. Turkers were paid 42

cents, and took 19 hours and 30 minutes to complete this batch. The average response

time was 4.85 minutes.

5.2.3.2 Analysis

At first glance, it is surprising to note that the number of tags marked as most useful

has increased compared to the previous experiment, which involved humans rating tags

created by other humans. We feel that this is attributed to the fact that while humans

put in only the most relevant tags pertaining to a Tweet, Palantir recommends a higher

number of tags, providing a chance for more tags to be relevant. This also causes

the percentage of tags that are marked as ’not useful’ to rise to 18.2%. We believe

that this is a good trade off compared to having to type in more tags, which would be

more pronounced on a space constrained mobile device. Figure 5-10 explores the

behavior of the Palantir’s tag recommender system when we constrain it to five, four

and three tags. We find that tags are somewhat equally distributed across all bins, and

that this does not significantly change the distribution. The tag cloud Figure 5-8A and

tag distribution Figure 5-9A show that there is a larger set of popular tags compared to

the previous experiment. We feel that recommending tags make people aware of the

options they have before they invent their own tags which may not have mainstream

appeal. This is evidenced by the fact that only 217 unique tags were applied to Tweets in

51

Tweet TagsMedicineBill GatesCharityGates

Welfare

Melinda

AMT

Medicine---

Bill GatesGates

---Melinda

CharityWelfare

Figure 5-11. Experiment 4: Synonym detection on AMT

this experiment, as opposed to 303 unique tags in experiment 2, and 338 unique tags in

experiment 1.

5.2.4 Experiment 4:AMT Synonym Detection

This experiment is different from the preceding experiments in that we don’t ask

turkers to generate new tags. Instead, we ask them to split tags that have been applied

to a specific Tweet into buckets. The goal of this experiment is to determine how many

tags that have been applied to a single Tweet are words that might be considered

synonyms by the community. It is important to note that the conventional method of

52

1.0 1.5 2.0 2.5 3.0 3.5 4.0

Synonym groups

0

20

40

60

80

100

120

Tweets

A Synonym groups distribution

1.0 1.5 2.0 2.5 3.0 3.5 4.0

Tags

0

50

100

150

200

250

Syn

onym

gro

ups

B Tags distribution

Figure 5-12. Experiment 4: Similar Words

using a synonym dictionary like WordNet [59] fails here as some of the words might not

be considered to by synonyms without context, or don’t have synonyms. For example,

one tagging community might consider football and rugby to be equivalent while another

community considers football and soccer to mean the same thing. An advantage

of using AMT for this is that turkers also do entity resolution, consolidating tags like

Samsung galaxy s3 and sgs3.

This HIT is structured as shown in Figure 5-11, wherein a Tweet is presented

with tags applied to it. These tags have been consolidated per Tweet from previous

experiments. The interface presents workers with 5 buckets where they can input tags.

Tags may be repeated in different buckets.


Figure 5-12 describes the result of this experiment. We see that there an average

of 2.58 tag groups per Tweet having on average 2.21 tags per group. When we paid 35

cents for this task, workers took 10 hours to complete it, with the average response time

being 14.5 minutes.

5.2.4.2 Analysis

As seen by the results of this experiments, the same tagging community may use

multiple words which have similar meanings to tag Tweets. The fact that the community

53

is able to partition tags into buckets, on an average of 2.58 buckets per Tweet suggest

that when people apply more than 3 tags to a Tweet, they are using multiple synonyms

to tag the same Tweet. This may be bought down by ensuring that synonyms are not

automatically suggested by the system, as they happen to be now.

5.2.5 Summary of Results

We see that the performance of Palantir as measured by the percentage of tags

people rated as ’most useful’ and ’useful’ rises dramatically while the percentage of

tags rated as ’not useful’ falls to about half of what it is when there is no data loaded.

In addition, we note that there are only 217 unique tags when tagging using Palantir,

as compared to 303 when unguided humans tag. While we aren’t able to comment on

the time taken by the AMT community to apply these tags, we expect that tagging using

Palantir would be quicker compared to using an unguided approach.

54

CHAPTER 6CONCLUSION AND FUTURE WORK

6.0.6 Conclusion

In this thesis, we described Palantir, an architecture that uses collective human

intelligence in microblogging as a means to achieve coherent snapshots of real

world events. We studied the feasibility of recommending tags to an online microblog

community, and measured how useful the tags were for the community. We notice that

when humans are left to tag microblog posts without any guidance on the content of

tags, they select tags that are very general, or are too specific, indicating trouble in

selecting a tag which has the right number of selectivity. Picking a tag that is highly

specific to a Tweet is a setback to the tag’s popularity, as it cannot be applied to most

other Tweets. On the other hand, there are cases where tags specific to users have

helped them retrieve Tweets that are of interest to that user alone. Palantir assists users

by showing candidate tags which may be easily applied. For users new to the system,

this guidance could be important to having them engage directly and instantaneously.

Palantir, even when inaccurate, passively increases the user’s knowledge of tags

existing in the system, contributing to serendipitous discoveries of tags and interests.

6.0.7 Future Work

Palantir was conceptualized with the goal of persuading people of participating

in content creation during times when such content is most crucial. While the tag

recommender system allows people to join in on conversations around a particular

tag, and discover new tags to contribute to, Palantir can go much farther, by enabling

other scenarios summarized in the beginning of Chapter 4. We highlight some of these

possibilities below.

6.0.7.1 Content Syndication

People using Palantir are already plugged in to a stream of information generated

by Twitter, which has been filtered by topics that are of interest to them, and topics they

55

contribute to. We can foster citizen journalism by enabling Palantir users to form ad hoc

groups to report on a specific event. By constructing Palantir as a service, we could

create clients on a variety of devices. People using Palantir on different devices may

use it for different purposes. On a device like a desktop, which has a big screen with

a specialized text input device, an user may comb through Tweets of interest to pick

ones that he could weave into a story. A mobile user may improve Palantir by applying

Tweets to tags and reporting on events. A symbiotic relation may be established by

these groups of people where users having mobile devices are report from the site of

the event while those on more powerful machines channel these tidbits into information

that can be readily digested by outsiders.

6.0.7.2 Survey Creation

A way to make the previous approach more robust would be to ask more people

for information, as supplemental viewpoints may paint a holistic picture. Further, mobile

devices are great for content consumption, and people can use the same to read

through such articles and tag, comment and rate them. Because Palantir already knows

users’ location, and tags that they have contributed towards most, it might be possible to

target these questions to the subsection of people who are likely to be more interested

in it. With the advent of push notifications, high speed data, and computation power

available in todays portable mobile devices, we believe that we are no longer shackled

by hardware capabilities. However, we would need to develop sophisticated ways of

managing reputation of these users in the system, and provide human stewardship

to sustain and grow the community. Of significant importance is having the ability of

filtering out misinformation from Tweets. A reputation management system that does

provides good reach to new people constructively using the system while simultaneously

preventing established users from twisting current facts would work well for this purpose.

It would be an interesting challenge to build a trusted system that is expressive while not

being overly constrained.

56

APPENDIXWORDLISTS

Table A-1. Filter Terms

S.no Word

1 47

2 5 trillion

3 6 studies

4 abort

5 abortion

6 absurd

7 accurate

8 afghanistan

9 akin

10 alaska

11 ambassador

12 anderson

13 anti

14 apploause

15 approval

16 arafat

17 arithmetic

18 assessment

19 bain

20 benjamin

21 bibi

22 biden

23 big government

57

Table A-1. Continued

24 billionaires

25 bin laden

26 bipartisan

27 bird

28 bowles

29 brilliance

30 broad-minded

31 budget

32 buffet

33 bush

34 business

35 canada

36 candidates

37 candy

38 cbo

39 charlie

40 cheers

41 cheny

42 china

43 chinese

44 chouces

45 civil rights

46 class

47 college

48 colorado

49 commander

58


50 commission

51 companies

52 congressman

53 controversial issues

54 cooper

55 corporate

56 credibility

57 crippling

58 crist

59 criticial

60 crowley

61 daily

62 debate

63 democrat

64 denver

65 depression

66 detroit

67 differences

68 different

69 dishonest

70 dodge

71 domestic

72 donald

73 dream act

74 earth

75 economy

59


76 education

77 egypt

78 eisenhower

79 elk

80 energy

81 environmental policy

82 exxonmobil

83 fact check

84 federal deficit

85 finland

86 foreign policy

87 fraud

88 fundamentalist

89 gallup

90 gates

91 gay

92 gay marriage

93 giuliani

94 gop

95 governer

96 governing

97 government

98 graduate

99 green

100 growth

101 half

60


102 health care

103 health care reform

104 hempstead

105 hispanic

106 hisses

107 hofstra

108 homeland security

109 benghazi

110 incentives

111 inclusive

112 independent

113 intelligence

114 intolerant

115 iran

116 iraq

117 israel

118 israelis

119 jet

120 jim

121 jobs

122 kill

123 korans

124 kosher

125 language

126 latino

127 left

61


128 lehrer

129 liar

130 libya

131 linda

132 lying

133 malarkey

134 marine

135 martha

136 math

137 mcmahon

138 medicaid

139 medicare

140 michelle

141 mid east policy

142 middle eastern policy

143 military

144 mitt

145 morris

146 mubarak

147 mullahs

148 multi-cultural

149 netanyahu

150 newshour

151 nominee

152 nuclear

153 ny

62


154 obama

155 obamacare

156 obsolete

157 ohio

158 oil

159 opportunity

160 osama

161 oval

162 overseas

163 overwhelming

164 palestine

165 pbs

166 peace

167 peaceful

168 perception

169 philip morris

170 pickering

171 plutocrat

172 polluter

173 pollution

174 potus

175 president

176 principal

177 priority

178 pro choice

179 pro environment

63


180 progressive

181 queada

182 raddatz

183 rape

184 reagan

185 regressive

186 resilience

187 right-wing

188 roby

189 roe v. wade

190 roll

191 romney

192 romneycare

193 roughly

194 rudy

195 russia

196 russian

197 ryan

198 sanctions

199 satan

200 school

201 scotus

202 sensata

203 shipping

204 silent

205 simpson

64


206 six studies

207 slowest

208 small

209 small government

210 smoking

211 social security

212 social services

213 socialist

214 soup

215 souza

216 specifiecs

217 stewart

218 stockman

219 stunt

220 subjects

221 taces

222 taibbi

223 taliban

224 tax code

225 ticket

226 todd

227 tolerant

228 training

229 trickle-down

230 tripoli

231 trump

65


232 tsa

233 unbalanced

234 uninsured

235 veracity

236 vietnam

237 vp

238 waivers

239 war

240 warren

241 wealth

242 wealthy

243 weapon

244 women

245 work

246 yassir

247 york

66

REFERENCES

[1] “State of the news media,” 2012. [Online]. Available: http://www.stateofthemedia.org

[2] M. Garber, “Twitter, the conversation-enabler? actually, most news orgs use theservice as a glorified rss feed,” 2011. [Online]. Available: goo.gl/tWrMs

[3] G. Chen, “Breaking-news situations require a breaking-newsapproach,” 2012. [Online]. Available: http://www.niemanlab.org/2012/01/gina-chen-breaking-news-situations-require-a-breaking-news-approach/

[4] S. Sen, S. K. Lam, A. M. Rashid, D. Cosley, D. Frankowski, J. Osterhouse, F. M.Harper, and J. Riedl, “tagging, communities, vocabulary, evolution,” in Proceedingsof the 2006 20th anniversary conference on Computer supported cooperative work,ser. CSCW ’06. New York, NY, USA: ACM, 2006, pp. 181–190. [Online]. Available:http://doi.acm.org/10.1145/1180875.1180904

[5] “Storify,” 2012. [Online]. Available: http://storify.com/

[6] “Storify: About us,” 2012. [Online]. Available: http://storify.com/about

[7] M. J. Tenore, “25 ways to use facebook, twitter storify to improve political coverage,”2011. [Online]. Available: http://www.poynter.org/how-tos/digital-strategies/151883/25-ways-to-use-facebook-twitter-storify-to-improve-election-coverage/

[8] E. Zak, “How journalists can use storify to cover any type ofmeeting,” 2012. [Online]. Available: http://www.mediabistro.com/10000words/how-to-use-storify-to-cover-a-meeting-workshop-or-event b9068

[9] “Appstore: Vibe,” 2012. [Online]. Available: http://itunes.apple.com/us/app/vibe/id433067417?mt=8

[10] J. Wortham, “Messaging app grows with wall street protests,”2011. [Online]. Available: http://bits.blogs.nytimes.com/2011/10/12/anonymous-messaging-app-vibe-gets-boost-from-occupy-wall-street/

[11] “Dataminr,” 2012. [Online]. Available: http://www.dataminr.com/

[12] T.C.Sottek, “Dataminr analyzes over 340 million tweets a day totrack and predict global events,” 2012. [Online]. Available: http://www.theverge.com/2012/4/9/2936816/dataminr-twitter-data-predict-events

[13] “Wikinews,” 2012. [Online]. Available: http://en.wikinews.org

[14] “About cnn ireport,” 2012. [Online]. Available: http://ireport.cnn.com/about.jspa

[15] K. Ehrlich and N. Shami, “Microblogging inside and outside the workplace,”in In Proceedings of the 4th International AAAI Conference on Weblogs and SocialMedia, 2010, (ICWSM 2010), AAAI Publications. [Online]. Available:http://www.cs.cornell.edu/!sadats/icwsm2010.pdf

67

http://www.stateofthemedia.org

goo.gl/tWrMs

http://www.niemanlab.org/2012/01/gina-chen-breaking-news-situations-require-a-breaking-news-approach/

http://www.niemanlab.org/2012/01/gina-chen-breaking-news-situations-require-a-breaking-news-approach/

http://doi.acm.org/10.1145/1180875.1180904

http://storify.com/

http://storify.com/about

http://www.poynter.org/how-tos/digital-strategies/151883/25-ways-to-use-facebook-twitter-storify-to-improve-election-coverage/

http://www.poynter.org/how-tos/digital-strategies/151883/25-ways-to-use-facebook-twitter-storify-to-improve-election-coverage/

http://www.mediabistro.com/10000words/how-to-use-storify-to-cover-a-meeting-workshop-or-event_b9068

http://www.mediabistro.com/10000words/how-to-use-storify-to-cover-a-meeting-workshop-or-event_b9068

http://itunes.apple.com/us/app/vibe/id433067417?mt=8

http://itunes.apple.com/us/app/vibe/id433067417?mt=8

http://bits.blogs.nytimes.com/2011/10/12/anonymous-messaging-app-vibe-gets-boost-from-occupy-wall-street/

http://bits.blogs.nytimes.com/2011/10/12/anonymous-messaging-app-vibe-gets-boost-from-occupy-wall-street/

http://www.dataminr.com/

http://www.theverge.com/2012/4/9/2936816/dataminr-twitter-data-predict-events

http://www.theverge.com/2012/4/9/2936816/dataminr-twitter-data-predict-events

http://en.wikinews.org

http://ireport.cnn.com/about.jspa

http://www.cs.cornell.edu/~sadats/icwsm2010.pdf

[16] A. Mathes, “Folksonomies - cooperative classification and communication throughshared metadata,” December 2004. [Online]. Available: http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html

[17] M. Guy and E. Tonkin, “Folksonomies: Tidying up tags?” D-Lib Magazine, vol. 12, no. 1, January 2006. [Online]. Available:http://www.dlib.org/dlib/january06/guy/01guy.html

[18] M. S. Bernstein, B. Suh, L. Hong, J. Chen, S. Kairam, and E. H. Chi, “Eddi :Interactive topic-based browsing of social status streams,” Fortune, pp. 303–312,2010. [Online]. Available: http://portal.acm.org/citation.cfm?id=1866077

[19] A. M. Kaplan and M. Haenlein, “The early bird catches the news:Nine things you should know about micro-blogging,” Business Horizons,vol. 54, no. 2, pp. 105–113, March 2011. [Online]. Available: http://ideas.repec.org/a/eee/bushor/v54yi2p105-113.html

[20] “Wikipedia: Microblogging,” 08 2012. [Online]. Available: http://en.wikipedia.org/wiki/Microblogging

[21] “Twitter blog: One hundred million voices,” 2012. [Online]. Available:http://blog.twitter.com/2011/09/one-hundred-million-voices.html

[22] “Twitter to surpass 500 million users,” 2012. [Online]. Available:http://www.mediabistro.com/alltwitter/500-million-registered-users b18842

[23] H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a newsmedia?” in Proceedings of the 19th international conference on World wide web,ser. WWW ’10. New York, NY, USA: ACM, 2010, pp. 591–600. [Online]. Available:http://doi.acm.org/10.1145/1772690.1772751

[24] Z. Papacharissi and M. de Fatima Oliveira, “Affective news andnetworked publics: The rhythms of news storytelling on egypt,” Journalof Communication, vol. 62, no. 2, pp. 266–282, 2012. [Online]. Available:http://dx.doi.org/10.1111/j.1460-2466.2012.01630.x

[25] L. Grossman, “Iran protests: Twitter, the medium of the movement,” Time Magazine,vol. 17, 2009.

[26] C. Beaumont, “New york plane crash: Twitter breaks the news, again,”2009. [Online]. Available: http://www.telegraph.co.uk/technology/twitter/4269765/New-York-plane-crash-Twitter-breaks-the-news-again.html

[27] J. Wortham, “Michael jackson tops the charts on twitter,”2009. [Online]. Available: http://bits.blogs.nytimes.com/2009/06/25/michael-jackson-tops-the-charts-on-twitter/

68

http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html

http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html

http://www.dlib.org/dlib/january06/guy/01guy.html

http://portal.acm.org/citation.cfm?id=1866077

http://ideas.repec.org/a/eee/bushor/v54yi2p105-113.html

http://ideas.repec.org/a/eee/bushor/v54yi2p105-113.html

http://en.wikipedia.org/wiki/Microblogging

http://en.wikipedia.org/wiki/Microblogging

http://blog.twitter.com/2011/09/one-hundred-million-voices.html

http://www.mediabistro.com/alltwitter/500-million-registered-users_b18842

http://doi.acm.org/10.1145/1772690.1772751

http://dx.doi.org/10.1111/j.1460-2466.2012.01630.x

http://www.telegraph.co.uk/technology/twitter/4269765/New-York-plane-crash-Twitter-breaks-the-news-again.html

http://www.telegraph.co.uk/technology/twitter/4269765/New-York-plane-crash-Twitter-breaks-the-news-again.html

http://bits.blogs.nytimes.com/2009/06/25/michael-jackson-tops-the-charts-on-twitter/

http://bits.blogs.nytimes.com/2009/06/25/michael-jackson-tops-the-charts-on-twitter/

[28] C. Beaumont, “Mumbai attacks: Twitter and flickr used to break news,” 2008.[Online]. Available: http://www.telegraph.co.uk/news/worldnews/asia/india/3530640/Mumbai-attacks-Twitter-and-Flickr-used-to-break-news-Bombay-India.html

[29] J. O’Dell, “One twitter user reports live from osama bin laden raid,” 2011. [Online].Available: http://mashable.com/2011/05/02/live-tweet-bin-laden-raid/

[30] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes twitter users:real-time event detection by social sensors,” in Proceedings of the 19th internationalconference on World wide web, ser. WWW ’10. New York, NY, USA: ACM, 2010,pp. 851–860. [Online]. Available: http://doi.acm.org/10.1145/1772690.1772777

[31] D. Boyd, S. Golder, and G. Lotan, “Tweet, tweet, retweet: Conversational aspects ofretweeting on twitter,” in System Sciences (HICSS), 2010 43rd Hawaii InternationalConference on, jan. 2010, pp. 1 –10.

[32] A. Java, X. Song, T. Finin, and B. Tseng, “Why we twitter: understandingmicroblogging usage and communities,” in Proceedings of the 9th WebKDD and1st SNA-KDD 2007 workshop on Web mining and social network analysis, ser.WebKDD/SNA-KDD ’07. New York, NY, USA: ACM, 2007, pp. 56–65. [Online].Available: http://doi.acm.org/10.1145/1348549.1348556

[33] P. Andre, M. S. Bernstein, and K. Luther, “Who Gives A Tweet? EvaluatingMicroblog Content Value,” in Proceedings of CSCW 2012, Feb. 2012. [Online].Available: http://www.cs.cmu.edu/!pandre/pubs/whogivesatweet-cscw2012.pdf

[34] D. Boyd, “Twitter: ”pointless babble” or peripheral awareness?” 2009. [Online].Available: http://www.zephoria.org/thoughts/archives/2009/08/16/twitter pointle.html

[35] A. Watters, “How recent changes to twitter’s terms of service might hurt academicresearch,” 2011.

[36] R. Higashinaka, N. Kawamae, K. Sadamitsu, Y. Minami, T. Meguro, K. Dohsaka, andH. Inagaki, “Building a conversational model from two-tweets,” in Automatic SpeechRecognition and Understanding (ASRU), 2011 IEEE Workshop on, dec. 2011, pp.330 –335.

[37] S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts, “Who says what to whom ontwitter,” in Proceedings of the 20th international conference on World wide web, ser.WWW ’11. New York, NY, USA: ACM, 2011, pp. 705–714. [Online]. Available:http://doi.acm.org/10.1145/1963405.1963504

[38] C. Arthur, “What is the 1% rule?” 2006. [Online]. Available: http://www.guardian.co.uk/technology/2006/jul/20/guardianweeklytechnologysection2

[39] M. P. Bill Heil, “New twitter research: Men follow men and nobody tweets,” 2009.[Online]. Available: http://blogs.hbr.org/cs/2009/06/new twitter research men follo.html

69

http://www.telegraph.co.uk/news/worldnews/asia/india/3530640/Mumbai-attacks-Twitter-and-Flickr-used-to-break-news-Bombay-India.html

http://www.telegraph.co.uk/news/worldnews/asia/india/3530640/Mumbai-attacks-Twitter-and-Flickr-used-to-break-news-Bombay-India.html

http://mashable.com/2011/05/02/live-tweet-bin-laden-raid/

http://doi.acm.org/10.1145/1772690.1772777

http://doi.acm.org/10.1145/1348549.1348556

http://www.cs.cmu.edu/~pandre/pubs/whogivesatweet-cscw2012.pdf

http://www.zephoria.org/thoughts/archives/2009/08/16/twitter_pointle.html

http://doi.acm.org/10.1145/1963405.1963504

http://www.guardian.co.uk/technology/2006/jul/20/guardianweeklytechnologysection2

http://www.guardian.co.uk/technology/2006/jul/20/guardianweeklytechnologysection2

http://blogs.hbr.org/cs/2009/06/new_twitter_research_men_follo.html

http://blogs.hbr.org/cs/2009/06/new_twitter_research_men_follo.html

[40] M. Newman, “Power laws, pareto distributions and zipf’s law,” ContemporaryPhysics, vol. 46, no. 5, pp. 323–351, 2005. [Online]. Available:http://www.tandfonline.com/doi/abs/10.1080/00107510500052444

[41] A. Maslow, “A theory of human motivation,” Psychological Review, vol. 50,pp. 370–396, 1943. [Online]. Available: http://psychclassics.yorku.ca/Maslow/motivation.htm

[42] S. Sen, J. Vig, and J. Riedl, “Tagommenders: connecting users to items throughtags,” in Proceedings of the 18th international conference on World wide web, ser.WWW ’09. New York, NY, USA: ACM, 2009, pp. 671–680. [Online]. Available:http://doi.acm.org/10.1145/1526709.1526800

[43] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation.” Journalof Machine Learning Research, vol. 3, pp. 993–1022, 2003. [Online]. Available:http://dblp.uni-trier.de/db/journals/jmlr/jmlr3.html#BleiNJ03

[44] T. Hofmann, “Probabilistic latent semantic analysis.” in UAI, K. B. Laskey andH. Prade, Eds. Morgan Kaufmann, 1999, pp. 289–296. [Online]. Available:http://dblp.uni-trier.de/db/conf/uai/uai1999.html#Hofmann99

[45] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part-of-speechtagging with a cyclic dependency network,” in NAACL ’03: Proceedings of the 2003Conference of the North American Chapter of the Association for ComputationalLinguistics on Human Language Technology. Morristown, NJ, USA: Associationfor Computational Linguistics, 2003, pp. 173–180. [Online]. Available:http://portal.acm.org/citation.cfm?id=1073445.1073478

[46] Defense Mapping Agency, “The universal grids: Universal Transverse Mercator(UTM) and Universal Polar Stereographic (UPS),” Defense Mapping Agency,Hydrographic/Topographic Center, Fairfax, VA, USA, Tech. Rep. TM8358.2, 1989.[Online]. Available: http://earth-info.nga.mil/GandG/publications/

[47] J. Howe, Crowdsourcing: Why the Power of the Crowd Is Driving the Future ofBusiness, 1st ed. Crown Business, August 2008. [Online]. Available:http://www.worldcat.org/isbn/0307396207

[48] “Wikipedia: Amazon mechanical turk,” 2012. [Online]. Available: http://en.wikipedia.org/wiki/Amazon Mechanical Turk

[49] “mturk forum: Turker nation,” 2012. [Online]. Available: www.turkernation.com

[50] “Turkopticon,” 2012. [Online]. Available: http://turkopticon.differenceengines.com/

[51] L. von Ahn and L. Dabbish, “Labeling images with a computer game,” inProceedings of the SIGCHI conference on Human factors in computing systems,ser. CHI ’04. New York, NY, USA: ACM, 2004, pp. 319–326. [Online]. Available:http://doi.acm.org/10.1145/985692.985733

70

http://www.tandfonline.com/doi/abs/10.1080/00107510500052444

http://psychclassics.yorku.ca/Maslow/motivation.htm

http://psychclassics.yorku.ca/Maslow/motivation.htm

http://doi.acm.org/10.1145/1526709.1526800

http://dblp.uni-trier.de/db/journals/jmlr/jmlr3.html#BleiNJ03

http://dblp.uni-trier.de/db/conf/uai/uai1999.html#Hofmann99

http://portal.acm.org/citation.cfm?id=1073445.1073478

http://earth-info.nga.mil/GandG/publications/

http://www.worldcat.org/isbn/0307396207

http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk

http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk

www.turkernation.com

http://turkopticon.differenceengines.com/

http://doi.acm.org/10.1145/985692.985733

[52] M. Denton, “Crowdsourcing the production of public art,” Master’s thesis, Massey,2010. [Online]. Available: http://mro.massey.ac.nz/handle/10179/1345

[53] W. Mason and S. Suri, “Conducting behavioral research on amazon’s mechanicalturk,” Behavior Research Methods, vol. 44, pp. 1–23, 2012. [Online]. Available:http://dx.doi.org/10.3758/s13428-011-0124-6

[54] S. Suri and D. J. Watts, “Cooperation and contagion in web-based, networked publicgoods experiments,” SIGecom Exch., vol. 10, no. 2, pp. 3–8, Jun. 2011. [Online].Available: http://doi.acm.org/10.1145/1998549.1998550

[55] M. S. Bernstein, G. Little, R. C. Miller, B. Hartmann, M. S. Ackerman, D. R. Karger,D. Crowell, and K. Panovich, “Soylent: a word processor with a crowd inside,”in Proceedings of the 23nd annual ACM symposium on User interface software andtechnology, ser. UIST ’10. New York, NY, USA: ACM, 2010, pp. 313–322. [Online].Available: http://doi.acm.org/10.1145/1866029.1866078

[56] G. Little, “Turkit: Tools for iterative tasks on mechanical turk,” in Visual Languagesand Human-Centric Computing, 2009. VL/HCC 2009. IEEE Symposium on, sept.2009, pp. 252 –253.

[57] M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin, “Crowddb:answering queries with crowdsourcing,” in Proceedings of the 2011 internationalconference on Management of data. New York, NY, USA: ACM, 2011, pp. 61–72.[Online]. Available: http://doi.acm.org/10.1145/1989323.1989331

[58] J. P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R. C. Miller,A. Tatarowicz, B. White, S. White, and T. Yeh, “Vizwiz: nearlyreal-time answers to visual questions.” in W4A, C. Asakawa, H. Takagi,L. Ferres, and C. C. Shelly, Eds. ACM, 2010, p. 24. [Online]. Available:http://dblp.uni-trier.de/db/conf/w4a/w4a2010.html#BighamJJLMMTWWY10

[59] “Wordnet,” 2012. [Online]. Available: http://wordnet.princeton.edu/

71

http://mro.massey.ac.nz/handle/10179/1345

http://dx.doi.org/10.3758/s13428-011-0124-6

http://doi.acm.org/10.1145/1998549.1998550

http://doi.acm.org/10.1145/1866029.1866078

http://doi.acm.org/10.1145/1989323.1989331

http://dblp.uni-trier.de/db/conf/w4a/w4a2010.html#BighamJJLMMTWWY10

http://wordnet.princeton.edu/

BIOGRAPHICAL SKETCH

Prithvi Raj was born in Chennai, India. He attended Crescent Engineering College,

Chennai and graduated with a bachelor’s degree in computer science and engineering

from Anna University, Chennai in 2010.

He joined the Department of Computer and Information Science and Engineering

at the University of Florida in Fall 2010. His interests include crowd computing, human

computer interaction, and information visualization.

72

c 2012 Prithvi Raj Venkat Raj - University of...

Documents

Transcript of c 2012 Prithvi Raj Venkat Raj - University of...