c 2012 Prithvi Raj Venkat Raj - University of...
Transcript of c 2012 Prithvi Raj Venkat Raj - University of...
PALANTIR: CROWDSOURCED NEWSIFICATION USING TWITTER
By
PRITHVI RAJ VENKAT RAJ
A THESIS PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCE
UNIVERSITY OF FLORIDA
2012
c! 2012 Prithvi Raj Venkat Raj
2
To my family
3
ACKNOWLEDGMENTS
I would like to convey my sincere gratitude to my advisor, Dr. Helal, for his excellent
counsel, support, and encouragement in pursuing research in this exciting field.
I would also like to thank Dr. Thai and Dr. Xia for serving on my supervisory
committee.
At the same time, I wish to thank the members of the online forum Turker Nation
who provided me valuable feedback on some aspects of my evaluation methodology,
and workers on Amazon Mechanical Turk without whose labor I couldn’t have actual
human generated results.
I would especially like to thank my family for their persistent encouragement and
belief in me.
4
TABLE OF CONTENTSpage
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
CHAPTER
1 OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.1.1 Motivation and Current Problems . . . . . . . . . . . . . . . . . . . 101.1.2 Thesis Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Commercial Products on Twitter . . . . . . . . . . . . . . . . . . . . . . . 132.1.1 Storify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.2 Vibe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.3 Dataminr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Commercial Products Supporting Citizen Journalism . . . . . . . . . . . . 142.2.1 Wikinews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.2 CNN iReport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Overview of Related Academic Research . . . . . . . . . . . . . . . . . . 162.4 Microblogging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.2 Providers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.3 Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.3.2 Research on Twitter . . . . . . . . . . . . . . . . . . . . . 18
3 OVERALL APPROACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 A Brief Incursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 An Outline of Palantir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1 Palantir Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2 Tweet Tagging and Annotation Services . . . . . . . . . . . . . . . . . . . 234.3 Tags in Palantir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.1 How Users Tag Tweets . . . . . . . . . . . . . . . . . . . . . . . . . 26
5
4.3.2 Tag Recommender . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3.2.1 Text Transformation . . . . . . . . . . . . . . . . . . . . . 284.3.2.2 Geographic Location . . . . . . . . . . . . . . . . . . . . 284.3.2.3 Tweetopic . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.3.2.4 Tag co-occurrence and Prefix matching . . . . . . . . . . 304.3.2.5 Tag Ranking . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.5 What Palantir Uses Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5.1 Create an User Interest Profile . . . . . . . . . . . . . . . . . . . . 334.5.2 Searching, Topic Following . . . . . . . . . . . . . . . . . . . . . . . 344.5.3 Tag Consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 EXPERIMENTATION AND EVALUATION . . . . . . . . . . . . . . . . . . . . . 35
5.1 Crowdsourcing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.1.1 Amazon Mechanical Turk (AMT) . . . . . . . . . . . . . . . . . . . 35
5.1.1.1 Basic Terminology . . . . . . . . . . . . . . . . . . . . . . 365.1.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.2.1 Experiment 1: Palantir Baseline . . . . . . . . . . . . . . . . . . . . 41
5.2.1.1 Experimental Data . . . . . . . . . . . . . . . . . . . . . . 425.2.1.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.2 Experiment 2: Unguided Human Baseline . . . . . . . . . . . . . . 435.2.2.1 Experimental Results . . . . . . . . . . . . . . . . . . . . 435.2.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.3 Experiment 3: Heated Palantir . . . . . . . . . . . . . . . . . . . . . 475.2.3.1 Experimental Results . . . . . . . . . . . . . . . . . . . . 485.2.3.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.4 Experiment 4:AMT Synonym Detection . . . . . . . . . . . . . . . . 525.2.4.1 Experimental Results . . . . . . . . . . . . . . . . . . . . 535.2.4.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2.5 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . 556.0.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.0.7 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.0.7.1 Content Syndication . . . . . . . . . . . . . . . . . . . . . 556.0.7.2 Survey Creation . . . . . . . . . . . . . . . . . . . . . . . 56
APPENDIX: WORDLISTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6
LIST OF TABLESTable page
A-1 Filter Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7
LIST OF FIGURESFigure page
2-1 Storify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4-1 Palantir Usage patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4-2 Palantir Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4-3 Palantir Tag Recommender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4-4 Tweetopic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4-5 Tweet Entry Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4-6 Tag Suggestion Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5-1 Experiment 1: Palantir baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5-2 Experiment 1: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5-3 Experiment 1: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5-4 Experiment 2: Tweets tagged using only using AMT . . . . . . . . . . . . . . . 45
5-5 Experiment 2: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5-6 Experiment 2: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5-7 Experiment 3: Tweets tags recommended by Palantir and validated by AMT . . 48
5-8 Experiment 3: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5-9 Experiment 3: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5-10 Experiment 3: Histograms showing variation in usefulness . . . . . . . . . . . . 50
5-11 Experiment 4: Synonym detection on AMT . . . . . . . . . . . . . . . . . . . . 52
5-12 Experiment 4: Similar Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8
Abstract of Thesis Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Master of Science
PALANTIR: CROWDSOURCED NEWSIFICATION USING TWITTER
By
Prithvi Raj Venkat Raj
December 2012
Chair: Abdelsalam (Sumi) HelalMajor: Computer Engineering
People today generate and consume several exabytes of content. A significant
portion of this is on social media and microblogging sites like Twitter. The popularity of
these services encourages people to share real world developments and experiences
online. The value of such sharing is very apparent during times of duress, when people
take to posting important news on social media. While this enlightens the outside world,
it lacks coherence and clarity that might make a bigger impact. In addition, there is no
obvious way of communicating back to these people, to blend, assimilate and stimulate
the flow of information.
In this work, we develop and evaluate a colloboration system, Palantir, which is
desgined for people who glean information from Twitter during an event, and consolidate
the information into stories, which allows them to capture a snapshot of how things were
at the time of the event.
Palantir is designed so that people can easily annotate Tweets from mobile clients,
which uses a web client to track Tweets, and finally consolidate these Tweets into stories
which can later be published.
9
CHAPTER 1OVERVIEW
1.1 Introduction
1.1.1 Motivation and Current Problems
People have been consuming news on Newspapers since the 16th century. In
2011, 46% of Americans turned to the Internet for news at least 3 times a week, as
opposed to the 40% of people who got their news on newspapers [1]. The study also
finds that 84% of Americans own mobile devices, 47% of whom consume news on
these devices. While these numbers show that many people today are moving towards
digital news sources, it does not take into account news that finds people by means
of social networks. News of almost every major earthquake in the last three years
broke out on Twitter before catching the attention of mainstream media. An article [2]
by Megan Garber, Neiman labs comments that most mainstream news organizations
that are on the social media bandwagon, use Twitter as a glorified RSS feed. A PEJ
study shows that less than 2% of Tweets by 13 of America’s popular news organizations
used Twitter as a conversational medium to gather information from people. Oftentimes,
transient events of local importance are Tweeted promptly long before mainstream
media is aware. In her article [3], Gina Chen highlights how people supported each
other by publishing alternative routes that could be taken to avoid a multi-car pile up.
In cases like these, there is much value to be had in getting realtime information that is
immediately useful. Twitter also played a pivotal role in Iran election protests of 2009,
Egypt revolution of 2011, and currently, the Occupy movement. Twitter, however useful,
is more a transient medium. As Tweets age, they slip out of context, and what was once
important now rapidly vanish. We believe that Tweets, a broadly adopted medium, can
be leveraged to provide a far more powerful experience by sparking news tidbits and
circulating timely news. We imagine that contextual snapshots of Tweets would have
potency to stay relevant long after an event has elapsed.
10
1.1.2 Thesis Objective
We describe a collaborative tool, Palantir, designed for people to glean information
from Twitter during an event, and enable them to consolidate Tweets into stories as and
when they happen. The fundamental idea of Palantir is tapping into the social network
effect of mobile Twitter users who are intimately following development of a topic. Such
users, if given the appropriate tools can naturally channel their passion and energy to
write a more organized view of lower level tweets. Palantir makes organizing tweets by
tagging simpler by providing tag recommendations which are influenced by both content,
and geospatial context. These tags are exploited to create profiles for individual users
of this system, who form a crowd that can be categorized and contacted for information.
Palantir allows for querying relevant portions of this crowd and aggregating their results.
These aspects make Palantir a valuable tool which can be used for realtime remote
reporting. By following topics coded by tags, filling in missing data, and publishing
consumable story lines, people are enabled to productively create content based on real
time information from Tweets, and share them with others. Fundamental to the working
of Palantir is the concept of recommending tags, which are text annotations applied to
a microblogging post. While there are studies about tagging content, none focus on
humans tagging short texts so that it is more discoverable for others. We test whether
Sen et al’s findings [4] could be applied to the domain of microblogs.In this thesis, we
concern ourselves with the following
• Do tag recommendations affect the cognitive load of people applying tags ?
• Do people find value in applying tags to microblog posts ?
• Is there a faster convergence of tag vocabulary when tags are recommended ?
• Does the quality and quantity of tags improve when tags are recommended ?
1.1.3 Thesis Organization
This thesis is organized into 5 chapters. Chapter 2 provides insight about related
work in the industry, and provides detail about academic research work covered in
11
this thesis. Chapter 3 presents the overall approach taken by Palantir, while Chapter
4 provides an in-depth discussion of Palantir architecture, design considerations, and
implementation details. Chapter 5 covers validation and evaluation of results using a
crowdsource based approach. Chapter 6 concludes with thoughs about future work.
12
CHAPTER 2RELATED WORK
2.1 Commercial Products on Twitter
2.1.1 Storify
Storify [5] is a social media curation service that launched in late 2011 [6]. Users
of Storify search social networks, and select individual elements into stories. Figure 2-1
shows a typical article written with the service. Storify allows users to import social
media elements like Tweets and images into their story, and allows users to supply their
own textual content to maintain the flow of their story. Storify has been used to provide
political coverage [7], and even document meetings and workshops [8]. However, Storify
does not make finding these social media elements easier, nor does it provide a way to
collaborate over stories.
2.1.2 Vibe
Vibe [9] is a messaging application that allows users to post anonymous short
messages that are pinned down to a certain geographical region, and have an expiry
time after which they are deleted. Any user with the Vibe application can view messages
within a radius from the current location of the user. The current version on this
application allows users to set a large radius (12000 miles). Vibe differs from Twitter
and other services in that it does not require users to sign up. While the application
isn’t popular with everyday Twitter users, it was beneficial to people participating in the
wall street protests, providing them with an electronic channel of communication with
the anonymous crowd around them [10]. It is of interest to note that people are making
use of applications that pins down posts at specific locations, we feel that with a social
network like Twitter, messages, and by extension tags, should have geographical affinity.
2.1.3 Dataminr
Dataminr [11] is a real-time social media analytics engine that listens to every public
post on Twitter to mathematically determine events and micro trends. Dataminr claimed
13
Figure 2-1. Storify
to have informed clients about Osama Bin Laden’s death before it was reported by social
media outlets [12].
2.2 Commercial Products Supporting Citizen Journalism
2.2.1 Wikinews
Wikinews [13] is a collaborative journalism platform that was established in Nov
2004. An interesting facet of Wikinews is that it allows original work in addition to work
that has been sourced. Wikinews is valuable in covering news of large events affecting a
large population of people who can report about it from different viewpoints.
14
The major detraction for Wikinews is the perceived inability to preserve a neutral
point of view. Some of the more complex issues lie at the heart of what news is. i.e.,
delivering information in a timely basis, and providing that information with a captivating
narrative. Andrew Lih, a noted authority on Wikipedia, feels that is it difficult to get two
or more people to write in the same style, as evidenced by the failed project ‘A million
penguins’ by Penguin Publishing.
Wikinews has a model similar to Wikipedia where submitted articles are reviewed
by trusted users. Where it differs is that the purpose of news is to capture a snapshot
in time. If a new development takes place, a news article makes a reference to the
previous article, recaps the story, and then takes it forward. This is in strong contrast
with how Wikipedia works, where such a change would have caused someone to edit
an existing article. Another key difference is that Wikipedia sticks to an formulaic style
like the strict inverted pyramid. An article in a Wikipedia page might start like ”On 14
November, event x happened at location y, z people were affected”. Articles give an
overall view before drilling into details, and most articles on Wikipedia strictly follow such
a style.
A problem that plagues community reviewed sites is the quest for perfectionism.
This leads to instruction creep, which, in the case of Wikinews gradually caused output
of news articles to fall from 6-8 articles per day to the same number per week. As an
artifact of maintaining a snapshot of history in time, Wikinews imposes some time limit
before which an article meeting their standards is written. Many authors are unable
to get timely help meeting these requirements, and thus might not have their articles
published on the main site.
Palantir, on the other hand is designed for people who are tuned into social media
like Twitter, and have a need to quickly catalyze information flow during an ongoing
event. While a Wikinews model works well for reporting after an event, we feel that
Palantir is suited more for live reporting.
15
2.2.2 CNN iReport
iReport [14] is a tool that citizen journalists can use to submit their news articles
to CNN. iReport is similar to Wikinews in that it allows people to submit photos, videos
or articles, but it does allow allow for collaboration as Wikinews does. When a story is
posted to iReport, it immediately appears on the CNN iReport site. CNN has a staff of
reports who comb through these stories for one that are interesting and can be used
on their main site. These selected stories are then verified, and used. Verified stories
are badged saying ”Vetted by CNN”. People submit stories to CNN, because there is a
possibility that their story might appear on the main site, and in 2011, CNN recognized
good contributions to iReport by holding the first iReport Awards, further fostering
community effort.
2.3 Overview of Related Academic Research
We review work done on microblogging, tagging systems, topic detection algorithms,
temporal exploration interfaces and crowdsourcing marketplaces.Twitter, a microblogging
service, is growing at a rapid pace, and is spurring research. Ehrlich and Shami [15]
found microblogs being used as real-time information sources, but highlighted concerns
about the volume of data, noise and relevancy issues. We draw on research on human
guided unstructured tagging systems, folksonomies, by Adam Mathes [16], Marieke Guy
and Emma Tonkin [17]. The advantage folksonomies have over formal classification
systems is that the terms used in formal tagging systems may be imprecise. Adam
Mathes’ [16] paper contains good discussion about how tags are distributed, and allays
fears of single use tags dominating others. Sen et al. [4], provide a good treatment
on how recommending tags to people affects the tags that they choose for tagging.
Palantir’s tag recommendation system is based on the TweeTopic Algorithm that has
been described by Bernstein et al. in [18]. This algorithm makes use of a online Internet
search engine to assign topics to Tweets or other short pieces of text.
16
In Section 2.4, we provide a discussion on microblogging, with a focus on Twitter.
Crowdsourcing systems are covered in Subsubsection 5.1.1.2.
2.4 Microblogging
2.4.1 Overview
Microblogging is an emerging form of broadcast communication that has been
gaining popularity over the past few years. Microblogging services allow users to post
short content online [19, 20]. A majority of this content is text, but users also link to other
types of content like audio, video, images, or other web resources.
2.4.2 Providers
Microblogging services are provided by many organizations like Twitter, Plurk,
Yammer, Jaiku, Pownce, Social.Net, App.Net, etc. In this work, we will be concentrating
on Twitter, which was among the first services to launch in May 2006, and had 100
million active users in September 2011 [21], and is believed to have more than 500
million registered users as of April 2012 [22].
2.4.3 Twitter
2.4.3.1 Overview
While Twitter is a microblogging site, it has some social networking semantics.
Twitter supports and encourages the actions below.
Following The act of a user subscribing to the updates of another user
Followers The subscribers for a specific user
@reply A user can use @username to mention other users in a post. Others can view
this if the account of the poster is public
Direct Message (DM) A private message that can be sent to a follower
The social network model of Twitter is different from many other social networks.
Specifically, in Twitter relations between users are directed. A user can follow another
without requiring the other user to follow him/her back. It has been noted that only 22%
of follows are mutual [23]. The Twitter feed of a given user contains Tweets (Messages
17
posted on Twitter) from all users that the current user follows, arranged in a reverse
chronological manner.
Twitter allows short 140 character messages to be posted using it’s service, which
can be seen immediately by other people on the service. This allows for Twitter to be
a realtime stream of information. The realtime nature of Twitter has been exploited to
spread news. Twitter has been the preferred medium on communication in the Arab
Spring [24], the Iran elections [25], and Occupy Wall Street Movement. In certain events
like the US Airways Flight 1549 crashing into the Hudson river [26], the death of pop idol
Michael Jackson [27], the terrorist attacks in Mumbai [28], the death of terrorist Osama
Bin Laden [29] and for every earthquake in the past 3 years [30], Twitter provided
news quicker than other news media. As such, Twitter acts as a source of information,
allowing users to discover new and interesting content on the Internet.
Although Twitter is a good source of information, there isn’t an easy way to organize
Tweets, or retrieve Tweets corresponding to a topic of interest. While Twitter introduced
hashtags, which are word tags in the body of the Tweet having a syntax of [word], e.g.
OWS (Occupy Wall Street) they are inline with the Tweet, usually contained at the end
and eat into the 140 character limit, forcing people to use tags that are already short,
and limiting the number of tags used per Tweet. In a study using a sampling of Tweets
from 2009, researchers from Microsoft [31] found that only 5% of Tweets contain a
hashtag. In addition, not all Tweets have hashtags that are pertinent to the content in
the Tweet. Currently, there isn’t a way to retrieve all the Tweets by a specific user on a
specific topic.
2.4.3.2 Research on Twitter
Kwak et al. [23] crawl the Twitter network as on 2009, study network structure,
determine influential users, information diffusion and conclude that Twitter takes after
an information sharing network, rather than a social network. Of particular interest to
us is the quantitative comparison between CNN Headline News and trending topics on
18
Twitter. It was found that though CNN Headlines had better coverage, news of a live
broadcasting nature broke out first on Twitter. [32] study the intent of Twitter users, which
are daily chatter, conversations, sharing information/news etc. Andre et al. [33] analysed
the contents of over 43,000 Tweets on Twitter to find that 36% of the rated tweets are
worth reading, 25% are not, and 39% are middling. Thought this is just a sampling of
Tweets, it shows that there is significant room for improvement which can be achieved
by presenting to users Tweets that have connect that they value. In fact, the authors
argue that taking a social intervention approach by informing users about content value,
audience reaction and emerging norms while leaving the user in control has potential to
improve the microblogging experience. While this seems like leaving the human value of
the equation, i.e. one can’t speak to people solely about things the listener is interested
in, we feel that such an approach might be needed for media like Twitter where some
researchers [34] are of the opinion that about 40% of Tweets might be pointless babble.
A contributing factor to the large body of research done in the past few years
was the openness of the Twitter platform. Twitter supported academia by allowing
unrestricted access to Tweets, and follower/followee information. Recently (September,
2012), Twitter has changed it’s business model to showing advertisements in the Tweet
Stream, and is now preventing people from the same level of access they had before.
This article [35] describes how these changes affect research on Twitter.
Higashinaka et al. [36] show that the majority of conversations on Twitter are
composed of just two tweets and that this is sufficient to model conversation. Researchers
from Yahoo [37] find that Twitter is a very homophilious network, where information
diffusion occurs primarily in the same community that the information originated in.
A line of research on Twitter focusses on finding the most influential people on the
network.
19
CHAPTER 3OVERALL APPROACH
3.1 A Brief Incursion
Online communities are larger today, than at any point in history. Services such
as Twitter, Jaiku, Facebook, Tumblr, Reddit, etc; have fostered communites passionate
about diverse topics. However, every online community suffers from the 1% rule [38],
which states that only 1% of the user base actually create content, 9% edit content,
and the remaing 90% of a virtual community only consume content. Bill and Mikolaj
[39] randomly sampled the Tweets of 300,000 users in 2009 to find that 10% of profilic
users create 90% of the Tweets. While this emphirical results concerning pariticipation
inequality on the Internet seem extreme, we are familiar with the Pareto principle, or the
law of the vital few, which states that, for many events, roughly 80% of the effects come
from 20% of the causes [40].
Another influence for the design of Palantir is the existence and the rise of citizen
journalism. Tools like CNN iReport, Fox uReport, etc; allow citizen journalists to write
their own articles. In the case of CNN iReport, these articles have their own separate
section easily accesible from the main navigation bar on CNN’s website, where users
can read through all such contributed articles. It is important to note that iReport allows
anonymous articles, and posts all articles posted by every user on to their website,
some of which are subsequently verified by CNN reporters, and their website amended
to include this information. While CNN iReport has some guidelines that advice people
to post articles that are known to be true, Fox uReport provides no such guidance, and
does not have a concept of verifying postings. It however includes a section titled editor
picks. A common note among these plaforms is the inablility to mobilize people into
creating information that one needs for a report in a timely manner, which we feel is a
significant drawback—one that could have the most impact if solved.
20
With Palantir, our goal is to stimulate people online to create timely content that
could be useful, depreciating the skew between content producers and consumers.
The basis of this idea is to find sufficiently interested people who are motivated about
creating content that is relevant to issues on hand and catalyzing them into action. It is
designed as the evolution of citizen journalism and reporting from a mostly one sided
affair to dialog.
3.2 An Outline of Palantir
To enable Palantir, we draw into work by Maslow [41] and tap into humans’ natural
need for esteem and recognition. Palantir is a collaboration platform, where users
post about topics they are interested in, while explicity mentioning topics that their
post mentions. The selection of topic(s) for the post is guided by Palantir to ensure
that existing tags that are similar are considered first by the user, promoting reuse and
borrowing over reinvention of tags. Users are also allowed to apply tags to posts that
aren’t their own.
The use of tags by a user determines topics that the user is interested in posting
about. We use this information to form communities which can then be used to create
flow of information on a particular topic. This is done by having people ask questions
which are then posted to the relevant interested communities.
3.3 Challenges
At the heart of Palantir are the methods used to recommened topics to users. Sen
et. al [4, 42] show that tag recommendations significantly influence the tags that users
choose. As Palantir runs on Twitter, we face the following complexities
• Modest character limit for posts. Twitter posts (Tweets), are limited to 140characters, and thus have only about 15 words [39]. Traditional topic modellingalgorithms have been designed to work well with large text corpuses. An algorithmlike Latent Dirichlet Allocation (LDA) [43], requires as input a text corpus andnumber of topics to mine. Algorithms like LDA rely on inferring word distributionsin a document to split into topics. When documents contain a small number ofwords, the resulting performance is poor. In addition to this, the number of topics isassumed to be known a priori.
21
• Variances in language. Twitter users use creative shortened words to convey whatthey mean within the 140 character limit. Many of these words may be nouns, nothaving a canonical form, thus creating problems where for a same intent, differentwords are used.
• Twitter API. While Twitter was conducive towards developers when they started(circa 2006), they did not have a clear monetization strategy, and were relying oninvestor money for their expenses, while spending time and resources to builda great product. However, going forward, Twitter feels that their biggest assetis the data that the have on the system and the eyeballs of the people usingTwitter. As of 2012, Twitter monetizes on this data stream by providing bulk rawaccess to Tweets to companies as a pricing that isn’t affordable for most individualdevelopers. Twitter severely restricts API use and added new terms of servicewhich prevent sharing collections of Tweets. This year, 2012, Twitter have addedfurther restrictions on the API, which would make scholarly research on Twittereven more difficult. The difficultly faced is the unpredictability of the licencingterms and the pace at which they are revised. Unfortunately, Managing risk in alandscape of vacillating and unpredictable licensing poses immense challenges inany research depending on third party services and data.
22
CHAPTER 4ARCHITECTURE
4.1 Palantir Architecture
Figure 4-1 shows typical uses of Palantir. Palantir is an abstraction which allows
authors to make use of information available on Twitter, while enabling them to turn a
trickle of information into a gush by asking questions. People interact with Palantir in the
following ways
1. Submit Tweets and Tags: Using the UI, people can submit Tweets along with tagspertinent to those Tweets. A recommender system aids Palantir users to selectgood tags based on the topic they are contributing to.
2. Search and Follow tags: As Tweets are organized by tags, people search forTweets they like by following tags that are interesting to them.
3. Correct, Consolidate and Vote: People see the tags that others have applied, anddepending on whether they agree with it or not, they vote for or against the tag.Palantir users may also consolidate similar tags and Tweets into a bundle.
4. Write articles using permanent references: A set of Tweets might containsubstantial information that are coherent together, and while identifying thisPalantir users use them to form and contribute articles based on them, addingvalue to these ordinary Tweets.
5. Survey Responses: Sometimes, there might not be enough information on aparticular tag, and people could create surveys to gather data. Replies to theserequests are composed by Palantir users on mobile devices.
Tough the Palantir architecture provides for several broad and useful interactions,
the scope of this thesis is limited to the design and evaluation of a tag recommender
system, and implementation that allows for sumbitting Tweets and tags, and searching
and following tags.
4.2 Tweet Tagging and Annotation Services
As mentioned earlier, Andre et al. [33] analyzed over 43,000 volunteer rated Tweets
to find that nearly 36% of the rated Tweets are worth reading, 25% are not, and 39% are
middling. Though this is just a sampling of Tweets, it shows that there is significant room
for improvement which can be achieved by presenting to users Tweets that have content
23
Palantir
1:Submit Tweet
and tags
Tag db
2: C
orre
ct, C
onso
lidat
e
and
Vote
on
Tags
, Twe
ets 3: Respond to surveys
5: Write articles Get permanent
references4: Search and Followtags
Figure 4-1. Palantir Usage patterns
they value. Other studies [34] mention that up to 40% of Tweets might just be pointless
babble. Palantir relies on Tweets having reasonably accurate tags to support selective
exploration and topic following. Palantir also exploits these tags to find pertinent users
for surveys and polls. As of 2011, users of Twitter produce on average 1620 Tweets
per second, with a record high of 6939 Tweets per second. Each of these Tweets
are just 140 characters, and therefore techniques like Latent Dirichlet Allocation [43],
Probabilistic Latent Semantic Analysis [44] are difficult to apply successfully. Palantir
partially shifts the onus of determining the topics mentioned on the Tweet, by having
users tag Tweets when they post it. Users are aided by topic recommendations provided
by Palantir which ultimately shapes up the vocabulary used in the Tag set.
24
Tag Rec.Handler
Tag DB
Article DB
User DB
Tweet id
DB
Cac
he
Rep
licat
ion
Palantir Community
AlgorithmsTag RecommenderUser Class RecommenderTag Co-occuranceGeographic Grid Conversionetc.
Future Plugins
Aggregate and Normalize results
Appl
icat
ion
Laye
r Newsification
Survey Operations
Bundling Tweets
(Articles)
Authenticate Twitter OAuth
User Validation
Explore Articles Tags,
Tweets
Palantir User
Figure 4-2. Palantir Architecture
25
4.3 Tags in Palantir
We use tags to mark Tweets so that other users may retrieve them easily when
they are searching for a specific topic. In addition, tags allows users and their circles
to create their own niche tags which could aid collaboration further. It may be argued
that Twitter hashtags serve the same purpose, but they are not as effective when we
do not know what exactly a topic belongs to and want to add multiple tags, in which
case, hashtags eat into content length. We encourage users to enter tags that are
already in the system using prefix matching, or tags that are suggested by the Tweetopic
algorithm. The rationale behind this is to reduce the number of tags with different
spelling variations.
4.3.1 How Users Tag Tweets
The fundamental contribution of Palantir is the idea of harnessing the activity of
people submitting and reading Tweets to allocate topics to Tweets, called tags. Palantir
allows users to tag their own Tweets, or Tweets that have been posted by others.
Palantir also allows people to have niche tags which can only be found by knowing the
name of the tag beforehand. These niche tags would not be suggested by the system,
and may be considered private to a user, or group of users who know the tag.
4.3.2 Tag Recommender
The motivation for providing recommendations is derived from Sen et. al’s [4] work
that shows that when tags are recommended to people, they tend to select factual
tags as opposed to personal or subjective tags.This approach also optimizes user
experience on a mobile device in that the user does not have to type out tags when
they are recommended correctly. In addition, it also reduces the number of single use
tags, tags that are misspelled, or have different punctuation, and the quality of tags are
maintained.
The Tag Recommender system operates when the user is inputting data into
Palantir, and analyzes partial user input to produce candidate tags. Figure 4-3 outlines
26
Partial Tweet, potential user entered tags
Tag Ranking
Geographic Binning
Tweetopic Cooccurance
Text Transformation
WeightedRecommendataions
tags
tags
tags,location
tweet tag
Figure 4-3. Palantir Tag Recommender
27
the working of this system. Palantir uses three distinct mechanisms, Geographic
Binning, Tweetopic, and Tag Cooccurance to generate candidate tags, which are ranked
to produce a weighted recommendation list
4.3.2.1 Text Transformation
We note that search engines give better results when short specific words which
characterize pages that we are searching for are given as the search query. To achieve
this we remove Twitter specific idiosyncrasies like RT and @username replies. We then
use a maximum entropy part of speech tagger[45] to identify all the noun phrases in the
Tweet.
4.3.2.2 Geographic Location
The intuition behind binning is that we can rewrite noisy latitude and longitude
coordinates using a grid system, which allows for a simpler search for nearby neighbors.
Palantir collects Tweets and tags submitted by the users along with their physical
coordinates. Each Tweet can have a location associated with it, and can be tagged with
multiple tags. We define tag location as the list of locations that the Tweets utilizing this
tag have. We map the raw latitude and longitude values into the Universal Transverse
Mercator (UTM) Grid system [46], which uses a two dimensional Cartesian coordinate
system based on an ellipsoidal model of Earth to give locations on the surface of the
earth. Topics are stored, indexed by their grid identifier. This allows for retrieval of topics
that have been used in a specific grid easily. Further, such indexing makes it easier to
look at topics that have been used in grids nearby.
In Palantir the geographic binning recommender takes in a latitude and longitude,
and outputs a set of candidate tags that are nearby.
4.3.2.3 Tweetopic
This algorithm, as described in [18], Figure 4-4 uses data of the users Tweet along
with data from a search engine to provide prospective tags. In short, the algorithm
formulates queries for a search engine and mines the results.
28
TWEETOPIC[nounPhrase]
1 results " search noun phrase on Google2 if results.length < 103 then tokNouns = TOKENIZE(nounPhrase)4 for each noun in tokNouns5 do noun.result " no. of results by
searching only for that noun6 sort tokNouns based on noun.result7 resultsMax gets NIL8 resultsMin gets NIL9 while resultsMax .length < 10
10 do11 resultsMax "
search tokNouns # tokNouns.minimum12 tokNouns "
tokNouns # tokNouns.minimum13 while resultsMin.length < 1014 do15 resultsMin "
search tokNouns # tokNouns.maximum16 tokNouns "
tokNouns # tokNouns.maximum17 results " resultMax $ resultMin18 for each url in results19 do keywords " keywords % TF-IDF(url .Text, 20)20 sort keywords by number of occurrences of words21 return top 5 unique keywords
Figure 4-4. Tweetopic Algorithm
Query a Search Engine
The noun phrases obtained from the previous step are sent to a search engine. An
iterative backoff is used to adjust the query until at least 10 results are obtained.
Identify Popular Terms in the Results
TF-IDF is used to identify about 20 key words per page. Key terms that occur more
than 5 times are considered as valid descriptors for the Tweet.
29
The Tweetopic algorithm does not learn directly from the tags entered into the system,
and cannot determine tags that are already in the system. Palantir uses prefix matching
to display preexisting tags when the user types in a tag not shown by Tweetopic.
An advantage of using Tweetopic is that the system does not have a cold start, and
provides tag recommendations even for Tweets which do not yet have corresponding
tags in the system.
4.3.2.4 Tag co-occurrence and Prefix matching
Tag co-occurrence grades pairs of tags based on their association with each other.
We assume that tags are similar and share traits if they occur with in similar context, i.e.
they are used to describe the same Tweet.
Each tag t, is numbered, and is represented as a sparse co-occurrence vector in
multidimensional space w = (f1, f2, ... , fN), where fi indicates how often w occurs with
ti . The similarity of two tags is measured by the proximity of the vectors. We use the
measure of cosine similarity to measure proximity which is given by
cos (#&x ,#&y ) =!ni=1 xiyi"!n
i=1 x2"!n
i=1 y2
The top n similar tags are chosen to be presented to the user.
Prefix matching works by considering the tag entered by the user as a prefix, and
fetching all tags already in the database which have the same prefix.
4.3.2.5 Tag Ranking
We adapt the notion that same tags coming from different mechanisms are more
important. To place these tags first, we sort the set of tags that have been identified from
all different modules on the frequency of occurrence, and then discard duplicates. Tags
with frequencies greater than one are presented in decreasing order of frequencies. For
tags that have a frequency of one, we compare them with the set of tags the user has
used before, and present those first, followed by the remaining tags, ordered by their
global frequency counts.
30
Figure 4-5. Tweet Entry Screen
31
Figure 4-6. Tag Suggestion Screen
32
4.4 User Interface
Tweets posted through Palantir are tagged by users aided by the tag recommender.
When users compose Tweets on the mobile client, tags are computed periodically after
the input of n words, and are then displayed on the screen, allowing the user to select
some, or enter a new ones. If the system is able to recommend more than 5 tags, users
could get to the next set of results by flicking the result bar. To enter a tag that is not
recommended, the user clicks on “Add New”, and is allowed to enter a custom tag.
When users are in the process of entering custom tags, they are shown tags that are
matched by prefix, to further enable them to pick a tag that is already in the system.
However, they are not restricted to any vocabulary, and are free to complete their tags in
any way that they see fit.
Palantir provides immediate feedback to users tagging Tweets by allowing them to
view Tweets described by similar tags, and providing them an opportunity to retroactively
change the tags that they selected for the Tweet they submitted.
Palantir allows people to browse others’ Tweets possibly constrained by a
geographic location or tags. While doing so, users are allowed to add or remove tags
from Tweets. This information is stored to allow Palantir to determine the membership of
a tag to a particular Tweet, which is the ratio of users who don’t remove the tag.
4.5 What Palantir Uses Tags
4.5.1 Create an User Interest Profile
For each user, we compute the top k most frequently used tags. This forms the
feature set for a particular user, and lets call it frequentSet. We incorporate feedback
given by other users (by virtue of removing tags) by computing the percentage of tags
that were removed by other users of the system, and a top-k list of these tags, called
disputedSet. These terms are recomputed whenever the user makes a post.In particular,
the set difference of frequentSet and DisputedSet gives us an idea of the topics posted
33
by this user that is accepted by other users, called acceptedSet, is the computer user
profile which is used to determine which polls are to target this user.
4.5.2 Searching, Topic Following
Using the web interface, users would follow Tweets by searching for topics that
interest them. This would be updated whenever people post new Tweets. This interface
allows users to reference Tweets and consolidate them into stories. The system makes
use of tag co-occurrence to show Tweets from related topics as well.
4.5.3 Tag Consolidation
When the number of unique tags in the system reach a specified size, we compute
the semantic similarity of words in the tag. Groups of words are presented to users
when they search for topics in the web application, and users are allowed to give input
on whether these words should be merged. When there is strong agreement between
users, the tags are merged and saved.
34
CHAPTER 5EXPERIMENTATION AND EVALUATION
Palantir relies on user input for its tasks, namely tag recommendation, polling
and fact consolidation. One method of doing this is by publicizing Palantir, and finding
a group of volunteers willing to create data that we could use for evaluation. This is
a tough undertaking, mainly in the way of recruiting volunteers, and iterating over
algorithms and experiments.
We believe that a labor market for micro-tasks, like Amazon Mechanical Turk (AMT),
is well suited for our experiments. Subsubsection 5.1.1.2 explores crowdsourcing with
a focus on AMT, existing research that makes use of that platform, our reasons for
choosing AMT and finally the experiments we run on AMT, and discussion about results.
5.1 Crowdsourcing
Crowdsourcing, a term coined by Jeff Howe in 2006, was described by him as
”the process by which the power of many can be leveraged to accomplish feats that
were once the province of a specialized few” [47]. Howe used examples of people
contributing to Wikipedia, uploading videos to YouTube, to demonstrate the concept.
Today, crowdsourcing is more popular when referring to online services where
publishers post tasks that are completed by a group of people to fulfill requirements
of the publisher. The tasks are diverse, and marketplaces like Amazon Mechanical Turk
[48] have abundant workers to ensure timely completion of tasks.
5.1.1 Amazon Mechanical Turk (AMT)
Amazon Mechanical Turk (AMT) [48] is an online labor marketplace that focuses on
assisting developers in using human input for their programs. Typically, the human input
required is for simple tasks that cannot yet be done algorithmically using computers
while being cost and time efficient. Some of these problems are easily solved by
humans after minimal training. An example of such a microtask might be to look at two
photos of a single person taken under different conditions to determine whether they
35
are the same person. AMT uses the tag line Artificial Artificial Intelligence to brand their
service, stemming from the fact that AMT could be used to perform tasks that Artificial
Intelligence cannot.
5.1.1.1 Basic Terminology
Workers
Workers are humans who select and complete one or multiple microtasks on
AMT. Workers are paid with rewards that are deposited into Amazon Payments
accounts, and may be withdrawn as cash. The reward or wage is determined
individually for each task group by requesters, and is subject to approval by
requesters.
Requesters
Requesters are people who publish tasks which are to be completed by workers.
The requester designs the tasks and the steps needed for its completion, decides
the reward for a task, and acceptance criteria for the task. Requesters may also
limit visibility of tasks depending on qualification of workers and range of previous
acceptance ratio of workers.
HIT
A Human Intelligence Task (HIT) is a job posted on AMT. Tasks that are easy and
well defined produce good results. Tasks range from selecting good pictures of
storefronts, identifying addresses, etc., to writing product description, etc.
HIT Type
The HIT Type refers to the characteristics of the HIT, viz. title of the HIT, the
requester who created the HIT, reward being offered, number of HITs of this
type, time for completion of HIT, auto-approval time after which workers get paid
automatically, qualifications of workers who can accept the HIT, and HIT expiry
time.
HIT Group
36
A HIT Group comprises of HITs of the same type. This is to enable workers to
easily find similar HITs. Workers prefer HIT groups with a large number of tasks
because they do not have to retrain, and could better their skill while picking up
jobs.
Assignment
AMT supports having multiple workers working on a replica of the same HIT. Each
of these replicas is called an assignment. AMT ensures that workers can only
complete a single assignment for an HIT. This allows requesters to evaluate the
quality of submissions by looking at how other workers have completed the same
task.
Qualifications
There are requirements the worker must have to work on HITs. These qualification
can be auto-generated by AMT or can be created by requesters. Auto-generated
qualification include criteria like approval percentage of the worker, the number of
assignments completed successfully, etc.; while custom qualifications are those
that are created by requesters, and are usually time bound tasks that need to be
completed that need to be completed according to the requesters specification to
be granted. Requesters are also able to grant qualifications for workers who have
previously worked for them. Sometimes multiple qualifications may be required for
a given HIT.
Reward
Reward is the wage paid to a worker for completing a HIT. On approval of a HIT
by a requester, rewards are automatically transferred from the prepaid Amazon
payments account of the requester to the account of the worker.
Life cycle of a HIT
For a requester, the first step is to register an account with Amazon payment
services using a US based credit/debit card. The role of the requester is creating
37
HITs which are subsequently put on the AMT marketplace. A HIT may be created
using one of three ways, viz. using the Requester User Interface (RUI), using the
AMT Application Programming Interface (API) or using the AMT Command Line
Tools (CLT). Regardless of the technique used to create a HIT, the requester
needs to provide a title, reward, time for completion of HIT, assignments,
auto-approval time, qualifications, HIT expiry time. The HIT itself can be hosted
on Amazon Web Services (AWS) or hosted externally. For a HIT submitted using
RUI and hosted on AWS, the requester provides a HIT template with placeholders
for data, which are then filled from a data file before being sent out to workers.
Requesters need to prepay the maximum amount which can be consumed by
their HIT groups before they appear in the market place. The minimum reward
for a HIT is $0.01, with Amazon charging a 10% commission of all payments
made using it’s platform. The minimum commission charged is $0.005. Once
HITs have been posted, they appear on the worker interface, which is by default
sorted by the most recent time a HIT has been posted/update. Workers then
select one or multiple jobs are of interest, and complete the tasks. When the
worker is done with one task in a HIT group, he/she is given the option to complete
another task with the same HIT type. One way to control this is to have a HIT
group with a number of assignments, where each worker is limited to submitting
a single assignment. Once HITs have been completed by workers, requesters
can review submissions and approve or deny payments. At this point, requesters
may optionally present a bonus to a worker, or set of workers. For workers on
AMT, the ratio of submissions to that of approvals is calculated and displayed
along with their profiles. Requesters have the option of restricting HITs to only
workers meeting a certain ratio. To deal with workers who consistent unsatisfactory
performance, AMT allows requesters to block workers. AMT does not provide any
such metrics for requesters, leaving workers without a way to rate requesters.
38
Community forums like Turker Nation [49] and Turkopticon [50] step in to provide
ratings and guidance about requesters. In general workers prefer to work with
requesters who have well defined HITs with clear acceptance criteria, and prompt
approval and payment practices. If a requester has ambiguous practices that hurt
the interests of workers, they may be blacklisted on a forum like Turker Nation
[49], causing them a lot of difficulty in getting work done using AMT. A HIT stops
appearing on the worker interface when either all HITs posted in that group has
completed, or if the time assigned for the expiry of the HIT is past. The expiry
time, and other parameters could be changed by a requester when the HIT is still
running, which causes the HIT to be updated and listed at the top of the worker
interface.
5.1.1.2 Related Work
Crowdsourcing is a relatively new concept that researchers have been toying with.
In 2004, Luis von Ahn and Laura Dabbish came up with the ESP Game [51], which
made use of crowd wisdom to annotate images. They estimated that 5000 people
playing the game for 31 days would assign labels to all images indexed by Google.
In a novel attempt, Michael Denton [52] used crowdsourcing to create art on a public
sidewalk. More traditional uses of crowdsourcing are data collection and sensing.
Apple collects wifi signal strength along with GPS coordinates to make their mapping
services accurate. Waze collects velocity information to provide realtime traffic data
to motorists. Google enriches their maps with user submitted photos pinned down to
specific coordinates.
Yahoo Researchers Mason and Suri [53] delve deeply into using Amazon
Mechanical Turk for behavioral research. The examine a plethora of work pertaining
to comparing AMT to traditional offline tests, and other tests administered online, and
summarize results. They notice that results from well designed studies on AMT are
qualitatively and quantitatively the same as those conducted in lab settings. Worker
39
demographics given by Suri and Watts [54] show that the average age of a worker is
30 years old, with 55% being female, and 45% being male, with majority of turkers from
USA and India. They also detail their experience conducting synchronous experiments
in AMT using waiting rooms.
AMT has been put through a variety of creative uses by developers and researchers.
Bernstein et. al [55] rely on TurKit’s [56] AMT algorithms to iteratively refine text inside
Microsoft Word. Soylent provides human powered text shortening, proofing, and an
interface for requesting arbitrary word processing tasks. CrowdDb [57], presents an
open world database model where queries for missing information is transparently
converted into AMT HITs, and crowd results aggregated into the results presented to the
user. VizWiz [58] empowers people with limited visual acuity to perform visual search
by harnessing cognitive skills of humans on AMT. VizWiz on a mobile device allows
the user to snap a picture, verbally ask a question about it, and get a human to reply in
minutes.
5.2 Experiments
This section presents results from experiments run with Palantir on AMT using data
from Twitter. Using the Twitter Streaming API, we collected about 270,000 Tweets from
Twitter during the second presidential debate between 9:00PM to 10.30PM EST October
16, 2012; which forms our base dataset. We found that a significant percent of these
Tweets were reTweets, which were eliminated. To ensure that we captured only English
Tweets related to politics, we use the word list provided in Table A-1. We then constrain
the Tweets to those that originated from the state of New York. This process reduced
the number of Tweets to 245, which were used for running 455 HITs on Mechanical
Turk between October 16 to October 24, 2012. We designed experiments to mimic
ways in which Palantir could be used, and evaluate user behavior when using Palantir’s
keystone, it’s tag recommendation system. We also vary parameters specific to AMT,
viz. price and jobs per HIT, and report on the time taken to complete tasks, and the
40
Tweet Corpus Palantir Tag Recommender
Annotated Tweet Corpus A (ATC-A)
A Stage 1
Annotated Tweet Corpus A (ATC-A)
Annotated Tweet Corpus B (ATC-B)AMT
B Stage 2
Figure 5-1. Experiment 1: Palantir baseline
quality of responses. Workers who had an approval rating of at least 98% and could
demonstrate a basic understanding of Twitter concepts and terminology were chosen for
this task.
We note that we are using the same Tweet data corpus for different experiments so
that comparisons may be drawn. However, the functioning of AMT cannot be stringently
controlled, and results derived using AMT are dependent on properties of AMT at the
time of the experiments. Specifically, the portion of crowd doing experiments determine
results. This might cause variances in the results every time an experiment is run with
a different portion of the crowd. In addition, design of the specific tasks and assignment
of HITs significantly affect results. Despite this, we believe that we can get some
interesting insights from these experiments.
5.2.1 Experiment 1: Palantir Baseline
Our initial evaluation seeks to establish cold baseline performance of Palantir’s
tag recommender system. We run this experiment to see how well Palantir’s tag
recommender works when there is no previous data in Palantir.
41
This experiment is run in two stages. In the first stage, we run Palantir’s tag
recommender on every Tweet in our corpus, and set the number of tags to 5. As there
are no existing tags in the system, the results of this experiment are solely contributed
by our implementation of the Tweetopic algorithm, as described in Figure 4-4. In stage
two, to have an idea of qualitative performance of this algorithm, results were sent to
AMT (5 jobs per HIT, 2 assignments, 25 cents per HIT), asking workers to pick out
useful tags from the ones generated by Palantir. For each Tweet, workers were asked to
input the usefulness of every tag on a 5 point scale. The work flow for this experiment
is shown in Figure 5-1. Workers had to have an approval rating of above 98% and
demonstrate adequate understanding of Twitter and tagging, as determined by a custom
qualification to take part in this experiment.
5.2.1.1 Experimental Data
A histogram depicting the distribution of Tweet scores are shown in Figure 5-2B.
We use a tag rating of 0 to signify that the tag is blank, while ratings 1 to 5 range from
’not useful’ to ’most useful’. We found that just 7.1% of tags supplied by Palantir were
marked as most useful, and 6.2% marked as useful. The tag cloud corresponding to the
338 unique tags generated by Palantir are shown in Figure 5-2A, while the popularity of
tags are shown in Figure 5-3A.
Workers took about 19 hours to complete this job, with an average completion time
per HIT being 5.9 minutes.
5.2.1.2 Analysis
We find that Palantir performs inadequately at recommending tags to users when it
it started without any data. We note that a majority of turkers strongly felt that tags were
unrelated to the Tweets presented. We posit that the rather long time to complete this
set of HITs was because we were new requesters on the AMT marketplace. Workers
are cautious about new requesters as requesters have the ability to arbitrarily reject
work done by workers, which impacts worker’s approval rating negatively. In fact, the
42
first results for this batch of HITs started coming in only after introducing the HITs, and
providing a short description of what they are used for, and how we would evaluate them
on TurkerNation [49]. Some workers are apprehensive of submitting work to requesters
who evaluate their work using majority rules, perhaps due to a perceived threat that their
work could be rejected even if they are correct.
5.2.2 Experiment 2: Unguided Human Baseline
This experiment, depicted in Figure 5-4, is used to determine baseline performance
of AMT for tagging a Tweet. Our goal is to examine the behavior of people annotating
Tweets without any guidance or recommendations. We are interested in finding out the
quantity and quality of these tags, and how useful the community perceives these tags.
This experiment runs similar to Experiment 1, but with a different first stage. As we are
benchmarking the performance of the crowd, the first stage is changed so that tags are
generated by turkers. This HIT provides a Tweet, and asks the turker to suggest up to 5
tags for it.The methodology of stage two of this experiment is identical to that of the first
experiment. We hope to measure community acceptance of tags generated by turkers.
5.2.2.1 Experimental Results
A histogram of workers rating is presented in Figure 5-5B, and the tag cloud is
shown in Figure 5-5A. We find, on average that 14.94% of tags generated by the turker
population working on the first stage was marked as most useful, and 10.16% of tags
marked useful by turkers working on the second stage. It is also interesting to note
that 19.36% of spaces for tags were left blank, indicating that fewer than five tags were
required for some Tweets. For stage one, when we paid the turkers a wage of 27 cents
per job, the batch was completed in 19 hours, with average completion time per Tweet
being 7 minutes. This batch was submitted to AMT along with the first experiment.
For the second stage, when we paid the turkers a wage of 27 cents per job, the batch
was completed in 1 hour and 40 minutes, with the average completion time per Tweet
43
A Tag cloud
0 1 2 3 4 5
Usefulness
0
500
1000
1500
2000
Num
ber
ofTa
gs
E1: Palantir baseline
B Usefulness of tags
Figure 5-2. Experiment 1: Results
44
0 50 100 150 200 250 300 350
Tags Tested
100
101
102P
opula
rity
E1: Palantir baseline
A Tag popularity
Figure 5-3. Experiment 1: Results
Tweet Corpus Annotated Tweet Corpus A (ATC-A)AMT
A Stage 1
Annotated Tweet Corpus A (ATC-A)
Annotated Tweet Corpus B (ATC-B)AMT
B Stage 2
Figure 5-4. Experiment 2: Tweets tagged using only using AMT
45
A Tag cloud
0 1 2 3 4 5
Usefulness
0
500
1000
1500
2000
Num
ber
ofTa
gs
E2: Human baseline
B Usefulness of tags
Figure 5-5. Experiment 2: Results
being 14.7 minutes. This differs significantly from the time taken by the second stage of
experiment one by +8.8 min.
5.2.2.2 Analysis
We see that for a good part, humans in the AMT community agree about the quality
of tags applied by their peers. Only 3.1% of tags were marked as not useful. There
were 303 unique tags generated by humans in this experiment, similar to what Palantir
baseline generated. However, there is a significant difference in the multiplicity of the
tags as revealed by the respective tag clouds in Figure 5-5A, and Figure 5-2A. We
46
0 50 100 150 200 250 300 350
Tags Tested
100
101
102
103
Popula
rity
E2: Human baseline
A Tag popularity
Figure 5-6. Experiment 2: Results
also compare tag popularity of these two experiments, as shown in Figure 5-3A, and
Figure 5-6A to note this difference. We see that while the number of unique tags are
similar, there are few tags which are considerably more popular than their alternatives.
The fact that these alternative tags aren’t discoverable proves unfavorable to them,
effectively burying them. The additional time of about 9 minutes taken per job was
puzzling to us until we communicated with a worker, who informed us that when workers
accept multiple jobs, the timer signifying that a job was started is triggered (even if they
did not start working on the job). This shows that relying on job completion time as a
metric for AMT experiments should be a well considered decision.
5.2.3 Experiment 3: Heated Palantir
The goal of this experiment is to measure the quality of tags produced by Palantir’s
tag recommender when it has prior data about tags that have been used. It also
47
ATC-B-EXPT1
ATC-B-EXPT2
AMTAnnotated Tweet Corpus B(ATC-D)
Annotated Tweet Corpus C(ATC-C)
Palantir Tag Recommender
Annotated Tweet Corpus C(ATC-C)
Palantir Tag DB
Tweet Corpus
Figure 5-7. Experiment 3: Tweets tags recommended by Palantir and validated by AMT
illustrates the influence of guidance, showing us the number of times people pick a
tag that exists as opposed to creating their own tags.
We use the results of the previous experiment, described in Subsection 5.2.2 to
populate data structures used by Palantir. To realistically model Palantir’s use case, we
run this experiment in a single stage, with Palantir providing tag recommendations. The
experiment is depicted in Figure 5-7. This HIT was run with 1 job per assignment, with
each job containing 5 Tweets.
5.2.3.1 Experimental Results
48
A Tag cloud
0 1 2 3 4 5
Usefulness
0
500
1000
1500
2000
Num
ber
ofTa
gs
E3: Heated palantir
B Usefulness of tags
Figure 5-8. Experiment 3: Results
49
0 50 100 150 200 250
Tags Tested
100
101
102
Popula
rity
E3: Heated Palantir
A Tag popularity
Figure 5-9. Experiment 3: Results
0 1 2 3 4 5
Usefulness
0
500
1000
1500
2000
Num
ber
ofTa
gs
A All tags
0 1 2 3 4 5
Usefulness
0
500
1000
1500
2000
Num
ber
ofTa
gs
B First four tags
0 1 2 3 4 5
Usefulness
0
500
1000
1500
2000
Num
ber
ofTa
gs
C First three tags
Figure 5-10. Experiment 3: Histograms showing the variation in the usefulnessdistribution of Tweets when only the first 5, 4, 3 tags presented by Palantirare chosen
50
Figure 5-8B shows the distribution of workers rating the Tweets generated by
Palantir when it has access to a tag corpus. The tag cloud corresponding to this
experiment is shown in Figure 5-8A. This tag cloud is generated by taking into account
overrides by turkers. In cases where a tag recommended by Palantir is replaced by the
turker, we use the replaced tag. In the histogram, we see that 19.7% of tags are rated as
most useful by workers while 10.8% are rated as useful. The percentage of tags marked
as not useful is 18.26%, which is much better than the initial performance of Palantir in
experiment 1, where we saw this metric was as high as 36.9%. Turkers were paid 42
cents, and took 19 hours and 30 minutes to complete this batch. The average response
time was 4.85 minutes.
5.2.3.2 Analysis
At first glance, it is surprising to note that the number of tags marked as most useful
has increased compared to the previous experiment, which involved humans rating tags
created by other humans. We feel that this is attributed to the fact that while humans
put in only the most relevant tags pertaining to a Tweet, Palantir recommends a higher
number of tags, providing a chance for more tags to be relevant. This also causes
the percentage of tags that are marked as ’not useful’ to rise to 18.2%. We believe
that this is a good trade off compared to having to type in more tags, which would be
more pronounced on a space constrained mobile device. Figure 5-10 explores the
behavior of the Palantir’s tag recommender system when we constrain it to five, four
and three tags. We find that tags are somewhat equally distributed across all bins, and
that this does not significantly change the distribution. The tag cloud Figure 5-8A and
tag distribution Figure 5-9A show that there is a larger set of popular tags compared to
the previous experiment. We feel that recommending tags make people aware of the
options they have before they invent their own tags which may not have mainstream
appeal. This is evidenced by the fact that only 217 unique tags were applied to Tweets in
51
Tweet TagsMedicineBill GatesCharityGates
Welfare
Melinda
AMT
Medicine---
Bill GatesGates
---Melinda
CharityWelfare
Figure 5-11. Experiment 4: Synonym detection on AMT
this experiment, as opposed to 303 unique tags in experiment 2, and 338 unique tags in
experiment 1.
5.2.4 Experiment 4:AMT Synonym Detection
This experiment is different from the preceding experiments in that we don’t ask
turkers to generate new tags. Instead, we ask them to split tags that have been applied
to a specific Tweet into buckets. The goal of this experiment is to determine how many
tags that have been applied to a single Tweet are words that might be considered
synonyms by the community. It is important to note that the conventional method of
52
1.0 1.5 2.0 2.5 3.0 3.5 4.0
Synonym groups
0
20
40
60
80
100
120
Tweets
A Synonym groups distribution
1.0 1.5 2.0 2.5 3.0 3.5 4.0
Tags
0
50
100
150
200
250
Syn
onym
gro
ups
B Tags distribution
Figure 5-12. Experiment 4: Similar Words
using a synonym dictionary like WordNet [59] fails here as some of the words might not
be considered to by synonyms without context, or don’t have synonyms. For example,
one tagging community might consider football and rugby to be equivalent while another
community considers football and soccer to mean the same thing. An advantage
of using AMT for this is that turkers also do entity resolution, consolidating tags like
Samsung galaxy s3 and sgs3.
This HIT is structured as shown in Figure 5-11, wherein a Tweet is presented
with tags applied to it. These tags have been consolidated per Tweet from previous
experiments. The interface presents workers with 5 buckets where they can input tags.
Tags may be repeated in different buckets.
5.2.4.1 Experimental Results
Figure 5-12 describes the result of this experiment. We see that there an average
of 2.58 tag groups per Tweet having on average 2.21 tags per group. When we paid 35
cents for this task, workers took 10 hours to complete it, with the average response time
being 14.5 minutes.
5.2.4.2 Analysis
As seen by the results of this experiments, the same tagging community may use
multiple words which have similar meanings to tag Tweets. The fact that the community
53
is able to partition tags into buckets, on an average of 2.58 buckets per Tweet suggest
that when people apply more than 3 tags to a Tweet, they are using multiple synonyms
to tag the same Tweet. This may be bought down by ensuring that synonyms are not
automatically suggested by the system, as they happen to be now.
5.2.5 Summary of Results
We see that the performance of Palantir as measured by the percentage of tags
people rated as ’most useful’ and ’useful’ rises dramatically while the percentage of
tags rated as ’not useful’ falls to about half of what it is when there is no data loaded.
In addition, we note that there are only 217 unique tags when tagging using Palantir,
as compared to 303 when unguided humans tag. While we aren’t able to comment on
the time taken by the AMT community to apply these tags, we expect that tagging using
Palantir would be quicker compared to using an unguided approach.
54
CHAPTER 6CONCLUSION AND FUTURE WORK
6.0.6 Conclusion
In this thesis, we described Palantir, an architecture that uses collective human
intelligence in microblogging as a means to achieve coherent snapshots of real
world events. We studied the feasibility of recommending tags to an online microblog
community, and measured how useful the tags were for the community. We notice that
when humans are left to tag microblog posts without any guidance on the content of
tags, they select tags that are very general, or are too specific, indicating trouble in
selecting a tag which has the right number of selectivity. Picking a tag that is highly
specific to a Tweet is a setback to the tag’s popularity, as it cannot be applied to most
other Tweets. On the other hand, there are cases where tags specific to users have
helped them retrieve Tweets that are of interest to that user alone. Palantir assists users
by showing candidate tags which may be easily applied. For users new to the system,
this guidance could be important to having them engage directly and instantaneously.
Palantir, even when inaccurate, passively increases the user’s knowledge of tags
existing in the system, contributing to serendipitous discoveries of tags and interests.
6.0.7 Future Work
Palantir was conceptualized with the goal of persuading people of participating
in content creation during times when such content is most crucial. While the tag
recommender system allows people to join in on conversations around a particular
tag, and discover new tags to contribute to, Palantir can go much farther, by enabling
other scenarios summarized in the beginning of Chapter 4. We highlight some of these
possibilities below.
6.0.7.1 Content Syndication
People using Palantir are already plugged in to a stream of information generated
by Twitter, which has been filtered by topics that are of interest to them, and topics they
55
contribute to. We can foster citizen journalism by enabling Palantir users to form ad hoc
groups to report on a specific event. By constructing Palantir as a service, we could
create clients on a variety of devices. People using Palantir on different devices may
use it for different purposes. On a device like a desktop, which has a big screen with
a specialized text input device, an user may comb through Tweets of interest to pick
ones that he could weave into a story. A mobile user may improve Palantir by applying
Tweets to tags and reporting on events. A symbiotic relation may be established by
these groups of people where users having mobile devices are report from the site of
the event while those on more powerful machines channel these tidbits into information
that can be readily digested by outsiders.
6.0.7.2 Survey Creation
A way to make the previous approach more robust would be to ask more people
for information, as supplemental viewpoints may paint a holistic picture. Further, mobile
devices are great for content consumption, and people can use the same to read
through such articles and tag, comment and rate them. Because Palantir already knows
users’ location, and tags that they have contributed towards most, it might be possible to
target these questions to the subsection of people who are likely to be more interested
in it. With the advent of push notifications, high speed data, and computation power
available in todays portable mobile devices, we believe that we are no longer shackled
by hardware capabilities. However, we would need to develop sophisticated ways of
managing reputation of these users in the system, and provide human stewardship
to sustain and grow the community. Of significant importance is having the ability of
filtering out misinformation from Tweets. A reputation management system that does
provides good reach to new people constructively using the system while simultaneously
preventing established users from twisting current facts would work well for this purpose.
It would be an interesting challenge to build a trusted system that is expressive while not
being overly constrained.
56
APPENDIXWORDLISTS
Table A-1. Filter Terms
S.no Word
1 47
2 5 trillion
3 6 studies
4 abort
5 abortion
6 absurd
7 accurate
8 afghanistan
9 akin
10 alaska
11 ambassador
12 anderson
13 anti
14 apploause
15 approval
16 arafat
17 arithmetic
18 assessment
19 bain
20 benjamin
21 bibi
22 biden
23 big government
57
Table A-1. Continued
24 billionaires
25 bin laden
26 bipartisan
27 bird
28 bowles
29 brilliance
30 broad-minded
31 budget
32 buffet
33 bush
34 business
35 canada
36 candidates
37 candy
38 cbo
39 charlie
40 cheers
41 cheny
42 china
43 chinese
44 chouces
45 civil rights
46 class
47 college
48 colorado
49 commander
58
Table A-1. Continued
50 commission
51 companies
52 congressman
53 controversial issues
54 cooper
55 corporate
56 credibility
57 crippling
58 crist
59 criticial
60 crowley
61 daily
62 debate
63 democrat
64 denver
65 depression
66 detroit
67 differences
68 different
69 dishonest
70 dodge
71 domestic
72 donald
73 dream act
74 earth
75 economy
59
Table A-1. Continued
76 education
77 egypt
78 eisenhower
79 elk
80 energy
81 environmental policy
82 exxonmobil
83 fact check
84 federal deficit
85 finland
86 foreign policy
87 fraud
88 fundamentalist
89 gallup
90 gates
91 gay
92 gay marriage
93 giuliani
94 gop
95 governer
96 governing
97 government
98 graduate
99 green
100 growth
101 half
60
Table A-1. Continued
102 health care
103 health care reform
104 hempstead
105 hispanic
106 hisses
107 hofstra
108 homeland security
109 benghazi
110 incentives
111 inclusive
112 independent
113 intelligence
114 intolerant
115 iran
116 iraq
117 israel
118 israelis
119 jet
120 jim
121 jobs
122 kill
123 korans
124 kosher
125 language
126 latino
127 left
61
Table A-1. Continued
128 lehrer
129 liar
130 libya
131 linda
132 lying
133 malarkey
134 marine
135 martha
136 math
137 mcmahon
138 medicaid
139 medicare
140 michelle
141 mid east policy
142 middle eastern policy
143 military
144 mitt
145 morris
146 mubarak
147 mullahs
148 multi-cultural
149 netanyahu
150 newshour
151 nominee
152 nuclear
153 ny
62
Table A-1. Continued
154 obama
155 obamacare
156 obsolete
157 ohio
158 oil
159 opportunity
160 osama
161 oval
162 overseas
163 overwhelming
164 palestine
165 pbs
166 peace
167 peaceful
168 perception
169 philip morris
170 pickering
171 plutocrat
172 polluter
173 pollution
174 potus
175 president
176 principal
177 priority
178 pro choice
179 pro environment
63
Table A-1. Continued
180 progressive
181 queada
182 raddatz
183 rape
184 reagan
185 regressive
186 resilience
187 right-wing
188 roby
189 roe v. wade
190 roll
191 romney
192 romneycare
193 roughly
194 rudy
195 russia
196 russian
197 ryan
198 sanctions
199 satan
200 school
201 scotus
202 sensata
203 shipping
204 silent
205 simpson
64
Table A-1. Continued
206 six studies
207 slowest
208 small
209 small government
210 smoking
211 social security
212 social services
213 socialist
214 soup
215 souza
216 specifiecs
217 stewart
218 stockman
219 stunt
220 subjects
221 taces
222 taibbi
223 taliban
224 tax code
225 ticket
226 todd
227 tolerant
228 training
229 trickle-down
230 tripoli
231 trump
65
Table A-1. Continued
232 tsa
233 unbalanced
234 uninsured
235 veracity
236 vietnam
237 vp
238 waivers
239 war
240 warren
241 wealth
242 wealthy
243 weapon
244 women
245 work
246 yassir
247 york
66
REFERENCES
[1] “State of the news media,” 2012. [Online]. Available: http://www.stateofthemedia.org
[2] M. Garber, “Twitter, the conversation-enabler? actually, most news orgs use theservice as a glorified rss feed,” 2011. [Online]. Available: goo.gl/tWrMs
[3] G. Chen, “Breaking-news situations require a breaking-newsapproach,” 2012. [Online]. Available: http://www.niemanlab.org/2012/01/gina-chen-breaking-news-situations-require-a-breaking-news-approach/
[4] S. Sen, S. K. Lam, A. M. Rashid, D. Cosley, D. Frankowski, J. Osterhouse, F. M.Harper, and J. Riedl, “tagging, communities, vocabulary, evolution,” in Proceedingsof the 2006 20th anniversary conference on Computer supported cooperative work,ser. CSCW ’06. New York, NY, USA: ACM, 2006, pp. 181–190. [Online]. Available:http://doi.acm.org/10.1145/1180875.1180904
[5] “Storify,” 2012. [Online]. Available: http://storify.com/
[6] “Storify: About us,” 2012. [Online]. Available: http://storify.com/about
[7] M. J. Tenore, “25 ways to use facebook, twitter storify to improve political coverage,”2011. [Online]. Available: http://www.poynter.org/how-tos/digital-strategies/151883/25-ways-to-use-facebook-twitter-storify-to-improve-election-coverage/
[8] E. Zak, “How journalists can use storify to cover any type ofmeeting,” 2012. [Online]. Available: http://www.mediabistro.com/10000words/how-to-use-storify-to-cover-a-meeting-workshop-or-event b9068
[9] “Appstore: Vibe,” 2012. [Online]. Available: http://itunes.apple.com/us/app/vibe/id433067417?mt=8
[10] J. Wortham, “Messaging app grows with wall street protests,”2011. [Online]. Available: http://bits.blogs.nytimes.com/2011/10/12/anonymous-messaging-app-vibe-gets-boost-from-occupy-wall-street/
[11] “Dataminr,” 2012. [Online]. Available: http://www.dataminr.com/
[12] T.C.Sottek, “Dataminr analyzes over 340 million tweets a day totrack and predict global events,” 2012. [Online]. Available: http://www.theverge.com/2012/4/9/2936816/dataminr-twitter-data-predict-events
[13] “Wikinews,” 2012. [Online]. Available: http://en.wikinews.org
[14] “About cnn ireport,” 2012. [Online]. Available: http://ireport.cnn.com/about.jspa
[15] K. Ehrlich and N. Shami, “Microblogging inside and outside the workplace,”in In Proceedings of the 4th International AAAI Conference on Weblogs and SocialMedia, 2010, (ICWSM 2010), AAAI Publications. [Online]. Available:http://www.cs.cornell.edu/!sadats/icwsm2010.pdf
67
[16] A. Mathes, “Folksonomies - cooperative classification and communication throughshared metadata,” December 2004. [Online]. Available: http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html
[17] M. Guy and E. Tonkin, “Folksonomies: Tidying up tags?” D-Lib Magazine, vol. 12, no. 1, January 2006. [Online]. Available:http://www.dlib.org/dlib/january06/guy/01guy.html
[18] M. S. Bernstein, B. Suh, L. Hong, J. Chen, S. Kairam, and E. H. Chi, “Eddi :Interactive topic-based browsing of social status streams,” Fortune, pp. 303–312,2010. [Online]. Available: http://portal.acm.org/citation.cfm?id=1866077
[19] A. M. Kaplan and M. Haenlein, “The early bird catches the news:Nine things you should know about micro-blogging,” Business Horizons,vol. 54, no. 2, pp. 105–113, March 2011. [Online]. Available: http://ideas.repec.org/a/eee/bushor/v54yi2p105-113.html
[20] “Wikipedia: Microblogging,” 08 2012. [Online]. Available: http://en.wikipedia.org/wiki/Microblogging
[21] “Twitter blog: One hundred million voices,” 2012. [Online]. Available:http://blog.twitter.com/2011/09/one-hundred-million-voices.html
[22] “Twitter to surpass 500 million users,” 2012. [Online]. Available:http://www.mediabistro.com/alltwitter/500-million-registered-users b18842
[23] H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a newsmedia?” in Proceedings of the 19th international conference on World wide web,ser. WWW ’10. New York, NY, USA: ACM, 2010, pp. 591–600. [Online]. Available:http://doi.acm.org/10.1145/1772690.1772751
[24] Z. Papacharissi and M. de Fatima Oliveira, “Affective news andnetworked publics: The rhythms of news storytelling on egypt,” Journalof Communication, vol. 62, no. 2, pp. 266–282, 2012. [Online]. Available:http://dx.doi.org/10.1111/j.1460-2466.2012.01630.x
[25] L. Grossman, “Iran protests: Twitter, the medium of the movement,” Time Magazine,vol. 17, 2009.
[26] C. Beaumont, “New york plane crash: Twitter breaks the news, again,”2009. [Online]. Available: http://www.telegraph.co.uk/technology/twitter/4269765/New-York-plane-crash-Twitter-breaks-the-news-again.html
[27] J. Wortham, “Michael jackson tops the charts on twitter,”2009. [Online]. Available: http://bits.blogs.nytimes.com/2009/06/25/michael-jackson-tops-the-charts-on-twitter/
68
[28] C. Beaumont, “Mumbai attacks: Twitter and flickr used to break news,” 2008.[Online]. Available: http://www.telegraph.co.uk/news/worldnews/asia/india/3530640/Mumbai-attacks-Twitter-and-Flickr-used-to-break-news-Bombay-India.html
[29] J. O’Dell, “One twitter user reports live from osama bin laden raid,” 2011. [Online].Available: http://mashable.com/2011/05/02/live-tweet-bin-laden-raid/
[30] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes twitter users:real-time event detection by social sensors,” in Proceedings of the 19th internationalconference on World wide web, ser. WWW ’10. New York, NY, USA: ACM, 2010,pp. 851–860. [Online]. Available: http://doi.acm.org/10.1145/1772690.1772777
[31] D. Boyd, S. Golder, and G. Lotan, “Tweet, tweet, retweet: Conversational aspects ofretweeting on twitter,” in System Sciences (HICSS), 2010 43rd Hawaii InternationalConference on, jan. 2010, pp. 1 –10.
[32] A. Java, X. Song, T. Finin, and B. Tseng, “Why we twitter: understandingmicroblogging usage and communities,” in Proceedings of the 9th WebKDD and1st SNA-KDD 2007 workshop on Web mining and social network analysis, ser.WebKDD/SNA-KDD ’07. New York, NY, USA: ACM, 2007, pp. 56–65. [Online].Available: http://doi.acm.org/10.1145/1348549.1348556
[33] P. Andre, M. S. Bernstein, and K. Luther, “Who Gives A Tweet? EvaluatingMicroblog Content Value,” in Proceedings of CSCW 2012, Feb. 2012. [Online].Available: http://www.cs.cmu.edu/!pandre/pubs/whogivesatweet-cscw2012.pdf
[34] D. Boyd, “Twitter: ”pointless babble” or peripheral awareness?” 2009. [Online].Available: http://www.zephoria.org/thoughts/archives/2009/08/16/twitter pointle.html
[35] A. Watters, “How recent changes to twitter’s terms of service might hurt academicresearch,” 2011.
[36] R. Higashinaka, N. Kawamae, K. Sadamitsu, Y. Minami, T. Meguro, K. Dohsaka, andH. Inagaki, “Building a conversational model from two-tweets,” in Automatic SpeechRecognition and Understanding (ASRU), 2011 IEEE Workshop on, dec. 2011, pp.330 –335.
[37] S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts, “Who says what to whom ontwitter,” in Proceedings of the 20th international conference on World wide web, ser.WWW ’11. New York, NY, USA: ACM, 2011, pp. 705–714. [Online]. Available:http://doi.acm.org/10.1145/1963405.1963504
[38] C. Arthur, “What is the 1% rule?” 2006. [Online]. Available: http://www.guardian.co.uk/technology/2006/jul/20/guardianweeklytechnologysection2
[39] M. P. Bill Heil, “New twitter research: Men follow men and nobody tweets,” 2009.[Online]. Available: http://blogs.hbr.org/cs/2009/06/new twitter research men follo.html
69
[40] M. Newman, “Power laws, pareto distributions and zipf’s law,” ContemporaryPhysics, vol. 46, no. 5, pp. 323–351, 2005. [Online]. Available:http://www.tandfonline.com/doi/abs/10.1080/00107510500052444
[41] A. Maslow, “A theory of human motivation,” Psychological Review, vol. 50,pp. 370–396, 1943. [Online]. Available: http://psychclassics.yorku.ca/Maslow/motivation.htm
[42] S. Sen, J. Vig, and J. Riedl, “Tagommenders: connecting users to items throughtags,” in Proceedings of the 18th international conference on World wide web, ser.WWW ’09. New York, NY, USA: ACM, 2009, pp. 671–680. [Online]. Available:http://doi.acm.org/10.1145/1526709.1526800
[43] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation.” Journalof Machine Learning Research, vol. 3, pp. 993–1022, 2003. [Online]. Available:http://dblp.uni-trier.de/db/journals/jmlr/jmlr3.html#BleiNJ03
[44] T. Hofmann, “Probabilistic latent semantic analysis.” in UAI, K. B. Laskey andH. Prade, Eds. Morgan Kaufmann, 1999, pp. 289–296. [Online]. Available:http://dblp.uni-trier.de/db/conf/uai/uai1999.html#Hofmann99
[45] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part-of-speechtagging with a cyclic dependency network,” in NAACL ’03: Proceedings of the 2003Conference of the North American Chapter of the Association for ComputationalLinguistics on Human Language Technology. Morristown, NJ, USA: Associationfor Computational Linguistics, 2003, pp. 173–180. [Online]. Available:http://portal.acm.org/citation.cfm?id=1073445.1073478
[46] Defense Mapping Agency, “The universal grids: Universal Transverse Mercator(UTM) and Universal Polar Stereographic (UPS),” Defense Mapping Agency,Hydrographic/Topographic Center, Fairfax, VA, USA, Tech. Rep. TM8358.2, 1989.[Online]. Available: http://earth-info.nga.mil/GandG/publications/
[47] J. Howe, Crowdsourcing: Why the Power of the Crowd Is Driving the Future ofBusiness, 1st ed. Crown Business, August 2008. [Online]. Available:http://www.worldcat.org/isbn/0307396207
[48] “Wikipedia: Amazon mechanical turk,” 2012. [Online]. Available: http://en.wikipedia.org/wiki/Amazon Mechanical Turk
[49] “mturk forum: Turker nation,” 2012. [Online]. Available: www.turkernation.com
[50] “Turkopticon,” 2012. [Online]. Available: http://turkopticon.differenceengines.com/
[51] L. von Ahn and L. Dabbish, “Labeling images with a computer game,” inProceedings of the SIGCHI conference on Human factors in computing systems,ser. CHI ’04. New York, NY, USA: ACM, 2004, pp. 319–326. [Online]. Available:http://doi.acm.org/10.1145/985692.985733
70
[52] M. Denton, “Crowdsourcing the production of public art,” Master’s thesis, Massey,2010. [Online]. Available: http://mro.massey.ac.nz/handle/10179/1345
[53] W. Mason and S. Suri, “Conducting behavioral research on amazon’s mechanicalturk,” Behavior Research Methods, vol. 44, pp. 1–23, 2012. [Online]. Available:http://dx.doi.org/10.3758/s13428-011-0124-6
[54] S. Suri and D. J. Watts, “Cooperation and contagion in web-based, networked publicgoods experiments,” SIGecom Exch., vol. 10, no. 2, pp. 3–8, Jun. 2011. [Online].Available: http://doi.acm.org/10.1145/1998549.1998550
[55] M. S. Bernstein, G. Little, R. C. Miller, B. Hartmann, M. S. Ackerman, D. R. Karger,D. Crowell, and K. Panovich, “Soylent: a word processor with a crowd inside,”in Proceedings of the 23nd annual ACM symposium on User interface software andtechnology, ser. UIST ’10. New York, NY, USA: ACM, 2010, pp. 313–322. [Online].Available: http://doi.acm.org/10.1145/1866029.1866078
[56] G. Little, “Turkit: Tools for iterative tasks on mechanical turk,” in Visual Languagesand Human-Centric Computing, 2009. VL/HCC 2009. IEEE Symposium on, sept.2009, pp. 252 –253.
[57] M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin, “Crowddb:answering queries with crowdsourcing,” in Proceedings of the 2011 internationalconference on Management of data. New York, NY, USA: ACM, 2011, pp. 61–72.[Online]. Available: http://doi.acm.org/10.1145/1989323.1989331
[58] J. P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R. C. Miller,A. Tatarowicz, B. White, S. White, and T. Yeh, “Vizwiz: nearlyreal-time answers to visual questions.” in W4A, C. Asakawa, H. Takagi,L. Ferres, and C. C. Shelly, Eds. ACM, 2010, p. 24. [Online]. Available:http://dblp.uni-trier.de/db/conf/w4a/w4a2010.html#BighamJJLMMTWWY10
[59] “Wordnet,” 2012. [Online]. Available: http://wordnet.princeton.edu/
71
BIOGRAPHICAL SKETCH
Prithvi Raj was born in Chennai, India. He attended Crescent Engineering College,
Chennai and graduated with a bachelor’s degree in computer science and engineering
from Anna University, Chennai in 2010.
He joined the Department of Computer and Information Science and Engineering
at the University of Florida in Fall 2010. His interests include crowd computing, human
computer interaction, and information visualization.
72