The evolution of research on social media
-
Author
farida-vis -
Category
Social Media
-
view
917 -
download
3
Embed Size (px)
description
Transcript of The evolution of research on social media

The evolution of research on social media
Farida Vis, University of [email protected]
European Conference on Social Media, 10 July, Brighton, United Kingdom

ACADEMIAINDUSTRY
GOVERNMENT

SOCIAL MEDIA = BIG DATA

REAL-TIME ANALYTICS

SOCIAL MEDIA = TWITTER

WHERE ARE THE RQs?

WHERE’S THE THEORY?

ETHICSMETHODSSAMPLING
DATA SHARING

WHERE’S THE FUNDING?

WHAT’S THE FUTURE?

Aftermath of Hurricane Katrina
2005: Flickr

2008: YouTube
Fitna: The Video Battle

2011: Twitter
Reading the Riots on Twitter

data
dataunstructured

235 posts – 106 individuals (Flickr)
Aftermath of Hurricane Katrina
2005
Manual collection possible

1413 videos – 700 individuals (YouTube)
Fitna: The Video Battle
2008
+ Computer Science

2.6 million tweets – 700K individuals (Twitter)
Reading the Riots on Twitter
2011
+ Lots of Computer Science

READING THE RIOTS
ON TWITTER
Rob Procter (University of Manchester)Farida Vis (University of Leicester)
Alexander Voss (University of St Andrews)[Funded by JISC] #readingtheriots

BORDER RUNNER

BIG DATA

“Big data” is high-volume, -velocity and –variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making’ (Gartner in Sicular, 2013).
Huge industry now built around ‘social data’ and ‘listening platforms’ feeding on this data (Many tools not suitable for academic use, black box).

• Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link, and compare large data sets.
• Analysis: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims.
• Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy. (boyd and Crawford p. 663).

Critiques of Big Data• Important to make visible inherent claims about
objectivity• Problematic focus on quantitative methods• How can data answer questions it was not designed
to answer?• How can the right questions be asked?• Inherent biases in large linked error prone datasets• Focus on text and numbers that can be mined
algorithmically• Data fundamentalism

Data fundamentalism
The notion that correlation always indicates causation, and that massive data sets and predictive analytics always reflect ‘objective truth’. Idea and belief in the existence of an objective ‘truth’, that something can be fully understood from a single perspective, again brings to light tensions about how the social world can be made known.

CRITICAL BIG DATA STUDIES?

How do we ground online data?In the offline: assessing findings against what we know about an offline population (census data) in order to better understand online data. Problems with over/under representation in online data?
In the online: premised on the idea that data derived from social media should be grounded in other online data in order to understand it. So comparing Facebook use to what we know about Facebook use, rather than connecting it to offline measurements about citizens. Richard Rogers

Important considerations
1. Asking the right question – research should be question driven rather than data driven.
2. Accept poor data quality & users gaming metrics – once online metrics have value users will try to game them.
3. Limitations of tools (often built in disconnected way)
4. Transparency – researchers should be upfront about limitations of research and research design. Can the data answer the questions?

A critical reflection on Big Data: considering APIs, researchers and
tools as data makers

Rather than assuming data already exists ‘out there’, waiting to simply be recovered and turned into findings, the article examines how data is co-produced through dynamic research intersections. A particular focus is the intersections between the Application Programming Interface (API), the researcher collecting the data as well as the tools used to process it. In light of this, the article offers three new ways to define and think about Big Data and proposes a series of practical suggestions for making data. (First Monday, October 2013, http://firstmonday.org/)

Twitter data ecosystem

Standard API sampling problems
Sampling from the FIREHOSE
1% random sample of the firehose
If not rate limited – all data collected?

New API sampling problems
New business models: enriched metadata
Social media vs social data
Datasift, GNIP and Topsy

Social media VS social data
• Social Media: User-generated content where one user communicates and expresses themselves and that content is delivered to other users. Examples of this are platforms such as Twitter, Facebook, YouTube, Tumblr and Disqus. Social media is delivered in a great user experience, and is focused on sharing and content discovery. Social media also offers both public and private experiences with the ability to share messages privately.
• Social Data: Expresses social media in a computer-readable format (e.g. JSON) and shares metadata about the content to help provide not only content, but context. Metadata often includes information about location, engagement and links shared. Unlike social media, social data is focused strictly on publicly shared experiences. (Cairns, 2013)

Social media
Social data


Enriched metadata
Location and influence
Where are users?
Are they influential?

Twitter expanding/enriching metadata
Hawskey (2013)

New Profile Geo Enrichment

Geo-locating tweets
Exact locationLat/long coordinatesGold standard geo dataProblem: only 1% of users-> Only 2% of firehose tweetsEarly adopters, highly skewed
Where in the world are you?No Lat/long coordinatesText field – enter anythingAdvantage: more than half of all tweets contain profile locationMuch more evenly distributed

Profile Geo Enrichment
‘our customers can now hear from the whole world of Twitter users and not just 1%’ (Cairs, 2013 on Gnip company Blog)
• Activity Location – 1% that provide lat/long• Profile Location – Place provided in their profile. May or may not
be posting from there.• Mentioned Location – Places a user talks about
‘Both the tweet text and Profile fields contain geographic information, but not in substantial quantities and have poor accuracy’ (Leetaru et al, First Monday, May 2013)

Problem with deleted tweets
‘A deleted tweet effectively disappears from the results of searching Twitter, although a short delay sometimes occurs between deletion and disappearance. A status deletion notice is distributed via the Twitter streaming API to relevant users’ clients so that they, in turn, remove deleted tweets from their records.’‘Twitter does not provide a bulk-deletion of user’s tweets. It provides, however, a one-click bulk-deletion of all location data that were attached to user’s tweets, without deleting the tweets. By clicking on the “Delete all location information” button on user’s account settings page, all locations attached to all previous tweets are deleted. (Almuhimedi et al, 2013)

Profile Geo Enrichment
Linking data
‘Profile location data can be used to unlock demographic data and other information that is not otherwise possible with activity location. For instance, US Census Bureau statistics are aggregated at the locality level and can provide basic stats like household income. Profile location is also a strong indicator ofactivity location when one isn’t provided. (Cairns, 2013)

Social influence: Klout scores

Klout used to make profiles without consent

This is a DM. What is it doing here?

Klout rewards users for giving dataMore data = more influential?
Online/offline?

Scores are easily gamed

Fake followers: Mitt Romney’s 100,000 extra followers in one day
As many as 20 million fake follower accounts (200 million active users)
This doesn’t take into account the issue of spoof accounts (clearly in evidence in riot tweets) (Perlroth, 2013)

Klout scores industry standard measure…

Ability to describe the limitations of our data:
- APIs as data makers. Once data is linked very hard to untangle how metadata is constructed and where problems might be. Included in terms of deleted content.
- Researchers and tools as data makers

- When creating a dataset important to describe how it was made, what the limitations are. What the sampling limitations (both in terms of the API, but also related to offline ‘population’. What other limitations re: enriched metadata needs to be described?)
- When creating a dataset how complete is it?- Limitations need to be known in order to
describe them. This is a real problem.

Tools as data makers
In answering complex questions about social media data, we need:1. Know the questions! And know how they might be
answered.2. Problem with tools: not question driven. Often
developed around available (poor quality) data, often by non social media experts, but those with data processing expertise.
3. Tools therefore become data-makers in that they limit the scope of possibility in the questions researchers imagine. This is a huge problem!

Need better understanding of complex ever changing dynamics between
APIsResearchers
Tools

Organic data / data in the wild
SOCIAL MEDIA SIMPLY AS (BIG) DATA
VS
SOCIAL MEDIA AS A RESEARCH AREA

DOMAIN EXPERTISE

TO UNDERSTAND TWITTER DATA YOU
NEED TO UNDERSTAND TWITTER

+ BE ON TWITTER

WHAT’S THE FUTURE?

WHAT GETS LEFT OUT?


750 MILLION IMAGES
SHARED DAILY

Images posses the ability to grab our attention
Social media companies know this
Images are key to engagement



Camera: used to be for special occasions Smartphone: always with us

Everyday snaps Witnessing events

US 65% smartphone penetrationSmartphones overtaken desktop usage to access the internet
Mobile internet accounts for majority of internet use in US (57%)Users typically access the internet via apps on mobile devices
All figures from comScore, US Digital Future in Focus, 2014

UK: The over-55s will experience the fastest year-on-year rises in smartphone penetration.
Smartphone ownership should increase to about 50% by year-end, a 25% increase from 2013, but trailing 70% penetration among 18-54s.
The difference in smartphone penetration by age will disappear, but differences in usage of smartphones remain substantial. Many over 55s use smartphones like feature phones.
All figures from Deloitte, predictions for 2014

Rise of platforms and apps focused on visual content
PinterestTumblr
InstagramVine
Snapchat
‘Mobile first… and only’ | simple easy, user friendly design

Facebook daily image uploads: 350 million (November 2013)
Instagram daily image uploads: 60 million (March 2014)
Twitter: 500 million tweets daily (March 2014)
Snapchat daily snaps: 400 million (November 2013)


Images largely ignored in social media research
Not easy to ‘mine’
Hard to figure out meaning
Huge interest in industry

WHAT DOES THE FUTURE OF SOCIAL MEDIA
RESEARCH LOOK LIKE?

QUESTION DRIVEN(TOOL AWARE) + CRITICAL
BETTER METHODSMORE THEORY
TRANSPARENTSUSTAINABLE
ETHICALCROSS PLATFORM
INTERDISCIPLINARYMORE CROSS SECTOR?
MORE FUNDING!

References• Hazim Almuhimedia, Shomir Wilsona, Bin Liua, Norman Sadeha, Alessandro Acquistib, 2012. ‘Tweets Are Forever: A Large-
Scale Quantitative Analysis of Deleted Tweets’, CSCW’13, February 23–27, 2013, San Antonio, Texas, USA, http://www.cs.cmu.edu/~shomir/cscw2013_tweets_are_forever.pdf , accessed 18 September, 2013.
• Ian Cairns, 2013. ‘Get More Geodata From Gnip With Our New Profile Geo Enrichment’, Gnip Company Blog, 22 August, at http://blog.gnip.com/tag/geolocation/ , accessed 13 September 2013.
• Grcommunication, 2012, ‘I will help raise your Klout score by sending you 10Ks and will tweet it out to my 50K+ followers from my 80+ Klout score for $5, http://fiverr.com/grcommunication/help-raise-your-klout-score-by-sending-you-10ks-and-will-tweet-it-out-to-my-17k-followers-from-my-70-klout-score, accessed 19 September 2013.
• Anthony Ha, 2013. ‘Gnip Expands Its Partnership With Klout, Becoming The Exclusive Provider Of Klout Topics’, TechCrunch, 8 August, http://techcrunch.com/2013/08/08/gnip-klout/ , accessed 19 September 2013.
• Martin Hawksey, 2013. ‘Twitter throws a bone: Increased hits and metadata in Twitter Search API 1.1,’ March 28, at http://mashe.hawksey.info/2013/03/twitterthrows-a-bone-increased-hits-and-metadata-in-twitter-search-api-1-1/ , accessed 10 September 2013.
• Kalev H. Leetaru, Shaowen Wang, Guofeng Cao, Anand Padmanabhan, and Eric Shook, 2013, ‘Mapping the global Twitter heartbeat: The geography of Twitter, First Monday, Volume 18, Number 5-6 May, http://firstmonday.org/article/view/4366/3654
• Nicole Perlroth, 2013. ‘Fake Twitter Followers Become Multimillion-Dollar Business’, New York Times, Bits blog, 5 April, http://bits.blogs.nytimes.com/2013/04/05/fake-twitter-followers-becomes-multimillion-dollar-business/ , accessed 19 September 2013.
• Farida Vis, 2013. ‘A critical reflection on Big Data: considering APIs, researchers and tools as data makers’, First Monday, 7 October, http://firstmonday.org