Big data luiss Facebook and epistemology

Post on 25-Jul-2015

Big data in social sciences and humanities: from

epistemology to data power

Teresa Numerico Dept. Philosophy, communication and

performing artsUniversity of Rome Three

Luiss - Media Politics and Democracy. A Challenging Topic for Social Sciences

21-22 May 2015

Questionable Big data examples: Ethical, juridical, political and social doubts

Facebook experiments, google flu trends, culturonomics

Facebook experiment on textual emotional contagion

• In June 2014 PNAS journal published the description of a Facebook experiment on measuring emotional negative and positive contagion by altering the news feed of 689,003 English users

• The paper was written by Adam Kramer (core data science team Facebook) and two scholars in social sciences who worked at the Dept. of Communication and information science, Cornell University

See Schroeder 2014 for a complete analysis of the Facebook experiment

Informed consent

• There is a discussion about informed consent of the people who were involved in the experiment

• Users tested in the experiment did not obtain any prior information or opt-out opportunity

• Because Facebook is a company and not a research institution there was no need to ask for any extra consent than that which is obtained in the service agreement

• The defence of Facebook with respect to this point is based on the fact that the company always manipulates user experience (Yarkoni 2014, boyd 2014)

IRB approval

• Because the research was conducted independently by Facebook and Professor Hancock had access only to results – and not to any individual, identifiable data at any time – Cornell University’s Institutional Review Board concluded that he was not directly engaged in human research and that no review by the Cornell Human Research Protection Program was required

Press release Cornell University 30 june 2014


Data collection and interpretations

• The collection of the data and their interpretations raises not only ethical and legal doubts but also epistemological controversies.

• Positive and negative emotional words were counted using a linguistic inquiry and word count software (LIWC 2007) that implies the use of a generic, univocal, context free definition of words, judged as positive or negative. The system interprets posts by listing the presence of positive or negative expressions

Kramer and al. 2014, passim

Technological determinism or exploitation of a dominant

position? • Prediction and manipulation are based on the hypothesis that human behaviour is stable and mechanically alterable

• No replication of the experiment according to the standard scientific methodology is possible

• No control on data acquisition from scientists that were involved in the interpretation process, Jamie Guillory and Jeffrey Hancock

• However their reputations as social scientists were used by the Facebook team to validate their data science research results

Social sciences: representing while intervening • According to Evelyn Fox Keller (1991), a feminist philosopher of science and to Ian Hacking (1983, 1992) it is not possible to represent something without intervening and transforming it

• The Facebook experiment is a clear example of a representation that need intervention: understanding the emotional reactions of the human beings - which were the objects of representation - implied manipulating them

• Scientists are like apprentice sorcerer: they describe emotional reactions, while inducing them during the experiment

Google Flu Trends (GFT) failure

• GFT did not give the right predictions on flu trends, their value almost doubled the data preview by the Center for disease control and prevention (CDC)

• Instability of the data • Continuous changes in the search algorithms

that influenced the GFT data • Not clear indicators adopted • Impossible to repeat experiments for

controlling results • Measurement systems impossible to analyse • The risk of ‘red teams’ attack on the

monitored systems, that attempt to manipulate results for economic or political gain

Lazer and al. 2014

Facebook filter bubble study

• Bakshy et al. Exposure to ideologically diverse news and opinion on Facebook, Science, 7 may 2015

• David Lumb: Why Scientists Are Upset About the Facebook Filter Bubble Study•


• Christian Sandvig: The Facebook “It’s Not Our Fault” Study•


• Eli Pariser:  Did Facebook’s Big New Study Kill My Filter Bubble Thesis?•


• Zeynep Tufekci:  How Facebook’s Algorithm Suppresses Content Diversity (Modestly) and How the Newsfeed Rules Your Clicks


• John Wihbey | May 7, 2015: Does Facebook drive political polarization? Data science and research

Facebook data science and politics

• Vinter Mason 28/10/2014: Politics and Culture on Facebook in the 2014 Midterm Elections

Epistemology and politics: research and power

Changes in thinking about knowledge creation and their consequences

researching or spying • How to be a knowledge scientist after Snowden

revelations? (Berendt, Bückler, Rockwell 2015, see also van Dijck 2014)

• The digital humanist is losing innocence, experiencing his/her own ‘Manhattan Project’ syndrome: there is no neutral technology

• Technologies are already oriented once they are used in the research/battle field

• Ethics of knowledge science is needed but it is very difficult if we decline responsibility on our creatures as soon as we invent them

• There is a power of data, not only because they are never raw, not only because they are often proprietary but also because they are used for political reasons and every generic ‘neutral’ manipulation is a transformation of the observed object with no way back

Knowing is transforming AKA Fox Keller vision

• There is no pure science and bad applications • Knowledge is action not only with respect to

power in society but also with respect to the object of research

• After the knowledge process the object will never be the same

• Language’s role in science is never considered enough

• The evocative character of language and its vague, ambiguous status introduces uncontrolled leaps of meanings, metaphors, and the pre-scientific arguments

Fox Keller 2011

Rhetoric of BD/1: Computer are better problem solver than humans

• It’s human nature to focus on the problems […] where human skill and ingenuity are most valuable. And it’s normal human prejudice to undervalue the problems [of] the domain where data-driven intelligence really shines. But […] what problems can computers solve that we can’t? And how, when we put that ability together with human intelligence, can we combine the two to do more than either is capable of alone?

Nielsen, 2011, p. 255

Rhetoric of BD/2: data-driven science

• Science is no more oriented by interpretation, models and theory

• Science is “data-driven” which - in the BD jargon - means that there is no interpretation and no theory prior to data, because they are just making sense by themselves

• But this is just rhetoric because in order to find out the correlation among data series you need to seek for them choosing the right machine learning algorithms, or you risk that the correlations are just random, particularly with high dimensionality

No BD without solid replicable methodologies• Machine-learning methods are a valuable part of our toolkit in understanding behavior, but we do not yet understand the precise limits of their applicability

• The biggest contributions before us are not new algorithms or new social theories but new methodologies for decomposing hard questions in the social sciences into a series of robust analyses that are replicable and composable

Raghavan 2014

BD can be useful provided we understand the epistemological


• According to Kitchin 2014a we need to develop a “situated, reflexive and contextually nuanced epistemology” in order to effectively use the methods in social sciences and humanities

• But to understand the problematic epistemological implication means to reduce the rhetoric and comprehend the relationships savoir/poivoir which are implied in data-driven results

Let’s ask some final questions on BD experiments and results

• Who owns the data? • Who owns the machines on which the data are processed?

• Who plans the algorithms to make sense of the data (is the data scientist working with or without the field expert)?

• What do we consider as definite results of the data-driven procedures?

• who is going to take advantages of the results?

• Is it possible to replicate the process, on different machines with different algorithms to be sure of the stability of the results?

