Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home...

70
Big Data Workshop Bettina Berendt Department of Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.berendt/ St. John's International School April 23 rd , 2018, Waterloo, Belgium ‹#›

Transcript of Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home...

Page 1: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

Big Data

Workshop

Bettina Berendt Department of Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.berendt/ St. John's International School April 23rd, 2018, Waterloo, Belgium

‹#›

Page 2: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

2

2

Who am I?

Page 3: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

3

Goals and non-goals

• Goals

▫ Talk about Big Data as a critical data scientist

▫ On a background of what science is & what

“critical“ means in this context

▫ Involve you in being critical and constructive

• Non-goals (selection)

▫ Go into depth about privacy and data protection

– although these topics are unavoidable in the Big

Data context

3

Page 4: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

Big Data is ...

(from Alexandra Roche and Josefine Droste’s

presentation)

‹#›

Page 5: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

5

Big Data is …

• “the growth in the volume of structured and

unstructured data, the speed at which it is

created and collected, and the scope of how

many data points are collected”

• Potential for personalizing learning

• Inherits bias

• Surveillance

• Ethical dilemmas

• Transparency (pro and con), privacy

(Alexandra Roche & Josefine Droste)

5

Page 6: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

Science and being critical are ... ‹#›

Page 7: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

7

What is science? (1)

• A systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe.

• the word "science" became increasingly associated with what is today known as the scientific method, a structured way to study the natural world.

• Contemporary science is typically subdivided into the natural sciences which study nature in the broadest sense, the social sciences which study people and societies, and the formal sciences like mathematics which study abstract concepts. […] Disciplines which use science like engineering and medicine may also be considered to be applied sciences.

• Science is related to research, and is normally organized by a university, a college, or a research institute.

(Wikipedia: “Science”) 7

Page 8: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

8

(1st part of pic)

8

Page 9: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

9

What is science? (2)

“Wissenschaft ist, wenn man genauer nachfragt.”

˜Science happens when you ask again, and ask

more precisely.

(author unknown to me)

9

Page 10: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

10

(1st part of pic)

10

Page 11: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

Big Data is ...

… something we usually encounter via

statements

‹#›

Page 12: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

12

Typical Big Data statements (fictitious, but true to style)

① The average Belgian pupil now spends 3 hours

a day chatting.

② Pupils who spend more than 3 hours a day

chatting “like” Converse sneakers and Dunkin

Donuts.

③ People who “like” Converse and Dunkin

Donuts are less intelligent.

12

Page 13: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

13 Typical BD statements (4):

From Psychometrics Centre 2013

to Cambridge Analytica 2016 13

Page 14: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

14

14

Page 15: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

15

Typical Big Data statements (5) (from the CEM Brochure)

• Maximise learning potential • The CEM IBE computer-adaptive assessment provides an

excellent research-informed baseline to help you predict future performance (in IB Diploma examinations for each subject)

• The CEM IBE computer-adaptive assessment measures students on three key cognitive areas which research shows are linked to later academic outcomes: maths, vocabulary, non-verbal

• Once you have students’ final IB Diploma results, you can return this data to us

• The full CEM IBE product includes additional … questionnaires aiming to understand your students’ motivations, interests and aspirations. (questions about views on cultural background, way of life, social status, …)

15

Page 16: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

16

So how …

• … can we understand such statements

scientifically?

• … can we criticise them scientifically?

16

Page 17: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

Big Data is ...

… data ‹#›

Page 18: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

18

“Data speak for themselves.“

• “With enough data, the numbers speak for

themselves.” Anderson, C. (2008).

• “Quantitative data [...] are independent of

interpretation; [...] they often demand an

interpretation that transcends the quantitative

realm.“ Moretti, F. (2007), p.30

18

Page 19: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

19

Data?

• datum = given

• “data refer to those elements that are taken

[abstracted from phenomena]: extracted

through observations, computations,

experiments, and record keeping”, “selected

from nature by the scientist in accordance with

his [sic] purpose” (Kitchin, 2014)

Capta! 19

Page 20: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

20

Impact of measure-

ment methods

20

Page 21: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

21

Who or what “speaks“?

Who or what “decides“?

21

Page 22: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

22

Summary:

Data cannot speak for themselves • All data are not given (by nature), but taken

(by a researcher or other data collector) ▫ With conscious or unconscious purposes/agendas

▫ In some context

• Data and analyses of them require interpretation

• Big Data are samples too

• All data have quality issues; in Big Data, we often do not know these

• Combining datasets can introduce biases and errors

22

Page 23: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

23

Parking lot science

23

Page 24: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

24

Some more examples of data biases

and parking lot science • Facebook likes, real-world likes

• Facebook self-presentation: only the good things ...

• Restrictions on search in Twitter

Research focus on current and recent events?!

• “Trending topics“ algorithm in Twitter based on burstiness

Suppression of persistent topics?!

24

Page 25: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

Big Data is ...

… statistics

(on steroids)

‹#›

Page 26: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

26

What should you ask a statistic?

26

Page 27: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

27

What should you ask this statement?

The average Belgian pupil now spends 3 hours a day

chatting.

27

Page 28: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

28

How to talk back to a statistic (1)

(building on Huff’s final chapter)

1. Who says so?

2. How do they know?

▫ How were data collected and analysed?

▫ In which contexts?

3. Did somebody change the subject?

▫ What are the actual data?

4. Does it make sense?

28

Page 29: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

29

So …?

1. Who says so?

2. How do they know?

▫ How were data collected and analysed?

3. Did somebody change the subject?

▫ What are the actual data?

4. Does it make sense?

29

The average Belgian pupil now spends 3 hours a day chatting.

Page 30: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

30

Huff’s questions in more detail 1. Who says so?

▫ What could be their conscious or unconscious biases? ▫ Do they use unqualified words (“average”: mean, median, …?) ▫ Do they use OK names? (“The survey results from scientists from the

University of … show …”)

2. How do they know? ▫ Sample size, selection bias? ▫ Correlation size, significance? ▫ Baseline values? ▫ Did external factors change? E.g. frequency of reporting?

3. Did somebody change the subject? / What are the actual data? ▫ Observation or self-report? ▫ Change over time or across data sets in how basic measures are defined ▫ Correlation or causation?

4. Does it make sense? ▫ Be wary of “exact-sounding numbers” (40.13 Euros to eat per week,

average family with 3.5 children) ▫ extrapolation

30

Page 31: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

31

Empiricism and apophenia

31

Page 32: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

32

Empiricism and apophenia: correlation, causation, and instrumentality

32

Page 33: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

33

Correlation vs. causation

• The current scientific consensus is that the only

way to properly demonstrate causation is to do

an experiment.

• Many Big Data sets – especially those

concerning people – are not experimental data,

because they have been collected as

observations in the field, in all the diverse

contexts in which people operate.

• This means they can only show correlation.

33

Page 34: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

34

How to talk back to a statistic (2)

1. Who says so?

2. How do they know?

▫ How were data collected and analysed?

3. Did somebody change the subject?

▫ What are the actual data?

▫ Correlation or causation?

4. Does it make sense?

34

Page 35: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

35

“Correlation replaces causation“?!

(1) Good enough for business logic

35

Page 36: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

36

Correlation replaces causation?!

(2) But deficient for explanation (can we really explain

German history like this?)

36

Page 37: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

37

Correlation replaces causation?!

(3) What about predictions that affect someone‘s self-

image?

37

Page 38: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

38

Questions you should ask any inferential

statistic (e.g., prediction models)

38

• How good is the model?

• There are many relevant measures of

“goodness”.

• In the following, only a small selection.

Page 39: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

39

What is the measure,

and is it statistically significant?

39

[figure caption, from paper]

• Prediction accuracy of

regression for numeric

attributes and traits

expressed by the Pearson

correlation coefficient

between predicted and

actual attribute values;

• all correlations are

significant at the P < 0.001

level.

• The transparent bars

indicate the questionnaire’s

baseline accuracy,

expressed in terms of test–

retest reliability.

Page 40: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

40

But what does the correlation value

itself say?

40 (Wikipedia: “Correlation”)

Page 41: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

41

But what does the correlation value

itself say?

41 (Wikipedia: “Correlation”)

Page 42: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

42

How is a classification model built?

42

Page 43: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

43

How is a classification model built?

43

Page 44: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

44

How good is the model? (= How is a classification model evaluated?) confusion matrix

44

Page 45: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

45

How good?

45

Overall accuracy = (4+900)/1010 = 89.5% Precision for “criminals” = 4/104 = 3.8% Recall for “criminals” = 4/10 = 40% Accuracy of model “always innocent” = 1000/1010 = 99%

Page 46: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

46

How to talk back to a statistic (3)

1. Who says so?

2. How do they know?

▫ How were data collected and analysed?

▫ How good is the model?

3. Did somebody change the subject?

▫ What are the actual data? Correlation or

causation?

4. Does it make sense?

46

Page 47: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

47

Recap (from the CEM Brochure)

• Maximise learning potential • The CEM IBE computer-adaptive assessment provides an

excellent research-informed baseline to help you predict future performance (in IB Diploma examinations for each subject)

• The CEM IBE computer-adaptive assessment measures students on three key cognitive areas which research shows are linked to later academic outcomes: maths, vocabulary, non-verbal

• Once you have students’ final IB Diploma results, you can return this data to us

• The full CEM IBE product includes additional … questionnaires aiming to understand your students’ motivations, interests and aspirations. (questions about views on cultural background, way of life, social status, …)

47

Page 48: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

48

How to talk back to a statistic (4)

1. Who says so?

2. How do they know?

▫ How were data collected and analysed?

▫ How good is the model?

3. Did somebody change the subject?

▫ What are the actual data? Correlation or

causation?

4. Does it make sense?

5. What is actually being claimed?

48

Page 49: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

49

Accumulation of errors 49

… and if they see this ad, they will vote for Trump

Statistical model 1

Statistical model 1

Page 50: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

Big Data is ...

… business models ‹#›

Page 51: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

51

Recap (from the CEM Brochure)

• Maximise learning potential • The CEM IBE computer-adaptive assessment provides an

excellent research-informed baseline to help you predict future performance (in IB Diploma examinations for each subject)

• The CEM IBE computer-adaptive assessment measures students on three key cognitive areas which research shows are linked to later academic outcomes: maths, vocabulary, non-verbal

• Once you have students’ final IB Diploma results, you can return this data to us

• The full CEM IBE product includes additional … questionnaires aiming to understand your students’ motivations, interests and aspirations. (questions about views on cultural background, way of life, social status, …)

51

Page 52: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

52

How to talk back to a statistic (5)

1. Who says so? ▫ What (else) are they interested in?

2. How do they know? ▫ How were data collected and analysed?

▫ How good is the model?

3. Did somebody change the subject? ▫ What are the actual data? Correlation or

causation?

4. Does it make sense?

5. What is actually being claimed?

52

Page 53: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

53

NB: Can I see my data?

What if it’s wrong?

• You have data access rights (and other rights)

under European data protection legislation.

• But that’s another workshop …

53

Page 54: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

Big Data is ...

… an understanding of the past used to justify what some decision maker wants to do in the future.

(Geoffrey Rockwell,

personal communication, cited from memory)

‹#›

Page 55: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

55

Which brings us to …

• … the 2nd meaning of “critical” in science

• “Critical theory” (Habermas, Adorno, …) ▫ (social) science as a practical philosophy aiming

at societal change with the goal of increasing the autonomy / self-determination of people

▫ (A view of “critical” not as widely shared as the first one)

Here:

• Is data the only answer?

• What is the question?

55

Page 56: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

Let’s be practical philosophers

and scientists

… and we’ll use a different example now ‹#›

Page 57: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

57

Belgium:

top?

57

http://ec.europa.eu/eurostat/tgm/refreshTableAction.do?tab=table&plugin=1&pcode=ten00063&language=en

Page 58: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

58

Belgium: flop?

58

Page 59: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

59

One reason:

Belgians don’t

excel at sorting

waste

59

Page 60: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

60

Group work!

• Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems separating their trash properly, in order to give them helpful alerts. You may use any data you want. Prepare a pitch for your business model.

• Group 2: You are a company that wants to use Big Data, but avoid processing personal data. Develop an idea for how to best use these data. Prepare a pitch for your business model.

• Group 3: You are a civil society organisation that wants to

improve the trash situation without recourse to Big Data. Prepare a pitch for your idea.

60

Page 61: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

61

Note 1: Definition of “recycling rate”

Recycling rates for packaging waste (in %)

'Recycling rate' for the purposes of Article 6(1) of

Directive 94/62/EC means the total quantity of

recycled packaging waste, divided by the total

quantity of generated packaging waste.

http://ec.europa.eu/eurostat/web/products-

datasets/product?code=ten00063

61

Page 62: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

62

Note 2: Recycling science

62

Page 63: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

63

Some more ideas

63

Page 64: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

64

Shops

64

Page 65: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

65

Re-

use

65

Page 66: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

66

Activists

66

Page 67: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

67

“Science

activists”

67

Page 68: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

68

Group work!

• Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems separating their trash properly, in order to give them helpful alerts. You may use any data you want. Prepare a pitch for your business model.

• Group 2: You are a company that wants to use Big Data, but avoid processing personal data. Develop an idea for how to best use these data. Prepare a pitch for your business model.

• Group 3: You are a civil society organisation that wants to

improve the trash situation without recourse to Big Data. Prepare a pitch for your idea.

68

Page 69: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

Thank you!

Questions? Email me!

http://people.cs.kuleuven.be/~bettina.berendt/

‹#›

Page 70: Big Data Workshopbettina.berendt/... · • Group 1: You are a fitbit-wristband / smart-home company and want to develop predictive analytics for identifying who will have problems

70

References

70

• Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired 16.07. Available at http://edge.org/3rd_culture/anderson08/anderson08_index.html

• pp. 42ff: Degeling, M. & Berendt, B. (2017). What is wrong about Robocops as consultants? A technology-centric critique of predictive policing. AI & Society. May 2017 Online First.

• pp. 8, 10: Huber, O. (). Das psychologische Experiment: Eine Einführung.

• Huff, D. (1954). How to Lie with Statistics. New York: W.W. Norton & Company, Inc.

• Kitchin, R. (2014). The Data Revolution. Big Data, Open Data, Data Infrastructures & Their Consequences. London: Sage.

• p. 13, 37, 39: Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110 (15), 5802–5805.

• Moretti, F. (2005). Graphs, Maps, Trees. Abstract Models for Literary History. p.30 London: Verso (cited from the paperback published in 2007)

• pp. 13f, 49: www.theguardian.com/commentisfree/2018/mar/20/brenda-the-civil-disobedience-penguin-on-cambridge-analytica-the-real-was-getting-caught

• pp. 31f.: From http://www.tylervigen.com/spurious-correlations

• Further sources on the slides themselves.

• My apologies for having mislaid some photo/picture URLs, and thanks to those who provide(d) them online!

Not cited, but also potentially interesting:

• Berendt, B. (2015). Big Capta, Bad Science? On two recent books on “Big Data” and its revolutionary potential. http://people.cs.kuleuven.be/~bettina.berendt/Reviews/BigData.pdf

• boyd, d. & Crawford, K. (2012). Critical questions for Big Data. Information, Communication & Society, 15:5, 662-679, DOI: 10.1080/1369118X.2012.678878.