A short Introduction to Data...

44
A short Introduction to Data Visualisation Pieterjan Robbe

Transcript of A short Introduction to Data...

Page 1: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

A short Introduction toData Visualisation

Pieterjan Robbe

Page 2: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Contents

1. General Introduction to Data- and Info Graphicswith historical overview

2. Examples of Good and Bad PracticeGraphical integrity

3. Consequences for Your Workwith pointers to technical software

Page 3: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

I. General Introduction to Data- and Info Graphics

Page 4: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

The Milestones Project

• Interactive overview of important events in the history of datavisualisation by Michael Friendly

http://www.datavis.ca/milestones/

1/33

Page 5: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

First Victory for Data Visualisation

John Snow, 1854

2/33

Page 6: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Time Series

Planetary movements, 10th century

3/33

Page 7: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Scales

Michael Florent van Langren, 1644

4/33

Page 8: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

1650-1700: First steps

• From 1650 onwards, more and more data becomes available inpolitics (wealth, population, agriculture) andeconomics (taxes, insurance)

• This increase causes a need for rigorous analysis techniquesbeyond the simple tabulated format⇒ visual thinking

• By 1700, many important data visualisation techniques such asmultivariate graphs, isolines and weather maps have beendiscovered• The two great inventors of modern graphical design are

• J. H. Lambert (1728-1777)• W. Playfair (1759-1823)

5/33

Page 9: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

William Playfair (1/3)

6/33

Page 10: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

William Playfair (2/3)

7/33

Page 11: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

William Playfair (3/3)

8/33

Page 12: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Meanwhile in France

C. J. Minard, “Tableau Figuratif”, 1844

9/33

Page 13: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

1850-1900: The Golden Age

C. Lallemand, “L’abaque Triomphe”, 188510/33

Page 14: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

II. Examples of Good and Bad Practice

Page 16: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

The Lie Factor

“Sometimes it is easier to see what has gone wrong. . .. . . than to explain how to do something right.”

11/33

Page 17: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

The Lie Factor

Lie factor =size of effect shown in graphic

size of effect in data(Tufte, 1983)

Common pitfails are

• apparent data size

• axis ranges

• incomplete data

• aspect ratio

• binning and ordering

Illustrations on the next slides!

12/33

Page 18: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

The Lie Factor

Lie factor =size of effect shown in graphic

size of effect in data(Tufte, 1983)

Common pitfails are

• apparent data size

• axis ranges

• incomplete data

• aspect ratio

• binning and ordering

Illustrations on the next slides!

12/33

Page 19: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Apparent Data Size

Crude Oil Production in the USA (2011-2015)

2011 2012 2013 2014 2015

2011 2012 2013 2014 2015

13/33

Page 20: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Apparent Data Size

Crude Oil Production in the USA (2011-2015)

2011 2012 2013 2014 2015

2011 2012 2013 2014 2015

13/33

Page 21: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Apparent Data Size

14/33

Page 22: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Axis Ranges

De Standaard Online, 3 september

188018901900191019201930194019501960197019801990200020102020

0102030405060708090

vrouwenmannen

15/33

Page 23: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Axis Ranges

De Standaard Online, 3 september

188018901900191019201930194019501960197019801990200020102020

0102030405060708090

vrouwenmannen

15/33

Page 24: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Axis Ranges

Metro, 21 september

2011 2012 2013 2014 2015 20160

50100150200250300350400450

diesel

benzine

16/33

Page 25: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Axis Ranges

Metro, 21 september

2011 2012 2013 2014 2015 20160

50100150200250300350400450

diesel

benzine

16/33

Page 26: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Incomplete Data

NUMA research output

2010 2011 2012 2013 2014 2015 2016 20170

10

20

30

40

year

num

bero

fjou

rnal

pape

rs

17/33

Page 27: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Aspect Ratio

18/33

Page 28: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Aspect Ratio

18/33

Page 29: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Binning and Ordering

1 2 3 40

20

40

60

80

100

problem number

perf

orm

ance

1 2 3 40

20

40

60

80

100

problem number

perf

orm

ance

19/33

Page 30: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

The Bible of Data Visualisation

E. R. Tufte, The Visual Display of Quantitative Information, Graphics Press, Cheshire, CT, USA, 1983

Above all else show the data

Maximize the data-ink ratio

Erase non-data-ink

Erase redunant data-ink

Revise and edit

20/33

Page 31: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Chart Junk

Laatste Nieuws, 17 augustus

21/33

Page 32: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Minard Revisited

Napoleon’s Russian campaign in 1812

22/33

Page 33: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Modern Data Visualisation

(from “Loren on the Art of MATLAB”, R2014b Graphics - Part 3)

23/33

Page 34: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Modern Data Visualisation

(from V. Coppé, D. Huybrechs, R. Matthysen and M. Webb, 2017)24/33

Page 35: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Modern Data Visualisation

The chord map

25/33

Page 36: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Jean-Luc Doumont and The Extreme

• Trees, maps and Theorems (2009)

26/33

Page 37: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Tour de France

cumulative time of all riders of the Tour de France in 2013

7 000

−10 000

C. FroomeJ. Bakelants

27/33

Page 38: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

III. Consequences for Your Work

Page 39: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Guidelines

• Time spent on data visualisation is time well spent!

• Beware of data integrity

• Think about your graphic (use the 5 principles)

• Know your audience

28/33

Page 40: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Outlook

π2

π 3π2

−1

1

π2

π 3π2

−1

1 sin(x)

sin(2x)

sin(3x)

29/33

Page 41: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Outlook

100

101

102

103

104

105

41%15%

24%

24%

22%

50%0%nu

mbe

rof

sam

ples

original

reused

30/33

Page 42: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Graphical Design

• Image aspect ratio, scale of the graphic

• Font font size, font type

• Lines line width, line color, line markers

• Legends do you need one? Where do you put it?

• Tables do you really need a graphic?

31/33

Page 43: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

Software Tools

• Small data sets (<1M points)

also PGFplots, Gnuplot. . .

• Large data sets (>1M points)

32/33

Page 44: A short Introduction to Data Visualisationsiam.cs.kuleuven.be/sites/siam.cs.kuleuven.be/files/... · 2017-10-26 · 1650-1700: First steps From 1650 onwards, more and more data becomes

References

E. R. Tufte, The Visual Display of Quantitative Information, GraphicsPress, Cheshire, CT, USA, 1983.

J. L. Doumont, Trees, maps, and theorems: Effective communicationfor rational minds, Principiae, 2009.

C. Chen, W. K. Härdle and A. Unwin, eds., Handbook of datavisualization, Springer Science & Business Media, 2007.

M. J. Kraak, Mapping time. Illustrated by Minard’s map of Napoleon’sRussian Campaign of 1812, Modern problems of geography andanthropology, Tblisi, Georgia, 2015.

33/33