Intro to data visualization

88
Data Visualization - An introduction Prof Jan Aerts Biodata Visualization and Analysis ESAT/SCD University of Leuven Belgium twitter: @jandot Google+: +Jan Aerts [email protected] http://biovizanlab.wordpress.com http://saaientist.blogspot.com

description

Slides used in capita selecta HCI course H05N2A

Transcript of Intro to data visualization

Page 1: Intro to data visualization

Data Visualization - An introduction

Prof Jan AertsBiodata Visualization and AnalysisESAT/SCDUniversity of LeuvenBelgium

twitter: @jandotGoogle+: +Jan [email protected]://biovizanlab.wordpress.comhttp://saaientist.blogspot.com

Page 2: Intro to data visualization

1. What is data visualization?

Page 3: Intro to data visualization

“A good sketch is better than a long speech” (Napoleon)

Page 4: Intro to data visualization

“A good sketch is better than a long speech” (Napoleon)

shows: size of the army, geographical coordinates, direction that the army was traveling, location of the army with respect to certain dates, temperature along the path of the retreat

Page 5: Intro to data visualization

John Snow - cholera map

Page 6: Intro to data visualization

Shape of Songs: “Like a Prayer” (Madonna)Martin Wattenberg

Page 8: Intro to data visualization
Page 9: Intro to data visualization

What I use as a definition:

“computer-based visualization systems providing visual representations of datasets intended to help people carry out some task more effectively.” (T Munzner)

Page 10: Intro to data visualization
Page 11: Intro to data visualization

cognition <=> perceptioncognitive task => perceptive task

“eyes beat memory”

Page 12: Intro to data visualization

• record information

• blueprints, photographs,seismographs, ...

• analyze data to support reasoning

• develop & assess hypotheses

• discover errors in data

• expand memory

• find patterns (see Snow’s cholera map)

• communicate information

• share & persuade

• collaborate & revise

Why do we visualize data?

Page 13: Intro to data visualization

pictorial superiority effect

“information”

“informa” “i”65% 1%

72hr

exploration explanation

Page 14: Intro to data visualization

2. Exploration <-> explanation

Page 15: Intro to data visualization

exploration explanation

Page 16: Intro to data visualization

exploration explanation

visual analytics infographics

Page 17: Intro to data visualization

exploration explanation

visual analytics infographics

Page 18: Intro to data visualization

exploration explanation

visual analytics infographics

hypothesis generation

Page 19: Intro to data visualization

exploration explanation

“visual analytics”

=> identify unexpected patterns

Page 20: Intro to data visualization

J van Wijk

exploration explanation

Page 21: Intro to data visualization

Anscombe’s quartet

• uX = 9.0

• uY = 7.5

• sigma X = 3.317

• sigma Y = 2.03

• Y = 3 + 0.5X

• R2 = 0.67

Page 22: Intro to data visualization
Page 23: Intro to data visualization
Page 24: Intro to data visualization

A concrete example: hive plots

Page 25: Intro to data visualization

Martin Krzewinsky

same network

Page 26: Intro to data visualization

Martin Krzewinsky

different networks!

Page 27: Intro to data visualization

3D, anyone?

Page 28: Intro to data visualization

3D, anyone?

occlusioninteraction complexityperspective distortion

text legibility

Page 29: Intro to data visualization

Gene interaction data: “gene A regulates gene B”

Functions in linux operation system: “function A calls function B”

Page 30: Intro to data visualization

regulator

manager

workhorse

Page 31: Intro to data visualization

3. Why specifically learn about dataviz?

Page 32: Intro to data visualization

Isn’t it all just about using common sense?

Page 33: Intro to data visualization

• huge space of design alternatives => many tradeoffs

• many possibilities known to be ineffective

• avoid random walk through parameter space

• avoid some of our past mistakes

• extensive experimentation has already been done

• guidelines continue to evolve

• we reflect on lessons learned in design studies

• iterative refinement usually wise

Page 34: Intro to data visualization

4. Stages of data visualization

Page 35: Intro to data visualization

How do we get from data to visualization? We need to understand:

• properties of the data

• properties of the image

• the rules mapping data to image

Page 36: Intro to data visualization

4.1. Properties of the data

Page 37: Intro to data visualization

S Stevens “On the theory of scales and measurements” (1946)

Page 38: Intro to data visualization

4.2. Properties of the image - perception

Page 39: Intro to data visualization

Semiology of graphics

• Jacques Bertin, Gauthier-Villars 1967, EHESS 1998

• semiology = study of signs and sign processes, likeness, analogy, metaphor, symbolism, signification, and communication (Wikipedia)

• visual encoding:

• what - points, lines, areas (, patterns, trees/networks, grids)

• where - positional: XY (1D, 2D, 3D)

• how - retinal: Z (size, lightness, texture, colour, orientation, shape)

• when - temporal: animation

Page 40: Intro to data visualization

“marks” - geometric primitives

“channels” - control appearance of marks

H

V

S

Page 41: Intro to data visualization

Gestalt laws - interplay between parts and the whole (Kurt Koffka)

series of principles

Election results Florida:

• black = Bush

• white = Gore

Page 42: Intro to data visualization
Page 43: Intro to data visualization

Gestalt - Principle of Simplicity

Every pattern we see is seen such that we see a structure that is as simple as possible.

Page 44: Intro to data visualization

Gestalt - Principle of Proximity

Things that are close to each other are seen as belonging together (=> clusters)

Page 45: Intro to data visualization

Gestalt - Principle of Similarity

Things that are similar in some way are perceived as belonging together.

Page 46: Intro to data visualization

Gestalt - Principle of Closure

You will try to complete a pattern.

Page 47: Intro to data visualization

Gestalt - Principle of Connectedness

Things that are connected are perceived as belonging together. This encoding is stronger than similarity, shape, colour, and size.

Page 48: Intro to data visualization

Gestalt - Principle of Good Continuation

Objects that are arranged in a straight or smooth line tend to be seen as a unit.

Page 49: Intro to data visualization

Gestalt - Principle of Common Fate

Objects that move in the same direction tend to be seen as a unit.

Page 50: Intro to data visualization

Gestalt - Principle of Familiarity

Page 51: Intro to data visualization
Page 52: Intro to data visualization
Page 53: Intro to data visualization
Page 54: Intro to data visualization

Gestalt - Principle of Symmetry

Symmetrical areas tend to be seen as figures against asymmetrical backgrounds.

Page 55: Intro to data visualization

Context affects perceptual tasks

Page 56: Intro to data visualization

Pre-attentive vision

= ability of low-level human visual system to rapidly identify certain basic visual properties

• some features “pop out”

• used for:

• target detection

• boundary detection

• counting/estimation

• ...

• visual system takes over => all cognitive power available for interpreting the figure, rather than needing part of it for processing the figure

Page 57: Intro to data visualization

Really fast; see http://www.csc.ncsu.edu/faculty/healey/PP/

Page 58: Intro to data visualization

1. Combining pre-attentive features does not always work => would need to resort to “serial search” (most channel pairs; all channel triplets)e.g. is there a red square in this picture

Limitations of preattentive vision

2. Speed depends on which channel (use one that is good for categorical; see further (“accuracy”))

Page 59: Intro to data visualization

4.3. Mapping data to image: visual encoding

Page 60: Intro to data visualization

Language of graphics

• graphics = sign system:

• each mark (point, line, area) represents a data element

• choose visual variables to encode relationships between data elements

• difference, similarity, order, proportion

• only position supports all relationships (see later)

• huge range of alternatives for data with many attributes

• find images that express & effectively convey the information

Page 61: Intro to data visualization

Which encoding should I use?

• From huge list of possibilities, you have to choose the best one.

• Principle of Consistency

• properties of the representation should match properties of the data (e.g. pie chart: area vs radius)

• Principle of Importance Ordering

• encode the most important piece of information in the most “effective” way (i.e. spatial position)

Page 62: Intro to data visualization
Page 63: Intro to data visualization

Steven’s psychophysical law

= proposed relationship between the magnitude of a physical stimulus and its perceived intensity or strength

Page 64: Intro to data visualization

Accuracy of quantitative perceptual tasks

McKinlay

what/where (qualitative)how much (quantitative)

Page 65: Intro to data visualization

Accuracy of quantitative perceptual tasks

McKinlay

what/where (qualitative)how much (quantitative)

Page 66: Intro to data visualization

Accuracy of quantitative perceptual tasks

McKinlay“power of the plane”

what/where (qualitative)how much (quantitative)

Page 67: Intro to data visualization

Accuracy of quantitative perceptual tasks

McKinlay

what/where (qualitative)how much (quantitative)

grouping: see Gestalt laws

Page 68: Intro to data visualization

COLOUR

Page 69: Intro to data visualization

COLOUR ... is tricky, and often used wrong

Page 70: Intro to data visualization

Colour space

• = mathematical model to talk about colour

• RGB (red-green-blue)

• most common, but less useful

• HSV (hue-saturation-value)

• more useful

Page 71: Intro to data visualization

colorbrewer2.org

in R: please use RColorBrewer!

Page 72: Intro to data visualization

Context affects colour perception

Page 73: Intro to data visualization

Context affects colour perception

Page 74: Intro to data visualization

Dangers of Depth (3D)

• We do NOT see in 3D; we see in 2.05D.

• occlusion

• interaction complexity

• perspective distortion

Page 75: Intro to data visualization

3D example

Page 76: Intro to data visualization

Lie factor

size of effect shown in graphic“lie factor” =

size of effect in data

Page 77: Intro to data visualization

3D scatter plots are better as series of 2D projections

Page 78: Intro to data visualization

Dynamic data

• animation is good sometimes, but often not:

• we can only follow 3-4 visual cues simultaneously

• change in “mental map”

• change blindness (e.g. http://nivea.psycho.univ-paris5.fr/CBMovies/BarnTrackFlickerMovie.gif)

Page 79: Intro to data visualization

http://vimeo.com/2035117

Page 80: Intro to data visualization
Page 81: Intro to data visualization

5. Interaction

Page 82: Intro to data visualization

Overview, zoom and filter, details on demand(Schneiderman’s Information Seeking Mantra)

Page 83: Intro to data visualization

• sorting

• filtering

• browsing/exploring

• comparison

• characterizing trends & distributions

• finding anomalies & outliers

• ...

Operations on the data

Page 84: Intro to data visualization

Techniques to support these operations

• re-orderable matrices

• brushing

• linked views

• overview & detail

• focus & context

• ...

Page 85: Intro to data visualization

6. Validation

Page 86: Intro to data visualization

Evaluate the right thing

Munzner, 2009

Page 87: Intro to data visualization

Slide/picture acknowledgments

• Jeffrey Heer

• Tamara Munzner

• Jessie Kennedy

• Nils Gehlenborg

• Miriah Meyer

Page 88: Intro to data visualization

“I think this presentation went quite well...”