Envisioning Information Lecture 3 – Multivariate Data Exploration

25
ENV 2006 3.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie

description

Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates. Ken Brodlie. Multivariate datasets can be expressed as a data table Each entry in table is an observation An observation consists of values of a set of variables, or variates - PowerPoint PPT Presentation

Transcript of Envisioning Information Lecture 3 – Multivariate Data Exploration

Page 1: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.1

Envisioning Information

Lecture 3 – Multivariate Data Exploration

Scatter plots and parallel coordinates

Ken Brodlie

Page 2: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.2

Data Tables

• Multivariate datasets can be expressed as a data table

– Each entry in table is an observation

– An observation consists of values of a set of variables, or variates

• Exercise– Create a data table from the

MSc class… A B C

1 .. .. ..

2 .. .. ..

variables

observations

Page 3: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.3

Scatter Plot

• For two variates, we have already met the scatter plot technique

• It is useful for showing what happens to one variable as another changes…

Page 4: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.4

Scatter Plot

• Visicube from Datamology is a useful free charting tool

• Here is an example scatter plot, visualizing the speed of the (receding) galaxy NGC7531 relative to the earth, measurements of speed being taken at different points on galaxy

• Circles represent measurements at 133o to horizon; pluses at 43o

• What can you observe?

http://www.datamology.com/sample-S2.shtml

Page 5: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.5

3D Scatter Plot

• Visicube has a tool specifically for 3D scatter plots

• Third variate expressed as a

vertical axis and widget lets you take slices at different heights

• Here we have same dataset but X and Y are positions, and Z axis is velocity … ie layered by velocity – here 3rd layer (1482 – 1519 km/sec)

• Observations less than 1500 km/sec highlighted in yellow (almost allowing 4D)

• Conclusion?

http://www.datamology.com/sample-S3.shtml

Page 6: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.6

3D Scatter Plots

• Here is an alternative approach, using 3D plotting…

• … does this work?

XRT/3d

http://www.ist.co.uk/XRT/xrt3d.html

Page 7: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.7

Extending to Higher Numbers of Variables

• Additional variables can be visualized by colour and shape coding

• IRIS Explorer ( a scientific visualization system!) used to visualize data from BMW

– Five variables displayed using spatial arrangement for three, colour and object type for others

– Notice the clusters…

• But there are clearly limits to how much this will scale

Kraus & Ertl, U Stuttgarthttp://wscg.zcu.cz/wscg2001/Papers_2001/R54.pdf

Page 8: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.8

Multivariate Visualization Techniques

• Software:– Xmdvtool

Matthew Ward Techniques designed for any number of variables

– Scatter plot matrices

– Parallel co-ordinates

– Glyph techniques

Acknowledgement:Many of images in followingslides taken from Ward’s work

http://davis.wpi.edu/~xmdv

Page 9: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.9

What are these?

Page 10: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.10

Multivariate Visualization

• Example of iris data set– 150 observations of 4

variables (length, width of petal and sepal)

– Check wikipedia for explanations of petals & sepals

– Techniques aim to display relationships between variables – the analytical task

Challenge in visualization is to design the visualization to match the analytical task

Page 11: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.11

Scatter Plot Matrices

Page 12: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.12

• For table data of M variables, we can look at pairs in 2D scatter plots

• The pairs can be juxtaposed:

A

B

C

CBA

With luck, you may spotcorrelations between pairsas linear structures… oryou may observe clusters

..

..

..

..

.

. . .

..

.

. . .

Scatter Plot Matrices

Page 13: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.13

Scatter Plot Matrix – Iris Data Set

Page 14: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.14

Scatter Plot Matrix – Car Data Set

Data represents7 aspects of cars:what relationshipscan we notice?

For example, what correlates with high MPG?

Page 15: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.15

A B C D E F

- create M equidistant vertical axes, each corresponding to a variable- each axis scaled to [min, max] range of the variable- each observation corresponds to a line drawn through point on each axis corresponding to value of the variable

Parallel Coordinates

Page 16: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.16

A B C D E F

- correlations may start to appear as the observations are plotted on the chart- here there appears to be negative correlation between values of A and B for example- this has been used for applications with thousands of data items

Parallel Coordinates

Page 17: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.17

Parallel Coordinates – Iris Data

Page 18: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.18

Parallel Coordinates Example

Detroit homicidedata7 variables13 observations

1961 -1973

Page 19: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.19

Parallel Coordinates

• Concept due to Alfred Inselberg

• Conceived the idea as a research student in 1959…

• … idea gradually refined over next 40 years

http://www.math.tau.ac.il/~aiisreal/

Page 20: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.20

Parallel Coordinates

• Parallel coordinates is a clever mechanism for transforming geometry from one space to another

• To get a handle on the idea, consider two variables X,Y

• In parallel coordinates, a point (X,Y) becomes… what?

• A line becomes… what?

• Why is the ordering of the axes important?

• Use this space to sketch the answers…

Page 21: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.21

The Screen Space Problem

• All techniques, sooner or later, run out of screen space

• Parallel co-ordinates– Usable for up to 150

variates– Unworkable greater than

250 variates

Remote sensing: 5 variates, 16,384 observations)

Page 22: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.22

Brushing as a Solution

• Brushing selects a restricted range of one or more variables

• Selection then highlighted

Page 23: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.23

Scatter Plot

Use of a‘brushing’ toolcan highlight subsets of data

..now we can seewhat correlateswith high MPG

Page 24: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.24

Parallel Coordinates

Brushing picksout the high MPGdata

Can you observethe same relationsas with scatterplots?

More or less easy?

Page 25: Envisioning Information Lecture 3 – Multivariate Data Exploration

ENV 2006 3.25

Parallel Coordinates

Here we highlighthigh MPG andnot 4 cylinders