Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation...

19
Visualisation Learning from Data: Visualisation Amos Storkey, School of Informatics October 13, 2005 http://www.anc.ed.ac.uk/amos/lfd/ Amos Storkey, School of Informatics Learning from Data: Visualisation

Transcript of Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation...

Page 1: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Learning from Data: Visualisation

Amos Storkey, School of Informatics

October 13, 2005

http://www.anc.ed.ac.uk/∼amos/lfd/

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 2: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Visualisation

I Presented with new data you shouldI Try to acquire knowledge of how it was created.I Visualize the data to see what is in it.

I Visualization is important for understanding what issuesthere might be with the data, and what forms ofassumptions are going to be invalid.

I Visualization is an informal assessment of the data usinghigh level modelling concepts.

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 3: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Histogram

I What sort of distribution does each attribute of the datahave?

I Is it normal?I Does it have heavy tails?I Is it skewed?I Does it cluster?

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 4: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Histogram

−10 −5 0 50

200

400

600

800

1000

1200

1400

1600

I Pretty much Gaussian.I Tails a bit heavy?

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 5: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Histogram

−3.5 −3 −2.5 −2 −1.5

x 104

0

5000

10000

15000

I Very skewed.I Some sort of cutoff.I This is the magnitude of objects in a particular region of

sky - more faint objects than very bright ones, but there is adetection limit.

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 6: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Plots

I Plot one attribute against anotherI Look at the joint distributionI Do they appear dependent or independent?I Are they uniform, peaked, clustered?

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 7: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Plots

−5 0 5

−6

−4

−2

0

2

4

I Some dependenceI Positive correlationI Centred around mean, cannot rule out Gaussianity.

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 8: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Plots

−2.9 −2.85 −2.8 −2.75 −2.7

x 104

0

0.5

1

1.5

2

2.5

3

3.5x 10

4

I Plot of magnitude (x axis) against area on sky (y axis).I Definite dependenceI Two clusters, Some outliers

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 9: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Class Conditional Plots

0 2000 4000 6000 8000 100000

2000

4000

6000

8000

10000

I Major elliptical axis (x) versus minor elliptical axis (y) forstars (red) and galaxies (blue).

I Galaxies are more likely to be highly elliptical.I Two axes definitely related.

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 10: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Class Conditional Plots

−3 −2.5 −2 −1.5

x 104

0

2

4

6

8

10

12

14x 10

7

I Magnitude (x) versus peak pixel brightness (y)- stars (red),galaxies (blue).

I Stars are more “point like” than galaxies. Higher peakbrightness for given magnitude.

I Already have two relevant measures for classifying starsand galaxies.

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 11: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Anomaly Detection

I Pay attention to outliers, or unusually high peaks inhistograms.

I Restrict the data set to those outliers or peaks.I Attempt to see if there is a potential explanation for themI You might need to remove outliers to do other analysis

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 12: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Anomaly Detection

0 50 100 1500

200

400

600

800

1000

1200

I Histogram of galaxy orientationI A few peculiar peaks.I Restrict data to those within the peak regions.

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 13: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Anomaly Detection

0.5 1 1.5 2 2.5 3

x 107

0.5

1

1.5

2

2.5

3

x 107

I Plot of x-position versus y-position on photograph.I A long line of points. Wouldn’t expect that of galaxies.I In fact a satellite track which the detection program has

mis-recognised.

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 14: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Covariance Visualisation

I Try to get some idea regarding what variables aredependent.

I Especially appropriate with approximately Gaussian data.I Do we have positive or negative correlations?I Is there some structure to the correlations?

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 15: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Covariance Visualisation

I Correlation visualisation via Hinton DiagramI Consecutive pairs correlated.I Some blockiness.

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 16: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

PCA

I What are the main linear components in the data set. Dothey capture any intrinsic concepts?

I Are there any clusters if the data are plotted onto theprincipal components?

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 17: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Multi-Attribute Plots

I Turn continuous atrributes into classes. Plot using differentcolours.

I Eg C1 (x > 0, y > 0), C2 (x < 0, y < 0), C3(x > 0, y < 0), C4 (x < 0, y > 0).

I Plot other attributes conditioned on class.I Look for class conditional differences.

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 18: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Multi-Attribute Plots

−8 −6 −4 −2 0 2 4

−6

−4

−2

0

2

4

I Some variation with colour change.

Amos Storkey, School of Informatics Learning from Data: Visualisation

Page 19: Learning from Data: Visualisation - The University of Edinburgh · 2005-10-13 · Visualisation Visualisation I Presented with new data you should I Try to acquire knowledge of how

Visualisation

Summary

I Do not fail to look at your data.I Real data is always messy. There are going to be things

which mess it up.I Try to identify dependencies.I How can you reduce the data size: PCA? Ignore

components? Ignore data points?

Amos Storkey, School of Informatics Learning from Data: Visualisation