Information Visualization 2 Lecture Outline

50
Information Visualization

description

 

Transcript of Information Visualization 2 Lecture Outline

Page 1: Information Visualization 2 Lecture Outline

Information Visualization

Page 2: Information Visualization 2 Lecture Outline

2

Lecture Outline Overview of information visualization The role of visualization in the process of

data mining The patterns being sought: clusters and

outliers Issues when visualizing higher

dimensional relationships Criteria for comparison A range of visualization techniques for

exploratory data analysis

Page 3: Information Visualization 2 Lecture Outline

3

Information Visualization

A conjunction of a number of fields: Data Mining Cognitive Science Graphic Design Interactive Computer Graphics

Page 4: Information Visualization 2 Lecture Outline

4

Information Visualization

Information Visualization attempts to use visual approaches and dynamic controls to provide understanding and analysis of multidimensional data

The data may have no inherent 2D or 3D semantics and may be abstract in nature.

There is no underlying physical model. Much of the data in databases is of this

type

Page 5: Information Visualization 2 Lecture Outline

5

Role of Information Visualization Acts as an exploratory tool Useful for identifying subsets of the data Structures, trends and outliers may be

identified Statistical tests tend to incorporate

isolated instances into a broader model as they attempt to formulate global features

There is no requirement for an hypothesis, but the techniques can also support the formulation of hypotheses if wanted

Page 6: Information Visualization 2 Lecture Outline

6

Integrating Visualization WithData Mining There are four possible approaches:

Use the visualization technique to present the results of the data mining process.

Use visualization techniques as complements to the data mining process.

They complement and increase understanding in a passive way.

Page 7: Information Visualization 2 Lecture Outline

7

Integrating Visualization WithData Mining

Use visualization techniques to steer the data mining process.

The visualization aids in deciding the appropriate data mining technique to use and appropriate subsets of the data to consider.

Apply data mining techniques to the visualization rather than directly to the data.

The idea is to capture the essential semantics visually then apply the data mining tools.

Page 8: Information Visualization 2 Lecture Outline

8

The Process of Knowledge Discovery in Databases (a.k.a. Data Mining)

Information Requirement

DataSelection

Cleaning & Enrichment

Coding Data mining Reporting

-domain consistency

- clustering

- segmentation-de-duplication

- prediction-disambiguation

Action

Feedback

Operational data External data

The Knowledge Discovery in Databases (KDD) process (AdZ1996)

Page 9: Information Visualization 2 Lecture Outline

9

Visualization in the Context of the Data Mining Process

Visualization tools can potentially be used at a number of steps in the DM process. But: the same tools may not be

appropriate at each step how they will be used may be

different

Page 10: Information Visualization 2 Lecture Outline

10

In general, it is not important whether data visualization is the first step in the process or not the feedback loop which moves the

process forward may be commenced by either a visualization or a query

Visualization in the Context of the Data Mining Process

Page 11: Information Visualization 2 Lecture Outline

11

some visualizations, (e.g. see slide 25) require an initial query to generate a visualization this is an example of a complementary approach

questions generate visualizations, which may prompt further questions or generate hypotheses

Visualization in the Context of the Data Mining Process

Page 12: Information Visualization 2 Lecture Outline

12

Motivations for Visualization

The human visual system is extremely good at recognizing patterns it is quicker and easier to understand

visual representations than to absorb information from language or formal notations.

Exploratory visualization assists in: identifying areas of interest identifying questions which might usefully

be asked

Page 13: Information Visualization 2 Lecture Outline

13

Motivations for Visualization

i.e. a relevant or revealing visualization of either part or all of a data set, may suggest useful questions and/or hypotheses to the analyst. These can then be confirmed by more rigorous approaches e.g. some clustering techniques require an

initial estimate of the number of clusters present in the data

visualization techniques can assist in this estimation

Page 14: Information Visualization 2 Lecture Outline

14

Criteria for Comparison of Visualization Tools Number of dimensions that can be

represented Number of data items that can be

handled Ability to handle categorical and other

non-numeric data types Ability to reveal patterns Ease of use Learning Curve (to what degree is the

technique intuitive)

Page 15: Information Visualization 2 Lecture Outline

15

Examples - Scatterplot Each pair of features (i.e. fields of

records) in a multidimensional database is graphed as a point in two dimensions (2D) This straightforward graphing

procedure produces a simple scatterplot - a projection of the multidimensional data into 2D

Page 16: Information Visualization 2 Lecture Outline

16

Examples - Scatterplot The scatterplots of all pair-wise

combinations of features are arranged in a matrix The figure on the following slide illustrates a

scatter plot matrix of 3D from a study of abrasion loss in tyres. The features are hardness, tensile-strength, abrasion-loss [Tie1989]

Each “sub-graph” gives insight into the relationship between a pair of features

Page 17: Information Visualization 2 Lecture Outline

17

Scatterplot Matrix

Scatterplot matrix of abrasion loss data [Tie1989]

Page 18: Information Visualization 2 Lecture Outline

18

Possible Problems With Scatterplots Everitt [Eve78, p. 5] gives two reasons

why scatter plots can prove unsatisfactory: if number of features is greater than ~10, the

number of plots to be examined is very large this is just as likely to lead to confusion as to

knowledge of the structures in the data. structures existing in multidimensional data

set do not necessarily appear in the 2D projections of the features represented in scatterplots (see next slide)

Page 19: Information Visualization 2 Lecture Outline

19

Possible Problems With Scatterplots Despite these potential problems,

variations on the scatterplot approach are the most commonly used of all the visualization techniques

Page 20: Information Visualization 2 Lecture Outline

20

Scatterplots: Recognizing High-dimensional Structures - 1

A structure which appears as a cluster in a 2D projection may in fact be a “pipe” in 3D a pipe is a structure in 3D that looks like

a rod or pipe when viewed in a 3D representation

Page 21: Information Visualization 2 Lecture Outline

21

Scatterplots: Recognizing High-dimensional Structures - 1

While the pipe is easily identifiable in a 3D display only projections of it will appear in the 2D components of the scatterplot matrix depending of the orientation of the pipe

in 3D, it may not appear as an obvious cluster, if at all

Page 22: Information Visualization 2 Lecture Outline

22

Scatterplots: Recognizing High-dimensional Structures - 1

Equivalent structures can exist in higher dimensions, e.g. a cluster in 5D might be a “pipe” in 6D the appearance of high-D structures in

lower-D projections depends on the luck and skill of the analyst in choosing the projections, and on the alignment of the structures to the axes

Page 23: Information Visualization 2 Lecture Outline

23

Scatterplots: recognizing high-dimensional structures - 2

Random(Uniform) May be a plane in 3D

A cluster in 2D May be a pipe in 3D(or a cluster in 3D)

Page 24: Information Visualization 2 Lecture Outline

24

Example Tool: Spotfirehttp://www.spotfire.com/

Page 25: Information Visualization 2 Lecture Outline

25

Example Tool: Spotfirehttp://www.spotfire.com/

The user interacts with data by choosing which features will form the horizontal and vertical axes

Other features can be represented by color this is an example of using the richness of

visual representations to provide more information to the user. As well as 2D spatial position, other modes such as colour, size, shape and even sound can be used to convey information about high-dimensional data

Page 26: Information Visualization 2 Lecture Outline

26

Example Tool: Spotfirehttp://www.spotfire.com/

On the previous slide, the data set contains a 3D cluster

The cluster can seen, with its centre at around (20, 74) all the points in the cluster are red,

showing that it’s a 3D cluster

Page 27: Information Visualization 2 Lecture Outline

27

Example Tool: DBMinerhttp://www.dbminer.com/

Page 28: Information Visualization 2 Lecture Outline

28

Example Tool: DBMinerhttp://www.dbminer.com/

DBMiner is an integrated data mining tool

It employs a data visualization known as a “data cube” (see On-Line Analytic Processing - OLAP)

Page 29: Information Visualization 2 Lecture Outline

29

Example Tool: DBMinerhttp://www.dbminer.com/

After creating a data cube, user can apply a variety of data mining techniques to analyze the data further, including: association, classification, prediction and

clustering, etc. The figure on the preceding slide

shows a data cube for a data set which has 3D cluster of data instances in a 3D space

Page 30: Information Visualization 2 Lecture Outline

30

Examples: Parallel Coordinates - 1 Uses the idea of mapping a point in

a multidimensional feature space on to a number of parallel axes

Each feature is mapped one axis as many axes as need can be lined up

side to side there is no limit to the number of

dimensions that can be represented

Page 31: Information Visualization 2 Lecture Outline

31

Examples: Parallel Coordinates - 1 A single polygonal line connects

the individual coordinate mappings for each point

The technique has been applied in air traffic control, robotics, computer vision and computational geometry

Page 32: Information Visualization 2 Lecture Outline

32

Examples: Parallel Coordinates - 2

Parallel axes for RN. The polygonal line shown represents the point C= (C1, .... , C i-1, Ci, Ci+1, ... , Cn)

C1Cn

X1 X2 X3 Xi-1 Xn

Ci-1

Ci-1

Ci

Page 33: Information Visualization 2 Lecture Outline

33

Examples: Parallel Coordinates - 3 The Parallel Coordinates

visualization technique is employed in the software WinViz http://www.computer.org/intelligent/ex1996/x5069abs.htm

The main advantage of the technique is that it can represent unlimited numbers of dimensions

Page 34: Information Visualization 2 Lecture Outline

34

Examples: Parallel Coordinates - 3 When many points are represented

using the parallel coordinates, the overlap of the polygonal lines can make it difficult to identify structures in the data.

Certain structures, such as clusters, can often be identified but others are hidden due to the overlap.

Page 35: Information Visualization 2 Lecture Outline

35

Two Clusters In WinViz

Page 36: Information Visualization 2 Lecture Outline

36

Examples: Stick Figures The stick figure technique is intended to

make use of the user’s low-level perceptual processes [PGL1995], such as perception of: texture, color, motion, and depth

The hope is that the user will “automatically” try to make physical sense of the pictures of the data created

Page 37: Information Visualization 2 Lecture Outline

37

Examples: Stick Figures Visualizations which represent

multidimensional feature spaces by using a number of subspaces of 3D or less (e.g. scatterplots) rely more on our cognitive abilities than our perceptual abilities

Stick figures avoid this, and present all variables and data points in a single representation.

Page 38: Information Visualization 2 Lecture Outline

38

Iconographic display using stick figures - US Census Datahttp://ivpr.cs.uml.edu/gallery/

Page 39: Information Visualization 2 Lecture Outline

39

Page 40: Information Visualization 2 Lecture Outline

40

Page 41: Information Visualization 2 Lecture Outline

41

Page 42: Information Visualization 2 Lecture Outline

42

Examples: Pixel-based techniqueshttp://www.dbs.informatik.uni-muenchen.de/dbs/projekt/visdb/visdb.html

Query-Dependent Pixel-based Techniques based on a query, a “semantic distance” is

calculated between each of the query feature values and the features of each instance in the DB

Distance is mapped to colour for each attribute

Overall distance between the data values for a specific instance and the data attribute values used in the predicate of the query is also calculated

Page 43: Information Visualization 2 Lecture Outline

43

Instances are arranged on the screen, with the data items with highest relevance in the centre of the display, and then proceeding outwards in a spiral

the values for each of the attributes are presented in separate subwindows

the arrangement inside the subwindows is according to the overall distance

Examples: Pixel-based techniqueshttp://www.dbs.informatik.uni-muenchen.de/dbs/projekt/visdb/visdb.html

Page 44: Information Visualization 2 Lecture Outline

44

Query-Dependent Pixel-based Techniques

Result of a complex query [KeK1994]

Overall Distance

Page 45: Information Visualization 2 Lecture Outline

45

Examples: Worlds within Worldshttp://www.cs.columbia.edu/graphics/projects/AutoVisual/AutoVisual.html

Employs virtual reality devices to represent an nD virtual world in 3D or 4D-Hyperworlds basic approach to reducing the complexity of a

multidimensional function is to hold one or more of its independent variables constant

equivalent to taking an infinitely thin slice of the world perpendicular to the constant variable’s axis

can be repeated until there are 3 dimensions and the resulting slice can be manipulated and displayed with conventional 3D graphics hardware

Page 46: Information Visualization 2 Lecture Outline

46

After reducing the higher-dimensional space to 3 dimensions the additional dimensions can be added back, by adding additional 3D worlds within the first 3D world

Examples: Worlds within Worldshttp://www.cs.columbia.edu/graphics/projects/AutoVisual/AutoVisual.html

Page 47: Information Visualization 2 Lecture Outline

47

Worlds within Worlds

Page 48: Information Visualization 2 Lecture Outline

48

Dynamic Techniques Allow interaction with the visualization to

explore the data more effectively. Can potentially be applied to all visualization techniques Dynamic linking of the data attributes to the

parameters of the visualization. Filtering Linking and “brushing” between multiple

visualizations Zooming Details on demand

Page 49: Information Visualization 2 Lecture Outline

49

Other Techniques Keim and Kriegel’s query independent

approach Chernoff faces

http://www.fas.harvard.edu/~stats/Chernoff/Hcindex.htm

Cone trees Perspective walls Visualization Spreadsheet A number of techniques especially

developed for web pages and their links

Page 50: Information Visualization 2 Lecture Outline

50

Web References More lectures and demo software

available at: http://www.cs.auc.dk/·DVDM/

courses.html