The Role of Visualization in Data Mining
Björn Gustafsson, Jonas Gustafsson and Ragnar Hammarqvist
Why use Information Visualization in Data Mining?
Information visualization lexicon definition:A method of presenting data or information in non traditional, interactive graphical forms. By using 2-D or 3-D color graphics and animation, these visualizations can show the structure of information, allow one to navigate through it, and modify it with graphical interactions.
Problem is finding valuable information hidden in raw data (monitoring systems, credit cards and so on).
Allows faster data exploration and generally provides a better result than automatic data mining algorithms.
The idea of visual exploration in data mining is to represent the data with visualization.
The Visual Exploration Paradigm
1. overview first
2. zoom and filter
3. details-on-demand
Visualization Techniques
Geometrically transformed displays
Iconic displays
Dense Pixel Displays
Stacked Displays
Geometrically transformed displays
Example: Parallel Coordinates
Iconic displays
Example: Chernoff faces and star icons.
Dense Pixel Displays
Example: Tree map.
Stacked Displays
Example: Table lens.
Interaction and Distortion
Dynamic Projections
Interactive filtering
Interactive distortion
Interactive Linking and Brushing
Classification
Data type to be visualizedVisualization techniqueInteraction and distortion
Example: (multi-dimensional, Iconic Display, Distortion)
Exploratory Data Analysis, EDA
Not hypothesis testing.
Find systematic relations between variables when there are no expectations of what the result might be.
Computational EDA and Graphical EDA techniques.
Computational EDA
Basic statistical exploratory
Multivariate exploratory analysisNeural Networks
Graphical EDA techniques
Brushing
Other techniques
Verification of results of EDA
Only a first stage of analysis
In a second stage the data needs to be confirmed
Visualizing Data Mining models
Extracting information from a data base that the user did not already know about.
To be able to do this we need to understand the user’s needs and design the visualization after that.
The two major driving forces behind visualizing data mining models are understanding and trust.
Understanding leads to trust.
Trust
The ways to assessing trust are many, for example:
Not violate expected qualitative principles when having a general knowledge of the domain. Example of violation: finding correlation between shoe size and IQ.
Domain knowledge is also critical for outlier detection. If you know that the domain is between the numbers 10 and 50, you can not put numbers outside it. It simply makes no sense.
Measure their trustworthiness in some way, such as a quantified measurement of variance.
Understanding
There are three components for understanding a model:
Representation
Interaction
Integration
Comparing methods
You can compare models in three approaches:
Input/output
Algorithms
Processes
Bad examples of Information Visualization
Why all the graphics?
Bad examples of Information Visualization
Sufficient and appropriate
Bad examples of Information Visualization
What does it display?
Bad examples of Information Visualization
This is what it tried to display (left), but it can still be distorted (right).
Bad examples of Information Visualization
Based on the assumption that happiness should be linearly related to GNP.
Current research ITN
VITA research:Goal: Improving existing visual user interfaces (VUI) methods.
Mission: Discover and create tools and technologies :1. Aid human analytical reasoning 2. State-of-the-art visual representations and interaction techniques3. Effectively communicate analytical understanding to a wide variety of users
Current research ITN
Example application: geovisualization application GeoWizard developed with the GAV framework
Current research ITN: Jimmy Johansson
Parallel Coordinates in Information Visualization
Transfer functions : linear, quadratic, square root and logarithmic.
Parallel Coordinates in 3D.
References
[1] Daniel Keim, Information Visualization andVisual Data Mining, IEEE transactions on visualization and computer graphics, vol. 7 No. 1 January-March 2002.
[2] Michael Friendly, “Gallery of Data Visualization”http://www.math.yorku.ca/SCS/Gallery/, April 2007.
[3] Kurt Thearling, Barry Becker among others, "Visualizing Data Mining Models" http://www.thearling.com/text/dmviz/modelviz.htm, April 2007.
[4] Statsoft, “Exploratory Data Analysis”http://www.statsoft.com/textbook/stdatmin.html#eda, April 2007.
[5] Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining, Addison-Wesley, 2006.
Top Related