Multidimensional data processing. x 1G [x 1G, x 2G ] x 2G.

32
IPIAC Multidimensional data processing

Transcript of Multidimensional data processing. x 1G [x 1G, x 2G ] x 2G.

IPIAC

Multidimensional data processing

Parallel Coordinates

• orthogonal system uses up the plane very fast• geometrical transformation

• unlike the before mentioned methods • has other uses, than just visualization

• low representational complexity – scatter plot array has

• equidistant parallel axes• same positive orientation• each one has different scale – no normalization is

performed• values need not to be numeric

Fundamental duality

• point-line duality• a point in is represented by a (polygonal) line in

projective plane • a line in is represented by a point in projective

plane • is defined by • is distance between parallel axes, directed, but

otherwise arbitrary• for the line is parallel with slope • a plane in is represented by lines

Fundamental duality

x1G

[x1G, x2G]x2G

Fundamental duality

Still fundamental duality

Still fundamental duality

||-coords properties• designed to take advantage of human pattern

recognition abilities• when exploring dataset with M items, there are

possible subsets• any of which may be interesting

• each variable is treated uniformly• no theoretical/conceptual limit

• requires interactivity• no filtering and/or projection is applied• projection may hide information

Query types – pinch

• select intervals of different variables• combine the limiting intervals together

• look for• holes, peaks, valleys, gaps• density variations• regularities and irregularities

• interesting for negative correlations

Query types – angle query

• select lines with a given angle in ||-coords space

• point lies• between for , → negative correlation• right to for → positive correlation• left to for → positive correlation

Variable order

• unfortunately ||-coords are dependent on the ordering of variables

• unlike with scatter plots combinations, only adjacent combinations need to be tested

• represented a by a Hamiltonian path• N = 2M (even) or N = 2M + 1 (odd) permutations

are required• that is, the number of combinations which

need to be tested is

Variable combinations

Parallel coordinates

• uses different geometry• needs a mind shift• data mining• offers much more than

just data mining

Good visualizations

• preserve information – dataset may be fully reconstructed from the visualization

• reveal multivariate relations• treat each variable uniformly• are not limited by number of dimensions• have low complexity – low computational cost

of constructing the visualization• are invariant to translation, rotation and scaling• have mathematical/algorithmic background –

ensure unambiguity

Sparkline (2004)

• typically small intense line chart• without axes, coordinates, frames• shows only important information (trend)

• word-sized, graphic is no longer separated from text

Gapminder (2005)

• originally moving bubble chart• moving bar chart • moving line chart

• designed to show variable changes over time• acquired by Google in 2007

• available as Google Motion Chart• part of Google Chart Tools

• https://google-developers.appspot.com/chart/

• http://www.gapminder.org/

Conclusion

• visualizations are no longer passive images • interactivity enable us to create completely

new types of visualizations• it’s not just mouse-over text

• it is still important to maintain properties of good visualizations• otherwise it may become useless• although visually pleasant

• is pie chart dead?