FODAVA-Lead Education, Community Building, and Research: Dimension Reduction and Data Reduction:...

16
Education, Community Building, and Research: Dimension Reduction and Data Reduction: Foundations for Interactive Visualization Haesun Park School of Computational Science and Engineering Georgia Institute of Technology FODAVA Review Meeting, Dec. 9, 2010

Transcript of FODAVA-Lead Education, Community Building, and Research: Dimension Reduction and Data Reduction:...

FODAVA-Lead Education, Community Building, and

Research:Dimension Reduction and Data Reduction:Foundations for Interactive Visualization

Haesun ParkSchool of Computational Science and

EngineeringGeorgia Institute of Technology

FODAVA Review Meeting, Dec. 9, 2010

Challenges in Analyzing High Dimensional Massive Data

on Visual Analytics System

• Screen Space and Visual Perception: low dim and

number of available pixels fundamentally limiting constraints

• High dimensional data: Effective dimension reduction

• Large data sets: Informative representation of data

• Speed: necessary for real-time, interactive use

• Scalable algorithms

• Adaptive algorithms

Development of Fundamental Theory and Algorithms in Data Representations and Transformations to enable Visual Understanding

• Dimension Reduction • Dimension reduction with prior info/interpretability constraints• Manifold learning

• Informative Presentation of Large Scale Data• Sparse recovery by L1 penalty

• Clustering, semi-supervised clustering• Multi-resolution data approximation

• Fast Algorithms • Large-scale optimization/matrix decompositions• Adaptive updating algorithms for dynamic and time-varying data,

and interactive vis.

• Data Fusion • Fusion of different types of data from various sources• Fusion of different uncertainty level

• Integration with DAVA systems • Testbed, Jigsaw, iVisClassifier, iVisClustering, ..

FODAVA-Lead Research Topics

FODAVA-Lead Research Presentation• H. Park – Overview of the FODAVA-lead research, FODAVA Test-bed;

Two stage method for 2D/3D representation of clustered data, InteractiveVisualClassifier, InteractiveVisualClustering, Info space alignments for information fusion (multi-language document analysis)

• A. Gray – Nonlinear dimension reduction (manifold learning), Fast computation of neighborhood graphs, Fast optimizations for SVMs

• V. Koltchinskii – Low rank matrix estimation and kernel learning on graphs,Sparse recovery, Multiple kernel learning and fusion of datawith heterogeneous types (multi language document analysis)

• J. Stasko – Improved analytical capabilities in JIGSAW, Interplay between math/comp and interactive visualization

• R. Monteiro – Sparse Principal Component Analysis and Feature selection based on L1 regularized optimization (POSTER)

FODAVA Research Test Bed for High Dimensional Massive Data

• Open source software• Integrates foundational results from FODAVA teams as

well as other widely utilized methods (e.g. PCA)• Easily accessible to a wide community of researchers

• Makes methods/algorithms readily available to VA research community and relevant to applications• Identifies effective methods for specific problems (evaluation)• A base for specialized VA systems (e.g. iVisClassifier, iVisClustering)

FODAVAFundamentalResearch

ApplicationsApplications

Test BedTest Bed

Vector Rep. of

Raw Data

• Text

• Image

• Audio …

Informative Representation and

Transformation

Visual Representation

• Dimension Reduction (2D/3D)

• Temporal Trend

• Uncertainty

• Anomaly/Outlier

• Causal relationship

• Zoom in/out by dynamic updating …

• Clustering

• Summarization

• Regression

• Multi-Resolution Data Reduction

•Multiple Kernel Leaning …

Label

Similarity

Density

Missing value …

Interactive Analysis

0

1

2

34

5

6

7

8

9

Modules in FODAVA Test Bed

iVisClassifier [VAST10]

(J. Choo, H. Lee, J. Kim, HP)

Interactive visual classification system using supervised dimension reduction– Biometric recognition– Text classification– Search space reduction

iVisClustering(H. Lee, J. Kihm, J. Choo, J. Stasko, HP)

Interactive visual clustering system using topic modeling (LDA) for text clustering

Two-stage Linear Discriminant Analysis for2D/3D Representation of Clustered Data and

Computational Zooming in/out [VAST09, J. Choo, S. Bohn, HP]

max (GT Sb G) min (GT Sw

G)

&max trace ((GT SwG)-1 (GT Sb

G))

• Regularization in LDA

Small regularization Large regularization

2D Visualization of Clustered Image and Audio Data

Spoken Letters (Audio)Handwritten Digits (Image)

PCA

Rank-2 LDA

PCA

Rank-2 LDA

iVisClassifier: Computational Zoom-inLDA scatter plot, Cluster level PC, Bases view and Heat Map

Applying LDA recursively on the selected subset of data

iVisClassifier: Cooperative Filtering (Poster and Demo)

Utilizing brushing-and-linking

Fusion based on Information Space Alignment (J. Choo, S. Bohn, G. Nakamura, A. White, HP)

•Want: Unified vector representations of heterogeneous data sets•Utilize: Reference correspondence information between data pairs, cluster correspondence, etc.• Multi-lingual iVisClassifier

Two conflicitng criteria: maximize alignment and minimize deformation

Data set A (English) Data set B (Spanish) Fused data sets

Existing methods: Constrained Laplacian Eigenmap, Parafac2, Procrustes analysis, …

Graph Embedding Approach

1. Represent each data matrix as a graph2. Add zero-length edges between reference point pairs3. Apply graph embedding algorithm

Data sets Similarity graph

Fused dataMatrix representation of graphs

e.g., Nonmetric multidimensional scaling (preserving rank order of distances)

min ∑(dfA(i,j)-ḋA(i,j))2 + ∑(df

B(i,j)-ḋB(i,j))2 + µ∑(dfAB(r,r)-ḋAB(r,r))2

subject toḋAB(r,r)<ḋA(i,j), ḋAB(r,r)<ḋB(i,j) for 1 ≤ r ≤ R and i ≠ j, ḋ: rank orders

(POSTER)

Evaluation: Cross-domain RetrievalEnglish-Spanish Documents Document(Eng)-Phoneme Data

De

form

atio

n A

lign

me

nt

Parafac2 Nonmetric MDS Metric MDS Laplacian Eig.

Procrustes

K in K-NN in fused spaceK in K-NN in fused space

K in K-NN in fused space K in K-NN in fused space

Summary / Future Research• Informative 2D/3D Representation of Data

• Clustered Data: Two-stage dimension reduction methods effective for a wide range of problems• Interpretable Dimension Reduction for nonnegative data: NMF• Customized Fast Algorithms for 2D/3D Reduction needed• Dynamic Updating methods for Efficient and Interactive Visualization

• Visual Analytic Methods for Foundational Problems• Classification• Information Fusion by Space Alignment • Clustering

• Information Fusion via Space Alignment

• FODAVA Research Test bed and VA System Development

• Sparse methods with L1 regularization• Sparse Solution for Regression• Sparse PCA (with Renato Monteiro)