Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based...

64
Multidimensional Projections and Tree-based Techniques for Visualization and Mining Rosane Minghim [email protected] Closing Conference: Statistical and Computational Analytics for Big Data Dalhousie University – Jun 12 2015

Transcript of Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based...

Page 1: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Multidimensional Projections and Tree-based Techniques for

Visualization and Mining

Rosane [email protected]

Closing Conference: Statistical and Computational Analytics for Big DataDalhousie University – Jun 12 2015

Page 2: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Where

2

Page 3: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Our research

•Visual Data Mining

•Visual Analytics

•Visualization

•Applications

3

Page 4: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Techniques• Point placement: 2D or 3D similarity-based layouts

pairwise distances and/or dimensional embedding(feature space)

7

8 2 11 10 7 12 5 12 15 10 0

0 12 1 12 12 5 1 7 11 12 12

10 0 12 12 9 12 0 10 12 12 8

1 12 05 12 12 16 2 12 9 12 0

12 10 2 12 1 12 12 11 6 0 12

6 12 05 17 12 10 12 12 9 12 8

12 12 7 12 0 12 0 12 10 12 12

2 10 05 15 12 1 12 10 9 8 2

7 12 05 0 12 12 10 17 9 12 12

5 6 8 12 12 15 12 6 9 17 0

10 12 0 11 10 2 7 12 2 16 7

12 8 05 12 12 12 8 12 9 12 12

0 12 01 12 9 0 12 10 5 5 12

0 1 05 10 15 12 8 12 9 11 5

12 5 0 12 12 12 12 12 18 12 12

5 12 15 2 7 5 0 12 9 0 8

Page 5: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Techniques• Projection-based

• variations on MDS or other dimension reduction approaches• data mapped to low-dimensional visual space• preserving distances vs neighborhoods, global vs. local control,

segregation

• fully interactive manipulation, dynamically adapting to user feedback

• massive data, sparse high-dimensional data, streaming data

• Tree-based• hierarchy of similarity relations• variations on tree layouts

8

Page 6: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Techniques• Projection-based

• PCA• MDS• Sammon Mapping

• Glimmer (distance)– Stephen Ingram, Tamara Munzner, Marc Olano: Glimmer:

Multilevel MDS on the GPU. IEEE Trans. Vis. Comput. Graph. 15(2): 249-261 (2009)

• T-sne (segregation)– L.J.P. van der Maaten and G.E. Hinton. Visualizing High-

Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579-2605, 2008.

– http://lvdmaaten.github.io/tsne/

9

Page 7: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

LSP• Paulovich, Nonato, Minghim, Levkowitz, Least square projection: A

fast high-precision multidimensional projection technique and its application to document mapping, IEEE Trans. Visualization and Computer Graphics 2008

• based on identifying samples (control points) and their neighborhoods• distance matrices & spatially embedded data• preserves data neighborhoods• few thousand data items

10

Page 8: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

PLMP• Paulovich, Silva, Nonato, Two-Phase Mapping for Projecting Massive

Data Sets, IEEE Trans. Visualization and Computer Graphics, 2010

• spatially embedded data, more samples than dimensions• millions of data items• time varying and streaming data• reduced amount of distance information

11

Page 9: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

PLP• Paulovich, Eler, Poco, Botha, Minghim, Nonato, Piecewise Laplacian-

based Projection for Interactive Data Exploration and Organization, Computer Graphics Forum 2011

• local control• flexibility in handling user interaction: users may change the layout

based on previous knowledge/perception of similarity

12

Page 10: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

NJ & PNJ Trees• Cuadros, Paulovich, Minghim, Telles, Point placement by

phylogenetic trees and its application to visual analysis of document collections, IEEE VAST 2007.

• Paiva, Florian-Cruz, Pedrini, Telles, Minghim, Improved Similarity Trees and their Application to Visual Data Classification, IEEE Trans. Visualization and Computer Graphics, 2011

14

Page 11: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Applications

15

Eler, Nakazaki, Paulovich, Santos, Andery, Oliveira, Batista Neto, Minghim, Visual analysis of image collectionsThe Visual Computer, 2009

Exploratory visualization of

• images• text: news, scientific papers, web

search results• sensor measurements• volumetric data: vector, scalar• social networks

• neural fibers • particle trajectories• time series

Page 12: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Social Networks

16

Co-authorship

Eurovis 2000-2010

betweeness Keyword ´tree´

Martins, Andery, Heberle, Lopes, Pedrini, Minghim; Multidimensional Projections for Visual Analysis of Social Networks (to appear), JCST 2012

Page 13: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Data from nanotech sensors & biosensors

17

Moraes et. al, Detection of glucose and triglycerides using information visualization methods to process impedance spectroscopy data, Sensors & Actuators B, 2012

finding good sensor configurations: segregation tasks on data

Page 14: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Data from nanotech sensors & biosensors

• Volpati et. al, Toward the optimization of an e-tongue system using information visualization: a case study with perylene tetracarboxylic derivative films in the sensing units, Langmuir, 2012

• Paulovich et al., Information visualization techniques for sensing and biosensing, Analyst, 2011

• Paulovich et al., Using multidimensional projection techniques for reaching a high distinguishing ability in biosensing. Analytical and Bioanalytical Chemistry, 2011

• Siqueira Jr. et al., Strategies to optimize biosensors based on impedance spectroscopy to detect phytic acid using layer-by-layer films, Analytical Chemistry, 2010

• Perinoto et al., Biosensors for efficient diagnosis of leishmaniasis: innovations in bioanalytics for a neglected disease, Analytical Chemistry, 2010

18

Page 15: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

On studies on ecology and environment

• D.Sc. project: Visual exploration of feature spaces to support green algae taxonomic classification

• Classification based on features from imagens & other sources• Collaboration with Dr. Armando Vieira, Department of Biology,

UFSCar• Time-varying images, feature extraction, representation and

analysis

19

Page 16: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Open problems• Metaphors: user interface, scalability, user control...

• Handling text• Handling time-varying data• Going small: portable devices• Growing large: scalability issues•

• Evaluation: user perception, quantitative & qualitative metrics

• Applications, reaching out to users: understanding their needs & tuning to specific profiles and application domains

20

Page 17: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Metaphors: clutter

21

Paulovich and Minghim, HiPP: a novel hierarchical point placement strategy and its application to the exploration of document collections, IEEE Trans. Visualization & Computer Graphics, 2008

Poco; Etedmapour, Paulovich, Long, Rosenthal, Oliveira, Linsen, Minghim. A framework for exploring multidimensional data with 3D projections, Computer Graphics Forum, Eurovis 2011.

Page 18: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

22

Metaphors: clutter

Ersoy, Hurter, Paulovich, Cantareira, Telea, Skeleton-based edge bundling for graph visualization. IEEE Trans. Visualization and Computer Graphics, Infovis 2011

Page 19: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Metaphors: text, time-varying ...

23

Alencar, Paulovich, Börner, Oliveira, Time-aware visualization of document collections, ACM SAC 2012, MMV track

Page 20: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Open problems• Metaphors: user interface, scalability, user control...

• Handling text• Handling time-varying data• Going small: portable devices• Growing large: scalability issues•

• Evaluation: user perception, quantitative & qualitative metrics

• Development software platform•

• Recent developments and current work•

24

Page 21: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Vispipeline

25

vicg.icmc.usp.br

Page 22: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Evaluation• Numerical Evaluation

• Distance Preservation

• Neighborhood Preservation

• Segregation

• User Understanding

• Visual Explanations

vicg.icmc.usp.br

26

Page 23: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Visual Coordinated explorationRule-based Topic Trees

30

Page 24: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

More Applications – Word clouds• Semantical ordering of keywords from projected points (new)• Semantical filling of polygons over projections (new)

31

Paulovich, Telles, Toledo, Minghim, Nonato – Semantic Wordification of Document Collections, Computer Graphics Forum, Eurovis 2012•

Page 25: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

More Applications – Fiber Tracking• Projection from fiber features• Interaction through fast and reconfigurable projections (LAMP)• Lines, Tubes and Surface Views

32

Poco, Eler, Paulovich, Minghim - Employing 2D projections for fast visual exploration of large fiber tracking data, Computer Graphics Forum, Eurovis 2012.•

Page 26: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Plus

33

• Scalability• Multiscale• Understanting of feature spaces• Time-varying volumes• More evaluation• Many more applications (molecular interactions, genome)• Change of visual layouts• Visual classification of images and other data

vicg.icmc.usp.br

Page 27: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

MINING MEETS VISUALIZATION

Techniques and applications

Visual strategies to support data analysis/mining tasks

Problems regarding scale of data sets

Page 28: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Visual Data Mining• Dimension Reduction

• Clustering Visualization

• Labeling

• Classification: sample selection, model creation and application, evolution of models

• Cooperation UNICAMP (Campinas), UFU (Uberlândia) and UFMG (Belo Horizonte)

35

Page 29: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

36

Semi – Supervised Dimensional Reduction using PLS• Sampling Procedure

PAIVA, J. G. S.; SCHWARTZ, W. R.; PEDRINI, H.; MINGHIM, R.. Semi-Supervised Dimensionality Reduction based on Partial Least Squares for Visual Analysis of High Dimensional Data. Computer Graphics Forum (Print) (2012).

Page 30: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Sampling Procedure• Automatic Clustering Sampling

37

Page 31: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Layout Evaluation - NEWS

40

Original – 3731 dimensions

Reduced

Page 32: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Layout Evaluation - NEWS

41

NJ Tree

ISOMAP

Page 33: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Visual Improvement - NEWS

42

PLS – OAA

PLS – MultiClass

Page 34: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

PLS Model Reuse• Mapping of any size of data set bearing the same features

• Fast model loading

• Growing data set mapping• Points groups kept in same regions of the layout: Mental Model

Maintenance

43

Page 35: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

44

ALL Data Set – 796 Instances

Page 36: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

45

ALL Data Set – 1237 Instances

Page 37: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

46

ALL Data Set – 1520 Instances

Page 38: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

47

ALL Data Set – 1968 Instances

Page 39: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

48

ALL Data Set – 2402 Instances

Page 40: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

49

ALL Data Set – 2814 Instances

Page 41: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

50

796 1520

2402 2814

Page 42: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

51

Visualization for Classification

• User: important role in building, applying and adjusting classifiers• Knowledge of the problem• Insertion of the classification process

• Insertion may be more effective: better data sets presentation• Data set structure and instances relationship understanding• Detection of specificities that justify classifiers behaviors

Page 43: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Task: Classification of Unlabeled Data set

53

Page 44: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Similarity Organization

54

Page 45: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Similarity Organization

55

Page 46: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Selection of Representative Instances

56

Instances selected to train classification model

Page 47: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Classification using Created Model

57

Page 48: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Classification Results

58

Page 49: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Evaluation: Classification Results

Page 50: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

60

Evaluation: Classification Results

Page 51: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

61

Evaluation: Classification Results

Page 52: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

62

Evaluation: Classification Results

Page 53: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

63

Classification Model Upgrading• Several upgrade strategies: Layout also works as a guide

• Example: relabeling of strategic instances: adjustment to specific scenarios

• Successive iterations: classifiers adaptation• Insertion of user knowledge on the classification model• Convergence to desired results

Page 54: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

64

Misclassified Instances Relabeling

Page 55: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

65

Misclassified Instances Relabeling

Page 56: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Classification Model Upgrading

66

ClassificationModel

Page 57: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Reclassification - Upgraded Model

67

Page 58: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Reclassification - Upgraded Model

68

Page 59: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Current Work: handling scalability?The Visual Super Tree

69

Page 60: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Partners & collaborators• Guilherme P. Telles, Hélio Pedrini IC-UNICAMP• William Schwartz, UFMG• Danilo Medeiros Eler, UNESP

• Lars Linsen, Jacobs University, Germany• Alexandru Telea, University of Groningen, the Netherlands• Charl Botha, T.U. Delft, the Netherlands• Haim Levkowitz, University of Massachusetts Lowell, USA• Evangelos Milios & team – Dalhousie U., Canada• Stan Matwin – Big Data Institute – Dalhousie U., Canada

• Osvaldo Novais de Oliveira Jr., IFSC-USP, and nBioNet research network http://www.ifsc.usp.br/nbionet/

• Armando Vieira, Biology Department, UFSCar

71

Page 61: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

72

The people

Fernando V. PaulovichICMC

Hélio PedriniUNICAMP

William Robson SchwartzUFMG

Guilherme P. Telles

UNICAMP

José Gustavo S. PaivaUFU

Luis Gustavo NonatoICMC

Maria Cristina F. OliveiraICMC

Page 62: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Some more people73

Renato Oliveira

HenryHeberle

SoniaCastelo

CarlosZampieri

RafaelMartins

Fábio Rolli

Page 63: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

Funding (current)• CAPES / DAAD PROBRAL• CAPES / NUFFIC• FAPESP / CALDO• CAPES 04/CII-2008, Network NANOBIOTEC-Brasil• FAPESP – student grants• CNPQ personal grants / student grants• CNPq Universal 2012-2016

Thanks!!!!

74

Page 64: Multidimensional Projections and Tree-based Techniques for ... · Techniques • Projection-based • variations on MDS or other dimension reduction approaches • data mapped to

VICG Group

vicg.icmc.usp.br (or http://vis.icmc.usp.br/vicg)

Papers: http://vis.icmc.usp.br/vicg/papers

Software mentioned in this talk soon to be made available at: http://vis.icmc.usp.br/vicg/software

Thanks!!!!