J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

24
“Visualizing the Non- Visual: Spatial Analysis and Interaction with Information from Text Documents” J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow Proceedings of Infoviz’95 Reviewed by Nada Golmie for CMSC 838S Fall 1999

description

Document Visualization “Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents”. J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow Proceedings of Infoviz’95. Reviewed by Nada Golmie for CMSC 838S Fall 1999. - PowerPoint PPT Presentation

Transcript of J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

Page 1: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

Document Visualization

“Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents”J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M.

Pottier, A. Schur, and V. Crow Proceedings of Infoviz’95

Reviewed by

Nada Golmie

for CMSC 838S

Fall 1999

Page 2: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 2

Outline• Document visualization:

– What? Why? How?• Examples for 1D, 2D visualizations:

– vector space analysis (Salton 1995)– reduced text + interaction (Eick, 1992)– 2D maps of document collections (Lin, 1992)

• 3D Visualization: SPIRE(Wise et. al. 1995)

• 3D + Time: Interactive Landscapes(Rennison, 1994)

Page 3: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 3

Document Visualization• Document visualization is an important IV

application due to emerging technology trends:– World Wide Web– Digital Libraries– Communication Advances

• Mapping a text document: – Understand the content of a document.

• Mapping a collection of documents:– Discover relationships among documents.

Page 4: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 4

Vector Space Analysis (Salton et. al.)• Support of free-form text queries in IR.• Text passages are mapped into a vector of

terms in high dimensional space:

where is the weighted assigned to term k in document .

• Given document and query a similarity computation is computed as:

)d,...,d,(dD iki2i1i

ikd

iD

iD jQ

jk

t

1kikji dd)Q,sim(D

Page 5: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 5

Reduced Text + Interaction: SeeSoft (Eick, 1992)

• Reduced representation– display of lines as rows, files as columns

(max 900 rows per column)• Colors are used to display statistics

– statistics include: age, programmer, feature, type of line, number of times the line was executed

• Direct manipulation techniques– find interesting patterns– capability to read actual code using magnification

Page 6: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 6

SeeSoft (Eick, 1992)

Page 7: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 7

2D Maps (Lin, 1992)• Framework for information retrieval:

– mapping of high dimensional document space into 2D map.– document relationships are explored using visual cues such

as: dots, links, clusters, and areas.• Neural network self-organizing learning algorithm

based on Kohonen’s feature map:– preserves distance relationships between input data.– allocates different numbers of nodes to inputs based on

their occurrence frequencies.• Sitemap

Page 8: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 8

Visual Text Analysis: SPIRE SPIRE (Spatial Paradigm for Information

Retrieval and Exploration) is a software that allows users:– to explore complex relationships between text

documents. – to rapidly discover known and hidden

information relationships by reading only the pertinent documents rather than wading through large volumes of text.

Page 9: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 9

Applications• SPIRE was originally developed for the U.S.

intelligence community.• Other potential applications include:

– environmental assessment– market analysis – corporations researching competitive products, – health care providers searching patient records,– or attorneys reading through previous cases.

Page 10: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 10

2D Scatterplot: Galaxies• Galaxies computes word similarities and

patterns in documents and then displays the documents on a computer screen to look like a universe of "docustars”:– closely related documents will cluster together in

a tight group.– unrelated documents will be separated by large

spaces.

Page 11: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 11

Galaxies

Page 12: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 12

3D Landscapes: Themescapes• Themes within the document spaces appear

on the computer screen as a relief map of natural terrain:– mountains in Themescapes indicate where

themes are dominant; – valleys indicate weak themes. – shapes reflect how the thematic information is

distributed and relate across documents. • Themes close in content will be close

visually based on the many relationships within the text spaces.

Page 13: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 13

Themescapes

Page 14: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 14

Visualization Transformations• Definition of text: written form of natural language.• Text conversion to spatial form: algorithms & processes. • Meaningful visualizations: mathematical procedures and analytical

measures.• Database management:store and manage text and its derivative

forms.

Page 15: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 15

Processing Text Requirements• Identification and extraction of text features:

– frequency-based measures of words – higher order statistics taken on words: occurrence,

frequency, context of individual words are used to characterize defined word classes.

– Semantic approaches using natural language understanding.

• Efficient and flexible representation of documents in terms of text features.

• Support of information retrieval and visualization.

Page 16: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 16

Visual Output of Text Processing• Vector representation of document in high

dimensional feature space.– Comparisons, filters, transformations can be applied

• Projection onto 2-3D visualization– dimensionality reduction– scaling– clustering in high dimension feature space and

centroids of clusters are fed into layout algorithms (principal component analysis or multidimensional scaling)

Page 17: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 17

Interface Design• Three display types:

– Backdrop: central display resource.– Workshop: grid with resizable windows to hold

multiple views.– Chronicle: space where views are placed and linked to

form a visual story.• Tools provided to allow more in-depth analysis:

point and click, grouping, annotation, query, subset, temporal slicing.

Page 18: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 18

Screenshot

Page 19: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 19

Favorite Sentences“The bottleneck in the human processing and

understanding of information in large amounts of text can be overcome if the text is spatialized in a manner that takes advantage of common powers of perception.”

“So much has already been written about everything that you can’t find out anything about it”. James Thurber (1961).

Page 20: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 20

Contributions• Effective use of physical metaphors such as

night sky and landscape to provide overview visualization on the collection of documents:– helps answer simple questions about the database

• Discussion on processing text for visualization.

• Platform includes integrated tools and techniques for text manipulation and analysis.

Page 21: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 21

Critique• How to measure the effectiveness of the

visualization in discovering relationships and answering detailed questions about the documents:– may depend on the ease of interaction– need to verify claim: “discovering in 35 minutes what

would have taken two weeks otherwise”.• There could be cluttering and occlusion resulting

from layout algorithms (complex for large collections of documents)

• Clustering may reduce feature sensitivity from individual documents.

Page 22: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 22

Other Comments• Agree with the need to create visual tools to aid

cognitive skills, however skeptical about statement:“And the limitations of Information Age will not be

set by the speed with which human mind can read”:• Paper contains too many sound biting sentences

and buzz words which could be distracting:“fluid environment for reflective cognition and

higher-order thought”

Page 23: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 23

Galaxy of News: Interactive Landscapes

• Parse content to extract key information• Build an associative relation network• Classify elements into hierarchies• Sort peer elements spatially and temporally• Construct visual information space based on

classified elements• Dynamic response to visual interaction

Page 24: J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow

04/22/23 Document Visualization 24

Galaxy of News: Summary

• Use of motion to visualize relationships among documents.

• Documents have no fixed position in space – associative relation network built dynamically– fixed positioning of categories

• Space constructed is based on conceptual abstract metaphors (galaxies) and could have any dimensions.