Exemplar-based Visualization of Large Document Corpus Yanhua Chen, Lijun Wang, Ming Dong, and Jing...

8
Exemplar-based Visualization of Large Document Corpus Yanhua Chen, Lijun Wang, Ming Dong, and Jing Hua {chenyanh, ljwang, mdong, jinghua}@wayne.edu Department of Computer Science Wayne State University, Detroit, MI

Transcript of Exemplar-based Visualization of Large Document Corpus Yanhua Chen, Lijun Wang, Ming Dong, and Jing...

Page 1: Exemplar-based Visualization of Large Document Corpus Yanhua Chen, Lijun Wang, Ming Dong, and Jing Hua {chenyanh, ljwang, mdong, jinghua}@wayne.edu Department.

Exemplar-based Visualization of Large Document Corpus

Yanhua Chen, Lijun Wang, Ming Dong, and Jing Hua

{chenyanh, ljwang, mdong, jinghua}@wayne.edu

Department of Computer Science

Wayne State University, Detroit, MI

Page 2: Exemplar-based Visualization of Large Document Corpus Yanhua Chen, Lijun Wang, Ming Dong, and Jing Hua {chenyanh, ljwang, mdong, jinghua}@wayne.edu Department.

Overview

• Text Mining and Visualization• Current Visualization Systems

• Exemplar-based Visualization (EV)• Experiments and Results• EV Demo

Page 3: Exemplar-based Visualization of Large Document Corpus Yanhua Chen, Lijun Wang, Ming Dong, and Jing Hua {chenyanh, ljwang, mdong, jinghua}@wayne.edu Department.

Text Mining: Clustering Definition

• Given:• A source of textual

documents• Similarity measure

• e.g., how many words are common in these documents

ClusteringSystem

Similarity measure

Documentssource

Doc

DocDoc

Doc

Doc

DocDoc

Doc

Doc

Doc

• Find:• Several clusters of

documents that are relevant to each other

Page 4: Exemplar-based Visualization of Large Document Corpus Yanhua Chen, Lijun Wang, Ming Dong, and Jing Hua {chenyanh, ljwang, mdong, jinghua}@wayne.edu Department.

Current Visualization Systems

• Text Visualization: select the representation of selected features of complex multi-dimensional data to display in a logical layout (2-D or 3-D) and understand the relationship between documents

• IN-SPIRE• Infosky

Page 5: Exemplar-based Visualization of Large Document Corpus Yanhua Chen, Lijun Wang, Ming Dong, and Jing Hua {chenyanh, ljwang, mdong, jinghua}@wayne.edu Department.

Exemplar-based Visualization (EV)

Data

Low-rank Approximation

Exemplar-based Clustering

X~

X

Visualization by Parameter Embedding

G

1.

2.

3.

• Challenges :• Preserve original

relationship from multi-dimensional to low-dimensional:

Accuracy ?• Large scale document:

Efficiency ?• Layout overlap:

Exemplar ?

Page 6: Exemplar-based Visualization of Large Document Corpus Yanhua Chen, Lijun Wang, Ming Dong, and Jing Hua {chenyanh, ljwang, mdong, jinghua}@wayne.edu Department.

Experiments and Results

Visualization of 20,000 Medical Articles

Page 7: Exemplar-based Visualization of Large Document Corpus Yanhua Chen, Lijun Wang, Ming Dong, and Jing Hua {chenyanh, ljwang, mdong, jinghua}@wayne.edu Department.

Exemplar-based Visualization Demo

Page 8: Exemplar-based Visualization of Large Document Corpus Yanhua Chen, Lijun Wang, Ming Dong, and Jing Hua {chenyanh, ljwang, mdong, jinghua}@wayne.edu Department.

Reminder

Title:

Exemplar-based Visualization of Large Scale Document Corpus

Session:

Text Visualization

Time:

10:30am-12:10pm

Friday, 16 October