UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul...

25
UTOPIAN: U ser-Driven Top ic Modeling Based on I ntera ctive N onnegative Matrix Factorization Jaegul Choo 1* , Changhyun Lee 1 , Chandan K. Reddy 2 , and Haesun Park 1 1 Georgia Institute of Technology, 2 Wayne State University *e-mail: [email protected]

Transcript of UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul...

Page 1: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization

Jaegul Choo1*, Changhyun Lee1, Chandan K. Reddy2, and Haesun Park1

1Georgia Institute of Technology, 2Wayne State University

*e-mail: [email protected]

Page 2: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Intro: Topic Modeling

genedna genetic lifeevolve organismbrain neuronnerve

Document 1 Document 2 Document 3 Document 4

Page 3: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Intro: Topic Modeling

Topic: a distribution over keywords

genedna genetic lifeevolve organismbrain neuronnerve

Document 1 Document 2 Document 3 Document 4

Topic 1 Topic 2 Topic 3

Page 4: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Intro: Topic Modeling

Topic: a distribution over keywords

Document :

a distribution over topic

Topic 1 Topic 2 Topic 3

genedna genetic lifeevolve organismbrain neuronnerve

Document 1 Document 2 Document 3 Document 4

Page 5: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Latent Dirichlet Allocation (LDA) in Visual Analytics

• LDA has been widely used in visual analytics. • TIARA [Wei et al. KDD10], iVisClustering [Lee et al. EuroVis12], ParallelTopics

[Dou et al. VAST12], TopicViz [Eisenstein et al. CHI-WIP12], …

*Image courtesy of original papers.

Page 6: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

• Proposes nonnegative matrix factorization (NMF) for topic modeling.• Highlights advantages of NMF over LDA in visual analytics.• Presents UTOPIAN, an NMF-based interactive topic modeling system.

Topic merging

Topic splittingDoc-induced topic

creation

Keyword-induced topic creation

Overview of Our Work

Page 7: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

What is Nonnegative Matrix Factorization?

Page 8: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Nonnegative Matrix Factorization (NMF)

Lower-rank approximation with nonnegativity constraints

Why nonnegativity?Easy interpretation and semantically meaningful output

AlgorithmAlternating nonnegativity-constrained least squares [Kim et al., 2008]

~=

min || A – WH ||F

W>=0, H>=0

A

H

W

Page 9: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

NMF as Topic Modeling~=

A

H

W

Topic: a distribution over keywords

Document :

a distribution over topic

Topic 1 Topic 2 Topic 3

genedna genetic lifeevolve organismbrain neuronnerve

Document 1 Document 2 Document 3 Document 4

H

W

Page 10: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Why NMF in Visual Analytics?

Page 11: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Advantages of NMF in Visual Analytics

• Reliable algorithmic behaviors• Flexible support for user interactions

Page 12: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

NMF vs. LDAConsistency from Multiple Runs

Documents’ topical membership changes among 10 runs

InfoVis/VAST paper data set 20 newsgroup data set

Page 13: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

NMF vs. LDAEmpirical Convergence

Documents’ topical membership changes between iterations

InfoVis/VAST paper data set

LDANMF

10 minutes48 seconds

Page 14: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

NMF vs. LDATopic Summary (Top Keywords)

Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7

NMFRun #1 visualization

designinformation

useranalysissystem

graphlayout

visualanalytics

datasets

colorweaving

Run #2 visualizationdesign

informationuser

analysissystem

graphlayout

visualanalytics

datasets

colorweaving

LDARun #1 documents

similaritiesknowledge

edgequery

collaborativesocialtree

measuresmultivariate

treeanimation

dimensionstreemap

Run #2 documentsquery

analystsscatterplot

spatialcollaborative

textdocuments

multidimensional, high

treeaggregation

dimensionstreemap

InfoVis/VAST paper data set

Topics are more consistent in NMF than in LDA. Topic quality is comparable between NMF and LDA.

Page 15: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Advantages of NMF in Visual Analytics

• Reliable algorithmic behaviors• Flexible support for user interactions

Page 16: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

min ||A – WH ||F2 + α||(W – Wr)MW ||F2 + β||MH(H – DHHr) ||F2

W>=0, H>=0

•Wr, Hr : reference matrices for W and H

•MW, MH : diagonal matrices for weighting/masking columns/rows of W and H

Provides flexible yet intuitive means for user interaction.

Maintains the same computational complexity as original NMF.

Weakly Supervised NMF [Choo et al., DMKD, accepted with rev.]

Page 17: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

UTOPIAN: User-Driven Topic Modeling Based on Interactive NMF

Topic merging

Topic splitting

Doc-induced topic creation

Keyword-induced topic creation

Page 18: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

UTOPIAN Overview

Topic merging

Topic splitting

Doc-induced topic creation

Keyword-induced topic creation

Supervised t-distributed stochastic neighbor embedding (t-SNE)

User interactions supported•Keyword refinement

•Topic merging/splitting

•Keyword-/document-induced

topic creation

Real-time interaction via

PIVE (Per-Iteration

Visualization Environment)

Page 19: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Original t-SNE•Documents are often too noisy to work with.

Supervised t-SNE

Supervised t-SNE

• d(xi, xj) ← α•d(xi, xj) if xi and xj belongs to the same topic cluster.

Page 20: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

PIVE (Per-Iteration Visualization Environment) for Real-time Interaction [Choo et al., under revision]

Standard approach

PIVE approach

Page 21: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Demo Videohttp://tinyurl.com/UTOPIAN2013

Page 22: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Usage Scenario: Hyundai Genesis Review Data

Initial result After interaction

Page 23: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Summary

• Presented UTOPIAN, a User-Driven Topic Modeling based on Interactive NMF.

• Highlighted the advantages of NMF over LDA in visual analytics. • Reliable algorithmic behaviors

• Consistency from multiple runs• Early empirical convergence

• Flexible support for user interactions• Keyword refinement• Topic merging/splitting• Keyword-/document-induced topic creation

Page 24: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

More in the paper & On-going Work

• A general taxonomy of user interactions with computational methods• Keyword-based vs. document-based• Template-based vs. from-scratch-based

• Algorithmic details about supported user interactions• Implementation details• More usage scenarios

On-going Work• Scaling up the system with parallel distributed NMF

Page 25: UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo 1*, Changhyun Lee 1, Chandan K. Reddy 2, and Haesun.

Topic merging

Topic splitting

Doc-induced topic creation

Keyword-induced topic creation

Thank you!http://tinyurl.com/UTOPIAN2013

For more details,

please find me at

‘Meet the Candidate’

A601+ A602,

6PM today

Jaegul [email protected]

http://www.cc.gatech.edu/~joyfull/