UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul...
-
Upload
vanessa-holt -
Category
Documents
-
view
221 -
download
2
Transcript of UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul...
UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization
Jaegul Choo1*, Changhyun Lee1, Chandan K. Reddy2, and Haesun Park1
1Georgia Institute of Technology, 2Wayne State University
*e-mail: [email protected]
Intro: Topic Modeling
genedna genetic lifeevolve organismbrain neuronnerve
Document 1 Document 2 Document 3 Document 4
Intro: Topic Modeling
Topic: a distribution over keywords
genedna genetic lifeevolve organismbrain neuronnerve
Document 1 Document 2 Document 3 Document 4
Topic 1 Topic 2 Topic 3
Intro: Topic Modeling
Topic: a distribution over keywords
Document :
a distribution over topic
Topic 1 Topic 2 Topic 3
genedna genetic lifeevolve organismbrain neuronnerve
Document 1 Document 2 Document 3 Document 4
Latent Dirichlet Allocation (LDA) in Visual Analytics
• LDA has been widely used in visual analytics. • TIARA [Wei et al. KDD10], iVisClustering [Lee et al. EuroVis12], ParallelTopics
[Dou et al. VAST12], TopicViz [Eisenstein et al. CHI-WIP12], …
*Image courtesy of original papers.
• Proposes nonnegative matrix factorization (NMF) for topic modeling.• Highlights advantages of NMF over LDA in visual analytics.• Presents UTOPIAN, an NMF-based interactive topic modeling system.
Topic merging
Topic splittingDoc-induced topic
creation
Keyword-induced topic creation
Overview of Our Work
What is Nonnegative Matrix Factorization?
Nonnegative Matrix Factorization (NMF)
Lower-rank approximation with nonnegativity constraints
Why nonnegativity?Easy interpretation and semantically meaningful output
AlgorithmAlternating nonnegativity-constrained least squares [Kim et al., 2008]
~=
min || A – WH ||F
W>=0, H>=0
A
H
W
NMF as Topic Modeling~=
A
H
W
Topic: a distribution over keywords
Document :
a distribution over topic
Topic 1 Topic 2 Topic 3
genedna genetic lifeevolve organismbrain neuronnerve
Document 1 Document 2 Document 3 Document 4
H
W
Why NMF in Visual Analytics?
Advantages of NMF in Visual Analytics
• Reliable algorithmic behaviors• Flexible support for user interactions
NMF vs. LDAConsistency from Multiple Runs
Documents’ topical membership changes among 10 runs
InfoVis/VAST paper data set 20 newsgroup data set
NMF vs. LDAEmpirical Convergence
Documents’ topical membership changes between iterations
InfoVis/VAST paper data set
LDANMF
10 minutes48 seconds
NMF vs. LDATopic Summary (Top Keywords)
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7
NMFRun #1 visualization
designinformation
useranalysissystem
graphlayout
visualanalytics
datasets
colorweaving
Run #2 visualizationdesign
informationuser
analysissystem
graphlayout
visualanalytics
datasets
colorweaving
LDARun #1 documents
similaritiesknowledge
edgequery
collaborativesocialtree
measuresmultivariate
treeanimation
dimensionstreemap
Run #2 documentsquery
analystsscatterplot
spatialcollaborative
textdocuments
multidimensional, high
treeaggregation
dimensionstreemap
InfoVis/VAST paper data set
Topics are more consistent in NMF than in LDA. Topic quality is comparable between NMF and LDA.
Advantages of NMF in Visual Analytics
• Reliable algorithmic behaviors• Flexible support for user interactions
min ||A – WH ||F2 + α||(W – Wr)MW ||F2 + β||MH(H – DHHr) ||F2
W>=0, H>=0
•Wr, Hr : reference matrices for W and H
•MW, MH : diagonal matrices for weighting/masking columns/rows of W and H
Provides flexible yet intuitive means for user interaction.
Maintains the same computational complexity as original NMF.
Weakly Supervised NMF [Choo et al., DMKD, accepted with rev.]
UTOPIAN: User-Driven Topic Modeling Based on Interactive NMF
Topic merging
Topic splitting
Doc-induced topic creation
Keyword-induced topic creation
UTOPIAN Overview
Topic merging
Topic splitting
Doc-induced topic creation
Keyword-induced topic creation
Supervised t-distributed stochastic neighbor embedding (t-SNE)
User interactions supported•Keyword refinement
•Topic merging/splitting
•Keyword-/document-induced
topic creation
Real-time interaction via
PIVE (Per-Iteration
Visualization Environment)
Original t-SNE•Documents are often too noisy to work with.
Supervised t-SNE
Supervised t-SNE
• d(xi, xj) ← α•d(xi, xj) if xi and xj belongs to the same topic cluster.
PIVE (Per-Iteration Visualization Environment) for Real-time Interaction [Choo et al., under revision]
Standard approach
PIVE approach
Demo Videohttp://tinyurl.com/UTOPIAN2013
Usage Scenario: Hyundai Genesis Review Data
Initial result After interaction
Summary
• Presented UTOPIAN, a User-Driven Topic Modeling based on Interactive NMF.
• Highlighted the advantages of NMF over LDA in visual analytics. • Reliable algorithmic behaviors
• Consistency from multiple runs• Early empirical convergence
• Flexible support for user interactions• Keyword refinement• Topic merging/splitting• Keyword-/document-induced topic creation
More in the paper & On-going Work
• A general taxonomy of user interactions with computational methods• Keyword-based vs. document-based• Template-based vs. from-scratch-based
• Algorithmic details about supported user interactions• Implementation details• More usage scenarios
On-going Work• Scaling up the system with parallel distributed NMF
Topic merging
Topic splitting
Doc-induced topic creation
Keyword-induced topic creation
Thank you!http://tinyurl.com/UTOPIAN2013
For more details,
please find me at
‘Meet the Candidate’
A601+ A602,
6PM today
Jaegul [email protected]
http://www.cc.gatech.edu/~joyfull/