Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process
description
Transcript of Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process
-
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet ProcessChong Wang and David M. BleiNIPS 2009Discussion led by Chunping WangECE, Duke UniversityMarch 26, 2010
-
OutlineMotivations LDA and HDP-LDASparse Topic Models Inference Using Collapsed Gibbs samplingExperimentsConclusions1/16
-
Motivations2/16 Topics modeling with the bag of words assumption An extension of the HDP-LDA model In the LDA and the HDP-LDA models, the topics are drawn from an exchangeable Dirichlet distribution with a scale parameter . As approaches zero, topics will be sparse: most probability mass on only a few terms less smooth: empirical counts dominant Goal: to decouple sparsity and smoothness so that these two properties can be achieved at the same time. How: a Bernoulli variable for each term and each topic is introduced.
-
LDA and HDP-LDA3/16LDAHDP-LDAtopic : document : word : topic : document : word : Nonparametric form of LDA, with the number of topics unbounded Base measureweights
-
Sparse Topic Models4/16The size of the vocabulary is VDefined on a V-1-simplexDefined on a sub-simplex specified by : a V-length binary vector composed of V Bernoulli variables one selection proportion for each topicSparsity: the pattern of ones in , controlled bySmoothness: enforced over terms with non-zero s throughDecoupled!
-
Sparse Topic Models5/16
-
Inference Using Collapsed Gibbs sampling6/16
-
Inference Using Collapsed Gibbs sampling6/16As in the HDP-LDA Topic proportions and topic distributions are integrated out.
-
Inference Using Collapsed Gibbs sampling6/16 Topic proportions and topic distributions are integrated out. The direct-assignment method based on the Chinese restaurant franchise (CRF) is used for and an augmented variable, table counts As in the HDP-LDA
-
Inference Using Collapsed Gibbs sampling7/16Notation: : # of customers (words) in restaurant d (document) eating dish k (topic) : # of tables in restaurant d serving dish k : marginal counts represented with dots K, u: current # of topics and new topic index, respectively : # of times that term v has been assigned to topic k : # of times that all the terms have been assigned to topic k conditional density of under the topic k given all data except
-
Inference Using Collapsed Gibbs sampling8/16Recall the direct-assignment sampling method for the HDP-LDA Sampling topic assignments
if a new topic is sampled, then sample , and let and and Sampling stick length Sampling table counts
-
Inference Using Collapsed Gibbs sampling8/16Recall the direct-assignment sampling method for HDP-LDA Sampling topic assignments
for HDP-LDAfor sparse TMInstead, the authors integrate out for faster convergence. Since there are total possible , this is the central computational challenge for the sparse TM.straightforward
-
Inference Using Collapsed Gibbs sampling9/16wheredefinevocabularyset of terms that have word assignments in topic kThis conditional probability depends on the selector proportions.
-
Inference Using Collapsed Gibbs sampling10/16
-
Inference Using Collapsed Gibbs sampling10/16
-
Inference Using Collapsed Gibbs sampling11/16 Sampling Bernoulli parameter ( using as an auxiliary variable)
Sampling hyper-parameters : with Gamma(1,1) priors : Metropolis-Hastings using symmetric Gaussian proposal Estimate topic distributions from any single sample of z and bdefineset of terms with an on b sample conditioned on ; sample conditioned on .sparsitysmoothness on the selected terms
-
Experiments12/16 arXiv: online research abstracts, D = 2500, V = 2873 Nematode Biology: research abstracts, D = 2500, V = 2944 NIPS: NIPS articles between 1988-1999, V = 5005. 20% of words for each paper are used. Conf. abstracts: abstracts from CIKM, ICML, KDD, NIPS, SIGIR and WWW, between 2005-2008, V = 3733.Four datasets:Two predictive quantities: where the topic complexity
-
Experiments13/16better perplexity, simpler modelslarger : smootherless topics similar # of terms
-
Experiments14/16
- Experiments15/16small (
- Experiments15/16small (
- Experiments15/16small (
- Experiments15/16Infrequent words populate noise topics.small (
-
Conclusions16/16 A new topic model in the HDP-LDA framework, based on the bag of words assumption; Main contributions: Decoupling the control of sparsity and smoothness by introducing binary selectors for term assignments in each topic; Developing a collapsed Gibbs sampler in the HDP-LDA framework. Held out performance is better than the HDP-LDA.