Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet ProcessChong Wang and David M. BleiNIPS 2009Discussion led by Chunping WangECE, Duke UniversityMarch 26, 2010

OutlineMotivations LDA and HDP-LDASparse Topic Models Inference Using Collapsed Gibbs samplingExperimentsConclusions1/16

Motivations2/16 Topics modeling with the bag of words assumption An extension of the HDP-LDA model In the LDA and the HDP-LDA models, the topics are drawn from an exchangeable Dirichlet distribution with a scale parameter . As approaches zero, topics will be sparse: most probability mass on only a few terms less smooth: empirical counts dominant Goal: to decouple sparsity and smoothness so that these two properties can be achieved at the same time. How: a Bernoulli variable for each term and each topic is introduced.

LDA and HDP-LDA3/16LDAHDP-LDAtopic : document : word : topic : document : word : Nonparametric form of LDA, with the number of topics unbounded Base measureweights

Sparse Topic Models4/16The size of the vocabulary is VDefined on a V-1-simplexDefined on a sub-simplex specified by : a V-length binary vector composed of V Bernoulli variables one selection proportion for each topicSparsity: the pattern of ones in , controlled bySmoothness: enforced over terms with non-zero s throughDecoupled!

Sparse Topic Models5/16

Inference Using Collapsed Gibbs sampling6/16

Inference Using Collapsed Gibbs sampling6/16As in the HDP-LDA Topic proportions and topic distributions are integrated out.

Inference Using Collapsed Gibbs sampling6/16 Topic proportions and topic distributions are integrated out. The direct-assignment method based on the Chinese restaurant franchise (CRF) is used for and an augmented variable, table counts As in the HDP-LDA

Inference Using Collapsed Gibbs sampling7/16Notation: : # of customers (words) in restaurant d (document) eating dish k (topic) : # of tables in restaurant d serving dish k : marginal counts represented with dots K, u: current # of topics and new topic index, respectively : # of times that term v has been assigned to topic k : # of times that all the terms have been assigned to topic k conditional density of under the topic k given all data except

Inference Using Collapsed Gibbs sampling8/16Recall the direct-assignment sampling method for the HDP-LDA Sampling topic assignments

if a new topic is sampled, then sample , and let and and Sampling stick length Sampling table counts

Inference Using Collapsed Gibbs sampling8/16Recall the direct-assignment sampling method for HDP-LDA Sampling topic assignments

for HDP-LDAfor sparse TMInstead, the authors integrate out for faster convergence. Since there are total possible , this is the central computational challenge for the sparse TM.straightforward

Inference Using Collapsed Gibbs sampling9/16wheredefinevocabularyset of terms that have word assignments in topic kThis conditional probability depends on the selector proportions.

Inference Using Collapsed Gibbs sampling10/16

Inference Using Collapsed Gibbs sampling11/16 Sampling Bernoulli parameter ( using as an auxiliary variable)

Sampling hyper-parameters : with Gamma(1,1) priors : Metropolis-Hastings using symmetric Gaussian proposal Estimate topic distributions from any single sample of z and bdefineset of terms with an on b sample conditioned on ; sample conditioned on .sparsitysmoothness on the selected terms

Experiments12/16 arXiv: online research abstracts, D = 2500, V = 2873 Nematode Biology: research abstracts, D = 2500, V = 2944 NIPS: NIPS articles between 1988-1999, V = 5005. 20% of words for each paper are used. Conf. abstracts: abstracts from CIKM, ICML, KDD, NIPS, SIGIR and WWW, between 2005-2008, V = 3733.Four datasets:Two predictive quantities: where the topic complexity

Experiments13/16better perplexity, simpler modelslarger : smootherless topics similar # of terms

Experiments14/16

Experiments15/16small (

Experiments15/16Infrequent words populate noise topics.small (

Conclusions16/16 A new topic model in the HDP-LDA framework, based on the bag of words assumption; Main contributions: Decoupling the control of sparsity and smoothness by introducing binary selectors for term assignments in each topic; Developing a collapsed Gibbs sampler in the HDP-LDA framework. Held out performance is better than the HDP-LDA.

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Documents

Transcript of Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process