Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and...

24
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang ECE, Duke University March 26, 2010

Transcript of Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and...

Page 1: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Chong Wang and David M. BleiNIPS 2009

Discussion led by Chunping Wang

ECE, Duke University

March 26, 2010

Page 2: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Outline

• Motivations

• LDA and HDP-LDA

• Sparse Topic Models

• Inference Using Collapsed Gibbs sampling

• Experiments

• Conclusions

1/16

Page 3: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Motivations

2/16

• Topics modeling with the “bag of words” assumption

• An extension of the HDP-LDA model

• In the LDA and the HDP-LDA models, the topics are drawn from an exchangeable Dirichlet distribution with a scale parameter . As approaches zero, topics will be

o sparse: most probability mass on only a few terms

o less smooth: empirical counts dominant

• Goal: to decouple sparsity and smoothness so that these two properties can be achieved at the same time.

• How: a Bernoulli variable for each term and each topic is introduced.

Page 4: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

LDA and HDP-LDA

3/16

wzθα β

ND K

LDA

u

)(Dir~ αθd

)(Mult~ ddiz θ

)(Mult~dizdiw β

)(Dir~ uβ k

wzθα β

ND

HDP-LDA

u),(DP~ αθ d

)(GEM~ α

)(Dir~ uβ k

)(Mult~ ddiz θ

)(Mult~dizdiw β

topic : k

document : d

word : i

topic : k

document : d

word : i

Nonparametric form of LDA, with the

number of topics unbounded

Base measure

weights

Page 5: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Sparse Topic Models

4/16

)(Dir~ uβ

The size of the vocabulary is V

)(Dir~ bβ

Defined on a V-1-simplex Defined on a sub-simplex specified by b

b : a V-length binary vector composed of V Bernoulli variables

)(Bern~ kkvb ),(Beta~ srk one selection proportion for each topic

Sparsity: the pattern of ones in , controlled by

Smoothness: enforced over terms with non-zero ’s through

kkb

kvb Decoupled!

Page 6: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Sparse Topic Models

5/16

Page 7: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Inference Using Collapsed Gibbs sampling

6/16

Page 8: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Inference Using Collapsed Gibbs sampling

6/16

As in the HDP-LDA

Topic proportions and topic distributions are integrated out. θ β

Page 9: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Inference Using Collapsed Gibbs sampling

6/16

Topic proportions and topic distributions are integrated out.

The direct-assignment method based on the Chinese restaurant franchise (CRF) is used for and an augmented variable, table counts

θ β

As in the HDP-LDA

α,z m

Page 10: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Inference Using Collapsed Gibbs sampling

7/16

Notation:

: # of customers (words) in restaurant d (document) eating dish k (topic)

: # of tables in restaurant d serving dish k

: marginal counts represented with dots

K, u: current # of topics and new topic index, respectively

: # of times that term v has been assigned to topic k

: # of times that all the terms have been assigned to topic k

conditional density of under the topic k given all data except

dkn

dkm

..... ,,, mmmn kdd

)(vkn

(.)kn

)},'',:,{|()|( '''''' diidkzzwvwpvwf idididdidiw

kdi

diw diw

Page 11: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Inference Using Collapsed Gibbs sampling

8/16

Recall the direct-assignment sampling method for the HDP-LDA

Sampling topic assignments

if a new topic is sampled, then sample , and let and and

Sampling stick length

Sampling table counts

),,,(Dir~| .1. Kmm mα

ukwf

kwfnkzp

diw

uu

diw

kkdidkdidi

di

di

if)(

usedpreviouslyif)()(),,|( ,

αmz

newk

mkdk

dkk

kdkdk mns

nmmp ))(,(

)(

)(),,|(

αmz

Page 12: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Inference Using Collapsed Gibbs sampling

8/16

Recall the direct-assignment sampling method for HDP-LDA

Sampling topic assignments

ukwf

kwfnkzp

diw

uu

diw

kkdidkdidi

di

di

if)(

usedpreviouslyif)()(),,|( ,

αmz

)(

,)( vdikdi

wk nvwf difor HDP-LDA

kvv

dikkdiw

k bnvwf di )()|( )(,

bfor sparse TM

Instead, the authors integrate out for faster convergence. kb

k

dikidididkkkdikkdi

wk diidkzzwpvwpdvwf

b

bβββ )},'',:,{|,()|()|( ''''''

Since there are total possible , this is the central computational challenge for the sparse TM.

kbV2

straightforward

Page 13: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Inference Using Collapsed Gibbs sampling

9/16

where

define vocabularyset of terms that have word assignments in topic k

kBv

kvbX

This conditional probability depends on the selector proportions.

Page 14: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Inference Using Collapsed Gibbs sampling

10/16

Page 15: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Inference Using Collapsed Gibbs sampling

10/16

Page 16: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Inference Using Collapsed Gibbs sampling

11/16

Sampling Bernoulli parameter ( using as an auxiliary variable)

Sampling hyper-parameters o : with Gamma(1,1) priorso : Metropolis-Hastings using symmetric Gaussian proposal

Estimate topic distributions from any single sample of z and b

kb

define set of terms with an “on” b

o sample conditioned on ;o sample conditioned on .

kb

kbk

k

,

β

sparsity

smoothness on the selected terms

Page 17: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Experiments

12/16

arXiv: online research abstracts, D = 2500, V = 2873

Nematode Biology: research abstracts, D = 2500, V = 2944

NIPS: NIPS articles between 1988-1999, V = 5005. 20% of words for each paper are used.

Conf. abstracts: abstracts from CIKM, ICML, KDD, NIPS, SIGIR and WWW, between 2005-2008, V = 3733.

Four datasets:

Two predictive quantities:

where the topic complexity kk Bcomplexity

Page 18: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Experiments

13/16

better perplexity, simpler models

larger : smoother

less topics

similar # of terms

Page 19: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Experiments

14/16

Page 20: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Experiments

15/16

small (<0.01)

Page 21: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Experiments

15/16

small (<0.01)

lack of smoothness

( )

(.)ˆ

vk

kvk

n

n V

Page 22: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Experiments

15/16

small (<0.01)

Need more topics to explain all kinds of patterns of empirical word counts

lack of smoothness

( )

(.)ˆ

vk

kvk

n

n V

Page 23: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Experiments

15/16

Infrequent words populate “noise” topics.

small (<0.01)

Need more topics to explain all kinds of patterns of empirical word counts

lack of smoothness

Vn

n

k

vk

kv

(.)

)(

Page 24: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Conclusions

16/16

A new topic model in the HDP-LDA framework, based on the “bag of words” assumption;

Main contributions:

• Decoupling the control of sparsity and smoothness by introducing binary selectors for term assignments in each topic;

• Developing a collapsed Gibbs sampler in the HDP-LDA framework.

Held out performance is better than the HDP-LDA.