Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… ·...

29
Latent Dirichlet Allocation David Blei Andrew Ng Michael Jordan

Transcript of Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… ·...

Page 1: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Latent Dirichlet Allocation

David BleiAndrew Ng

Michael Jordan

Page 2: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Outlines

• Notation and assumption• Latent variable models: mixture of unigrams,

probabilistic latent semantic indexing, latent Dirichlet allocation.

• A geometric interpretation• Inference and estimation• Experimental results

Page 3: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Outlines

• Notation and assumption• Latent variable models• A geometric interpretation• Inference and estimation• Experimental results

Page 4: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Notation and Terminology

• A word is the basic unit of discrete data, defined to be an item w from a vocabulary indexed by {1,…,V}.

• A document is a sequence of N words denoted by w={w1,w2,…,wN}

• A corpus is a collection of M documents denoted by D={w1,w2,…wM}

Page 5: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Bag-of-words Assumption• Word order is ignored• “bag-of-words” – exchangeability• A finite set of random variables is said to be

exchangeable if the joint distribution is invariant to permutation. If π is a permutation of the integers from 1 to N:

• Theorem (De Finetti) – if are infinitely exchangeable, then the joint probabilityhas a representation as a mixture:

( )Nxxx ,,, 21 K),,,( 21 Nxxxp K

1 21

( , , , ) ( ) ( )N

N ii

p x x x p p x dθ θ θ=

= ∏∫K

},,{ 1 Nxx K

),,(),( )()1(1 NN xxpxxp ππ KK =

Page 6: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Outlines

• Notation and assumption• Latent variable models• A geometric interpretation• Inference and estimation• Experimental results

Page 7: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Mixture of Unigrams

• Each document exhibits only one topic

Page 8: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Probabilistic Latent Semantic Indexing

• Relaxes the assumption that each document is generated from only one topic

Page 9: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Probabilistic Latent Semantic Indexing

• p(d) is 0 for an unseen document.• pLSI learns the topic mixture weights p(z|d)

only for trained documents. It cannot assign probability to an unseen document. It is not a well defined generative model.

• p(z|d) needs kM parameters which is linearly grows with M – overfitting.

Page 10: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Latent Dirichlet Allocation

For each document:

Page 11: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Latent Dirichlet Allocation

• LDA doesn’t model documents d explicitly.• LDA doesn’t associate topics mixture weights

with each document, instead, it treats them as a k-parameter hidden random variable (θ), and builds a Dirichlet distribution over it.

• A k-topic LDA needs k+kV paprameters (kforα, kV for β), which doesn’t increase with M.

Page 12: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Outlines

• Notation and assumption• Latent variable models• A geometric interpretation• Inference and estimation• Experimental results

Page 13: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

A Geometric Interpretation

word simplex

Page 14: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

word simplex

topic 2

topic 1

topic 3

topic simplex

A Geometric Interpretation

Page 15: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

word simplex

topic 2

topic 1

topic 3

topic simplex

A Geometric Interpretation

Mixture of Unigram: For each document one of k topics is chosen and all the words of the document are drawn from the distribution corresponding to that point.

Page 16: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

A Geometric Interpretation

word simplex

topic 2

topic 1

topic 3

topic simplex

pLSI: Topic for each word is drawn from a document-specificdistribution over topics, i.e. a point on the topic simplex. One distribution for each document. The training set defines an empirical distribution on topic simplex

Page 17: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

A Geometric Interpretation

word simplex

topic 2

topic 1

topic 3

topic simplex

LDA: Each word of both the observed and unseen documents is generated by a randomly chosen topic which is drawn from a distribution with a randomly chosen parameter. This parameter is sampled once per document from a smooth distribution on the topic simplex

Page 18: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Outlines

• Notation and assumption• Latent variable models• A geometric interpretation• Inference and estimation• Experimental results

Page 19: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Inference and Estimation

• Given the parameters α and β, the joint distribution of a topic mixture θ, a set of Ntopics z, and a set of N words w is given by:

Page 20: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

• Summering over z and θ

Page 21: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Variational Inference

• Basic idea: make use of Jensen’s inequality to obtain an adjustable lower bound on the log likelihood.

• Introduce a family of distribution on the latent variables:

Page 22: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Variational Inference

• Obtain a lower bound for log p(w| α, β):

Page 23: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Variational EM Algorithm

• E-step: For each document, find the optimizing values of the variational parameters , by maximizing the lower bound of log p(w| α, β).

• M-step: Maximizing the resulting lower bound of Σ log p(w| α, β) to obtain the mode parameters α and β.

Page 24: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Outlines

• Notation and assumption• Latent variable models• A geometric interpretation• Inference and estimation• Experimental results

Page 25: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t
Page 26: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Document modeling

• Perplexity: How the model is “perplexed” by the data.

• The lower the better.

Page 27: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

Data sets

• C. Elegans Community abstracts– 5,225 abstracts– 28,414 unique terms

• TREC AP corpus (subset)– 16,333 newswire articles– 23,075 unique terms

• Held-out data – 10%• Removed terms – 50 stop words, words

appearing once (AP)

Page 28: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

nematode

Page 29: Latent Dirichlet Allocation - University of Minnesotabaner029/Teaching/Fall07/talks/Han… · Latent Dirichlet Allocation • LDA doesn’t model documents d explicitly. • LDA doesn’t

AP