Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation....

12
Probabilistic Topic Models - Latent Dirichlet Allocation Machine Learning Reading Group March 2014 ML Reading Group Probabilistic Topic Models 1 / 12

Transcript of Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation....

Page 1: Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022, 2003. MLReadingGroup ProbabilisticTopicModels

Probabilistic Topic Models -Latent Dirichlet Allocation

Machine Learning Reading Group

March 2014

ML Reading Group Probabilistic Topic Models 1 / 12

Page 2: Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022, 2003. MLReadingGroup ProbabilisticTopicModels

Related Papers

D. Blei. Probabilistic topic models. Communications of the ACM,55(4):77-84, 2012.

D. Blei, A. Ng, M. Jordan. Latent Dirichlet allocation. Journal ofMachine Learning Research 3:993-1022, 2003.

ML Reading Group Probabilistic Topic Models 2 / 12

Page 3: Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022, 2003. MLReadingGroup ProbabilisticTopicModels

Motivation

Searching and exploring documents based on the themes that runthrough them

Rather than finding documents through keyword search alone, wemight first find the theme that we are interested in, and then examinethe documents related to that theme.

Probabilistic topic modeling: a suite of algorithms that aim todiscover and annotate large archives of documents with thematicinformation.

Note: these algorithms, sometimes in different names, are used forother data types (audio, image, video...)

ML Reading Group Probabilistic Topic Models 3 / 12

Page 4: Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022, 2003. MLReadingGroup ProbabilisticTopicModels

Topic Models (1)

Unigram model: the words of every document are drawnindependently from a single multinomial distribution

p(w) =∏

n

p(wn) (1)

ML Reading Group Probabilistic Topic Models 4 / 12

Page 5: Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022, 2003. MLReadingGroup ProbabilisticTopicModels

Topic Models (2)

Mixture of unigrams: each document is generated by first choosing atopic z and then generating N words independently from theconditional multinomial p(w |z)

p(w) =∑

z

p(z)∏

n

p(wn|z) (2)

ML Reading Group Probabilistic Topic Models 5 / 12

Page 6: Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022, 2003. MLReadingGroup ProbabilisticTopicModels

Topic Models (3)

Probabilistic latent semantic indexing (PLSI): it captures thepossibility that a document d may contain multiple topics

p(d ,wn) = p(d)∑

z

p(wn|z)p(z |d) (3)

ML Reading Group Probabilistic Topic Models 6 / 12

Page 7: Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022, 2003. MLReadingGroup ProbabilisticTopicModels

Latent Dirichlet Allocation (1)

Latent Dirichlet Allocation (LDA): expands PLSI by introducing priorson probability distributions

Better generalisability on new documents

ML Reading Group Probabilistic Topic Models 7 / 12

Page 8: Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022, 2003. MLReadingGroup ProbabilisticTopicModels

Latent Dirichlet Allocation (2)

Generative Process1 Randomly choose a distribution over topics2 For each word in the document

Randomly choose a topic from the distribution over topics in step 1Randomly choose a word from the corresponding distribution over thevocabulary

ML Reading Group Probabilistic Topic Models 8 / 12

Page 9: Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022, 2003. MLReadingGroup ProbabilisticTopicModels

Latent Dirichlet Allocation (3)

Graphical Model

ML Reading Group Probabilistic Topic Models 9 / 12

Page 10: Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022, 2003. MLReadingGroup ProbabilisticTopicModels

Latent Dirichlet Allocation (4)

Posterior Computation

Posterior is intractable - need to approximate it

Variational inference

MCMC - Gibbs sampling

ML Reading Group Probabilistic Topic Models 10 / 12

Page 11: Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022, 2003. MLReadingGroup ProbabilisticTopicModels

Latent Dirichlet Allocation (5)

Extensions

LDA can be exteded by relaxing some of the original assumptions

Bag-of-words: Not suitable for language generation

Solution: Integrating syntax

Bag-of-documents: Not suitable for chronologically ordereddocuments

Solution: Dynamic topic models

Number of topics: Assumed to be known and fixed

Solution: Bayesian nonparametric topic models

ML Reading Group Probabilistic Topic Models 11 / 12

Page 12: Probabilistic Topic Models - Latent Dirichlet Allocation€¦ · Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022, 2003. MLReadingGroup ProbabilisticTopicModels

Latent Dirichlet Allocation (6)

Open Issues

Evaluation and model checking: Interpretability over goodness of fit.

Visualization and user interfaces: Intuitive ways to visualise topics.

Data discovery : Seeking help of domain experts.

ML Reading Group Probabilistic Topic Models 12 / 12