Latent Dirichlet Allocation (Nicolas Loeff)

39
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff) http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 1/39 Latent Dirichlet Allocation D. Blei, A. Ng, M. Jordan Includes some slides adapted from J.Ramos at Rutgers, M. Steyvers and M. Rozen-Zvi at UCI, L. Fei Fei at UIUC.

Transcript of Latent Dirichlet Allocation (Nicolas Loeff)

Page 1: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 1/39

Latent DirichletAllocationD. Blei, A. Ng, M. Jordan

Includes some slides adapted from J.Ramos at Rutgers, M.Steyvers and M. Rozen-Zvi at UCI, L. Fei Fei at UIUC.

Page 2: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 2/39

Overview

What is so special about text?Classification methodsLSIUnigram / Mixture of UnigramProbabilistic LSI (Aspect Model)LDA modelGeometric interpretation

Page 3: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 3/39

What is so special about text?

No obvious relation between featuresHigh dimensionality, (often larger vocabulary, V, than the number of features!)Importance of speed

Page 4: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 4/39

The need for dimensionality

reductionRepresentation:

Presenting documents as vectors in thewords space - ‘bag of words’ representationIt is a sparse representation, V>>|D|

A need to define conceptual closeness

Page 5: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 5/39

Bag of wordsOf all the sensory impressions proceeding to

the brain, the visual experiences are thedominant ones. Our perception of the worldaround us is based essentially on themessages that reach the brain from our eyes.For a long time it was thought that the retinalimage was transmitted point by point to visualcenters in the brain; the cerebral cortex wasa movie screen, so to speak, upon which the

image in the eye was projected. Through thediscoveries of Hubel and Wiesel we nowknow that behind the origin of the visualperception in the brain there is a considerablymore complicated course of events. Byfollowing the visual impulses along their pathto the various cell layers of the optical cortex,Hubel and Wiesel have been able todemonstrate that the message about theimage falling on the retina undergoes a step-wise analysis in a system of nerve cellsstored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.

sensory, brain,visual, perception,

retinal, cerebral cortex,eye, cell, opticalnerve, imageHubel, Wiesel

China is forecasting a trade surplus of $90bn(£51bn) to $100bn this year, a threefoldincrease on 2004's $32bn. The CommerceMinistry said the surplus would be created bya predicted 30% jump in exports to $750bn,compared with a 18% rise in imports to$660bn. The figures are likely to further annoy the US, which has long argued thatChina's exports are unfairly helped by adeliberately undervalued yuan. Beijingagrees the surplus is too high, but says theyuan is only one factor. Bank of Chinagovernor Zhou Xiaochuan said the countryalso needed to do more to boost domesticdemand so more goods stayed within thecountry. China increased the value of theyuan against the dollar by 2.1% in July and

permitted it to trade within a narrow band, butthe US wants the yuan to be allowed to tradefreely. However, Beijing has made it clear that it will take its time and tread carefullybefore allowing the yuan to rise further invalue.

China, trade,surplus, commerce,

exports, imports, US,yuan, bank, domestic,foreign, increase,

trade, value

Page 6: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 6/39

Bag of words

Order of words in document can beignored, only count matters.Probability theory: Exchangability(includes IID) (Aldous, 1985).Exchangable RVs have a representationas mixture distribution (de Finetti, 1990).

Page 7: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 7/39

What does this have to do with

Vision?ObjectObject Bag of Bag of

‘words’‘words’

Page 8: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 8/39

TF-IDF Weighing Scheme (Salton

and McGill, 1983)Given corpus D, word w , document d ,calculate w d = f w, d · log (| D|/f w, D )

Many varieties of basic scheme.

Search procedure:Scan each d , compute each w

i, d , return set

D’ that maximizes Σ i w i, d

Page 9: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 9/39

A Spatial Representation: Latent

Semantic Analysis (Deerwester, 1990)

Document/Term count matrix

1…

16…

0…

SCIENC E …

6190RESEAR CH

2012SOUL

3034LOVE

Doc3 …Doc2Doc1

SVD

High dimensional space,not as high as |V|

SOUL

RESEARCH

LOVE

SCIENCE

• Each word is a single point in semantic space (dimensionality reduction)• Similarity measured by cosine of angle between word vectors

Page 10: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 10/39

Feature Vector representation

From: Modeling the Internet and the Web: Probabilistic methodsand Algorithms , Pierre Baldi,Paolo Frasconi, Padhraic Smyth

Page 11: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 11/39

Classification: assigning words to

topicsDifferent models for data:

Discrete Classifier,modeling the boundariesbetween different classes of

the data

Prediction of

Categoricaloutput e.g., SVM

Density Estimator: modeling thedistribution of the data pointsthemselves

GenerativeModels e.g. NB

Page 12: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 12/39

Generative Models – Latent

semantic structure

Latent Structure

Words

∑= ),()(ww P P

Distribution over words

)()()|()|(

w

ww

P

P P P =

Inferring latent structure

Page 13: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 13/39

Topic Models

Unsupervised learning of topics (“gist”) of documents:

articles/chaptersconversationsemails.... any verbal context

Topics are useful latent structures to explainsemantic association

Page 14: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 14/39

Probabilistic Generative Model

Each document is a probability distributionover topics

Each topic is a probability distribution over words

Page 15: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 15/39

Generative Process

l o a n

TOPIC 1

m o n e y

l o a n

b a n k

m o n e

y

b a n

k

r i v e r

TOPIC 2

r i v e r

r i v e r

s t r e a m

b a n k

b a n k

s t r e a m

b a n k

l o a n

DOCUMENT 2: river 2 stream 2 bank 2 stream 2 bank 2 money 1 loan 1 river 2 stream 2 loan 1 bank 2 river 2 bank 2 bank 1 stream 2

river 2 loan 1 bank 2 stream 2 bank 2 money 1 loan 1 river 2 stream 2

bank 2 stream 2 bank 2 money 1 river 2 stream 2 loan 1 bank 2 river 2bank 2 money 1 bank 1 stream 2 river 2 bank 2 stream 2 bank 2

money 1

DOCUMENT 1: money 1 bank 1 bank 1 loan 1 river 2 stream 2

bank 1 money 1 river 2 bank 1 money 1 bank 1 loan 1 money 1

stream 2 bank 1 money 1 bank 1 bank 1 loan 1 river 2 stream 2 bank 1

money 1 river 2 bank 1 money 1 bank 1 loan 1 bank 1 money 1

stream 2

.3

.8

.2

Mixturecomponents

Mixtureweights

Bayesian approach: use priorsMixture weights ~ Dirichlet( α )Mixture components ~ Dirichlet( β )

.7

Page 16: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 16/39

Vision: Topic = Object categories

Page 17: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 17/39

Simple Model: Unigram

Words of document are drawn IID from asingle multinomial distribution:

Page 18: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 18/39

Unigram Mixture Model

First choose topic z , then generate wordsconditionally independent given topic.

Page 19: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 19/39

Unigram Mixture Model

First choose topic z , then generate wordsconditionally independent given topic.

Page 20: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 20/39

Unigram Mixture Model

First choose topic z , then generate wordsconditionally independent given topic.

Page 21: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 21/39

Probabilistic Latent Semantic

Indexing (Hoffman, 1999)Document d in training set, and word w n areconditionally independent given topic.

Not truly generative (dummy r.v. d ). Number of parameters grows with size of corpus(overfitting).Document may contain several topics.

Page 22: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 22/39

Vision app.: Sivic et al., 2005

wN

d z

D

“face”

Page 23: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 23/39

LDA

Page 24: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 24/39

LDA

Page 25: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 25/39

LDA

Page 26: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 26/39

LDA

Page 27: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 27/39

LDA

Page 28: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 28/39

Vision app.: Fei Fei Li, 2005

wN

c z

D

π

“beach”

Page 29: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 29/39

Example: Word density distribution

Page 30: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 30/39

A geometric interpretation

Page 31: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 31/39

LDA

Topics sampled repeatedly in eachDocument (like pLSI).

But, number of parameters does not growwith size of corpus.Problem: Inference.

Page 32: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 32/39

LDA - Inference

Coupling between Dirchlet distribuionsmakes inference intractable.

Blei, 2001: Variational Approximation

Page 33: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 33/39

LDA - Inference

Other procedures:Monte Carlo Markov Chin (Griffith et al.,

2002)Expectation Propagation (Minka et al., 2002)

Page 34: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 34/39

Experiments

Perplexity: Inverse of geometric mean per-wordlikelihood (monotonically decreasing function of

likelihood):

Idea: Lower Perplexity implies better generalization.

Page 35: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 35/39

Experiments – Nematode corpus

Page 36: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 36/39

Experiments – AP corpus

Page 37: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 37/39

Page 38: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 38/39

Polysemy

PRINTINGPAPER PRINT

PRINTED

TYPEPROCESS

INK PRESSIMAGE

PRINTER PRINTS

PRINTERSCOPY

COPIESFORM

OFFSETGRAPHICSURFACE

PRODUCEDCHARACTERS

PLAY

PLAYSSTAGE

AUDIENCETHEATER ACTORSDRAMA

SHAKESPEAREACTOR

THEATREPLAYWRIGHT

PERFORMANCEDRAMATICCOSTUMES

COMEDYTRAGEDY

CHARACTERS

SCENESOPERA

PERFORMED

TEAMGAME

BASKETBALLPLAYERSPLAYER

PLAY

PLAYINGSOCCER PLAYED

BALLTEAMS

BASKETFOOTBALL

SCORECOURT

GAMESTRY

COACHGYMSHOT

JUDGETRIAL

COURT

CASEJURY

ACCUSEDGUILTY

DEFENDANTJUSTICE

EVIDENCE

WITNESSESCRIME

LAWYER WITNESS

ATTORNEYHEARING

INNOCENTDEFENSECHARGE

CRIMINAL

HYPOTHESISEXPERIMENT

SCIENTIFICOBSERVATIONS

SCIENTISTSEXPERIMENTS

SCIENTISTEXPERIMENTAL

TEST

METHODHYPOTHESES

TESTEDEVIDENCE

BASEDOBSERVATION

SCIENCEFACTSDATA

RESULTSEXPLANATION

STUDYTEST

STUDYINGHOMEWORK

NEEDCLASSMATH

TRYTEACHER

WRITEPLAN

ARITHMETICASSIGNMENT

PLACESTUDIED

CAREFULLYDECIDE

IMPORTANT NOTEBOOK

REVIEW

Page 39: Latent Dirichlet Allocation (Nicolas Loeff)

8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)

http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 39/39

Choosing number of topicsSubjective interpretability

Bayesian model selectionGriffiths & Steyvers (2004)

Generalization test

Non-parametric Bayesian statisticsInfinite models; models that grow with size of data

Teh, Jordan, Teal, & Blei (2004)Blei, Griffiths, Jordan, Tenenbaum (2004)