Latent Dirichlet Allocation (Nicolas Loeff)
Transcript of Latent Dirichlet Allocation (Nicolas Loeff)
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 1/39
Latent DirichletAllocationD. Blei, A. Ng, M. Jordan
Includes some slides adapted from J.Ramos at Rutgers, M.Steyvers and M. Rozen-Zvi at UCI, L. Fei Fei at UIUC.
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 2/39
Overview
What is so special about text?Classification methodsLSIUnigram / Mixture of UnigramProbabilistic LSI (Aspect Model)LDA modelGeometric interpretation
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 3/39
What is so special about text?
No obvious relation between featuresHigh dimensionality, (often larger vocabulary, V, than the number of features!)Importance of speed
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 4/39
The need for dimensionality
reductionRepresentation:
Presenting documents as vectors in thewords space - ‘bag of words’ representationIt is a sparse representation, V>>|D|
A need to define conceptual closeness
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 5/39
Bag of wordsOf all the sensory impressions proceeding to
the brain, the visual experiences are thedominant ones. Our perception of the worldaround us is based essentially on themessages that reach the brain from our eyes.For a long time it was thought that the retinalimage was transmitted point by point to visualcenters in the brain; the cerebral cortex wasa movie screen, so to speak, upon which the
image in the eye was projected. Through thediscoveries of Hubel and Wiesel we nowknow that behind the origin of the visualperception in the brain there is a considerablymore complicated course of events. Byfollowing the visual impulses along their pathto the various cell layers of the optical cortex,Hubel and Wiesel have been able todemonstrate that the message about theimage falling on the retina undergoes a step-wise analysis in a system of nerve cellsstored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.
sensory, brain,visual, perception,
retinal, cerebral cortex,eye, cell, opticalnerve, imageHubel, Wiesel
China is forecasting a trade surplus of $90bn(£51bn) to $100bn this year, a threefoldincrease on 2004's $32bn. The CommerceMinistry said the surplus would be created bya predicted 30% jump in exports to $750bn,compared with a 18% rise in imports to$660bn. The figures are likely to further annoy the US, which has long argued thatChina's exports are unfairly helped by adeliberately undervalued yuan. Beijingagrees the surplus is too high, but says theyuan is only one factor. Bank of Chinagovernor Zhou Xiaochuan said the countryalso needed to do more to boost domesticdemand so more goods stayed within thecountry. China increased the value of theyuan against the dollar by 2.1% in July and
permitted it to trade within a narrow band, butthe US wants the yuan to be allowed to tradefreely. However, Beijing has made it clear that it will take its time and tread carefullybefore allowing the yuan to rise further invalue.
China, trade,surplus, commerce,
exports, imports, US,yuan, bank, domestic,foreign, increase,
trade, value
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 6/39
Bag of words
Order of words in document can beignored, only count matters.Probability theory: Exchangability(includes IID) (Aldous, 1985).Exchangable RVs have a representationas mixture distribution (de Finetti, 1990).
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 7/39
What does this have to do with
Vision?ObjectObject Bag of Bag of
‘words’‘words’
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 8/39
TF-IDF Weighing Scheme (Salton
and McGill, 1983)Given corpus D, word w , document d ,calculate w d = f w, d · log (| D|/f w, D )
Many varieties of basic scheme.
Search procedure:Scan each d , compute each w
i, d , return set
D’ that maximizes Σ i w i, d
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 9/39
A Spatial Representation: Latent
Semantic Analysis (Deerwester, 1990)
Document/Term count matrix
1…
16…
0…
SCIENC E …
6190RESEAR CH
2012SOUL
3034LOVE
Doc3 …Doc2Doc1
SVD
High dimensional space,not as high as |V|
SOUL
RESEARCH
LOVE
SCIENCE
• Each word is a single point in semantic space (dimensionality reduction)• Similarity measured by cosine of angle between word vectors
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 10/39
Feature Vector representation
From: Modeling the Internet and the Web: Probabilistic methodsand Algorithms , Pierre Baldi,Paolo Frasconi, Padhraic Smyth
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 11/39
Classification: assigning words to
topicsDifferent models for data:
Discrete Classifier,modeling the boundariesbetween different classes of
the data
Prediction of
Categoricaloutput e.g., SVM
Density Estimator: modeling thedistribution of the data pointsthemselves
GenerativeModels e.g. NB
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 12/39
Generative Models – Latent
semantic structure
Latent Structure
Words
∑= ),()(ww P P
Distribution over words
)()()|()|(
w
ww
P
P P P =
Inferring latent structure
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 13/39
Topic Models
Unsupervised learning of topics (“gist”) of documents:
articles/chaptersconversationsemails.... any verbal context
Topics are useful latent structures to explainsemantic association
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 14/39
Probabilistic Generative Model
Each document is a probability distributionover topics
Each topic is a probability distribution over words
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 15/39
Generative Process
l o a n
TOPIC 1
m o n e y
l o a n
b a n k
m o n e
y
b a n
k
r i v e r
TOPIC 2
r i v e r
r i v e r
s t r e a m
b a n k
b a n k
s t r e a m
b a n k
l o a n
DOCUMENT 2: river 2 stream 2 bank 2 stream 2 bank 2 money 1 loan 1 river 2 stream 2 loan 1 bank 2 river 2 bank 2 bank 1 stream 2
river 2 loan 1 bank 2 stream 2 bank 2 money 1 loan 1 river 2 stream 2
bank 2 stream 2 bank 2 money 1 river 2 stream 2 loan 1 bank 2 river 2bank 2 money 1 bank 1 stream 2 river 2 bank 2 stream 2 bank 2
money 1
DOCUMENT 1: money 1 bank 1 bank 1 loan 1 river 2 stream 2
bank 1 money 1 river 2 bank 1 money 1 bank 1 loan 1 money 1
stream 2 bank 1 money 1 bank 1 bank 1 loan 1 river 2 stream 2 bank 1
money 1 river 2 bank 1 money 1 bank 1 loan 1 bank 1 money 1
stream 2
.3
.8
.2
Mixturecomponents
Mixtureweights
Bayesian approach: use priorsMixture weights ~ Dirichlet( α )Mixture components ~ Dirichlet( β )
.7
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 16/39
Vision: Topic = Object categories
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 17/39
Simple Model: Unigram
Words of document are drawn IID from asingle multinomial distribution:
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 18/39
Unigram Mixture Model
First choose topic z , then generate wordsconditionally independent given topic.
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 19/39
Unigram Mixture Model
First choose topic z , then generate wordsconditionally independent given topic.
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 20/39
Unigram Mixture Model
First choose topic z , then generate wordsconditionally independent given topic.
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 21/39
Probabilistic Latent Semantic
Indexing (Hoffman, 1999)Document d in training set, and word w n areconditionally independent given topic.
Not truly generative (dummy r.v. d ). Number of parameters grows with size of corpus(overfitting).Document may contain several topics.
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 22/39
Vision app.: Sivic et al., 2005
wN
d z
D
“face”
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 23/39
LDA
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 24/39
LDA
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 25/39
LDA
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 26/39
LDA
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 27/39
LDA
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 28/39
Vision app.: Fei Fei Li, 2005
wN
c z
D
π
“beach”
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 29/39
Example: Word density distribution
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 30/39
A geometric interpretation
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 31/39
LDA
Topics sampled repeatedly in eachDocument (like pLSI).
But, number of parameters does not growwith size of corpus.Problem: Inference.
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 32/39
LDA - Inference
Coupling between Dirchlet distribuionsmakes inference intractable.
Blei, 2001: Variational Approximation
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 33/39
LDA - Inference
Other procedures:Monte Carlo Markov Chin (Griffith et al.,
2002)Expectation Propagation (Minka et al., 2002)
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 34/39
Experiments
Perplexity: Inverse of geometric mean per-wordlikelihood (monotonically decreasing function of
likelihood):
Idea: Lower Perplexity implies better generalization.
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 35/39
Experiments – Nematode corpus
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 36/39
Experiments – AP corpus
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 37/39
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 38/39
Polysemy
PRINTINGPAPER PRINT
PRINTED
TYPEPROCESS
INK PRESSIMAGE
PRINTER PRINTS
PRINTERSCOPY
COPIESFORM
OFFSETGRAPHICSURFACE
PRODUCEDCHARACTERS
PLAY
PLAYSSTAGE
AUDIENCETHEATER ACTORSDRAMA
SHAKESPEAREACTOR
THEATREPLAYWRIGHT
PERFORMANCEDRAMATICCOSTUMES
COMEDYTRAGEDY
CHARACTERS
SCENESOPERA
PERFORMED
TEAMGAME
BASKETBALLPLAYERSPLAYER
PLAY
PLAYINGSOCCER PLAYED
BALLTEAMS
BASKETFOOTBALL
SCORECOURT
GAMESTRY
COACHGYMSHOT
JUDGETRIAL
COURT
CASEJURY
ACCUSEDGUILTY
DEFENDANTJUSTICE
EVIDENCE
WITNESSESCRIME
LAWYER WITNESS
ATTORNEYHEARING
INNOCENTDEFENSECHARGE
CRIMINAL
HYPOTHESISEXPERIMENT
SCIENTIFICOBSERVATIONS
SCIENTISTSEXPERIMENTS
SCIENTISTEXPERIMENTAL
TEST
METHODHYPOTHESES
TESTEDEVIDENCE
BASEDOBSERVATION
SCIENCEFACTSDATA
RESULTSEXPLANATION
STUDYTEST
STUDYINGHOMEWORK
NEEDCLASSMATH
TRYTEACHER
WRITEPLAN
ARITHMETICASSIGNMENT
PLACESTUDIED
CAREFULLYDECIDE
IMPORTANT NOTEBOOK
REVIEW
8/3/2019 Latent Dirichlet Allocation (Nicolas Loeff)
http://slidepdf.com/reader/full/latent-dirichlet-allocation-nicolas-loeff 39/39
Choosing number of topicsSubjective interpretability
Bayesian model selectionGriffiths & Steyvers (2004)
Generalization test
Non-parametric Bayesian statisticsInfinite models; models that grow with size of data
Teh, Jordan, Teal, & Blei (2004)Blei, Griffiths, Jordan, Tenenbaum (2004)