1 An Introduction to Latent Dirichlet Allocation (LDA)

55
1 An Introduction to Latent Dirichlet Allocation (LDA)

Transcript of 1 An Introduction to Latent Dirichlet Allocation (LDA)

Page 1: 1 An Introduction to Latent Dirichlet Allocation (LDA)

1

An Introduction to Latent Dirichlet Allocation (LDA)

Page 2: 1 An Introduction to Latent Dirichlet Allocation (LDA)

2

LDALDA• A generative probabilistic model for collections of

discrete data such text corpora.

• Is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as finite mixture over an underlying set of topics.

• Each topic again is modeled as an infinite mixture of an underlying set of topic probabilities.

• It has natural advantages over unigram model and probabilistic LSI model.

Page 3: 1 An Introduction to Latent Dirichlet Allocation (LDA)

3

History (1) History (1) – text processing– text processing

• IR – text to real number vector(Baeza-Yates and Ribeiro-Neto, 1999), tfidf (Salton and McGill, 1983)– Tfidf – shortcoming: (1) Lengthy and (2) Cannot model inter- and

intra- document statistical structure

• LSI – dimension reduction (Deerwester et al., 1990) – Advantages: achieve significant compression in large collections

and capture synonymy and polysemy.

• Generative probabilistic model – to study the ability of LSI (Papadimitriou et al., 1998) – Why LSI, we can model the data directly using maximum

likelihood or Bayesian methods.

Page 4: 1 An Introduction to Latent Dirichlet Allocation (LDA)

4

History (2) – text processingHistory (2) – text processing

• Probabilistic LSI – also aspect model. Milestone (Hofmann, 1999). – P(wi|θj), d={w1, …, wN}, and θ={θ1, …, θk}. each word is generated

from a single model θj. Document d is considered to be a mixing proportions for these mixture components θ, that is a list of numbers (the mixing proportions for topics).

– Disadvantage: no probabilistic model at document level.• The number of parameters grows linearly with the size of corpora.

• It is not clear to assign probability to document outside of the collection. (does not make any assumptions about how the mixture weights θ are generated, making it difficult to test the generalizability of the model to new documents. )

Page 5: 1 An Introduction to Latent Dirichlet Allocation (LDA)

5

NotationNotation• D={d1, …, dM}, d={w1, …, wN},

• and θ={θ1, …, θk},

equivalently D={w1, …, wM}. (Bold variable denotes vector.)

• Suppose we have V distinct words in the whole data set.

Page 6: 1 An Introduction to Latent Dirichlet Allocation (LDA)

6

LDALDA• The basic idea: Documents are represented as

random mixtures over latent topics, where each topic is characterized by a distribution over words.

• For each document d, we generate as follows:

1. Choose Poisson( )

2. Choose Dir( )

3. For each of the words :

(a) Choose a topic Multinomial( )

(b) Choose a word from ( | , ),

a multinomial probability conditio

n

n

n n n

N

N w

z

w p w z

θ α

θ

β

ned on the topic nz

k topics z

is a k matrix

with = ( | )ij i j

V

p w zβ

Page 7: 1 An Introduction to Latent Dirichlet Allocation (LDA)

7

• A k-dimensional Dirichlet random variables θ can take values in the (k-1)-simplex (a k-vector θ lies in the (k-1)-simplex if , and thus the probability density can be:

where α is a k-vector parameter with αi>0, and Γ(x) is a gamma function

Dirichlet Random Variables Dirichlet Random Variables θθ

1 111

1

( )( | , )

( )i k

k

iikk

ii

p

α αα

θ α θ θα

10, 1

k

i ii θ θ

Page 8: 1 An Introduction to Latent Dirichlet Allocation (LDA)

8

Multinomial DistributionMultinomial Distribution• Each trial can end in exactly one of k categories

• n independent trials

• Probability a trial results in category i is pi

• Yi is the number of trials resulting in category i

• p1+…+pk = 1

• Y1+…+Yk = n

Page 9: 1 An Introduction to Latent Dirichlet Allocation (LDA)

9

Multinomial DistributionMultinomial Distribution

Page 10: 1 An Introduction to Latent Dirichlet Allocation (LDA)

10

Joint DistributionJoint Distribution• Given the parameters α and β, the joint distribution of a

topic mixture θ, a set of N topics z, and a set of N words w is given by:

where p(zn|θ) is simply θi for the unique i such that zni=1.

Integrating over θ and summing over z we obtain the marginal distribution of a document

1

( , , | ) ( | ) ( | ) ( | , )N

n n nn

p p p z p w z

θ z w α θ α θ β

1

( | , ) ( | ) ( | ) ( | , )n

N

n n nzn

p p p z p w z d

w α β θ α θ β θ (a)

Page 11: 1 An Introduction to Latent Dirichlet Allocation (LDA)

11

Joint Distribution (cont.)Joint Distribution (cont.)• Finally, taking the product of the marginal

probabilities of single documents, we can obtain the probability of a corpus:

1 1

( | , ) ( | ) ( | ) ( | , )d

nd

NM

d dn d dn dn dzd n

p D p p z p w z d

α β θ α θ β θ

MN

wzθα

βSimilar model can be referred to hierarchical models (Gelman et al., 1995), or more precisely as conditionally independent hierarchical models (Kass and Steffey, 1989).

Page 12: 1 An Introduction to Latent Dirichlet Allocation (LDA)

12

Relationships with Other Latent ModelsRelationships with Other Latent Models

MN

w

MN

wz

MN

wzd

1

( ) ( )N

nn

p p w

w

1

( ) ( ) ( | )N

nz n

p p z p w z

w

( , ) ( ) ( | ) ( | )n nz

p d w p d p w z p z d

Problems: (1) p(z|d) only model the documents in the training data set and cannot for the unseen document; and (2) the parameter kV+kM grows linearly in M and thus overfitting.

Page 13: 1 An Introduction to Latent Dirichlet Allocation (LDA)

13

Graphical InterpretationGraphical Interpretation

The probability density of the Dirichlet distribution when K=3 for various parameter vectors α. Clockwise from top left: α=(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4).

1 111

1

( )( | )

( )i k

k

iikk

ii

p

α α

αθ α θ θ

α

10, 1

k

i ii θ θ

Page 14: 1 An Introduction to Latent Dirichlet Allocation (LDA)

14

Graphical Interpretation (cont.)Graphical Interpretation (cont.)

• The Dirichlet prior on the topic-word distributions can be interpreted as forces on the topic locations with higher β moving the topic locations away from the corners of the simplex.

Page 15: 1 An Introduction to Latent Dirichlet Allocation (LDA)

15

Matrix InterpretationMatrix Interpretation

• In the topic model, the word-document co-occurrence matrix is split into two parts: a topic matrix Φ and a document matrix Θ. Note that the diagonal matrix D in LSA can be absorbed in the matrix U or V, making the similarity between the two representations even clearer.

Page 16: 1 An Introduction to Latent Dirichlet Allocation (LDA)

16

Inference and Parameter EstimationInference and Parameter Estimation• The key inferential problem is that of computing the

posterior distribution of the hidden variables given a document:

• However, such distribution is intractable to compute in general. For normalization in the above distribution, we have to marginalize over the hidden variables and write the Equation (a) in terms of the model parameters:

( , , | , )( , | , , )

( | , )

pp

p

θ z w α βθ z w α β

w α β

1

11 1 1

( )( | , ) ( )

( )

ji n

k N Vki wi

i i ijii n jii

p d

α

αw α β θ θ β θ

α

Page 17: 1 An Introduction to Latent Dirichlet Allocation (LDA)

17

Inference and Parameter EstimationInference and Parameter Estimation• The key inferential problem is that of computing the

posterior distribution of the hidden variables given a document:

• However, such distribution is intractable to compute in general. For normalization in the above distribution, we have to marginalize over the hidden variables and write the Equation (a) in terms of the model parameters:

( , , | , )( , | , , )

( | , )

pp

p

θ z w α βθ z w α β

w α β

1

11 1 1

( )( | , ) ( )

( )

ji n

k N Vki wi

i i ijii n jii

p d

α

αw α β θ θ β θ

α

1

( | , ) ( | ) ( | ) ( | , )n

N

n n nzn

p p p z p w z d

w α β θ α θ β θ

Page 18: 1 An Introduction to Latent Dirichlet Allocation (LDA)

18

Inference and Parameter Estimation Inference and Parameter Estimation (cont.)(cont.)

• This function is intractable due to the coupling between θ

and β in the summation over latent topics (Dickey, 1983).

• Rather than the intractable exact inference, we can use some

other approximate inference algorithms, e.g., Laplace Laplace

approximation, variational approximation, and Markov approximation, variational approximation, and Markov

chain Monte Carlochain Monte Carlo (Jordan, 1999).

1

11 1 1

( )( | , ) ( )

( )

ji n

k N Vki wi

i i ijii n jii

p d

α

αw α β θ θ β θ

α

Page 19: 1 An Introduction to Latent Dirichlet Allocation (LDA)

19

Variational InferenceVariational Inference• Here, we introduce a simple convexity-based variational

algorithm for inference in LDA.• The basic idea here is to make use of Jensen’s inequality to

obtain an adjustable lower bound on the log likelihood (Jordan, 1999).

• A simple way to obtain a tractable family of lower bounds lower bounds is to consider simple modifications of the original graphical model in which some of the edges and nodes are removed.

( , , | , ) ( , )log ( | , ) log

( , )

( , ) log ( , , | , ) ( , ) log ( , )

z

z z

p z w q zp d

q z

q z p z w d q z q z d

w α β

Page 20: 1 An Introduction to Latent Dirichlet Allocation (LDA)

20

Variational Inference (cont.)Variational Inference (cont.)• Hence, by dropping edges between θ, z, and w, and w

nodes, and also endow the resulting simplified graphical model with free variational parameters, we obtain a family of distributions on the latent variables:

• where the Dirichlet parameter γ and the multinomial parameters (Φ1, …, ΦN) are the free variational parameters.

1

( , | , ) ( | ) ( | )N

n nn

q q q z

θ z γ θ γ

MN

wzθα

β

MN

Φγ

Page 21: 1 An Introduction to Latent Dirichlet Allocation (LDA)

21

How to determine the parametersHow to determine the parameters• We can set up an optimization problem to determine the

values of the variational parameters γ and Φ. • We can define the optimization function as minimizing the

Kullback-Leibler (KL) divergence between the variational distribution and the true posterior p(θ,z|w,α,β):

• This minimization can be achieved by an iterative fixed-point method.

* *

( , )( , ) arg min D( ( , | , ) || ( , | , , )q p

γγ θ z γ θ z w α β

(5)

MN

wzθα

β

Page 22: 1 An Introduction to Latent Dirichlet Allocation (LDA)

22

Variational InferenceVariational Inference• We now discuss how to set the parameter γ and Φ via an

optimization procedure.• Following Jordan et al. (1999), we have a lower bound of

the log likelihood of a document using Jensen’s inequality:

( | , ) log ( , , | , )

( , , | , ) ( , ) log

( , )

( , ) log ( , , | , ) ( , ) log ( , )

E [log ( , , | , )] E [log ( , )]q q

q p d

p qd

q

q p d q q d

p q

z

z

z z

w α β θ z w α β

θ z w α β θ z

θ z

θ z θ z w α β θ z θ z

θ z w α β θ z

Page 23: 1 An Introduction to Latent Dirichlet Allocation (LDA)

23

Variational Inference (cont.)Variational Inference (cont.)• The Jensen’s inequality provides us with a lower bound on

the log likelihood for an arbitrary variational distribution q(θ,z|γ,Φ).

• It can be easily verified that the difference between the left-hand side and the right-hand side of the above equation is the KL divergence between the variational posterior probability and the true posterior probability.

log ( | , ) ( , ; , ) D( ( , | , ) || ( , | , , ))p L q p w α β γ α β θ z γ θ z w α β

( , ; , ) ( , ) log ( , , | , ) ( , ) log ( , )z z

L q z p z w d q z q z d γ α β

Page 24: 1 An Introduction to Latent Dirichlet Allocation (LDA)

24

Variational Inference (cont.)Variational Inference (cont.)• That is, letting denote the right-hand

side of the above equation we have:

• This means that maximizing the lower bound

w.r.t. γ and Φ is equivalent to minimizing the KL divergence between the variational posterior probability and the true posterior probability, the optimization problem in equation (5).

( , ; , )L γ α β

log ( | , ) ( , ; , ) D( ( , | , ) || ( , | , , ))p L q p w α β γ α β θ z γ θ z w α β

( , ; , )L γ α β

Page 25: 1 An Introduction to Latent Dirichlet Allocation (LDA)

25

Variational Inference (cont.)Variational Inference (cont.)• We can expand the above equation

• By extending it again, we can have

( , ; , ) E [log ( | )] E [log ( | )] E [log ( | , )]

E [log ( )] E [log ( )]

q q q

q q

L p p p

q q

γ α β θ α z θ w z β

θ z

1 11 1

11 1

1 1 1

1

( , ; , ) log ( ) log ( ) ( 1)( ( ) ( ))

( ( ) ( ))

log

log ( ) log

k kk k

j i i i jj ji i

N kk

ni i jjn i

N k Vj

ni n ijn i j

k

jj

L

w

γ α β α α α γ γ

γ γ

β

γ

11 1

1 1

( ) ( 1)( ( ) ( ))

log

k kk

i i i jji i

N k

ni nin i

γ γ γ γ

(15)

Page 26: 1 An Introduction to Latent Dirichlet Allocation (LDA)

26

EntropyEntropy

Page 27: 1 An Introduction to Latent Dirichlet Allocation (LDA)

27

Variaitonal MultinomialVariaitonal Multinomial• We first maximize Eq. (15) w.r.t. Φni, the probability that the

n-th word is generated by latent topic i.

• We form the Lagrangian by isolating the terms which contain Φni and adding the appropriate Lagrange multipliers. Let βiv be p(wv

n=1|zi=1) for the appropriate v. (recall that each wn is a vector of size V with exactly one component equal to one; we can select the unique v such that wv

n=1):

[ ] 1 1( ( ) ( )) log log ( 1)

ni

k k

ni i j ni iv ni ni n nij jL

γ γ β

Page 28: 1 An Introduction to Latent Dirichlet Allocation (LDA)

28

Variaitonal Multinomial (cont.)Variaitonal Multinomial (cont.)• Taking derivatives w.r.t. Φni, we obtain:

• Setting this to zero yields the maximizing value of the variational parameter Φni :

1( ) ( ) log log 1

k

i j iv nijni

L

γ γ β

1exp( ( ) ( ))

k

ni iv i jj β γ γ

Page 29: 1 An Introduction to Latent Dirichlet Allocation (LDA)

29

Variational DirichletVariational Dirichlet• Next we maximize equation (15) w.r.t. γi, the i-th

component of the posterior Dirichlet parameters, the terms containing γi are:

• By simplifying

[ ] 1 11 1

1 11

( 1)( ( ) ( )) ( ( ) ( ))

log ( ) log ( ) ( 1)( ( ) ( ))

k Nk k

i i j ni i jj ji n

kk k

j i i i jj ji

L

γ α γ γ γ γ

γ γ γ γ γ

[ ] 1 1 11

( )( ( ) ( )) log ( ) log ( )k

N k k

i ni i i j j in j ji

L

γ α γ γ γ γ γ

Page 30: 1 An Introduction to Latent Dirichlet Allocation (LDA)

30

Variational Dirichlet (cont.)Variational Dirichlet (cont.)• Taking the derivative w.r.t. γi:

• Setting this equation to zero yields a maximum at:

' '

1 1 11

( )( ) ( ) ( )k

N k N

i i ni i j i ni in j nii

L

γ α γ γ α γγ

1

N

i i nin γ α

Page 31: 1 An Introduction to Latent Dirichlet Allocation (LDA)

31

Solve the Optimization ProblemSolve the Optimization Problem• Derivate the KL divergence and setting them equal to zero, we

obtain the following update equations:

where the expectation in the multinomial update can be computed as follows:

where ψ is the first derivative of the logΓ function which is

computable via Taylor approximations (Abramowitz and Stegun, 1970).

exp{E [log( ) | ]}nni iw q i β θ γ

1

N

i i nin γ α

1E [log( ) | ] ( ) ( )

k

q i i jj θ γ γ γ

(6)

(7)

(8)

Page 32: 1 An Introduction to Latent Dirichlet Allocation (LDA)

32

Computing E[log(Computing E[log(θθii||αα)])]• Recall that a distribution is in the exponential family if it

can be written in the form:

where η is the natural parameter, T(x) is the sufficient statistic, and A(η) is the log of the normalization factor.

• As we can write the Dirichlet in this form by exponentiating the log of Eq.: p(θ|α)

( | ) ( )exp{ ( ) ( )}Tp x h x T x A

1 1 1( | ) exp{( ( 1)log ) log ( ) log ( )}

k k k

i i i ii i ip

θ α α θ α α

Page 33: 1 An Introduction to Latent Dirichlet Allocation (LDA)

33

Computing E[log(Computing E[log(θθii||αα)] (cont.))] (cont.)• From this form we see that the natural parameter of the

Dirichlet is ηi=αi-1 and the sufficient statistic is T(θi)=logθi. Moreover, based on the general fact that the derivative of the log normalization factor w.r.t. the natural parameter is equal to the expectation of the sufficient statistic, we obtain:

where ψ is the digamma function, the first derivative of the log Gamma function.

1E[log( ) | ] ( ) ( )

k

i i jj θ α α α

Page 34: 1 An Introduction to Latent Dirichlet Allocation (LDA)

34

Variational Inference AlgorithmVariational Inference Algorithm

Each iteration requires O((N+1)k) operations

For a single document the iteration number is on the order of the number of words in it

Thus, the total number of operations roughly on the order of N2k

0

i

1

1

1(1) initialize for all and

(2) initialize for all

(3) repeat

(4) for =1 to

(5) for =1 to

(6) exp( ( ))

(7) normalize to sum t

n

ni

i

t tni iw i

tni

i nk

Ni

k

n N

i k

γ α

β γ

1

1

o 1

(8)

(9) until convergence

Nt ti nin

γ α

Page 35: 1 An Introduction to Latent Dirichlet Allocation (LDA)

35

Parameter EstimationParameter Estimation• We can use a empirical Bayes method for parameter

estimation. In particular, we wish to find parameters α and β that maximize the marginal log likelihood:

• The quantity p(w|α, β) can be computed by the variational inference as described above. Thus, we can find approximate empirical Bayes estimates for the LDA model via an alternating variational EM procedure that maximizes a lower bound w.r.t. the variational parameters γ and Φ, and then fixed values of the variational parameters, maximizes the lower bound w.r.t. the model parameter α and β.

1

log( , ) log ( | , )M

dd

p

α β w α β

Page 36: 1 An Introduction to Latent Dirichlet Allocation (LDA)

36

Variational EMVariational EM1. (E-step) For each document, find the optimizing values of the

variational parameters . This is done as described in the previous section.

2. (M-step) Maximize the resulting lower bound on the log likelihood w.r.t. the model parameters α and β. This corresponds to finding maximum likelihood estimates with expected sufficient statistics for each document under the approximate posterior which is computed in the E-step. Actually, the update for the conditional multinomial parameter β can be written out analytically:

The update for α can be implemented using an efficient Newton-Raphson method. These two steps are repeated until converges.

* *{ , : }d d d Dγ

*

1 1

dNMj

ij dni dnd n

w

β (9)

Page 37: 1 An Introduction to Latent Dirichlet Allocation (LDA)

37

Parameter EstimationParameter Estimation• We now consider how to obtain empirical Bayes

estimates of the model parameters α and β.

• We solve this problem by using the variational lower bound as a surrogate for the marginal log likelihood, with the variational parameters Φ and γ fixed to the values found by variational inference.

• We then obtain empirical Bayes estimates by maximizing this lower bound w.r.t. the model parameters.

Page 38: 1 An Introduction to Latent Dirichlet Allocation (LDA)

38

Parameter Estimation (cont.)Parameter Estimation (cont.)• Recall our approach for finding the empirical Bayes

estimates is based on a variational EM procedure.

• In the variational E-step, we maximize the bound

w.r.t. the variational parameter γ and Φ. In the M-step, which we describe in this section, we maximize the bound w.r.t. the model parameters α and β. The overall procedure can thus be viewed as coordinate ascent in L.

( , ; , )L γ α β

Page 39: 1 An Introduction to Latent Dirichlet Allocation (LDA)

39

Conditional MultinomialsConditional Multinomials• To maximize w.r.t. β, we isolate terms and add

Lagrange multipliers:

• Taking the derivative w.r.t. βij and set it to zero, we have

[ ] 11 1 1 1 1

log ( 1)dNM k V k

Vjdni dn ij i ijj

d n i j i

L w

β β β

1 1

dNMj

ij dni dnd n

w

β

Page 40: 1 An Introduction to Latent Dirichlet Allocation (LDA)

40

DirichletDirichlet• First, we have

• Taking derivative w.r.t. αi, we obtain:

• This derivative depends on αi, where j<>i, and we therefore must use an iterative method to find the maximal α. In particular, the Hessian is in the form found in equation (10):

[ ] 1 11 1 1

log ( ) log ( ) ( 1)( ( ) ( ))M k k

k k

j i i di djj jd i i

L

α α α α γ γ

1 11

( ( ) ( )) ( ) ( )M

k k

j i di djj jdi

LM

α α γ γα

' '

1( , ) ( ) ( )

k

i jji j

Li j M

α αα α

Page 41: 1 An Introduction to Latent Dirichlet Allocation (LDA)

41

SmoothingSmoothing• Simple Laplace smoothing is no longer justified as a

maximum a posteriori method in LDA setting.

• We can then assume that each row in βkxV is independently drawn from an exchangeable Dirichlet distribution. That is to treat βi as random variables that are endowed with a posterior distribution, conditioned on the data.

k

MN

wzθα

βη

Page 42: 1 An Introduction to Latent Dirichlet Allocation (LDA)

42

Smoothing ModelSmoothing Model• Thus we obtain a variational approach to Bayesian

inference:

where is the variational distribution defined for LDA as above and the update for the new variational parameter η is as follow:

1: 1: 1:1 1

( , , | , , ) ( | ) ( , | , )k N

k M M i i d d n n di n

q Dir q

β θ z η γ β η θ z γ

( , | , )d d n n dq θ z γ

*

1 1

dNMj

ij dni dnd n

w

η η

Page 43: 1 An Introduction to Latent Dirichlet Allocation (LDA)

43

Applications

Page 44: 1 An Introduction to Latent Dirichlet Allocation (LDA)

44

Document ModelingDocument Modeling• Perplexity is used to indicate the generalization

performance of a method.

• Specifically, we estimate a document modeling and use this model to describe the new data set.

• LDA outperforms the other models including pLSI, Smoothed Unigram, and Smoothed Mixt. Unigrams.

1

1

log ( )( ) exp{ }

M

ddtest M

dd

p wperplexity D

N

Page 45: 1 An Introduction to Latent Dirichlet Allocation (LDA)

45

Document ClassificationDocument Classification• We can use the LDA model results as the features

for classifiers. In this way, say 50 topics, we can reduce the feature space by 99.6%.

• The experimental results show that such feature reduction may decrease the accuracy only a little.

Page 46: 1 An Introduction to Latent Dirichlet Allocation (LDA)

46

Collaborative FilteringCollaborative Filtering• We can learn a model on a fully observed set of users. Then

for each unobserved user, we are shown all but one of the movies preferred by that user and are asked to predict what the held-out movie is.

• Precisely, define the predictive perplexity on M test uses as:

, ,1: 11log ( | )

( ) exp{ }d d

M

d N d Ndtest

p wpredictive perplexity D

M w

Page 47: 1 An Introduction to Latent Dirichlet Allocation (LDA)

47

Other ApplicationsOther Applications

Page 48: 1 An Introduction to Latent Dirichlet Allocation (LDA)

48

wN

c

The Naïve Bayes modelThe Naïve Bayes model

)|()( cwpcp

Prior prob. of the object classes

Image likelihoodgiven the class

Csurka et al. 2004

N

nn cwpcp

1

)|()(

Object classdecision

)|( wcpc

c maxarg

Page 49: 1 An Introduction to Latent Dirichlet Allocation (LDA)

49 Csurka et al. 2004

Page 50: 1 An Introduction to Latent Dirichlet Allocation (LDA)

50

wN

d z

D

Hierarchical Bayesian text modelsHierarchical Bayesian text models

Probabilistic Latent Semantic Analysis (pLSA)

“face”

Sivic et al. ICCV 2005

Page 51: 1 An Introduction to Latent Dirichlet Allocation (LDA)

51

wN

c z

D

Hierarchical Bayesian text modelsHierarchical Bayesian text models

Latent Dirichlet Allocation (LDA)

Fei-Fei et al. ICCV 2005

“beach”

Page 52: 1 An Introduction to Latent Dirichlet Allocation (LDA)

52

SummarySummary• Based on the exchangeability assumption

• Can be viewed as a dimensionality reduction technique

• Exact inference is intractable, we can approximate instead

• Applications in other collection – images and caption for example.

Page 53: 1 An Introduction to Latent Dirichlet Allocation (LDA)

53

End of The Talk !

Page 54: 1 An Introduction to Latent Dirichlet Allocation (LDA)

54

Conjugation Conjugation

Page 55: 1 An Introduction to Latent Dirichlet Allocation (LDA)

55

EntropyEntropy