Latent Dirichlet Allocation (LDA)sds.postech.ac.kr/wp-content/uploads/2020/07/LDA.pdf · 2020. 7....

Latent Dirichlet Allocation(LDA)

Seonghwi Kim

SDS Lab Seminar

Pohang University of Science and TechnologyDepartment of Industrial and Management Engineering

July 15, 2020

1 / 42

Outline

1 Introduction

2 LDA model

3 Approximate posterior inference

4 Application: SCNT recommendation system

2 / 42

Outline

1 Introduction

2 LDA model



3 / 42

Topic modeling

• A statistical model for discovering the abstract topics thatoccur in a collection of documents.

• A methods for automatically organizing, understanding,searching, and summarizing large documents.

4 / 42

Topic modeling

• Uncover the hidden topical patterns that pervade thecollection of documents.(corpus)

5 / 42

Topic modeling

From a machine learning perspective, topic modeling is a casestudy in applying hierarchical Bayesian models to groupeddata, like documents or images.

Topic modeling research touches on• Directed graphical models• Conjugate priors• Hierarchical Bayesian methods• Fast approximate posterior inference (MCMC, variational

methods)• ...

LDA is an example of topic model.

6 / 42

Document embedding

• Document embedding is to convert each document to avector space representation.

• It enables us to perform several tasks relevant todocuments like

– calculating similarity between documents– document classification

7 / 42

Bag of Words Representation

• A common representation of documents in naturallanguage processing

• A document is represented as the bag (multiset) of itswords, disregarding grammar and any meaning for word.

• The order of words in a document is ignored, and only thefrequency of the word matters.

• Latent Dirichlet Allocation assume the bag of wordsrepresentation for documents. (bag of words model)

8 / 42

Bag of Words RepresentationHere are two simple text documents:

• (1) John likes to watch movies. Mary likes movies too.• (2) Mary also likes to watch football games.

Based on these two text documents, a list is constructed as follows foreach document:

[John:1, likes:2, to:1, watch:1, movies:2, Mary:1, too:1][Mary:1, also:1, likes:1, to:1, watch:1, football:1, games:1]

and a union list of these two

[John, likes, to, watch, movies, Mary, too, also, football, games]

then for each documents (1) and (2):• (1) [1, 2, 1, 1, 2, 1, 1, 0, 0, 0]• (2) [0, 1, 1, 1, 0, 1, 0, 1, 1, 1]

9 / 42

Outline

1 Introduction

2 LDA model



10 / 42

Generative model

• Each document is a random mixture of topics• Each word is drawn from one of those topics

11 / 42

The posterior distribution

• In reality, we only observe the documents• Our goal is to infer the underlying topic structure

12 / 42

Graphical models

• Nodes are random variables• Edges denote possible dependence• Observed variables are shaded• Plates denote replicated structure

13 / 42

Graphical models

• Structure of the graph represents a relationship betweenrandom variables

• E.g., this graph corresponds to

14 / 42

Latent Dirichlet Allocation

βk ∼ Dirichlet(η), k ∈ {1, 2, ...,K}θd ∼ Dirichlet(α), d ∈ {1, 2, ...,D}

zd,n ∼ Multi(θd), d ∈ {1, 2, ...,D},n ∈ {1, 2, ...,N}wd,n ∼ Multi(βzd,n), d ∈ {1, 2, ...,D},n ∈ {1, 2, ...,N}

15 / 42


16 / 42

The Dirichlet distribution

• The Dirichlet distribution is an exponential familydistribution over the simplex, i.e., positive vectors that sumto one

p(θ | α⃗) = Γ(Σiαi)∏i Γ(αi)

∏iθαi−1

• The Dirichlet is conjugate to the multinomial. Given amultinomial observation, the posterior distribution of θ is aDirichlet.

• The parameter α controls the mean shape and sparsity of θ.• The topic proportions are a K dimensional Dirichlet. The

topics are a V dimensional Dirichlet.

18 / 42


Changes of θ distributions with different α values

• Large α values make the distribution to be peaky, while smallerα values push the distribution to the corners.

• α values determine smoothness or sparsity of the θdistributions.

From Geanegedara, Thushan. 2018. ”Intuitive Guide to Latent Dirichlet Allocation” 19 / 42


20 / 42


• From a collection of documents, infer1 Per-word topic assignment zd,n2 Per-document topic proportions θd3 Per-corpus topic distributions βk

• Approximate posterior inference algorithms1 Gibbs sampling2 Variational inference

21 / 42

Outline

1 Introduction

2 LDA model



22 / 42

Gibbs sampling

• MCMC algorithm for obtaining a sequence of observations whichare approximated from a specified multivariate probabilitydistribution

• Define a Markov chain whose stationary distribution is theposterior of interest

• Collect independent samples from that stationary distribution;approximate the posterior with them

• In Gibbs sampling, The chain is run by iteratively sampling fromthe conditional distribution of each hidden variable givenobservations and the current state of the other hidden variables

24 / 42

Gibbs sampling procedure

Suppose the joint probability distribution p(x1, x2, x3) of threerandom variables.

1 Initialize X0 = (x01, x02, x03)2 Fix the variables x02 and x03 of the currently given sample X0.3 The new value x11 to replace x01 is selected with the following

probability. p(x11 | x02, x03)4 Fix the variables x11 and x03.5 The new x11 to replace x02 is selected with the following probability.

p(x12 | x11, x03)6 Fix the variables x11 and x12.7 The new value x13 to replace x03 is selected with the following

probability. p(x13 | x11, x12)8 Finally, the obtained X1 = (x11, x12, x13)

25 / 42

Gibbs sampling procedure

Visualization of Gibbs sampling

26 / 42

Gibbs sampling for LDA

• Define n(z1:N) to be the counts vector.• A collapsed Gibbs sampler is

zi | z−i,w1:N ∼ Multi(π(z−i,wi)), where

π(z−i,wi) ∝ (α+ n(z1:N))p(wi | β1:K)

27 / 42


• The topic proportions θ can be integrated out.• A collapsed Gibbs sampler draws from

p(zi | z−i,w1:N) ∝ p(wi | β1:K)K∏

k=1Γ(nk(z−i)),

where nk(z−1) is the number of times we’ve seen topic k inthe collection of topic assignments zi

• Integrating out variables leads to a faster mixing chain.28 / 42


• zi: the topic assigned to the ithword record• z−i: topics assigned to the other words• In this example, n(z−i) = (9, 4, 6)

29 / 42


30 / 42

Outline

1 Introduction

2 LDA model



31 / 42

SCNT recommendation systemDevelopment of AI-based Recommendation System forCurated Retailing Services in Samsung C&T

Outfit recommendation system based user’s click history• topic : style• document : user• words frequency : item click frequency• words : items

32 / 42

SCNT recommendation system

• Recommendation process

33 / 42


• Preprocessing

34 / 42


• Model assessment

35 / 42


• Click history 2019.03.01 – 2019.03.10• style example

36 / 42


• Click history 2019.03.01 – 2019.03.10• recommendation example

37 / 42



38 / 42


• Click history 2019.06.01 – 2019.06.10• style example

39 / 42



40 / 42



41 / 42

Thank you!

42 / 42

IntroductionLDA modelApproximate posterior inferenceApplication: SCNT recommendation system

Latent Dirichlet Allocation (LDA)sds.postech.ac.kr/wp-content/uploads/2020/07/LDA.pdf · 2020. 7....

Documents

Transcript of Latent Dirichlet Allocation (LDA)sds.postech.ac.kr/wp-content/uploads/2020/07/LDA.pdf · 2020. 7....