Latent Dirichlet Allocation (LDA)sds.postech.ac.kr/wp-content/uploads/2020/07/LDA.pdf · 2020. 7....

42
Latent Dirichlet Allocation (LDA) Seonghwi Kim SDS Lab Seminar Pohang University of Science and Technology Department of Industrial and Management Engineering July 15, 2020 1 / 42

Transcript of Latent Dirichlet Allocation (LDA)sds.postech.ac.kr/wp-content/uploads/2020/07/LDA.pdf · 2020. 7....

  • Latent Dirichlet Allocation(LDA)

    Seonghwi Kim

    SDS Lab Seminar

    Pohang University of Science and TechnologyDepartment of Industrial and Management Engineering

    July 15, 2020

    1 / 42

  • Outline

    1 Introduction

    2 LDA model

    3 Approximate posterior inference

    4 Application: SCNT recommendation system

    2 / 42

  • Outline

    1 Introduction

    2 LDA model

    3 Approximate posterior inference

    4 Application: SCNT recommendation system

    3 / 42

  • Topic modeling

    • A statistical model for discovering the abstract topics thatoccur in a collection of documents.

    • A methods for automatically organizing, understanding,searching, and summarizing large documents.

    4 / 42

  • Topic modeling

    • Uncover the hidden topical patterns that pervade thecollection of documents.(corpus)

    5 / 42

  • Topic modeling

    From a machine learning perspective, topic modeling is a casestudy in applying hierarchical Bayesian models to groupeddata, like documents or images.

    Topic modeling research touches on• Directed graphical models• Conjugate priors• Hierarchical Bayesian methods• Fast approximate posterior inference (MCMC, variational

    methods)• ...

    LDA is an example of topic model.

    6 / 42

  • Document embedding

    • Document embedding is to convert each document to avector space representation.

    • It enables us to perform several tasks relevant todocuments like

    – calculating similarity between documents– document classification

    7 / 42

  • Bag of Words Representation

    • A common representation of documents in naturallanguage processing

    • A document is represented as the bag (multiset) of itswords, disregarding grammar and any meaning for word.

    • The order of words in a document is ignored, and only thefrequency of the word matters.

    • Latent Dirichlet Allocation assume the bag of wordsrepresentation for documents. (bag of words model)

    8 / 42

  • Bag of Words RepresentationHere are two simple text documents:

    • (1) John likes to watch movies. Mary likes movies too.• (2) Mary also likes to watch football games.

    Based on these two text documents, a list is constructed as follows foreach document:

    [John:1, likes:2, to:1, watch:1, movies:2, Mary:1, too:1][Mary:1, also:1, likes:1, to:1, watch:1, football:1, games:1]

    and a union list of these two

    [John, likes, to, watch, movies, Mary, too, also, football, games]

    then for each documents (1) and (2):• (1) [1, 2, 1, 1, 2, 1, 1, 0, 0, 0]• (2) [0, 1, 1, 1, 0, 1, 0, 1, 1, 1]

    9 / 42

  • Outline

    1 Introduction

    2 LDA model

    3 Approximate posterior inference

    4 Application: SCNT recommendation system

    10 / 42

  • Generative model

    • Each document is a random mixture of topics• Each word is drawn from one of those topics

    11 / 42

  • The posterior distribution

    • In reality, we only observe the documents• Our goal is to infer the underlying topic structure

    12 / 42

  • Graphical models

    • Nodes are random variables• Edges denote possible dependence• Observed variables are shaded• Plates denote replicated structure

    13 / 42

  • Graphical models

    • Structure of the graph represents a relationship betweenrandom variables

    • E.g., this graph corresponds to

    14 / 42

  • Latent Dirichlet Allocation

    βk ∼ Dirichlet(η), k ∈ {1, 2, ...,K}θd ∼ Dirichlet(α), d ∈ {1, 2, ...,D}

    zd,n ∼ Multi(θd), d ∈ {1, 2, ...,D},n ∈ {1, 2, ...,N}wd,n ∼ Multi(βzd,n), d ∈ {1, 2, ...,D},n ∈ {1, 2, ...,N}

    15 / 42

  • Latent Dirichlet Allocation

    16 / 42

  • Latent Dirichlet Allocation

    Here, the complete joint probability distribution of LDA:

    p(θ, z,w, β | α, η) =K∏

    k=1p(βk | η)

    D∏d=1

    p(θd | α)N∏

    n=1p(zd,n | θd)p(wd,n | βzd,n)

    , where

    p(zd,n | θd) = θd,zd,n ,p(wd,n | zd,n, β) = βwd,n,zd,n

    17 / 42

  • The Dirichlet distribution

    • The Dirichlet distribution is an exponential familydistribution over the simplex, i.e., positive vectors that sumto one

    p(θ | α⃗) = Γ(Σiαi)∏i Γ(αi)

    ∏iθαi−1

    • The Dirichlet is conjugate to the multinomial. Given amultinomial observation, the posterior distribution of θ is aDirichlet.

    • The parameter α controls the mean shape and sparsity of θ.• The topic proportions are a K dimensional Dirichlet. The

    topics are a V dimensional Dirichlet.

    18 / 42

  • The Dirichlet distribution

    Changes of θ distributions with different α values

    • Large α values make the distribution to be peaky, while smallerα values push the distribution to the corners.

    • α values determine smoothness or sparsity of the θdistributions.

    From Geanegedara, Thushan. 2018. ”Intuitive Guide to Latent Dirichlet Allocation” 19 / 42

  • The Dirichlet distribution

    20 / 42

  • Latent Dirichlet Allocation

    • From a collection of documents, infer1 Per-word topic assignment zd,n2 Per-document topic proportions θd3 Per-corpus topic distributions βk

    • Approximate posterior inference algorithms1 Gibbs sampling2 Variational inference

    21 / 42

  • Outline

    1 Introduction

    2 LDA model

    3 Approximate posterior inference

    4 Application: SCNT recommendation system

    22 / 42

  • Posterior distribution for LDA

    • For now, assume the topics β1:K are fixedThe per-document posterior is

    p(θ, z | w, α, β1:K) =p(θ, z,w | α, β1:K)

    p(w | α, β1:K), where

    p(w | α, β1:K) =∫θ

    p(θ | α)N∏

    n=1ΣKz=1p(zn | θ)p(wn | zn, β1:K)

    • This is intractable to compute• We appeal to approximate posterior inference.

    23 / 42

  • Gibbs sampling

    • MCMC algorithm for obtaining a sequence of observations whichare approximated from a specified multivariate probabilitydistribution

    • Define a Markov chain whose stationary distribution is theposterior of interest

    • Collect independent samples from that stationary distribution;approximate the posterior with them

    • In Gibbs sampling, The chain is run by iteratively sampling fromthe conditional distribution of each hidden variable givenobservations and the current state of the other hidden variables

    24 / 42

  • Gibbs sampling procedure

    Suppose the joint probability distribution p(x1, x2, x3) of threerandom variables.

    1 Initialize X0 = (x01, x02, x03)2 Fix the variables x02 and x03 of the currently given sample X0.3 The new value x11 to replace x01 is selected with the following

    probability. p(x11 | x02, x03)4 Fix the variables x11 and x03.5 The new x11 to replace x02 is selected with the following probability.

    p(x12 | x11, x03)6 Fix the variables x11 and x12.7 The new value x13 to replace x03 is selected with the following

    probability. p(x13 | x11, x12)8 Finally, the obtained X1 = (x11, x12, x13)

    25 / 42

  • Gibbs sampling procedure

    Visualization of Gibbs sampling

    26 / 42

  • Gibbs sampling for LDA

    • Define n(z1:N) to be the counts vector.• A collapsed Gibbs sampler is

    zi | z−i,w1:N ∼ Multi(π(z−i,wi)), where

    π(z−i,wi) ∝ (α+ n(z1:N))p(wi | β1:K)

    27 / 42

  • Gibbs sampling for LDA

    • The topic proportions θ can be integrated out.• A collapsed Gibbs sampler draws from

    p(zi | z−i,w1:N) ∝ p(wi | β1:K)K∏

    k=1Γ(nk(z−i)),

    where nk(z−1) is the number of times we’ve seen topic k inthe collection of topic assignments zi

    • Integrating out variables leads to a faster mixing chain.28 / 42

  • Gibbs sampling for LDA

    • zi: the topic assigned to the ithword record• z−i: topics assigned to the other words• In this example, n(z−i) = (9, 4, 6)

    29 / 42

  • Latent Dirichlet Allocation

    30 / 42

  • Outline

    1 Introduction

    2 LDA model

    3 Approximate posterior inference

    4 Application: SCNT recommendation system

    31 / 42

  • SCNT recommendation systemDevelopment of AI-based Recommendation System forCurated Retailing Services in Samsung C&T

    Outfit recommendation system based user’s click history• topic : style• document : user• words frequency : item click frequency• words : items

    32 / 42

  • SCNT recommendation system

    • Recommendation process

    33 / 42

  • SCNT recommendation system

    • Preprocessing

    34 / 42

  • SCNT recommendation system

    • Model assessment

    35 / 42

  • SCNT recommendation system

    • Click history 2019.03.01 – 2019.03.10• style example

    36 / 42

  • SCNT recommendation system

    • Click history 2019.03.01 – 2019.03.10• recommendation example

    37 / 42

  • SCNT recommendation system

    • Click history 2019.03.01 – 2019.03.10• recommendation example

    38 / 42

  • SCNT recommendation system

    • Click history 2019.06.01 – 2019.06.10• style example

    39 / 42

  • SCNT recommendation system

    • Click history 2019.06.01 – 2019.06.10• recommendation example

    40 / 42

  • SCNT recommendation system

    • Click history 2019.06.01 – 2019.06.10• recommendation example

    41 / 42

  • Thank you!

    42 / 42

    IntroductionLDA modelApproximate posterior inferenceApplication: SCNT recommendation system