LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter...

13
LATENT DIRICHLET ALLOCATION

Transcript of LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter...

Page 1: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

LATENT DIRICHLET ALLOCATION

Page 2: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

Outline• Introduction• Model Description• Inference and Parameter Estimation• Example• Reference

Page 3: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

Introduction

As more information becomes available, it becomes more difficult to access

what we are looking for.We need new tools to help us organize, search, and understand these vast

amounts of information.

Page 4: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

Introduction

Topic modeling provides methods for automatically organizing, understanding, searching, and summarizing large electronic archives.

• Uncover the hidden topical patterns that pervade the collection. • Annotate the documents according to those topics. • Use the annotations to organize, summarize, and search the texts.

Page 5: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

Intuition behind LDA

GOAL

Page 6: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

Notation and Assumption• We have a set of documents , constituting a

corpus.

• Each document is a collection of words or a “bag of words”. (Exchangeability)

• After elimination of some stopping words, a corpus contains V words: , involve K topic with distributions:

• Each document is composed of N “important” or “Effective” words: and with topic proportions .

Page 7: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

1….. topic …..K

1...nth word..Nd

1…word idx…V

1..topic..K1..doc..M

1..doc..M

Page 8: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

Model Definition

Page 9: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

Dirichlet and Multinomial Distribution • It’s more like such a distribution that is used to describe

another distribution. E.g. Multinomial • Multinomial:

where and • Dirichlet

Where variable \theta can take values in the (k-1) simplex.

Page 10: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

Dirichlet and Multinomial Distribution

Page 11: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

Properties

Page 12: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

LSA & LDA

Page 13: LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

Reference• Latent Dirichlet Allocation, DM Blei, AY Ng, MI jordan

– the journal of machine learning research, 2003• Topic Models Vs. Unstructured Data, G Anthes –

Communications of the ACM, 2010• Probabilistic Topic Models, M Steyvers, T Griffiths –

Handbook of latent sematic analysis, 2007• GibbsSampling for the Uninitiated, P Resnik, E

Hardisty - 2010