CS145: INTRODUCTION TO DATA...

30
CS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun [email protected] December 4, 2017 Text Data: Topic Model

Transcript of CS145: INTRODUCTION TO DATA...

Page 1: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

CS145: INTRODUCTION TO DATA MINING

Instructor: Yizhou [email protected]

December 4, 2017

Text Data: Topic Model

Page 2: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Methods to be Learnt

2

Vector Data Set Data Sequence Data Text Data

Classification Logistic Regression; Decision Tree; KNN;SVM; NN

Naรฏve Bayes for Text

Clustering K-means; hierarchicalclustering; DBSCAN; Mixture Models

PLSA

Prediction Linear RegressionGLM*

Frequent Pattern Mining

Apriori; FP growth GSP; PrefixSpan

Similarity Search DTW

Page 3: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Text Data: Topic Models

โ€ขText Data and Topic Models

โ€ขRevisit of Mixture Model

โ€ขProbabilistic Latent Semantic Analysis (pLSA)

โ€ขSummary

3

Page 4: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Text Data

โ€ขWord/term

โ€ขDocument

โ€ข A sequence of words

โ€ขCorpus

โ€ข A collection of

documents

4

Page 5: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Represent a Document

โ€ขMost common way: Bag-of-Words

โ€ข Ignore the order of words

โ€ข keep the count

5

c1 c2 c3 c4 c5 m1 m2 m3 m4

Vector space model

Page 6: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Topics

โ€ขTopic

โ€ข A topic is represented by a word

distribution

โ€ข Relate to an issue

6

Page 7: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Topic Models

โ€ข Topic modelingโ€ข Get topics automatically from a corpus

โ€ข Assign documents to topics automatically

โ€ข Most frequently used topic modelsโ€ข pLSA

โ€ข LDA

7

Page 8: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Text Data: Topic Models

โ€ขText Data and Topic Models

โ€ขRevisit of Mixture Model

โ€ขProbabilistic Latent Semantic Analysis (pLSA)

โ€ขSummary

8

Page 9: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Mixture Model-Based Clustering

โ€ข A set C of k probabilistic clusters C1, โ€ฆ,Ck

โ€ข probability density/mass functions: f1, โ€ฆ, fk,

โ€ข Cluster prior probabilities: w1, โ€ฆ, wk, ฯƒ๐‘—๐‘ค๐‘— = 1

โ€ข Joint Probability of an object i and its cluster Cj is:

โ€ข๐‘ƒ(๐‘ฅ๐‘– , ๐‘ง๐‘– = ๐ถ๐‘—) = ๐‘ค๐‘—๐‘“๐‘— ๐‘ฅ๐‘–

โ€ข ๐‘ง๐‘–: hidden random variable

โ€ข Probability of i is:

โ€ข ๐‘ƒ ๐‘ฅ๐‘– = ฯƒ๐‘—๐‘ค๐‘—๐‘“๐‘—(๐‘ฅ๐‘–)9

๐‘“1(๐‘ฅ)

๐‘“2(๐‘ฅ)

Page 10: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Maximum Likelihood Estimation

โ€ข Since objects are assumed to be generated

independently, for a data set D = {x1, โ€ฆ, xn},

we have,

๐‘ƒ ๐ท =เท‘

๐‘–

๐‘ƒ ๐‘ฅ๐‘– =เท‘

๐‘–

๐‘—

๐‘ค๐‘—๐‘“๐‘—(๐‘ฅ๐‘–)

โ‡’ ๐‘™๐‘œ๐‘”๐‘ƒ ๐ท =

๐‘–

๐‘™๐‘œ๐‘”๐‘ƒ ๐‘ฅ๐‘– =

๐‘–

๐‘™๐‘œ๐‘”

๐‘—

๐‘ค๐‘—๐‘“๐‘—(๐‘ฅ๐‘–)

โ€ข Task: Find a set C of k probabilistic clusters

s.t. P(D) is maximized10

Page 11: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Gaussian Mixture Model

โ€ขGenerative model

โ€ข For each object:

โ€ข Pick its cluster, i.e., a distribution component: ๐‘~๐‘€๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘ข๐‘™๐‘™๐‘– ๐‘ค1, โ€ฆ , ๐‘ค๐‘˜

โ€ข Sample a value from the selected distribution:

๐‘‹|๐‘~๐‘ ๐œ‡๐‘, ๐œŽ๐‘2

โ€ขOverall likelihood function

โ€ข๐ฟ ๐ท| ๐œƒ = ฯ‚๐‘–ฯƒ๐‘—๐‘ค๐‘—๐‘(๐‘ฅ๐‘–|๐œ‡๐‘— , ๐œŽ๐‘—2)

s.t. ฯƒ๐‘—๐‘ค๐‘— = 1 ๐‘Ž๐‘›๐‘‘ ๐‘ค๐‘— โ‰ฅ 0

11

Page 12: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Multinomial Mixture Model

โ€ข For documents with bag-of-words representationโ€ข ๐’™๐‘‘ = (๐‘ฅ๐‘‘1, ๐‘ฅ๐‘‘2, โ€ฆ , ๐‘ฅ๐‘‘๐‘), ๐‘ฅ๐‘‘๐‘› is the number of words for nth word in the vocabulary

โ€ข Generative model โ€ข For each document

โ€ข Sample its cluster label ๐‘ง~๐‘€๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘ข๐‘™๐‘™๐‘–(๐…)โ€ข ๐… = (๐œ‹1, ๐œ‹2, โ€ฆ , ๐œ‹๐พ), ๐œ‹๐‘˜ is the proportion of kth cluster

โ€ข ๐‘ ๐‘ง = ๐‘˜ = ๐œ‹๐‘˜

โ€ข Sample its word vector ๐’™๐‘‘~๐‘š๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘š๐‘–๐‘Ž๐‘™(๐œท๐‘ง)โ€ข ๐œท๐‘ง = ๐›ฝ๐‘ง1, ๐›ฝ๐‘ง2, โ€ฆ , ๐›ฝ๐‘ง๐‘ , ๐›ฝ๐‘ง๐‘› is the parameter associate with nth word

in the vocabulary

โ€ข ๐‘ ๐’™๐‘‘|๐‘ง = ๐‘˜ =ฯƒ๐‘› ๐‘ฅ๐‘‘๐‘› !

ฯ‚๐‘› ๐‘ฅ๐‘‘๐‘›!ฯ‚๐‘›๐›ฝ๐‘˜๐‘›

๐‘ฅ๐‘‘๐‘› โˆ ฯ‚๐‘›๐›ฝ๐‘˜๐‘›๐‘ฅ๐‘‘๐‘›

12

Page 13: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Likelihood Function

โ€ขFor a set of M documents

๐ฟ =เท‘

๐‘‘

๐‘(๐’™๐‘‘) =เท‘

๐‘‘

๐‘˜

๐‘(๐’™๐‘‘ , ๐‘ง = ๐‘˜)

=เท‘

๐‘‘

๐‘˜

๐‘ ๐’™๐‘‘ ๐‘ง = ๐‘˜ ๐‘(๐‘ง = ๐‘˜)

โˆเท‘

๐‘‘

๐‘˜

๐‘(๐‘ง = ๐‘˜)เท‘

๐‘›

๐›ฝ๐‘˜๐‘›๐‘ฅ๐‘‘๐‘›

13

Page 14: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Mixture of Unigrams

โ€ข For documents represented by a sequence of wordsโ€ข๐’˜๐‘‘ = (๐‘ค๐‘‘1, ๐‘ค๐‘‘2, โ€ฆ , ๐‘ค๐‘‘๐‘๐‘‘

), ๐‘๐‘‘ is the length of document d, ๐‘ค๐‘‘๐‘› is the word at the nth position of the document

โ€ข Generative modelโ€ข For each document

โ€ข Sample its cluster label ๐‘ง~๐‘€๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘ข๐‘™๐‘™๐‘–(๐…)โ€ข ๐… = (๐œ‹1, ๐œ‹2, โ€ฆ , ๐œ‹๐พ), ๐œ‹๐‘˜ is the proportion of kth cluster

โ€ข ๐‘ ๐‘ง = ๐‘˜ = ๐œ‹๐‘˜

โ€ข For each word in the sequenceโ€ข Sample the word ๐‘ค๐‘‘๐‘›~๐‘€๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘ข๐‘™๐‘™๐‘–(๐œท๐‘ง)

โ€ข ๐‘ ๐‘ค๐‘‘๐‘›|๐‘ง = ๐‘˜ = ๐›ฝ๐‘˜๐‘ค๐‘‘๐‘›

14

Page 15: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Likelihood Function

โ€ขFor a set of M documents

๐ฟ =เท‘

๐‘‘

๐‘(๐’˜๐‘‘) =เท‘

๐‘‘

๐‘˜

๐‘(๐’˜๐‘‘ , ๐‘ง = ๐‘˜)

=เท‘

๐‘‘

๐‘˜

๐‘ ๐’˜๐‘‘ ๐‘ง = ๐‘˜ ๐‘(๐‘ง = ๐‘˜)

=เท‘

๐‘‘

๐‘˜

๐‘(๐‘ง = ๐‘˜)เท‘

๐‘›

๐›ฝ๐‘˜๐‘ค๐‘‘๐‘›

15

Page 16: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Question

โ€ขAre multinomial mixture model and mixture of unigrams model equivalent? Why?

16

Page 17: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Text Data: Topic Models

โ€ขText Data and Topic Models

โ€ขRevisit of Mixture Model

โ€ขProbabilistic Latent Semantic Analysis (pLSA)

โ€ขSummary

17

Page 18: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Notations

โ€ขWord, document, topic

โ€ข๐‘ค, ๐‘‘, ๐‘ง

โ€ขWord count in document

โ€ข ๐‘(๐‘ค, ๐‘‘)

โ€ขWord distribution for each topic (๐›ฝ๐‘ง)

โ€ข๐›ฝ๐‘ง๐‘ค: ๐‘(๐‘ค|๐‘ง)

โ€ขTopic distribution for each document (๐œƒ๐‘‘)

โ€ข๐œƒ๐‘‘๐‘ง: ๐‘(๐‘ง|๐‘‘) (Yes, soft clustering)

18

Page 19: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Issues of Mixture of Unigrams

โ€ขAll the words in the same documents are sampled from the same topic

โ€ข In practice, people switch topics during their writing

19

Page 20: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Illustration of pLSA

20

Page 21: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Generative Model for pLSA

โ€ขDescribe how a document is generated probabilistically

โ€ข For each position in d, ๐‘› = 1,โ€ฆ ,๐‘๐‘‘โ€ข Generate the topic for the position as๐‘ง๐‘›~๐‘€๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘ข๐‘™๐‘™๐‘–(๐œฝ๐‘‘), ๐‘–. ๐‘’. , ๐‘ ๐‘ง๐‘› = ๐‘˜ = ๐œƒ๐‘‘๐‘˜

(Note, 1 trial multinomial, i.e., categorical distribution)

โ€ข Generate the word for the position as

๐‘ค๐‘›~๐‘€๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘ข๐‘™๐‘™๐‘–(๐œท๐‘ง๐‘›), ๐‘–. ๐‘’. , ๐‘ ๐‘ค๐‘› = ๐‘ค = ๐›ฝ๐‘ง๐‘›๐‘ค

21

Page 22: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Graphical Model

Note: Sometimes, people add parameters such as ๐œƒ ๐‘Ž๐‘›๐‘‘ ๐›ฝ into the graphical model

22

Page 23: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

The Likelihood Function for a Corpus

โ€ขProbability of a word๐‘ ๐‘ค|๐‘‘ =

๐‘˜

๐‘(๐‘ค, ๐‘ง = ๐‘˜|๐‘‘) =

๐‘˜

๐‘ ๐‘ค ๐‘ง = ๐‘˜ ๐‘ ๐‘ง = ๐‘˜|๐‘‘ =

๐‘˜

๐›ฝ๐‘˜๐‘ค๐œƒ๐‘‘๐‘˜

โ€ขLikelihood of a corpus

23

๐œ‹๐‘‘ ๐‘–๐‘  ๐‘ข๐‘ ๐‘ข๐‘Ž๐‘™๐‘™๐‘ฆ ๐‘๐‘œ๐‘›๐‘ ๐‘–๐‘‘๐‘’๐‘Ÿ๐‘’๐‘‘ ๐‘Ž๐‘  ๐‘ข๐‘›๐‘–๐‘“๐‘œ๐‘Ÿ๐‘š, i.e., 1/M

Page 24: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Re-arrange the Likelihood Function

โ€ขGroup the same word from different positions together

max ๐‘™๐‘œ๐‘”๐ฟ =

๐‘‘๐‘ค

๐‘ ๐‘ค, ๐‘‘ ๐‘™๐‘œ๐‘”

๐‘ง

๐œƒ๐‘‘๐‘ง ๐›ฝ๐‘ง๐‘ค

๐‘ . ๐‘ก.

๐‘ง

๐œƒ๐‘‘๐‘ง = 1 ๐‘Ž๐‘›๐‘‘

๐‘ค

๐›ฝ๐‘ง๐‘ค = 1

24

Page 25: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Optimization: EM Algorithm

โ€ข Repeat until converge

โ€ข E-step: for each word in each document, calculate its conditional

probability belonging to each topic

๐‘ ๐‘ง ๐‘ค, ๐‘‘ โˆ ๐‘ ๐‘ค ๐‘ง, ๐‘‘ ๐‘ ๐‘ง ๐‘‘ = ๐›ฝ๐‘ง๐‘ค๐œƒ๐‘‘๐‘ง (๐‘–. ๐‘’. , ๐‘ ๐‘ง ๐‘ค, ๐‘‘

=๐›ฝ๐‘ง๐‘ค๐œƒ๐‘‘๐‘ง

ฯƒ๐‘งโ€ฒ๐›ฝ๐‘งโ€ฒ๐‘ค๐œƒ๐‘‘๐‘งโ€ฒ)

โ€ข M-step: given the conditional distribution, find the parameters that

can maximize the expected likelihood

๐›ฝ๐‘ง๐‘ค โˆ ฯƒ๐‘‘ ๐‘ ๐‘ง ๐‘ค, ๐‘‘ ๐‘ ๐‘ค, ๐‘‘ (๐‘–. ๐‘’. , ๐›ฝ๐‘ง๐‘ค =ฯƒ๐‘‘ ๐‘ ๐‘ง ๐‘ค, ๐‘‘ ๐‘ ๐‘ค,๐‘‘

ฯƒ๐‘คโ€ฒ,๐‘‘

๐‘ ๐‘ง ๐‘คโ€ฒ, ๐‘‘ ๐‘ ๐‘คโ€ฒ,๐‘‘)

๐œƒ๐‘‘๐‘ง โˆ

๐‘ค

๐‘ ๐‘ง ๐‘ค, ๐‘‘ ๐‘ ๐‘ค, ๐‘‘ (๐‘–. ๐‘’. , ๐œƒ๐‘‘๐‘ง =ฯƒ๐‘ค ๐‘ ๐‘ง ๐‘ค, ๐‘‘ ๐‘ ๐‘ค, ๐‘‘

๐‘๐‘‘)

25

Page 26: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Exampleโ€ข Two documents, two topics

โ€ข Vocabulary: {data, mining, frequent, pattern, web, information, retrieval}

โ€ข At some iteration of EM algorithm, E-step

26

Page 27: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Example (Continued)

โ€ข M-step

27

๐›ฝ11 =0.8 โˆ— 5 + 0.5 โˆ— 2

11.8 + 5.8= 5/17.6

๐›ฝ12 =0.8 โˆ— 4 + 0.5 โˆ— 3

11.8 + 5.8= 4.7/17.6

๐›ฝ13 = 3/17.6๐›ฝ14 = 1.6/17.6๐›ฝ15 = 1.3/17.6๐›ฝ16 = 1.2/17.6๐›ฝ17 = 0.8/17.6

๐œƒ11 =11.8

17

๐œƒ12 =5.2

17

Page 28: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Text Data: Topic Models

โ€ขText Data and Topic Models

โ€ขRevisit of Mixture Model

โ€ขProbabilistic Latent Semantic Analysis (pLSA)

โ€ขSummary

28

Page 29: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Summary

โ€ขBasic Concepts

โ€ข Word/term, document, corpus, topic

โ€ขMixture of unigrams

โ€ขpLSA

โ€ข Generative model

โ€ข Likelihood function

โ€ข EM algorithm

29

Page 30: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2017Fall_CS145/Slides/17Text_TopicModel.pdfCS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun yzsun@cs.ucla.edu

Quiz

โ€ขQ1: Is Multinomial Naรฏve Bayes a linear classifier?

โ€ขQ2: In pLSA, For the same word in different positions in a document, do they have the same conditional probability ๐‘ ๐‘ง ๐‘ค, ๐‘‘ ?

30