CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step:...

38
CS145: INTRODUCTION TO DATA MINING Instructor: Yizhou Sun [email protected] December 3, 2018 Text Data: Topic Model

Transcript of CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step:...

Page 1: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

CS145: INTRODUCTION TO DATA MINING

Instructor: Yizhou [email protected]

December 3, 2018

Text Data: Topic Model

Page 2: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Methods to be Learnt

2

Vector Data Set Data Sequence Data Text Data

Classification Logistic Regression; Decision Tree; KNN;SVM; NN

NaΓ―ve Bayes for Text

Clustering K-means; hierarchicalclustering; DBSCAN; Mixture Models

PLSA

Prediction Linear RegressionGLM*

Frequent Pattern Mining

Apriori; FP growth GSP; PrefixSpan

Similarity Search DTW

Page 3: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Text Data: Topic Modelsβ€’Text Data and Topic Models

β€’Revisit of Mixture Model

β€’Probabilistic Latent Semantic Analysis (pLSA)

β€’Summary3

Page 4: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Text Data

β€’Word/termβ€’Document

β€’ A sequence of words

β€’Corpusβ€’ A collection of

documents

4

Page 5: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Represent a Documentβ€’Most common way: Bag-of-Words

β€’ Ignore the order of words

β€’ keep the count

5

c1 c2 c3 c4 c5 m1 m2 m3 m4

Vector space model

Page 6: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Topicsβ€’Topic

β€’ A topic is represented by a word distribution

β€’ Relate to an issue

6

Page 7: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Topic Models

β€’ Topic modelingβ€’ Get topics automatically from a corpus

β€’ Assign documents to topics automatically

β€’ Most frequently used topic modelsβ€’ pLSAβ€’ LDA

7

Page 8: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Text Data: Topic Modelsβ€’Text Data and Topic Models

β€’Revisit of Mixture Model

β€’Probabilistic Latent Semantic Analysis (pLSA)

β€’Summary8

Page 9: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Mixture Model-Based Clusteringβ€’ A set C of k probabilistic clusters C1, …,Ck

β€’ probability density/mass functions: f1, …, fk,

β€’ Cluster prior probabilities: w1, …, wk, βˆ‘π‘—π‘—π‘€π‘€π‘—π‘— = 1

β€’ Joint Probability of an object i and its cluster Cj is:‒𝑃𝑃(π‘₯π‘₯𝑖𝑖 , 𝑧𝑧𝑖𝑖 = 𝐢𝐢𝑗𝑗) = 𝑀𝑀𝑗𝑗𝑓𝑓𝑗𝑗 π‘₯π‘₯𝑖𝑖‒ 𝑧𝑧𝑖𝑖: hidden random variable

β€’ Probability of i is:β€’ 𝑃𝑃 π‘₯π‘₯𝑖𝑖 = βˆ‘π‘—π‘—π‘€π‘€π‘—π‘—π‘“π‘“π‘—π‘—(π‘₯π‘₯𝑖𝑖)

9

𝑓𝑓1(π‘₯π‘₯)

𝑓𝑓2(π‘₯π‘₯)

Page 10: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Maximum Likelihood Estimationβ€’ Since objects are assumed to be generated independently, for a data set D = {x1, …, xn}, we have,

𝑃𝑃 𝐷𝐷 = �𝑖𝑖

𝑃𝑃 π‘₯π‘₯𝑖𝑖 = �𝑖𝑖

�𝑗𝑗

𝑀𝑀𝑗𝑗𝑓𝑓𝑗𝑗(π‘₯π‘₯𝑖𝑖)

β‡’ 𝑙𝑙𝑙𝑙𝑙𝑙𝑃𝑃 𝐷𝐷 = �𝑖𝑖

𝑙𝑙𝑙𝑙𝑙𝑙𝑃𝑃 π‘₯π‘₯𝑖𝑖 = �𝑖𝑖

𝑙𝑙𝑙𝑙𝑙𝑙�𝑗𝑗

𝑀𝑀𝑗𝑗𝑓𝑓𝑗𝑗(π‘₯π‘₯𝑖𝑖)

β€’ Task: Find a set C of k probabilistic clusters s.t. P(D) is maximized

10

Page 11: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Gaussian Mixture Modelβ€’Generative model

β€’ For each object:β€’ Pick its cluster, i.e., a distribution component: 𝑍𝑍~𝑀𝑀𝑀𝑀𝑙𝑙𝑀𝑀𝑀𝑀𝑀𝑀𝑙𝑙𝑀𝑀𝑙𝑙𝑙𝑙𝑀𝑀 𝑀𝑀1, … ,π‘€π‘€π‘˜π‘˜

β€’ Sample a value from the selected distribution: 𝑋𝑋|𝑍𝑍~𝑁𝑁 πœ‡πœ‡π‘π‘,πœŽπœŽπ‘π‘2

β€’Overall likelihood function‒𝐿𝐿 𝐷𝐷| πœƒπœƒ = βˆπ‘–π‘– βˆ‘π‘—π‘— 𝑀𝑀𝑗𝑗𝑝𝑝(π‘₯π‘₯𝑖𝑖|πœ‡πœ‡π‘—π‘— ,πœŽπœŽπ‘—π‘—2)s.t. βˆ‘π‘—π‘— 𝑀𝑀𝑗𝑗 = 1 π‘Žπ‘Žπ‘€π‘€π‘Žπ‘Ž 𝑀𝑀𝑗𝑗 β‰₯ 0

11

Page 12: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Multinomial Mixture Modelβ€’ For documents with bag-of-words representationβ€’ 𝒙𝒙𝑑𝑑 = (π‘₯π‘₯𝑑𝑑1, π‘₯π‘₯𝑑𝑑2, … , π‘₯π‘₯𝑑𝑑𝑑𝑑), π‘₯π‘₯𝑑𝑑𝑛𝑛 is the number of words for nth word in the vocabulary

β€’ Generative model β€’ For each document

β€’ Sample its cluster label 𝑧𝑧~𝑀𝑀𝑀𝑀𝑙𝑙𝑀𝑀𝑀𝑀𝑀𝑀𝑙𝑙𝑀𝑀𝑙𝑙𝑙𝑙𝑀𝑀(𝝅𝝅)β€’ 𝝅𝝅 = (πœ‹πœ‹1,πœ‹πœ‹2, … ,πœ‹πœ‹πΎπΎ), πœ‹πœ‹π‘˜π‘˜ is the proportion of kth clusterβ€’ 𝑝𝑝 𝑧𝑧 = π‘˜π‘˜ = πœ‹πœ‹π‘˜π‘˜

β€’ Sample its word vector 𝒙𝒙𝑑𝑑~π‘šπ‘šπ‘€π‘€π‘™π‘™π‘€π‘€π‘€π‘€π‘€π‘€π‘™π‘™π‘šπ‘šπ‘€π‘€π‘Žπ‘Žπ‘™π‘™(πœ·πœ·π‘§π‘§)β€’ πœ·πœ·π‘§π‘§ = 𝛽𝛽𝑧𝑧1,𝛽𝛽𝑧𝑧2, … ,𝛽𝛽𝑧𝑧𝑑𝑑 ,𝛽𝛽𝑧𝑧𝑛𝑛 is the parameter associate with nth word

in the vocabulary

β€’ 𝑝𝑝 𝒙𝒙𝑑𝑑|𝑧𝑧 = π‘˜π‘˜ = βˆ‘π‘›π‘› π‘₯π‘₯𝑑𝑑𝑛𝑛 !βˆπ‘›π‘› π‘₯π‘₯𝑑𝑑𝑛𝑛!

βˆπ‘›π‘›π›½π›½π‘˜π‘˜π‘›π‘›π‘₯π‘₯𝑑𝑑𝑛𝑛 ∝ βˆπ‘›π‘› π›½π›½π‘˜π‘˜π‘›π‘›

π‘₯π‘₯𝑑𝑑𝑛𝑛

12

Page 13: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Likelihood Functionβ€’For a set of M documents

𝐿𝐿 = �𝑑𝑑

𝑝𝑝(𝒙𝒙𝑑𝑑) = �𝑑𝑑

οΏ½π‘˜π‘˜

𝑝𝑝(𝒙𝒙𝑑𝑑 , 𝑧𝑧 = π‘˜π‘˜)

= �𝑑𝑑

οΏ½π‘˜π‘˜

𝑝𝑝 𝒙𝒙𝑑𝑑 𝑧𝑧 = π‘˜π‘˜ 𝑝𝑝(𝑧𝑧 = π‘˜π‘˜)

βˆοΏ½π‘‘π‘‘

οΏ½π‘˜π‘˜

𝑝𝑝(𝑧𝑧 = π‘˜π‘˜)�𝑛𝑛

π›½π›½π‘˜π‘˜π‘›π‘›π‘₯π‘₯𝑑𝑑𝑛𝑛

13

Page 14: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Mixture of Unigramsβ€’ For documents represented by a sequence of wordsβ€’π’˜π’˜π‘‘π‘‘ = (𝑀𝑀𝑑𝑑1,𝑀𝑀𝑑𝑑2, … ,𝑀𝑀𝑑𝑑𝑑𝑑𝑑𝑑), 𝑁𝑁𝑑𝑑 is the length of document d, 𝑀𝑀𝑑𝑑𝑛𝑛 is the word at the nth position of the document

β€’ Generative modelβ€’ For each document

β€’ Sample its cluster label 𝑧𝑧~𝑀𝑀𝑀𝑀𝑙𝑙𝑀𝑀𝑀𝑀𝑀𝑀𝑙𝑙𝑀𝑀𝑙𝑙𝑙𝑙𝑀𝑀(𝝅𝝅)β€’ 𝝅𝝅 = (πœ‹πœ‹1,πœ‹πœ‹2, … ,πœ‹πœ‹πΎπΎ), πœ‹πœ‹π‘˜π‘˜ is the proportion of kth clusterβ€’ 𝑝𝑝 𝑧𝑧 = π‘˜π‘˜ = πœ‹πœ‹π‘˜π‘˜

β€’ For each word in the sequenceβ€’ Sample the word 𝑀𝑀𝑑𝑑𝑛𝑛~𝑀𝑀𝑀𝑀𝑙𝑙𝑀𝑀𝑀𝑀𝑀𝑀𝑙𝑙𝑀𝑀𝑙𝑙𝑙𝑙𝑀𝑀(πœ·πœ·π‘§π‘§)β€’ 𝑝𝑝 𝑀𝑀𝑑𝑑𝑛𝑛|𝑧𝑧 = π‘˜π‘˜ = π›½π›½π‘˜π‘˜π‘€π‘€π‘‘π‘‘π‘›π‘›

14

Page 15: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Likelihood Functionβ€’For a set of M documents

𝐿𝐿 = �𝑑𝑑

𝑝𝑝(π’˜π’˜π‘‘π‘‘) = �𝑑𝑑

οΏ½π‘˜π‘˜

𝑝𝑝(π’˜π’˜π‘‘π‘‘ , 𝑧𝑧 = π‘˜π‘˜)

= �𝑑𝑑

οΏ½π‘˜π‘˜

𝑝𝑝 π’˜π’˜π‘‘π‘‘ 𝑧𝑧 = π‘˜π‘˜ 𝑝𝑝(𝑧𝑧 = π‘˜π‘˜)

= �𝑑𝑑

οΏ½π‘˜π‘˜

𝑝𝑝(𝑧𝑧 = π‘˜π‘˜)�𝑛𝑛

π›½π›½π‘˜π‘˜π‘€π‘€π‘‘π‘‘π‘›π‘›

15

Page 16: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Questionβ€’Are multinomial mixture model and mixture of unigrams model equivalent? Why?

16

Page 17: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Text Data: Topic Modelsβ€’Text Data and Topic Models

β€’Revisit of Mixture Model

β€’Probabilistic Latent Semantic Analysis (pLSA)

β€’Summary17

Page 18: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Notationsβ€’Word, document, topic

‒𝑀𝑀,π‘Žπ‘Ž, 𝑧𝑧‒Word count in document

β€’ 𝑐𝑐(𝑀𝑀,π‘Žπ‘Ž)β€’Word distribution for each topic (𝛽𝛽𝑧𝑧)

‒𝛽𝛽𝑧𝑧𝑀𝑀:𝑝𝑝(𝑀𝑀|𝑧𝑧)β€’Topic distribution for each document (πœƒπœƒπ‘‘π‘‘)

β€’πœƒπœƒπ‘‘π‘‘π‘§π‘§:𝑝𝑝(𝑧𝑧|π‘Žπ‘Ž) (Yes, soft clustering)

18

Page 19: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Issues of Mixture of Unigramsβ€’All the words in the same documents are sampled from the same topic

β€’ In practice, people switch topics during their writing

19

Page 20: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Illustration of pLSA

20

Page 21: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Generative Model for pLSA

β€’Describe how a document d is generated probabilisticallyβ€’ For each position in d, 𝑀𝑀 = 1, … ,𝑁𝑁𝑑𝑑

β€’ Generate the topic for the position as𝑧𝑧𝑛𝑛~𝑀𝑀𝑀𝑀𝑙𝑙𝑀𝑀𝑀𝑀𝑀𝑀𝑙𝑙𝑀𝑀𝑙𝑙𝑙𝑙𝑀𝑀(πœ½πœ½π‘‘π‘‘), 𝑀𝑀. 𝑒𝑒. , 𝑝𝑝 𝑧𝑧𝑛𝑛 = π‘˜π‘˜ = πœƒπœƒπ‘‘π‘‘π‘˜π‘˜

(Note, 1 trial multinomial, i.e., categorical distribution)

β€’ Generate the word for the position as𝑀𝑀𝑛𝑛|𝑧𝑧𝑛𝑛~𝑀𝑀𝑀𝑀𝑙𝑙𝑀𝑀𝑀𝑀𝑀𝑀𝑙𝑙𝑀𝑀𝑙𝑙𝑙𝑙𝑀𝑀(πœ·πœ·π‘§π‘§π‘›π‘›), 𝑀𝑀. 𝑒𝑒. ,𝑝𝑝 𝑀𝑀𝑛𝑛 = 𝑀𝑀|𝑧𝑧𝑛𝑛 =

𝛽𝛽𝑧𝑧𝑛𝑛𝑀𝑀

21

Page 22: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Graphical Model

Note: Sometimes, people add parameters such as πœƒπœƒ π‘Žπ‘Žπ‘€π‘€π‘Žπ‘Ž 𝛽𝛽 into the graphical model

22

Page 23: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

The Likelihood Function for a Corpusβ€’Probability of a word𝑝𝑝 𝑀𝑀|π‘Žπ‘Ž = οΏ½

π‘˜π‘˜

𝑝𝑝(𝑀𝑀, 𝑧𝑧 = π‘˜π‘˜|π‘Žπ‘Ž) = οΏ½π‘˜π‘˜

𝑝𝑝 𝑀𝑀 𝑧𝑧 = π‘˜π‘˜ 𝑝𝑝 𝑧𝑧 = π‘˜π‘˜|π‘Žπ‘Ž = οΏ½π‘˜π‘˜

π›½π›½π‘˜π‘˜π‘€π‘€πœƒπœƒπ‘‘π‘‘π‘˜π‘˜

β€’Likelihood of a corpus

23

πœ‹πœ‹π‘‘π‘‘ 𝑀𝑀𝑖𝑖 π‘€π‘€π‘–π‘–π‘€π‘€π‘Žπ‘Žπ‘™π‘™π‘™π‘™π‘’π‘’ π‘π‘π‘™π‘™π‘€π‘€π‘–π‘–π‘€π‘€π‘Žπ‘Žπ‘’π‘’π‘π‘π‘’π‘’π‘Žπ‘Ž π‘Žπ‘Žπ‘–π‘– π‘€π‘€π‘€π‘€π‘€π‘€π‘“π‘“π‘™π‘™π‘π‘π‘šπ‘š, i.e., 1/M

Page 24: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Re-arrange the Likelihood Functionβ€’Group the same word from different positions together

max 𝑙𝑙𝑙𝑙𝑙𝑙𝐿𝐿 = �𝑑𝑑𝑀𝑀

𝑐𝑐 𝑀𝑀,π‘Žπ‘Ž 𝑙𝑙𝑙𝑙𝑙𝑙�𝑧𝑧

πœƒπœƒπ‘‘π‘‘π‘§π‘§ 𝛽𝛽𝑧𝑧𝑀𝑀

𝑖𝑖. 𝑀𝑀.�𝑧𝑧

πœƒπœƒπ‘‘π‘‘π‘§π‘§ = 1 π‘Žπ‘Žπ‘€π‘€π‘Žπ‘Ž �𝑀𝑀

𝛽𝛽𝑧𝑧𝑀𝑀 = 1

24

Page 25: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Optimization: EM Algorithmβ€’ Repeat until converge

β€’ E-step: for each word in each document, calculate its conditional probability belonging to each topic𝑝𝑝 𝑧𝑧 𝑀𝑀,π‘Žπ‘Ž ∝ 𝑝𝑝 𝑀𝑀 𝑧𝑧,π‘Žπ‘Ž 𝑝𝑝 𝑧𝑧 π‘Žπ‘Ž = π›½π›½π‘§π‘§π‘€π‘€πœƒπœƒπ‘‘π‘‘π‘§π‘§ (𝑀𝑀. 𝑒𝑒. ,𝑝𝑝 𝑧𝑧 𝑀𝑀,π‘Žπ‘Ž

=π›½π›½π‘§π‘§π‘€π‘€πœƒπœƒπ‘‘π‘‘π‘§π‘§

βˆ‘π‘§π‘§π‘§ π›½π›½π‘§π‘§π‘§π‘€π‘€πœƒπœƒπ‘‘π‘‘π‘§π‘§π‘§)

β€’ M-step: given the conditional distribution, find the parameters that can maximize the expected complete log-likelihood

𝛽𝛽𝑧𝑧𝑀𝑀 ∝ βˆ‘π‘‘π‘‘ 𝑝𝑝 𝑧𝑧 𝑀𝑀,π‘Žπ‘Ž 𝑐𝑐 𝑀𝑀,π‘Žπ‘Ž (𝑀𝑀. 𝑒𝑒. ,𝛽𝛽𝑧𝑧𝑀𝑀 = βˆ‘π‘‘π‘‘ 𝑝𝑝 𝑧𝑧 𝑀𝑀,π‘Žπ‘Ž 𝑐𝑐 𝑀𝑀,π‘‘π‘‘βˆ‘π‘€π‘€β€²,𝑑𝑑 𝑝𝑝 𝑧𝑧 𝑀𝑀

𝑧,π‘Žπ‘Ž 𝑐𝑐 𝑀𝑀′,𝑑𝑑)

πœƒπœƒπ‘‘π‘‘π‘§π‘§ βˆοΏ½π‘€π‘€

𝑝𝑝 𝑧𝑧 𝑀𝑀,π‘Žπ‘Ž 𝑐𝑐 𝑀𝑀,π‘Žπ‘Ž (𝑀𝑀. 𝑒𝑒. ,πœƒπœƒπ‘‘π‘‘π‘§π‘§ =βˆ‘π‘€π‘€ 𝑝𝑝 𝑧𝑧 𝑀𝑀,π‘Žπ‘Ž 𝑐𝑐 𝑀𝑀,π‘Žπ‘Ž

𝑁𝑁𝑑𝑑)

25

Page 26: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Exampleβ€’ Two documents, two topics

β€’ Vocabulary: {1: data, 2: mining, 3: frequent, 4: pattern, 5: web, 6: information, 7: retrieval}

β€’ At some iteration of EM algorithm, E-step

26

Page 27: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Example (Continued)β€’ M-step

27

𝛽𝛽11 =0.8 βˆ— 5 + 0.5 βˆ— 2

11.8 + 5.8= 5/17.6

𝛽𝛽12 =0.8 βˆ— 4 + 0.5 βˆ— 3

11.8 + 5.8= 4.7/17.6

𝛽𝛽13 = 3/17.6𝛽𝛽14 = 1.6/17.6𝛽𝛽15 = 1.3/17.6𝛽𝛽16 = 1.2/17.6𝛽𝛽17 = 0.8/17.6

πœƒπœƒ11 =11.817

πœƒπœƒ12 =5.217

Page 28: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Text Data: Topic Modelsβ€’Text Data and Topic Models

β€’Revisit of Mixture Model

β€’Probabilistic Latent Semantic Analysis (pLSA)

β€’Summary28

Page 29: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Summaryβ€’Basic Concepts

β€’ Word/term, document, corpus, topic

β€’Mixture of unigramsβ€’pLSA

β€’ Generative model

β€’ Likelihood function

β€’ EM algorithm

29

Page 30: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

CS145: INTRODUCTION TO DATA MINING

Instructor: Yizhou [email protected]

December 5, 2018

Final Review

Page 31: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Learnt Algorithms

31

Vector Data Set Data Sequence Data Text Data

Classification Logistic Regression; Decision Tree; KNN;SVM; NN

NaΓ―ve Bayes for Text

Clustering K-means; hierarchicalclustering; DBSCAN; Mixture Models

PLSA

Prediction Linear RegressionGLM*

Frequent Pattern Mining

Apriori; FP growth GSP; PrefixSpan

Similarity Search DTW

Page 32: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Deadlines aheadβ€’Homework 6 (optional)

β€’ Sunday, Dec. 9, 2018, 11:59pmβ€’ We will pick 5 highest score

β€’Course projectβ€’ Tuesday, Dec. 11, 2018, 11:59 pm.β€’ Don’t forget peer evaluation form:

β€’ One question is: β€œDo you think some member in your group should be given a lower score than the group score? If yes, please list the name, and explain why.”

32

Page 33: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Final Examβ€’ Time

β€’ 12/13, 11:30am-2:30pmβ€’ Location

β€’ Franz Hall: 1260β€’ Policy

β€’ Closed book examβ€’ You can take two β€œreference sheets” of A4 size, i.e., one in addition to the midterm β€œreference sheet”

β€’ You can bring a simple calculator

33

Page 34: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Content to Coverβ€’All the content learned so far

β€’ ~20% before midterm

β€’ ~80% after midterm

34

Vector Data Set Data SequenceData

Text Data

Classification Logistic Regression; Decision Tree; KNN;SVM; NN

NaΓ―ve Bayes for Text

Clustering K-means; hierarchicalclustering; DBSCAN; Mixture Models

PLSA

Prediction Linear RegressionGLM*

Frequent Pattern Mining

Apriori; FP growth

GSP;PrefixSpan

Similarity Search

DTW

Page 35: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Type of Questionsβ€’Similar to Midterm

β€’ True or false

β€’ Conceptual questions

β€’ Computation questions

35

Page 36: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Sample question on DTWβ€’Suppose that we have two sequences S1 and S2 as follows:β€’ S1 = <1, 2, 5, 3, 2, 1, 7>

β€’ S2 = <2, 3, 2, 1, 7, 4, 3, 0, 2, 5>

β€’Compute the distance between two sequences according to the dynamic time warping algorithm.

36

Page 37: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

What’s next?β€’ Following-up courses

β€’ CS247: Advanced Data Miningβ€’ Focus on Text, Recommender Systems, and

Networks/Graphsβ€’ Will be offered in Spring 2019

β€’ CS249: Probabilistic Graphical Models for Structured Dataβ€’ Focus on Probabilistic Models on text and graph dataβ€’ Research seminar: You are expected to read papers

and present to the whole class; you are expected to write a survey or conduct a course project

β€’ Will be offered in Winter 201937

Page 38: CS145: INTRODUCTION TO DATA MININGweb.cs.ucla.edu/~yzsun/classes/2018Fall_CS145/Slides/16...E-step: for each word in each document, calculate its conditional probability belonging

Thank you and good luck!β€’Give us feedback by submitting evaluation form

β€’1-2 undergraduate research intern positions are availableβ€’ Please refer to piazza: https://piazza.com/class/jmpc925zfo92qs?cid=274

38