Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor :...
-
Upload
colin-osborne -
Category
Documents
-
view
219 -
download
3
Transcript of Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor :...
![Page 1: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/1.jpg)
Dynamic Multi-Faceted Topic Discovery in TwitterDate : 2013/11/27Source : CIKM’13Advisor : Dr.Jia-ling, KohSpeaker : Wei, Chang
1
![Page 2: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/2.jpg)
Outline• Introduction• Approach• Experiment• Conclusion
2
![Page 3: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/3.jpg)
3
![Page 4: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/4.jpg)
What are they talking about?• Entity-centric• High dynamic
4
![Page 5: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/5.jpg)
Multiple facets of a topic discussed in Twitter
5
![Page 6: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/6.jpg)
Goal
6
![Page 7: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/7.jpg)
Outline• Introduction• Approach• Framework• Pre-processing• LDA• MfTM
• Experiment• Conclusion
7
![Page 8: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/8.jpg)
Framework
8
Training document
Model(hyper parameter)
Per document DocumentVector
Pre-processing
Pre-processing
![Page 9: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/9.jpg)
Pre-processing• Convert to lower-case• Remove punctuation and numbers• “Goooood” to “good”• Remove stop words• Named entity recognition• Entity types : person, organization, location, general terms• Linked Web : http://nlp.stanford.edu/ner/• Tweet : http://github.com/aritter/twitter_nlp
• All user’s posts published during the same day are grouped as a document
9
![Page 10: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/10.jpg)
Latent Dirichlet Allocation
• Each document may be viewed as a mixture of various topics.• The topic distribution is assumed to have
a Dirichlet prior.• Unsupervised learning• Need to initialize the topic number K
•Not Linear discriminant analysis (LDA)
10
![Page 11: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/11.jpg)
Example• I like to eat broccoli and bananas.• I ate a banana and spinach smoothie for breakfast.• Chinchillas and kittens are cute.• My sister adopted a kitten yesterday.• Look at this cute hamster munching on a piece of broccoli.
Topic 1
Topic 2
: food
: cute animals
11
![Page 12: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/12.jpg)
How LDA write a document?
Topic 2Topic 1
broccoli
munching
breakfast
bananas
kittens
chinchillas
cute
hamster
12
![Page 13: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/13.jpg)
Real World Example
13
![Page 14: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/14.jpg)
LDA Plate Annotation
14
, , , ,
𝛽=[0 .7 0.2 0.10.3 0.8 0.9
0 .8 0.4 0.70.2 0.6 0.3
0 .8 0.60.2 0.4 ]
Different implies different for every document.Each decide the fraction of each topic.
Different implies different topic mixture to each word.
![Page 15: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/15.jpg)
LDA
15
𝐷={𝑤1 ,𝑤2 ,𝑤3 ,…,𝑤𝑀 }
![Page 16: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/16.jpg)
How to find • EM algorithm• Gibbs sampling• Stochastic Variational Inference (SVI)
16
![Page 17: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/17.jpg)
Multi-Faceted Topic Model
17
![Page 18: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/18.jpg)
Outline• Introduction• Approach• Experiment• Conclusion
18
![Page 19: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/19.jpg)
Perplexity Evaluation• Perplexity is algebraicly equivalent to the inverse of the
geometric mean per-word likelihood.
• M is the model learned from the training dataset, is the word vector for document d and is the number of words in d.
19
![Page 20: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/20.jpg)
Perplexity Evaluation
20
![Page 21: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/21.jpg)
KL-divergence• P={1/6, 1/6, 1/6, 1/6, 1/6, 1/6}• Q={1/10, 1/10, 1/10, 1/10, 1/10, 1/2}
• KL is a non-symmetric measure 21
+++
![Page 22: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/22.jpg)
KL-divergence
22
![Page 23: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/23.jpg)
Scalability• A standard PC with a dual-core CPU, 4GB RAM and a 600GB
hard-drive
23
![Page 24: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/24.jpg)
Outline• Introduction• Approach• Experiment• Conclusion
24
![Page 25: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.](https://reader034.fdocuments.in/reader034/viewer/2022052603/5697c02b1a28abf838cd8791/html5/thumbnails/25.jpg)
Conclusion• We propose a novel Multi-Faceted Topic Model. The model
extracts semantically-rich latent topics, including general terms mentioned in the topic, named entities and a temporal distribution
25