SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH...
-
Upload
aaron-mckenna -
Category
Documents
-
view
220 -
download
0
Transcript of SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH...
![Page 1: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/1.jpg)
SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS
Chen LIN *, Jiang-Ming YANG +, Rui CAI +, Xin-jing WANG +, Wei WANG *, Lei ZHANG +
*Fudan University+Microsoft Research Asia
1
![Page 2: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/2.jpg)
OUTLINE
Motivation Challenges Model Application
Reply reconstruction Junk post detection Expert finding
Experiments Conclusion
2
![Page 3: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/3.jpg)
THREADED DISCUSSIONS
Mailing lists
Chat roomsIMs Web forums
3
root
reply
![Page 4: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/4.jpg)
IMPORTANT DATA SOURCES
4
![Page 5: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/5.jpg)
MINING SEMANTICS & STRUCTURE
5
Junk Identification
Expert Search
Measure post quality
…
![Page 6: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/6.jpg)
CHALLENGE
6
Semantics & Structure
![Page 7: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/7.jpg)
SEMANTIC & STRUCTURE
7
Semantic:Topics
Structure:Who reply to who
![Page 8: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/8.jpg)
CHALLENGE
8
Junk Post
![Page 9: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/9.jpg)
JUNK POST
9
![Page 10: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/10.jpg)
CHALLENGE
10
Post Quality
![Page 11: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/11.jpg)
POST QUALITY
valuable post
11
![Page 12: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/12.jpg)
MODEL
Purpose: Simultaneously modeling semantics Structures
Methodology Intuitive Matrix based Sparse coding
root
reply
12
![Page 13: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/13.jpg)
INTUITION
13
![Page 14: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/14.jpg)
A THREAD HAS SEVERAL TOPICS
14
![Page 15: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/15.jpg)
SEMANTIC REPRESENTATION OF THREAD
D X Θ
Minimize:
post1 post2 … postLword1word2word3…wordV
topic1 … topicTword1word2word3…wordV
post1 post2 … postLtopic1…topicT
15
Project posts to topic space
![Page 16: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/16.jpg)
A POST IS RELATED TO PREVIOUS POSTS
Minimize
16
post1 post2 … postLtopic1…topicTΘ
b:
approximate each post aslinear combination ofprevious posts
![Page 17: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/17.jpg)
A POST IS RELATED TO A FEW TOPICSgovernment
cobol
17
![Page 18: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/18.jpg)
SPARSE SEMANTICS OF POST
D X Θ
Minimize:
post1 post2 … postLword1word2word3…wordV
topic1 … topicTword1word2word3…wordV
post1 post2 … postLtopic1…topicT
18
![Page 19: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/19.jpg)
A POST IS RELATED TO A FEW POSTS
Minimize
19
post1 post2 … postLtopic1…topicT
Θ
Sparse
b:
approximate each post aslinear combination ofprevious posts
![Page 20: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/20.jpg)
OPTIMIZE THEM TOGETHER
Model semantic
Model structure
20
![Page 21: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/21.jpg)
APPLICATIONS
Reply reconstruction Capability of recognizing structure
Junk identification Capability of capturing semantics
Expert finding Capability of measuring post quality
21
![Page 22: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/22.jpg)
REPLY RECONSTRUCTION
22
DocumentSimilarity
TopicSimilarity
StructureSimilarity
![Page 23: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/23.jpg)
DATA SET
Slashdot Apple discussion
23
No.threads 1154
No.posts 203210
Avg.thread len.
176.09
Avg.word/p 73.53
Avg.post/user 15.32
No.threads 4488
No.posts 80008
Avg.thread len.
17.84
Avg.word/p 78.36
Avg.post/user 4.69
![Page 24: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/24.jpg)
BASELINES NP
Reply to Nearest Post RR
Reply to Root DS
Document Similarity LDA
Latent Dirichlet Allocation Project documents to topic space
SWB Special Words Topic Model with Background
distribution Project documents to topic and junk topic space
24
![Page 25: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/25.jpg)
EVALUATION
method Slashdot Apple
All Posts Good Posts All Posts Good Posts
NP 0.021 0.012 0.289 0.239
RR 0.183 0.319 0.269 0.474
DS 0.463 0.643 0.409 0.628
LDA 0.465 0.644 0.410 0.648
SWB 0.463 0.644 0.410 0.641
SMSS 0.524 0.737 0.517 0.772
25
![Page 26: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/26.jpg)
JUNK IDENTIFICATION
D=
X =
Θ =
Probability of junk
post1 post2 … … … postLword1word2word3…wordV
,
topic1 … topicT topicbgword1word2word3…wordV
post1 post2 … … … postLtopic1…topicTtopicbg
26
![Page 27: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/27.jpg)
DATA SET
Slashdot Apple discussion
27
![Page 28: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/28.jpg)
BASELINES
28
DF
SVM Classify posts as junk posts & non-junk posts
SWBSpecial Words Topic Model with
Background distribution Project documents to topic and junk topic space
![Page 29: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/29.jpg)
EVALUATIONMethod Precision Recall F-measure
SWB 0.48 0.22 0.30
SVM 0.37 0.24 0.20
DF 0.34 0.40 0.36
SMSS 0.38 0.45 0.41
29
![Page 30: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/30.jpg)
EXPERT FINDING Methods
HITS
PageRank
…
30
![Page 31: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/31.jpg)
BASELINES LM
Formal Models for Expert Finding in Enterprise Corpora. SIGIR 06
Achieves stable performance in expert finding task using a language model
PageRank Benchmark nodal ranking method
HITS Find hub nodes and authority node
EABIF Personalized Recommendation Driven by
Information Flow. SIGIR ’06 Find most influential node 31
![Page 32: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/32.jpg)
EVALUATION
32
Bayesian estimate
Method MRR MAP P@10
LM 0.821 0.698 0.800
EABIF(ori.) 0.674 0.362 0.243
EABIF(rec.) 0.742 0.318 0.281
PageRank(ori.) 0.675 0.377 0.263
PageRank(rec.)
0.743 0.321 0.266
HITS(ori.) 0.906 0.832 0.900
HITS(rec.) 0.938 0.822 0.906
![Page 33: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/33.jpg)
DISCUSSION
Parameters vs. Model Complexity Linear regression
SMSS model
Though the number of parameters is increased, the projection space is shrunk by the prior knowledge. 33
Prior knowledge
Prior knowledge
![Page 34: SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551519d3550346a80c8b5f74/html5/thumbnails/34.jpg)
CONCLUSION
Purpose Mine the semantics Mine the structure
Highlight Simultaneously model the
Semantic Structure
Applications are designed to evaluate the model Reply reconstruction Junk identification Expert Finding
34