A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search
description
Transcript of A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search
![Page 1: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/1.jpg)
1
A Topic Modeling Approach and its Integration into the Random WalkFramework for Academic Search
1Jie Tang, 2Ruoming Jin, and 1Jing Zhang
1Knowledge Engineering Group, Dept. of Computer Science and Technology
Tsinghua University2Department of Computer Science
Kent State UniversityDec. 25th 2008
![Page 2: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/2.jpg)
2
Motivation
However, the results are still not satisfactory …
“Academic search is treated as document search, but ignore
semantics”
![Page 3: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/3.jpg)
3
Examples – Expertise search
Search with keyword
Modeling using VSM Principles of Data Mining.DJ Hand - Drug Safety, 2007 - drugsafety.adisonline.com
Advances in Knowledge Discovery and Data Mining UM Fayyad, G Piatetsky-Shapiro, P Smyth, R…
Data Mining: Concepts and Techniques J Han, M Kamber - 2001…
Return
Search with semantic modeling
Modeling using semantic topics
Data mining
Data mining
Association Rules
Database systems
Data management
Web databases
Information systems
0.4
0.2
0.150.1
0.05
0.02
Topics
Return
ExpertsExpertise
conferences
Expertise papers
Data mining
11
00
1 1 0 1
1 0 1 0 1
0 1
00 1 1 1 1 1
Query
vector
Doc1
vector
Doc3 vector
Doc4 vector
![Page 4: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/4.jpg)
4
1. How to model the heterogeneous academic network?
2. How to capture the link information for ranking objects in the academic network?
Challenges
----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
Cite------------------------------------------------------------------------------------------------------------------------
Cite
Cite
Citewrite
write
write
Co-write
Co-writeCo-author
Co-author
PC memberchair
publish
publish
publish
----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
Cite------------------------------------------------------------------------------------------------------------------------
Cite
Cite
Citewrite
write
write
Co-write
Co-writeCo-author
Co-author
PC memberchair
publish
publish
publish
----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
Cite------------------------------------------------------------------------------------------------------------------------
Cite
Cite
Citewrite
write
write
Co-write
Co-writeCo-author
Co-author
PC memberchair
publish
publish
publish
![Page 5: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/5.jpg)
5
Outline
• Previous Work
• Our Approach– Ranking with Topic Model and Random Walk
• Experimental Results
• Online System—ArnetMiner.org
![Page 6: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/6.jpg)
6
----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
Cite------------------------------------------------------------------------------------------------------------------------
Cite
Cite
Citewrite
write
write
Co-write
Co-writeCo-author
Co-author
PC memberchair
publish
publish
publish
Previous Work
Search with keyword• Language Model [Zhai, 01], VSM, etc.
Search with semantic topics• LSI [Berry,95], pLSI [Hofmann, 99], LDA
[Blei,03] [Wei, 06], etc.
Ranking• PageRank [Page, 99], HITS [Kleinberg, 99],
PopRank [Nie, 05], Link Fusion [Xi, 04], AuthorRank [Liu, 05], etc.
Combining links and contents• A Joint Probabilistic Model [Cohn and
Hofmann, 01], Topical PageRank [Nie, 06], etc.
![Page 7: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/7.jpg)
7
Outline
• Previous Work
• Our Approach– Ranking with Topic Model and Random Walk
• Experimental Results
• Online System—ArnetMiner.org
![Page 8: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/8.jpg)
8
Modeling the Academic Network using
T
DNd
wzxad
β
Φ
α
A
θ
c
T
μ ψ
T
DNd
wzx
ad
β
Φ
α
ACθ
c
T
D
Ndwz
β
Φ
c
η,σ2
ad x
α
A
θ
ACT1 ACT2 ACT3
authors
Topic
words
conference
Author-Conference-Topic Model [Tang et al., 08]
![Page 9: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/9.jpg)
9
Generative Story of ACT1 Model
• Generative process
Shafiei
Milios
1234
NLP
MLDM
IR
1234ML
NLPIR
DM
Latent Dirichlet Co-clustering
Shafiei and Milios
We present a generative model for clustering documents and terms. Our model is a four hierarchical bayesian model. We present efficient inference techniques based on Markow Chain Monte Carlo. We report results in document modeling, document and terms clustering …
ICDM 0.23KDD 0.19….
mining 0.23clustering 0.19classification 0.17….
ICML 0.23NIPS 0.19….
model 0.23learning 0.19boost 0.17….
P(c|z)
P(w|z)
P(c|z)
P(w|z)
clustering
inference
ICDM
Paper
NIPS
![Page 10: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/10.jpg)
10
ACT Model 1
Generative process:
T
DNd
wzxad
β
Φ
α
A
θ
c
T
μ ψ
ACT1
authors
Topic
words
conference
![Page 11: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/11.jpg)
11
Random walk over the academic network
Modeling academic network with topics
Integrating Topic Model into Random Walk
----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
Cite------------------------------------------------------------------------------------------------------------------------
Cite
Cite
Citewrite
write
write
Co-write
Co-writeCo-author
Co-author
PC memberchair
publish
publish
publish+=?
![Page 12: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/12.jpg)
12
Combination Method 1
ISWC
IJCAI
WWW
Tree CRF...EOS...
Association...
Paper Graph Gp
Author Graph Ge
Prof. WangProf. Tang
Jing Zhang
Conference Graph Gc
λde
λed
λcd
λdc
λdd
Stage 1:Random walk
Stage 2.Topic-based relevance
Ranking score
Topic-based relevance score
Combination by multiplication
ISWC
IJCAI
WWW
Tree CRF...EOS...
Association...
Prof. WangProf. Tang
Jing Zhang
Data mining
Query
. . .
. . .
Topic layer
![Page 13: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/13.jpg)
13
Query: ontology alignment
ISWC
IJCAI
WWW
Tree CRF...EOS...
Association...
posowl
Web service
Paper Graph Gp
Author Graph Ge
Prof. WangProf. Tang
Jing Zhang
Conference Graph Gc
Hidden Theme Graph Gt
λde
λed
λcd
λdc
λtdλdt
λqtλtq
λdd
Combination Method 2
Ranking score
Transition probability
![Page 14: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/14.jpg)
14
Outline
• Previous Work
• Our Approach– Ranking with Topic Model and Random Walk
• Experimental Results
• Online System—ArnetMiner.org
![Page 15: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/15.jpg)
15
Experimental Setting
• Arnetminer data: (http://arnetminer.org)– 14,134 authors, 10,716 papers, 1,434 confs/journals– and relationships between them
• Evaluation measures: – pooled relevance + human judgment– P@5, P@10, P@20, R-pre, MAP
• Baselines:– Language Model (LM)– LDA– Author Topic (AT)
![Page 16: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/16.jpg)
16
Discovered Topics
200 topics have been discovered automatically
from the academic network
![Page 17: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/17.jpg)
17
Expertise Search Results
![Page 18: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/18.jpg)
18
Expertise Search Results (cont.)
![Page 19: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/19.jpg)
19
Online System—ArnetMiner(http://arnetminer.org)
Publication
Social Graph
User Interests and Evolution
Basic Profile Information
Social Graph
ExpertsExpertise
conferences
Expertise papers
![Page 20: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/20.jpg)
20
Outline
• Previous Work
• Our Approach– Ranking with Topic Model and Random Walk
• Experimental Results
• Conclusion & Future Work
![Page 21: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/21.jpg)
21
Conclusion & Future Work
• Investigate the problem of modeling heterogeneous academic network using a unified probabilistic model.
• Propose two methods to combine topic models with the random walk framework for academic search.
• Experimental results show that our approach can significantly improve the performance of academic search.
• Our approach is general. Variations of the approach can be applied to many other applications such as social search and blog search.
![Page 22: A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search](https://reader035.fdocuments.in/reader035/viewer/2022070404/56813b49550346895da43560/html5/thumbnails/22.jpg)
22
Thanks!
Q&A & DemoHP: http://keg.cs.tsinghua.edu.cn/persons/tj/
Online URL: http://arnetminer.org