Transcript of Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN Email:...
Slide 1
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Deqing Yang, Yanghua Xiao, Bo Xu, Hanghang Tong, Wei
Wang, Sheng Huang School of Computer Science, Fudan University,
China ECML-PKDD2012 Which Topic will You Follow?
Slide 2
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Outline Introduction Preliminaries Empirical Study
Modeling
Slide 3
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Who are the most appropriate candidates to receive a
call-for- paper or call-for-participation? How can you deliver the
call-for-paper emails to the authors who are interested in the
proposed topic instead of flooding it blindly?
Slide 4
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? What session topics should we propose for a conference
of next year? Furthermore, how many sessions are necessary for a
certain topic?
Slide 5
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Can we predict the topic of an authors next paper?
Slide 6
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Basic Idea Use features of authors in Scientific
Collaboration Network (SCN) to model authors topic- following
behavior Two candidate features Social influence an individual
tends to adopt behaviors of his neighbors or friends Homophily the
tendency of individuals to choose friends with similar
characteristics
Slide 7
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Contributions Verify that social influence and
homophily are the two factors determining topic diffusion in SCN
Propose a Multiple Logistic Regression (MLR) model to predict
authors topic-following behavior Conduct extensive experiments to
prove our model has good prediction performance
Slide 8
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Outline Introduction Preliminaries Empirical Study
Modeling
Slide 9
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Scientific Collaboration Network SCN A temporal,
undirected and edge-weighted graph Vertex: author Edge: coauthoring
relationship Edge-weight: number of papers coauthored by the two
ends of the edge Settings DBLP dataset 25 representative
topics
Slide 10
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Homophily We use topic similarity to characterize
homophily A 25-dim vector u represents an authors topic history
Topic similarity between two authors u and v: Topic similarity
between an author u and a group of authors U: is also a 25-dim
vector each dimension of which is i-th topics paper number
published by all users in U
Slide 11
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Outline Introduction Preliminaries Empirical Study
Modeling
Slide 12
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Driving Forces of Topic-Following U=U 0 V 0, U 0 V 0 =
U 0 : the users who have published papers of a given topic before a
certain year V 0 : U 1 ~U 4 N(u) is neighbor set of u U 1: affected
by social influence and homophily U 2 : affected merely by social
influence U 3 : affected merely by homophily U 4 : not affected by
these two forces
Slide 13
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Driving Forces of Topic-Following (cont.) Two forces
are mixed together to impact topic- following Impacts are
time-sensitive and decrease in an exponential way
Slide 14
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Social Influence An author adopts a topic with more
probability when more of his neighbors have followed the topic
before It is more probable for an author to follow the topics that
have been adopted by his neighbors (direct propagation) who have
coauthored more papers with him
Slide 15
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Outline Introduction Preliminaries Empirical Study
Modeling
Slide 16
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Model Variables Model selection Two-category
classification Multiple Logistic Regression (MLR) model Explanatory
Variables Social Influence An author us tendency to follow topic s
in year t
Slide 17
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Explanatory Variables Homophily W.r.t. those users who
have followed topic s before t, i.e.,, we measure us homophily as
Then, the whole MLR model is Baseline ( Anagnostopoulos et al.,2008
) Model Variables (cont.)
Slide 18
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Parameter Estimation By maximum likelihood (training
set in [2004,2008]) 2 has larger Wald value than 1 indicating F TS
(homophily) is more crucial to impact topic-following behavior than
F SI (social influence)
Slide 19
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Evaluation Results Model evaluation Metrics (testing
set in 2009) Recall/sensitivity, specificity, precision, accuracy,
AUC (Area under ROC curve), Results for topic XML AUC: 0.743 vs.
0.638
Slide 20
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Evaluation Results (cont.) For other 4 representative
topics, MLR outperforms the baseline in both accuracy and F
Slide 21
Graph Data Management Lab School of Computer Science
GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email:
[email protected] 2012, Bristol, UK Which Topic will
You Follow? Thank you! Any question is welcome