Post on 03-Jan-2016
UBLF: An Upper Bound Based Approach to Discover Influential No
des in Social Networks
Authors: C. Zhou, P. Zhang, J. Guo, X. Zhu, L. GuoPresenter: Peng Zhang, Chinese Academy of Sciences
December 7-10, 2013, Dallas, Texas
IEEE ICDM 2013
Content
• Background• Problem Formulation• Related work• Our solution• Experiments• Conclusion
Background
Social networks are popularly used– Viral marketing– Information dissemination– Technology/Idea transfers
Influence propagation– Influence maximization– Community detection– Influence inference– Early warning of public opinion– Link Prediction/Friends Recommendation– Partner Recommendation/
Social Cooperation/Team Formation
Problem Formulation
Given a directed social graph G=(V,E), a budget k, and a stochastic propagation model M, finding k nodes, such that the expected spread of the influence can be maximized [Kemp KDD’03]
Challenges:
How to measure the objective function M(S) ?
How to find the optimal solution, i.e., the subset k of the most influential nodes?
Problem Formulation
How to measure the influence M(S) ?
Stochastic propagation models
IC model LT model Other propagation models: e.g.
continuous time IC or LT models
Monte Carlo (MC) simulation
Exact calculation under IC and LT is #P-hard (Chen, KDD’ 10).
IC propagation model
.1
.1
.1
.1
.1
.2
.2
.2
.2
.3
.3
.3
.3
.4
.4
.4
.4
.4
.1
.1
b
a
c
fe
d
g
h
I
#P-hard
Greedy Algorithm
How to find a subset k containing the most influential nodes
Influence maximization under both IC and LT models is NP-hard . (Kemp, KDD’03)
Property 1: M(S) is monotone:
Property 2: M(S) is submodular : The set cover problem
Greedy Algorithm
Advantage: Performance guarantee of 1− 1/e =63%
Disadvantage: Heavy computation cost Inner loop : M(S) needs many Monte-Carlo simulations Outer loop : time complexity of O(Nk), where N is network
size
Improvement direction (I): Heuristic algorithms
• Heuristic algorithms– ShortestPath: Kimura and Saito (PKDD’06) “Tractable models for information diffusion in social networks”
– DegreeDiscount: Chen et al. (KDD'09) “Efficient influence maximization in social networks”
– MIA: Chen et al. (KDD'10) “Scalable influence maximization for prevalent viral marketing in large-scale social networks”
– DAG: Chen et al. (ICDM’10) “Scalable influence maximization in social networks under the linear threshold model”
– SIMPATH : Goyal et al. (ICDM’11)“SIMPATH: An Efficient Algorithm for Influence Maximization under the Linear Threshold Model”
Shortest Path from a to c
d e
fg
2
5DegreeDiscount
Node 2’s degree will shrink
Advantage: faster than the Greedy algorithm Disadvantage: no performance guarantee
• Advanced greedy algorithms– CELF : Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks”– Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence
maximization in social networks”
a
b
c
ab
c
d
d
reward
e
e
Greedy algorithm
Improvement direction (II): Advanced greedy
Improvement direction (II): Advanced greedy
a
b
c
ab
c
d
d
reward
e
e
Greedy algorithm
• Advanced greedy algorithms– CELF : Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks”– Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence
maximization in social networks”
a
b
c
ab
c
d
d
reward
e
e
Greedy algorithm
a
b
c
ab
c
d
d
reward
e
e
CELF algorithm
Improvement direction (II): Advanced greedy
• Advanced greedy algorithms– CELF : Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks”– Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence
maximization in social networks”
a
b
c
ab
c
d
d
reward
e
e
Greedy algorithm
a
ab
c
d
d
b
c
reward
e
e
CELF algorithm
Improvement direction (II): Advanced greedy
• Advanced greedy algorithms– CELF : Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks”– Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence
maximization in social networks”
a
b
c
ab
c
d
d
reward
e
e
Greedy algorithm
a
c
ab
c
d
d
b
reward
e
e
CELF algorithm
Improvement direction (II): Advanced greedy
• Advanced greedy algorithms– CELF : Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks”– Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence
maximization in social networks”
Advantage: by setting up an upper bound, CELF reduces the Monte-Carlo calls and improves the greedy algorithm by up to 700 times
Disadvantage: needs N Monte Carlo simulations to initialize the upper bound, where N is the network size.
Our work
Motivation Can we initialize the upper bounds without actually computing
the MC simulations ?
Node upper bound MC
a 2.1 1
b 1.5 1
c 1.1 1
d 1.8 1
e 1.2 1
Node Upper bound MC
a 2.3 0
b 1.7 0
c 1.2 0
d 1.8 0
e 1.2 0
UBLF algorithmCELF algorithmCELF algorithm UBLF algorithmUBLF algorithm
The upper bound of M(S)
Global view
Local view
Proposition 2 establishes a relationship among the activation probabilties in time t and t+1.
How many heads?
The upper bound of M(S)
M(S) is bounded by a sum of series.
In what condition the series convergent? and what is the limit?
Its aera?
But we know its upper bound!
Too hard!
Convergent condition : the total influence to or from any node is less than 1.
Under condition (14), we get a tractable upper bound.
The upper bound of M(S)
+……=
Our UBLF algorithm
• CELF: the first round is time-consuming, needs full MC simulations.
• UBLF: the first round is analytical calculated.
Our work: An example for UBLF
Monte-Carlo Simulation
Node 1 is selected! (only 1 time MC simulation)
Experiments
• Data collection– Ca-GrQc – Digger– Ca-HepPh– Email-Enron
• Benchmark– CELF– Degree– DegreeDiscount– PageRank– Random
Statistics of datas
Experiments
• Comparison results (Numbers of MC simulations)
Observation:
The total MC calls of UBLF is significantly reduced compared to CELF.
Experiments
• Comparison results (Influence spread)
Observations:
The spreads of UBLF and CELF are completely identical, which explains again that UBLF and CELF share the same logic in selecting nodes.
Experiments
• Comparison results (Time cost)
Observation:
UBLF is 2-5 times faster than CELF.
Conclusions
BackgroundProblem
Formulation Greedy Algorithm
Heuristic algorithms: DegreeDis
count, PageRank,
et al.
Advanced greedy
algorithms:CELF,
CELF++
UBLF
Comparisons
Email: zhangpeng@iie.ac.cn
Questions ?