Large Graph Mining: Power Tools and a Practitioner’s...
Transcript of Large Graph Mining: Power Tools and a Practitioner’s...
KDD'09 Faloutsos, Miller, Tsourakakis P6-1
CMU SCS
Large Graph Mining:
Power Tools and a Practitioner’s guide
Task 6: Virus/Influence Propagation
Faloutsos, Miller,Tsourakakis
CMU
KDD'09 Faloutsos, Miller, Tsourakakis P6-2
CMU SCS
Outline
• Introduction – Motivation
• Task 1: Node importance
• Task 2: Community detection
• Task 3: Recommendations
• Task 4: Connection sub-graphs
• Task 5: Mining graphs over time
• Task 6: Virus/influence propagation
• Task 7: Spectral graph theory
• Task 8: Tera/peta graph mining: hadoop
• Observations – patterns of real graphs
• Conclusions
KDD'09 Faloutsos, Miller, Tsourakakis P6-3
CMU SCS
Detailed outline
• Epidemic threshold
– Problem definition
– Analysis
– Experiments
• Fraud detection in e-bay
KDD'09 Faloutsos, Miller, Tsourakakis P6-4
CMU SCS
Virus propagation
• How do viruses/rumors propagate?
• Blog influence?
• Will a flu-like virus linger, or will it
become extinct soon?
KDD'09 Faloutsos, Miller, Tsourakakis P6-5
CMU SCS
The model: SIS
• ‘Flu’ like: Susceptible-Infected-Susceptible
• Virus ‘strength’ s= β/δ
Infected
Healthy
NN1
N3
N2
Prob. ββββ
Prob. β
Prob. δδδδ
KDD'09 Faloutsos, Miller, Tsourakakis P6-6
CMU SCS
Epidemic threshold ττττ
of a graph: the value of τ, such that
if strength s = β / δ < τ
an epidemic can not happen
Thus,
• given a graph
• compute its epidemic threshold
KDD'09 Faloutsos, Miller, Tsourakakis P6-7
CMU SCS
Epidemic threshold ττττ
What should τ depend on?
• avg. degree? and/or highest degree?
• and/or variance of degree?
• and/or third moment of degree?
• and/or diameter?
KDD'09 Faloutsos, Miller, Tsourakakis P6-8
CMU SCS
Epidemic threshold
• [Theorem 1] We have no epidemic, if
β/δ <τ = 1/ λ1,A
KDD'09 Faloutsos, Miller, Tsourakakis P6-9
CMU SCS
Epidemic threshold
• [Theorem 1] We have no epidemic (*), if
β/δ <τ = 1/ λ1,A
largest eigenvalue
of adj. matrix Aattack prob.
recovery prob.epidemic threshold
Proof: [Wang+03](*) under mild, conditional-independence assumptions
KDD'09 Faloutsos, Miller, Tsourakakis P6-10
CMU SCS
Beginning of proof
Healthy @ t+1:
- ( healthy or healed )
- and not attacked @ t
Let: p(i , t) = Prob node i is sick @ t+1
1 - p(i, t+1 ) = (1 – p(i, t) + p(i, t) * δ ) *
Πj (1 – β aji * p(j , t) )
Below threshold, if the above non-linear dynamical systemabove is ‘stable’ (eigenvalue of Hessian < 1 )
KDD'09 Faloutsos, Miller, Tsourakakis P6-11
CMU SCS
Epidemic threshold for various
networks
Formula includes older results as special cases:
• Homogeneous networks [Kephart+White]
– λ1,A = <k>; τ = 1/<k> (<k> : avg degree)
• Star networks (d = degree of center)
– λ1,A = sqrt(d); τ = 1/ sqrt(d)
• Infinite power-law networks
– λ1,A = ∞; τ = 0 ; [Barabasi]
KDD'09 Faloutsos, Miller, Tsourakakis P6-12
CMU SCS
Epidemic threshold
• [Theorem 2] Below the epidemic threshold,
the epidemic dies out exponentially
KDD'09 Faloutsos, Miller, Tsourakakis P6-13
CMU SCS
Detailed outline
• Epidemic threshold
– Problem definition
– Analysis
– Experiments
• Fraud detection in e-bay
KDD'09 Faloutsos, Miller, Tsourakakis P6-14
CMU SCS
Current prediction vs. previous
The formula’s predictions are more accurate
Oregon Star
PL-3
OurOur
Nu
mb
er o
f
infe
cted
nod
es
β/δ β/δ
PL-3
KDD'09 Faloutsos, Miller, Tsourakakis P6-15
CMU SCS
Experiments (Oregon)
0
100
200
300
400
500
0 250 500 750 1000
Time
Nu
mb
er
of
Infe
cte
d N
od
es
δ: 0.05 0.06 0.07
Oregon
β = 0.001
ββββ/δδδδ > τ
(above threshold)
ββββ/δδδδ = τ
(at the threshold)
ββββ/δδδδ < τ
(below threshold)
KDD'09 Faloutsos, Miller, Tsourakakis P6-16
CMU SCS
SIS simulation - # infected nodes vs
time
Time (linear scale)
#inf.(log scale)
above
at
below
Log - Lin
KDD'09 Faloutsos, Miller, Tsourakakis P6-17
CMU SCS
SIS simulation - # infected nodes vs
time
Log - Lin
Time (linear scale)
#inf.(log scale)
above
at
below
Exponentialdecay
KDD'09 Faloutsos, Miller, Tsourakakis P6-18
CMU SCS
SIS simulation - # infected nodes vs
time
Log - Log
Time (log scale)
#inf.(log scale)
above
at
below
KDD'09 Faloutsos, Miller, Tsourakakis P6-19
CMU SCS
SIS simulation - # infected nodes vs
time
Time (log scale)
#inf.(log scale)
above
at
below
Log - Log
Power-lawDecay (!)
KDD'09 Faloutsos, Miller, Tsourakakis P6-20
CMU SCS
Detailed outline
• Epidemic threshold
• Fraud detection in e-bay
KDD'09 Faloutsos, Miller, Tsourakakis P6-21
CMU SCS
E-bay Fraud detection
w/ Polo Chau &
Shashank Pandit, CMU
NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks, S. Pandit, D. H. Chau, S. Wang, and C.
Faloutsos (WWW'07), pp. 201-210
KDD'09 Faloutsos, Miller, Tsourakakis P6-22
CMU SCS
E-bay Fraud detection
• lines: positive feedbacks
• would you buy from him/her?
KDD'09 Faloutsos, Miller, Tsourakakis P6-23
CMU SCS
E-bay Fraud detection
• lines: positive feedbacks
• would you buy from him/her?
• or him/her?
KDD'09 Faloutsos, Miller, Tsourakakis P6-24
CMU SCS
E-bay Fraud detection - NetProbe
Belief Propagation gives:
KDD'09 Faloutsos, Miller, Tsourakakis P6-25
CMU SCS
Conclusions
• λ1,A : Eigenvalue of adjacency matrix
determines the survival of a flu-like virus
– It gives a measure of how well connected is the
graph (~ # paths – see Task 7, later)
– May guide immunization policies
• [Belief Propagation: a powerful algo]
KDD'09 Faloutsos, Miller, Tsourakakis P6-26
CMU SCS
References
• D. Chakrabarti, Y. Wang, C. Wang, J.
Leskovec, and C. Faloutsos, Epidemic
Thresholds in Real Networks, in ACM
TISSEC, 10(4), 2008
• Ganesh, A., Massoulie, L., and Towsley, D.,
2005. The effect of network topology on the
spread of epidemics. In INFOCOM.
KDD'09 Faloutsos, Miller, Tsourakakis P6-27
CMU SCS
References (cont’d)
• Hethcote, H. W. 2000. The mathematics of
infectious diseases. SIAM Review 42, 599–
653.
• Hethcote, H. W. AND Yorke, J. A. 1984.
Gonorrhea Transmission Dynamics and
Control. Vol. 56. Springer. Lecture Notes in
Biomathematics.
KDD'09 Faloutsos, Miller, Tsourakakis P6-28
CMU SCS
References (cont’d)
• Y. Wang, D. Chakrabarti, C. Wang and C.
Faloutsos, Epidemic Spreading in Real
Networks: An Eigenvalue Viewpoint, in
SRDS 2003 (pages 25-34), Florence, Italy