Making Diffusion Work for You
B. Aditya PrakashComputer Science
Virginia Tech.
GraphEx Symposium, MIT Endicott House, Aug 21, 2014
Prakash 2014 2
Thanks!
• Ali Pinar• Ben Miller
Prakash 2014 3
Networks are everywhere!
Human Disease Network [Barabasi 2007]
Gene Regulatory Network [Decourty 2008]
Facebook Network [2010]
The Internet [2005]
Prakash 2014 4
Dynamical Processes over networks are also everywhere!
Prakash 2014 5
Why do we care?• Social collaboration• Information Diffusion• Viral Marketing• Epidemiology and Public Health• Cyber Security• Human mobility • Games and Virtual Worlds • Ecology........
Prakash 2014 6
Why do we care? (1: Epidemiology)
• Dynamical Processes over networks[AJPH 2007]
CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts
Diseases over contact networks
SI Model
Prakash 2014 7
Why do we care? (1: Epidemiology)
• Dynamical Processes over networks
• Each circle is a hospital• ~3000 hospitals• More than 30,000 patients transferred
[US-MEDICARE NETWORK 2005]
Problem: Given k units of disinfectant, whom to immunize?
Prakash 2014 8
Why do we care? (1: Epidemiology)
CURRENT PRACTICE OUR METHOD
~6x fewer!
[US-MEDICARE NETWORK 2005]
Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)
Prakash 2014 9
Why do we care? (2: Online Diffusion)
> 800m users, ~$1B revenue [WSJ 2010]
~100m active users
> 50m users
Prakash 2014 10
Why do we care? (2: Online Diffusion)
• Dynamical Processes over networks
Celebrity
Buy Versace™!
Followers
Social Media Marketing
Prakash 2014 11
Why do we care? (3: To change the world?)
• Dynamical Processes over networks
Social networks and Collaborative Action
Prakash 2014 12
High Impact – Multiple Settings
Q. How to squash rumors faster?
Q. How do opinions spread?
Q. How to market better?
epidemic out-breaks
products/viruses
transmit s/w patches
Prakash 2014 13
Research Theme
DATALarge real-world
networks & processes
ANALYSISUnderstanding
POLICY/ ACTIONManaging/
Utilizing
Prakash 2014 14
Research Theme – Public Health
DATAModeling # patient
transfers
ANALYSISWill an epidemic
happen?
POLICY/ ACTION
How to control out-breaks?
Prakash 2014 15
Research Theme – Social Media
DATAModeling Tweets
spreading
POLICY/ ACTION
How to market better?
ANALYSIS# cascades in
future?
Prakash 2014 16
In this talk
Q1: How to ‘zoom-out’ of graphs?
Q2: How to control out-breaks?
POLICY/ ACTIONUtilizing
Prakash 2014 17
In this talk
DATALarge real-world
networks & processes
Q3: How does ‘activity’ evolve over time?
Prakash 2014 18
Outline
• Motivation• Part 1: Policy and Action (Algorithms)• Part 2: Learning Models (Empirical Studies)• Conclusion
Prakash 2014 19
Part 1: Algorithms
• Q1: How to zoom-out of a network?• Q2: How to control out-breaks?
(Broad theme: Network Topology Manipulation)
Prakash 2014 20
“Zoom-out” of the network• “Zoom-out” of the cascade graph to get a
quick picture (= summarization)
Coarsening
Big graph
Zoom-out
A
FE
D
CB
Smaller representation of the network
A
CB
E
F
D
[Purohit, Prakash, et, al. SIGKDD 2014]
Prakash 2014 21
Challenges
• C1: How do we maintain diffusive characteristics when coarsening networks?
• C2: How do we merge node to get the coarse network?
• C3: how do we find the best node to merge fast?
Prakash 2014 22
C1: Modeling diffusion• Information spreads over networks– e.g.:, rumor/meme spreads over Twitter following
network
• Independent cascade model (IC) [Kempe+, KDD03]
– Weights pij: propagation prob. from i to j– Each node has only one chance to infect its neighbors
Meme spreading
Prakash 2014 23
Diffusive characteristics• First eigenvalue λ1 (of adjacency matrix) is sufficient
for most diffusion models. [Prakash et al. ICDM’12 selected for best papers]
λ1 is the epidemic threshold (will there be an epidemic?)
“Safe” “Vulnerable” “Deadly”
Increasing λ1 , Increasing vulnerability
Prakash 2014 24
C1: maintain diffusive characteristics
• Goal: maintain the diffusive characteristics of the original network in the coarsened network
Original network
coarsen
A
FE
D
CB
Coarsened network
A
CB
E
F
D
Make the coarsened network have the least change in the first eigenvalue
Prakash 2014 25
C2: How to merge nodes• Goal: Merge nodes of graph G to get the
coarsened graph that “approximates” G with respect to diffusion
Merge b and a can get the least change of λ1
Is this correct?
0.375!
Original network
Influence from d to b: 0.5Influence from d to a: 0.25Average: 0.375
Prakash 2014 26
• In general: C2: How to merge nodes
Merging a,b
Prakash 2014 27
Problem DefinitionGraph Coarsening Problem (GCP)• Given: a large graph G, and the reduction factor• Find: the best set of adjacent nodes • To minimize |λG-λH| where H is the coarsened graph
– i.e.: H has the least change in the first eigenvalue– we use to λG represent the first eigenvalue of G
Prakash 2014 28
C3: which nodes to merge
• Goal: – Find the best nodes to merge– Fast, scalable to large network
Original network
coarsen
A
FE
D
CB
Coarsened network
A
CB
E
F
D
Prakash 2014 29
Naive Greedy HeuristicStep:• Score every edge by the change in eigenvalue• Greedily choose the edge (a,b) with the least score, and
merge (a,b)• Re-evaluate the scores of every edge and repeat
• Too slow! O(m2) time to score all edges• Lose time benefits of analyzing the smaller
graph
Prakash 2014 30
CoarseNet: idea• Can we approximate the edge scores faster?
– Yes!
• Use matrix perturbation arguments to estimate (up to first order terms) the score of an edge in constant time (skipping details)
• Score all edges in O(m) time– Naive Heuristic: O(m2) time
Prakash 2014 31
CoarseNet: Complete algorithm
• Step1: compute scores for all edge pairs2: Merge nodes with smallest score3. Goto step 1 until αn nodes left
Original Network (weight=0.5)
Assigning scores
Merging edges
Coarsened Network
Prakash 2014 32
How do we perform?
The first eigenvalue gets preserved well up to large coarsening factors!
Amazon
(See more results in the paper)
DBLP
Higher is better
Prakash 2014 33
Application 1: Influence Maximization
• Methodology:Step 1: Coarsen the large social network using CoarsenNetStep 2: Solve influence maximization on the coarsened networkStep 3: Randomly select one node from each selected “supernode”
Step 1: Coarsen
A
CB
E
F
DStep 2: Solve influence maximization
A
CB
E
F
D
Step 3: Randomly select one node from C
We call it CSPIN
Prakash 2014 34
Quality of CSPIN w.r.t
We can merge up to 95% of the vertices are merged without significantly affecting the influence spread!
Prakash 2014 35
Application 2: Diffusion Characterization
• Goal: use Graph Coarsening to understand information cascades
• Dataset: Flixster– a fridendship network with movie ratings
– Cascade: the same movie rating from friends
• Methodology– coarsen the network using CoarseNet with the reduction
factor α=0.5– study the formed groups (supernodes)– Can get non-network surrogates
Prakash 2014 36
Diffusion observation
Observation 1: a very large fraction of movies propagate in a small number of groupsObservation 2: a multi-modal distribution
Stats: • 1891 groups • mean group size: 16.6 • the largest group: 22061
nodes (roughly 40% of nodes)
(See more results in the paper)
Prakash 2014 37
Future work…
• How is it related to community structure? • More applications, like Visualization…• Parallelization
Prakash 2014 38
Part 1: Algorithms
• Q1: How to zoom-out of a network?• Q2: How to control out-breaks?
Prakash 2014 39
Immunization (= Interventions)
• Different Flavors:– Pre-emptive– Data-aware
Prakash 2014 40
Pre-emptive: Vulnerability (Again!)
• First eigenvalue λ1 (of adjacency matrix) is sufficient for most diffusion models.
[Prakash et al. ICDM’12 selected for best papers]
λ1 is the epidemic threshold
“Safe” “Vulnerable” “Deadly”
Increasing λ1 , Increasing vulnerability
Prakash 2014 41
Goal
• Decrease λ1 as much as possible
• Node based [Tong, P., + ICDM 2010]• Edge-based [Tong, P., Eliassi-Rad+ CIKM 2012,
Best Paper Award]• Edge-Manipulation [P., Adamic+ SDM 2013]
Prakash 2014 42
Latest results
• First (provable) approximation algorithms for edge-based problem (under submission [Saha, Adiga, P., Vullikanti 2014])– O(log^2 n)--factor (can be improved to O(log n)) • Based on the idea of removing closed walks
– Semi-Definite Programming Rounding-based O(1) factor
Prakash 2014 43
Data-aware Immunization
Dominator tree
Graph with infected nodes
Given: Graph and Infected nodesFind: ‘best’ nodes for immunization• Complexity
– NP-hard– Hard to approximate within an absolute error
• DAVA-tree– Optimal solution on the tree
• DAVA and DAVA-fast– Merging infected nodes– Build a “dominator tree”, and run DAVA-tree
• Running time: subquadratic– DAVA: O(k(|E|+ |V|log|V|))– DAVA-fast: O(|E|+|V|log|V|)
[Zhang and Prakash, SDM 2014]
Prakash 2014 44
Extensions
• Can be extended to Uncertain and noisy initial data as well!
[Zhang and Prakash, CIKM 2014]
Twitter Firehose API1% sample
Prakash 2014 45
Outline
• Motivation• Part 1: Policy and Action (Algorithms)• Part 2: Learning Models (Empirical Studies)• Conclusion
Prakash 2014 46
Part 2: Empirical Studies
• Q3: How does activity evolve over time?
Prakash 2014 47
Google Search Volume
e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date
? ?
(1) First spike (2) Release date (3) Two weeks before release
Prakash 2014 48
Patterns
X
Y
Prakash 2014 49
Patterns
X
Y
More Data
Prakash 2014 50
Patterns
X
YAnomaly
?
Prakash 2014 51
Patterns
X
YAnomaly
?
Extrapolation
Prakash 2014 52
Patterns
X
YAnomalyImputation
Extrapolation
Prakash 2014 53
Patterns
AnomalyImputation
Extrapolation
Compression
Prakash 2014 54
• Meme (# of mentions in blogs)– short phrases Sourced from U.S. politics in 2008
“you can put lipstick on a pig”
“yes we can”
Rise and fall patterns in social media
Prakash 2014 55
Rise and fall patterns in social media
• Can we find a unifying model, which includes these patterns?• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]
Prakash 2014 56
Rise and fall patterns in social media
• Answer: YES!
• We can represent all patterns by single model
In Matsubara, Sakurai, Prakash+ SIGKDD 2012
Prakash 2014 57
Main idea - SpikeM- 1. Un-informed bloggers (uninformed about rumor)- 2. External shock at time nb (e.g, breaking news)- 3. Infection (word-of-mouth)
Infectiveness of a blog-post at age n:
- Strength of infection (quality of news)
- Decay function (how infective a blog posting is)
Time n=0 Time n=nb Time n=nb+1
β
Power Law
Prakash 2014 58
-1.5 slopeJ. G. Oliveira et. al. Human Dynamics: The
Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]
(also in Leskovec, McGlohon+, SDM 2007)
Prakash 2014 59
SpikeM - with periodicity
• Full equation of SpikeM
Periodicity
12pmPeak activity 3am
Low activity
Time n
Bloggers change their activity over time
(e.g., daily, weekly, yearly)
activity
Details
Prakash 2014 60
Tail-part forecasts
• SpikeM can capture tail part
Prakash 2014 61
“What-if” forecasting
e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date
? ?
(1) First spike (2) Release date (3) Two weeks before release
Prakash 2014 62
“What-if” forecasting
–SpikeM can forecast not only tail-part, but also rise-part!
• SpikeM can forecast upcoming spikes
(1) First spike (2) Release date (3) Two weeks before release
Prakash 2014 63
Modeling Malware Penetration
• Worldwide Intelligence Network– Which machine got which malware (or legitimate files)– 1 Billion nodes– 37 Billion edges
• Q: Temporal patterns?
[Papalexakakis et. al. + 2013]
Prakash 2014 64
Q: Temporal Patterns
Looks familiar?
Prakash 2014 65
SpikeM again (or SharkFin)
7 parameters only!
~ 400 points ~ 400 points
Prakash 2014 66
Latent Propagation Patterns
Prakash 2014 67
Bonus: Protest Predictions
• Can Twitter provide a lead time?• South American twitter dataset– Language: Spanish/Portuguese– Idea
1. Look for trending keywords.2. Predict event type for protest using SpikeMparameters!
A political tweet
Violent Protest (VP)
Non Violent Protest (P)
[Sundereisan et al. ASONAM 2014][Jin et al. SIGKDD 2014]
VP
P
Outline
• Motivation• Part 1: Learning Models (Empirical Studies)• Part 2: Understanding Epidemics (Theory)• Part 3: Policy and Action (Algorithms)• Conclusion and Future Plans
Prakash 2014 68
Future Plans
DATALarge real-world
networks & processes
ANALYSISUnderstanding
POLICY/ ACTIONManaging
Prakash 2014 69
Scalability – Big Data
• Datasets of unprecedented scale– High dimensionality and sample size!
• Need scalable algorithms for – Learning Models– Developing Policy
• Leverage parallel systems– Map-Reduce clusters (like Hadoop) for data-intensive
jobs (more than 6000 machines) – Parallelized compute-intensive simulations (like Condor)
Prakash 2014 70
Uncertain Data in Cascade analysis
Original, Nodes sampled off
Culprits, and missing nodes filled in
Sundereisan, Vreeken, Prakash. 2014
Correcting for missing data Designing More Robust Immunization Policies
Zhang and Prakash. CIKM 2014
Prakash 2014 71
Social Biological Contagion
Automatically learnmodels
Chen, Prakash+. 2014
Prakash 2014 72
Prakash 2014 73
References1. Scalable Vaccine Distribution in Large Graphs given Uncertain Data (Yao Zhang and B. Aditya Prakash) -- In
CIKM 2014.2. Fast Influence-based Coarsening for Large Networks (Manish Purohit, B. Aditya Prakash, Chahhyun Kang, Yao
Zhang and V. S. Subrahmanian) – In SIGKDD 20143. DAVA: Distributing Vaccines over Large Networks under Prior Information (Yao Zhang and B. Aditya Prakash) --
In SDM 20144. Fractional Immunization on Networks (B. Aditya Prakash, Lada Adamic, Jack Iwashnya, Hanghang Tong, Christos
Faloutsos) – In SDM 20135. Spotting Culprits in Epidemics: Who and How many? (B. Aditya Prakash, Jilles Vreeken, Christos Faloutsos) – In
ICDM 2012, Brussels Vancouver (Invited to KAIS Journal Best Papers of ICDM.)6. Gelling, and Melting, Large Graphs through Edge Manipulation (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-
Rad, Michalis Faloutsos, Christos Faloutsos) – In ACM CIKM 2012, Hawaii (Best Paper Award)7. Rise and Fall Patterns of Information Diffusion: Model and Implications (Yasuko Matsubara, Yasushi Sakurai, B.
Aditya Prakash, Lei Li, Christos Faloutsos) – In SIGKDD 2012, Beijing8. Interacting Viruses on a Network: Can both survive? (Alex Beutel, B. Aditya Prakash, Roni Rosenfeld, Christos
Faloutsos) – In SIGKDD 2012, Beijing9. Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni
Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon10. Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan
Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of ICDM.)
11. Times Series Clustering: Complex is Simpler! (Lei Li, B. Aditya Prakash) - In ICML 2011, Bellevue12. Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya
Prakash, Hanghang Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain
13. Formalizing the BGP stability problem: patterns and a chaotic model (B. Aditya Prakash, Michalis Faloutsos and Christos Faloutsos) – In IEEE INFOCOM NetSciCom Workshop, 2011.
14. On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) – In IEEE ICDM 2010, Sydney, Australia
15. Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang Tong, Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain
16. MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash and Hanghang Tong) - In SIGKDD 2010, Washington D.C.
Prakash 2014 74
Acknowledgements
Collaborators Christos Faloutsos Roni Rosenfeld, Michalis Faloutsos, Lada Adamic, Theodore Iwashyna (M.D.), Dave Andersen, Tina Eliassi-Rad, Iulian Neamtiu,
Varun Gupta, Jilles Vreeken, V. S. Subrahmanian John Brownstein (M.D.)
Deepayan Chakrabarti, Hanghang Tong, Kunal Punera, Ashwin Sridharan, Sridhar Machiraju, Mukund Seshadri, Alice Zheng, Lei Li, Polo Chau, Nicholas Valler, Alex Beutel, Xuetao Wei
Prakash 2014 75
Acknowledgements
• Students Liangzhe Chen Shashidhar Sundereisan Benjamin Wang Yao Zhang
Prakash 2014 76
Acknowledgements
Funding
Prakash 2014 77
Analysis Policy/Action Data
Making Diffusion Work for You
B. Aditya Prakash http://www.cs.vt.edu/~badityap
Top Related