Searching for Influencers via Optimal Percolation: from ...€¦ · The superspreaders are the best...
Transcript of Searching for Influencers via Optimal Percolation: from ...€¦ · The superspreaders are the best...
Searching for Influencers via Optimal Percolation: from Twitter to the Brain
MAIN QUESTION: Where are the superspreaders of information?
Hernán Makse, Physics Department, City College of New York
NetSci2016
Friday, June 3, 16
Searching for Influencers via Optimal Percolation: from Twitter to the Brain
MAIN QUESTION: Where are the superspreaders of information?
in the kcore
?
hubs?
Hernán Makse, Physics Department, City College of New York
NetSci2016
Friday, June 3, 16
Searching for Influencers via Optimal Percolation: from Twitter to the Brain
MAIN QUESTION: Where are the superspreaders of information?
in the kcore
?
hubs?
ANSWER: Optimal Percolation and the Non-Backtracking Matrixuncover the Collective Influencers
Hernán Makse, Physics Department, City College of New York
NetSci2016
Friday, June 3, 16
Longstanding Problem in Network Science
Find the minimum number of influencers/superspreaders to “infect” the entire network
- Viral marketing: $$$- Predicting trends from social media- Immunization epidemics- Essential genes: gene regulation protein networks- Essential species: ecological networks- Brain Networks: neural correlates of consciousness - Financial networks “too big to fail” --> “too weak to fail”?- Predicting stock markets
Friday, June 3, 16
3
How to become viral in social media?EASY
Friday, June 3, 16
3
How to become viral in social media?EASY
50 million hits
1. Funny baby faces videos
lindosbebitos!!
Friday, June 3, 16
4
How to spread information in social media?EASY
50 million hits
2. Funny dog faces videos
FUNNY DOG VIDEOSlindos
perritos!!Friday, June 3, 16
5
3. Most successful viral superspreading event in the history of humankind
Gangnam style videoby Mr Psy
as of today: 2 billion hits(mainly teenagers)
Friday, June 3, 16
5
3. Most successful viral superspreading event in the history of humankind
Gangnam style videoby Mr Psy
as of today: 2 billion hits(mainly teenagers)
-‐Can we predict the onset of collec3ve behavior?-‐More is different, Phil Anderson, Nobel ’77
Friday, June 3, 16
Network Science: Introduction January 10, 2011
Years
Timescited
1. COMPLEX NETWORK SCIENCE: superspreading of ideas Courtesy of Gene Stanley-‐ Analogous to herding behavior in the stock market
Friday, June 3, 16
Network Science: Introduction January 10, 2011
Years
Timescited
1. COMPLEX NETWORK SCIENCE: superspreading of ideas Courtesy of Gene Stanley-‐ Analogous to herding behavior in the stock market
Friday, June 3, 16
Network Science: Introduction January 10, 2011
Years
Timescited “IrraMonal exuberance”
-‐ Chairman Greenspan (FED) on the dot-‐com bubble-‐ Keynes: “animal spirits” of excessive opMmism
1. COMPLEX NETWORK SCIENCE: superspreading of ideas Courtesy of Gene Stanley-‐ Analogous to herding behavior in the stock market
Friday, June 3, 16
THE ANSWER:Understanding BIG DATA
Friday, June 3, 16
THE ANSWER:Understanding BIG DATA
What is the Problem with Big Data?
Friday, June 3, 16
THE ANSWER:Understanding BIG DATA
What is the Problem with Big Data?
Eric Schmidt (CEO Google): “Every two years we create as much information (5 exabytes) as we did from the dawn of civilization until 2003”
Age of Big Data
Friday, June 3, 16
THE QUESTION ARISES:What happened in the last two years?
Friday, June 3, 16
THE QUESTION ARISES:What happened in the last two years?
Did we all get clever all of a sudden?
Friday, June 3, 16
THE QUESTION ARISES:What happened in the last two years?
Did we all get clever all of a sudden?Most probably not
Friday, June 3, 16
THE QUESTION ARISES:What happened in the last two years?
Did we all get clever all of a sudden?Most probably not
Evidence #1: The USA Presidential Primary Election Debates
Friday, June 3, 16
THE QUESTION ARISES:What happened in the last two years?
Did we all get clever all of a sudden?Most probably not
Evidence #1: The USA Presidential Primary Election Debates
Evidence #2: Peter Thiel (founder PayPal)
“The rate of technological innovation is actually slowing”
Friday, June 3, 16
WHY IS THAT?
Friday, June 3, 16
WHY IS THAT?
THE AGE OF BIG DATA
OR
Friday, June 3, 16
WHY IS THAT?
THE AGE OF BIG DATA
OR
THE AGE OF BIG DATA JUNK?
Friday, June 3, 16
THE SOLUTION: REDUCE BIG DATA TO A FEW INFLUENCERS
MINIMAL INFLUENCERS
NP-hard Maximization
of Influence Problem
Kempe, Kleinberg,
Tardos (2003)
Friday, June 3, 16
How to predict the most influentials in social networks
Typical approach: Brute-Force (heuristics)
Twitter“Attacking” the hubs (high degree)
Pres Obama: 55M followers Lady Gaga: 45M followers (very few: 1 percenters....)
Other heuristics: ranking the nodes byPageRank (Google), betweeness, eigenvector, closeness centralities, kcore, equal graph partitioning, etc...Problems: heuristics do not maximize any function of influence Friday, June 3, 16
Our approach
Inspired by French mime Marcel Marceau:“Making visible the invisible”
Collective optimization theory
unravels the strength of “weak nodes”
Granovetter, 1973: “The strength of weak ties”
Greatest mime of all time
Friday, June 3, 16
Mapping to Optimal Percolation to find the minimal set of “influencers” to fragment the network
Best spreaders = minimize the inactive nodes = maximize giant
component
Best immunizators =minimize the giant connected component
Optimal Percolation =Best immunizators = best influencers
G1G1
Linear-threshold model with
k-1 thresholdSIR model
Morone, Makse,
Nature (2015)
Friday, June 3, 16
Mapping to Optimal Percolation to find the minimal set of “influencers” to fragment the network
Best spreaders = minimize the inactive nodes = maximize giant
component
Best immunizators =minimize the giant connected component
Optimal Percolation =Best immunizators = best influencers
G1G1
Linear-threshold model with
k-1 thresholdSIR model
Morone, Makse,
Nature (2015)
Friday, June 3, 16
Mapping to Optimal Percolation to find the minimal set of “influencers” to fragment the network
Best spreaders = minimize the inactive nodes = maximize giant
component
Best immunizators =minimize the giant connected component
Optimal Percolation =Best immunizators = best influencers
G1G1
Linear-threshold model with
k-1 thresholdSIR model
Morone, Makse,
Nature (2015)
Friday, June 3, 16
Mapping to Optimal Percolation to find the minimal set of “influencers” to fragment the network
Best spreaders = minimize the inactive nodes = maximize giant
component
Best immunizators =minimize the giant connected component
Optimal Percolation =Best immunizators = best influencers
G1G1
Linear-threshold model with
k-1 thresholdSIR model
Morone, Makse,
Nature (2015)
Friday, June 3, 16
G1
Mapping to Optimal Percolation to find the minimal set of “influencers” to fragment the network
Best spreaders = minimize the inactive nodes = maximize giant
component
Best immunizators =minimize the giant connected component
Optimal Percolation =Best immunizators = best influencers
G1G1
Linear-threshold model with
k-1 thresholdSIR model
Morone, Makse,
Nature (2015)
Friday, June 3, 16
G1
Mapping to Optimal Percolation to find the minimal set of “influencers” to fragment the network
Best spreaders = minimize the inactive nodes = maximize giant
component
Best immunizators =minimize the giant connected component
Optimal Percolation =Best immunizators = best influencers
G1G1
Linear-threshold model with
k-1 thresholdSIR model
Morone, Makse,
Nature (2015)
Friday, June 3, 16
G1
Mapping to Optimal Percolation to find the minimal set of “influencers” to fragment the network
Best spreaders = minimize the inactive nodes = maximize giant
component
Best immunizators =minimize the giant connected component
Optimal Percolation =Best immunizators = best influencers
G1G1
Linear-threshold model with
k-1 thresholdSIR model
Morone, Makse,
Nature (2015)
Friday, June 3, 16
G1
Mapping to Optimal Percolation to find the minimal set of “influencers” to fragment the network
Best spreaders = minimize the inactive nodes = maximize giant
component
Best immunizators =minimize the giant connected component
Optimal Percolation =Best immunizators = best influencers
G1G1
Linear-threshold model with
k-1 thresholdSIR model
Morone, Makse,
Nature (2015)
Friday, June 3, 16
G1
G1
Mapping to Optimal Percolation to find the minimal set of “influencers” to fragment the network
Best spreaders = minimize the inactive nodes = maximize giant
component
Best immunizators =minimize the giant connected component
Optimal Percolation =Best immunizators = best influencers
G1G1
Linear-threshold model with
k-1 thresholdSIR model
Morone, Makse,
Nature (2015)
Friday, June 3, 16
qrand =� 2
� 1
qrand
Optimal Percolation = minimize : minimal fraction of influencers
to destroy the network
qqc
G1
qcqc
Minimize qc = find the minimal influencers until the solution
G1 = 0 ! {⌫i!j = 0} becomes unstable
Transform the problem
into a stability problem
Random percolation
Strategy = minimize the giant componentor
Friday, June 3, 16
⌫i!j = ni
2
41 �Y
k2@i\j
(1 � ⌫k!i)
3
5
ni = 0
ni = 1
How to calculate the stability matrix
ij
k
G1Order parameter = Prob to belong to giant component:
G1
removed node
other nodes
Message passing equation in a locally tree-like network
Friday, June 3, 16
@⌫i!j
@⌫k!`
���⌫i!j=0
⌘ nkBk!`,i!j
�max
(n, q) 1
minn
�max
(n,qc) = 1
Stability of is given by largest eigenvalue of non-backtracking matrix
G1 = 0
Non-Backtracking Matrix
Solution is stable when the largest eigenvalue of the modified NB matrix:
Finding the superspreaders = finding the nodes that minimizes the largest eigenvalue of the modified NB matrix:
2M x 2M matrix
Friday, June 3, 16
The superspreaders are the best non-backtracking walkers
Non-backtracking walk of length l: a walker that cannot
immediately come back
- Second largest eigenvalue of NB is also optimal for community detection. Krzakala PNAS (2013), Newman PRE (2013)- In contrast, most heuristics are based on the adjacency matrix which measures only regular random walks: PageRank largest eigenvalue Aij
Friday, June 3, 16
` = 1ki � 1 kj � 1
` = 3
` = 5
�max
(n) ⇡X
i
(ki � 1)X
j2@Ball(`)
nk(kj � 1)
Many-body problem: Power method “Feynman-like” diagrams for
Non-backtracking walk
�max
(n⇤, qc)
l-level = steps
1 NB step2-body interaction
3 NB steps4-body interaction
5 NB steps6-body interaction
Diagramatic expansion contains all physical interactions. Exact solution for random graphs (not for real graphs)
ki � 1 kj � 1
Friday, June 3, 16
O(N) ⇠ N logN
Highly Scalable Approximate Solution: Collective Influence Algorithm - CI Algorithm
For a given level l1. Start with all nodes occupied: ni=1
2. Assigned to every node the collective influence index:
CI` = (ki � 1)X
j2@Ball(`)
(kj � 1)
3. Remove (attack) the highest CI node: ni*=0
4. Recalculate CI and repeat until giant component is zero. CI scales as:
ki
kj
Friday, June 3, 16
Test: Exact optimal solution and best approximation with CI in Erdos-Renyi network
Others: hub removal (HD, high degree and adaptive HDA), PageRank, Closeness Centrality, Eigenvector Centrality, k-core, EGP
CI outperforms heuristic centralities and approximates well the exact optimal solution
The best possible attack
is to remove the loops to get
a tree at qc
Friday, June 3, 16
Why CI outperforms other heuristics?
Collective Influence identifies a new class of influencer neglected
by previous rankings
Weak node: a low degree node surrounded by hierarchical coronas of hubs at level l
Related to Granovetter theory “Strength of weak ties” (1973). Weak node
CI` = (ki � 1)X
j2@Ball(`)
(kj � 1)
Friday, June 3, 16
Validation of CI in Big Data: Twitter
A broker node
- CI identifies 40% less influencers than high degree
CI is valid in locally tree-like networks: what about real networks?
Friday, June 3, 16
Next, we address the question that will doubtless arise:
Friday, June 3, 16
Next, we address the question that will doubtless arise:
Is all this mathematical gibberish of any real use?
Three applications:1. Marketing campaign in Mexico
2. Predicting trends in Twitter: Trump vs Cruz3. Finding superspreaders in the brain
Friday, June 3, 16
1. Validating CI in a Real Marketing Campaign using Mobile Phone Networks
Mexico:110 million users
calling over 3 months
+banking data
Team up with GranData.com(an Argentinian
Corporation)
Market campaign targeting the high CI people, AKA influencers
Friday, June 3, 16
Communication patterns of rich and poor
i
E
B
A
Top 10%
Bottom 10%
ED
C
Bottom 20%
Top 1%
F
Top 1% richest people
Bottom 20% poorest people
HIGH CI
LOW CI
(same degree)
Friday, June 3, 16
Improve response rate = $$$ by five-fold between low CI and high CI targeting
Low CI: poorest people
High CI: wealthy people
Targeting 60,000 people according to their Collective Influence in the network of entire Mexico
Friday, June 3, 16
2. Predicting Trump Sentiment from Twitter
Trumpcamp
Cruzcamp
ALL TWEETS
INFLUENCERS
Trumpwins
Cruzwins
Friday, June 3, 16
2. Predicting Trump Sentiment from Twitter
Trumpcamp
Cruzcamp
ALL TWEETS
INFLUENCERS
Trumpwins
Cruzwins
Friday, June 3, 16
Awake brain surgery to dissect tumor with no clear boundaries
Neurosurgeon sMmulates areas around the tumor with electrodesto locate the essenMal funcMonal areas.
FuncMonal areas (eg, language, motor)are located by asking paMent to talk, move, etc. Remove as much tumor as possible avoiding the essenMal areas.
GOAL: predict the essenMal areas of the brain with NoN theory
CollaboraMon with Andrei Holodny, MSKCC
Friday, June 3, 16
3. Finally: Influencers in a Brain Network of Networks
Finding the essential minimal nodes in the brain- neural correlates of consciousness -
Important for any biological system: genetic networks, proteome, metabolome, etc
We extract the Network from fMRI/DTI experiments on dual tasking: auditory and visual
controllinks
controllinks
innerlinks
Gallos, Makse, Sigman, PNAS (2012)Reis, Andrade, Sigman, Canals, Makse, Nat Phys (2014)Friday, June 3, 16
Influencers in a Network of NetworksInfluencers are the best non-backtracking walkers walking along two types of links. Information flows via two type of messages:
intra-modular and inter-modular
Friday, June 3, 16
Collective Influence Map of the Human Brain:reducing the brain to its essential nodes
High CI
CI-map
a d
b
e
Anterior cingulate (AC)Posterior parietalcortex (PPC)
Posterior occipitalcortex (V1/V2)
ACPPC
V1/V2
ACPPC
V1/V2
Complexity Reduction
Inter-modularlinks
Intra-modularlinks
High CI
G(q)
q (Frac. of zero inputs)
Gia
nt a
ctiv
e co
mpo
nent
()
c
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
CI - Rare inputs Random - Typical inputs
High CI
CI-map
a d
b
e
Anterior cingulate (AC)Posterior parietalcortex (PPC)
Posterior occipitalcortex (V1/V2)
ACPPC
V1/V2
ACPPC
V1/V2
Complexity Reduction
Inter-modularlinks
Intra-modularlinks
High CI
G(q)
q (Frac. of zero inputs)
Gia
nt a
ctiv
e co
mpo
nent
()
c
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
CI - Rare inputs Random - Typical inputs
Friday, June 3, 16
Data Analytics at the Cutting-Edge for Free!kcore-analytics.com
Search Engine for Influencers
Kcore DREAMTEAM:
Flaviano Morone
Francesca LuciniAlex Bovet Kevin RothGeorge Furbish
Hernan Makse
Andrea Morone
Friday, June 3, 16