Measuring Proximity in Networks
Transcript of Measuring Proximity in Networks
![Page 1: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/1.jpg)
Measuring and Extracting Proximity in Networks
Yehuda Koren, Stephen North and Chris Volinsky
KDD 2006Philadelphia
![Page 2: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/2.jpg)
Outline
• What is proximity and why do we care? • What are the qualities of a good proximity measure?• A series of proposals• Our proposal: Cycle-Free Effective Conductance• Extraction of proximity graphs• Applying CFEC to large graphs• Applications: Call detail, IMDB, DBLP• Summary and Extensions
http://public.research.att.com/~volinsky/cgi-bin/prox/prox.pl
![Page 3: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/3.jpg)
What is Proximity?
• What is the distance between two nodes in a social network?
• proximity [prox·im·i·ty || prɑk'sɪmətɪ /prɒ-]n. adjacency, nearness, closeness, vicinity
![Page 4: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/4.jpg)
What is proximity good for?
• Missing Data• Link Prediction• Indirect relations• Information sharing• Viral marketing• Identifying clusters
![Page 5: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/5.jpg)
Our Goals
• Measure and visualize proximity between nodes.• Measurement should have the following qualities:
– “Close” nodes are intuitive• Short graph distance• Multiple paths • High weights on edges• Low degree nodes in the paths
– Monotonicity – Generalizes to n > 2.
![Page 6: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/6.jpg)
Our goals• Explain proximity by extracting proximity subgraphs that are
readily visualized and contain a large percentage of overall proximity.
• Idea comes from “connection subgraphs” (Faloutsos, McCurley and Tomkins 2004), the small subgraph that best captures the connections between two nodes of the graph
Prox = .0053
Prox = .0048
![Page 7: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/7.jpg)
Large social networks
31M 438K co-authors
1.1M 896K actor-actor
1000M 300M phone calls
800M 200M IM
data source |V| |E|
• -Proximity is relevant in all social networks, listed below are a few we have played with
-For now, we consider these as undirected graphs (stay tuned)
![Page 8: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/8.jpg)
Measuring proximity• Many proposals in the literature (n.b. Liben-Nowell and Kleinberg 2003)• Graph distance: shortest path
– Doesn’t account for path length, multiple paths, or high-degree nodes• Maximum Network Flow
– Disregards path length, high degree nodes, depends on bottlenecks• Electrical networks, or “effective conductance” (e.g. Doyle and Snell
1984)– High degree nodes still a problem
![Page 9: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/9.jpg)
When is the electric current analogy misleading?
Noise?Significant connection
• Same current-flow in both cases! • Degree-1 nodes are neutral (attract no-flow)
![Page 10: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/10.jpg)
Sink- augmented effective conductance [Faloutsos, McCurley & Tomkins, KDD 2004]
• Connect all nodes to a grounded universal sink (with 0V)• Tax each node - deliver portion of the flow to the sink
No nodes of degree 1 (above problem solved)Penalizes long pathsHow do we set taxing system?Doesn’t generalize to n > 2No monotonicity…
![Page 11: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/11.jpg)
Universal sink and (non-)monotonicity
With universal sink – no monotonicity:
• For larger networks, proximity tends to zero creating a “size bias”.
• Adding s—t paths can either increase or decrease proximity!
Network size
Pro
xim
ity
![Page 12: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/12.jpg)
Electrical networks = random walks
• Current-flow notions have direct random walk interpretation
• Take a random walk starting at s, following edges of the graph proportional to their weight (conductance).
• Let D(s), the degree of s, be the number of random walks originating at s. Then:
– The escape probability, EP(st), is the probability that a walk originating at s will reach t before visiting s again , and
– The effective conductance between s and t:• EC(s,t) = EP(st) * Deg(s)
![Page 13: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/13.jpg)
With the random walk perspective, you can see that the 1-degree nodes have no influence.
By discouraging “backtracking”, we now can properly account for high degree nodes
Electrical networks = random walks
![Page 14: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/14.jpg)
Our proximity: cycle free effective conductance
• The cycle-free escape probability, CFEP(st) is the probability that a random walk originating at s will reach t without visiting any node more than once
• Multiplying by degree of the source gives an absolute quantity (accounting for the number of "actually initiated" walks):
• The cycle-free effective conductance between s and t: CFEC(s,t) = CFEP(st) * Deg(s)
![Page 15: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/15.jpg)
Higher redgreen c.f. escape probability
Lower redgreen c.f. escape probability
Properties of CFEC as a proximity measure:• Accounts for multiple paths• Favors short paths• Penalizes high-degree nodes• Penalizes dead-end paths• Parameter free• Has the “right” monotonicity• Accommodates edge directions• Has a natural extension to multiple endpoints
![Page 16: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/16.jpg)
Computing CFEC
• Unlike previous measures, exact computation is impossible
• Practically, we can estimate it extremely well• Probability of paths declines exponentially (e.g.,
100th path is x106 less probable than the first one.)• Estimate using the most probable paths:
c.f.escsimple path [ ]
P ( ) = prob( )p s t
s t p
c.f.eschighly probablesimple path [ ]
P ( ) prob( )
p s t
s t p
![Page 17: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/17.jpg)
Finding k most probable paths
• Finding k shortest simple paths takes O(k|E|log|E|) time [Katoh, Ibarki and Mine, 1982]
• For an edge u-v of weight w(u,v), define its length
• Edge lengths are positive• Exp(-l(u,v)) = C*Prob(path)• Short path = High-probable path• Stop path-computation when probability drops below
“10-6” of first path
( , )( , ) log
deg( ) deg( )
w u vl u v
u v
![Page 18: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/18.jpg)
Extracting proximity graphs
Recall FMT’04 “connection subgraphs”, the small subgraph that best captures the connections between two nodes of the graph
![Page 19: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/19.jpg)
Extracting proximity graphs
• Achieve an efficient balance between “size” and “proximity” by maximizing the ratio:
• Larger α emphasize proximity larger subgraph– α=0 return shortest path
– α=∞ return all paths
CFEC( )
sub ap
gr h
s t
![Page 20: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/20.jpg)
Extracting proximity graphs• We already have the collection, Rk of shortest paths
{P1,P2,…,Pk}• Find the subset of the paths that maximizes
CFEC( )
sub ap
gr h
s t
… and combine the selected paths into a “proximity graph”
• This is an NP-hard problem, but recall that we have a list of paths sorted by probability
• Use a branch and bound path merging algorithm
![Page 21: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/21.jpg)
Working with large graphs• Dealing with full graph is sometimes infeasible and usually
unnecessary• Prior to running the algorithm, we construct a candidate graph in
main memory (also FCT ’04).
full networkN ~ 350M
Candidate graphN ~ 10,000
Proximity GraphN ~ 20
![Page 22: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/22.jpg)
S T
Finding the candidate graph
![Page 23: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/23.jpg)
S T
Dist(T,i)=2Dist(S,i)=2
![Page 24: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/24.jpg)
S T
Dist(T,i)=3Dist(S,i)=3
![Page 25: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/25.jpg)
S T
Dist(T,i)=4Dist(S,i)=4
![Page 26: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/26.jpg)
S T
Dist(T,i)=5Dist(S,i)=5
Shortest path of length 10
![Page 27: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/27.jpg)
S T
Dist(T,i)=12Dist(S,i)=12 i
• Stop adding nodes when path probabilities are below e
• Any path through unscanned node is likely to be low probability
• Once we have this candidate graph, apply CFEC algorithm to extract proximity graph.
![Page 28: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/28.jpg)
Summary: Proximity Graphs
• We have a measure of proximity which fulfills our desired criteria– Intuitive sense of closeness– Generalizes to n>2– Parameter free
• Using this measure of proximity we can efficiently extract the proximity graph.
• Let’s apply to real data
![Page 29: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/29.jpg)
Application: call detail
• AT&T’s call detail graph is large (350M nodes, several billion edges).
• To calculate proximity, we just need an adjacency list– Dynamic, efficient creation of adjacency lists for transaction
graphs (Cortes, Pregibon, and Volinsky 2003)
• Select a random sample of 2000 residential TNs and calculate proximity between them. – We found a path for 1808 of them– For those that we found a path, we calculated proximity, and
rendered a proximity graph for them.
![Page 30: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/30.jpg)
Building Proximity Graphs
full networkN ~ 350M
Candidate graphN ~ 10,000
Proximity GraphN ~ 20
![Page 31: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/31.jpg)
Distribution of proximities in phone-call network
![Page 32: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/32.jpg)
Application: call detail• Capturing proximity in a proximity graph….• Studying a
– Low alpha: smaller graphs, less proximity captured.
a = 10 seems to give a good tradeoff
![Page 33: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/33.jpg)
%C
aptu
red
Pro
xim
ity#
Gra
phs
Size of graph
![Page 34: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/34.jpg)
Proximity as link predictor
• Calculate proximities for a sample of pairs in the network that have never communicated.
• Look in the future to see which of these communicate in the next time period t.
• Did those that eventually communicate have closer proximities.
• i.e. is proximity predictive of future communication?
![Page 35: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/35.jpg)
Mean log proximity:Communicators = -2.4Non-comm. = -5.9
Proximity as link predictor
![Page 36: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/36.jpg)
Using Visualization
• Different Visualizations bring out different aspects of the proximity graph, especially for n>2.
![Page 37: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/37.jpg)
![Page 38: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/38.jpg)
![Page 39: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/39.jpg)
![Page 40: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/40.jpg)
Using a hierarchical layout for n=2 shows different eras of movie stars
![Page 41: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/41.jpg)
Prox webpagehttp://public.research.att.com/~volinsky/cgi-bin/prox/prox.pl
![Page 42: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/42.jpg)
Summary
• Proposed cycle free effective conductance (CFEC) with a random walk interpretation to measure “proximity” in social networks and other ad-hoc networks
• Described a way of approximating CFEC• Described a way of visualizing CFEC as a subgraph• Extended the method to external datasets• Showed empirical evidence for its utility
http://public.research.att.com/~volinsky/cgi-bin/prox/prox.pl
![Page 43: Measuring Proximity in Networks](https://reader038.fdocuments.in/reader038/viewer/2022102804/5472cdd0b4af9fb03d8b4573/html5/thumbnails/43.jpg)
Extensions
• Compare to other proximity measures (Katz, PageRank, and other methods compared in Liben-Nowell and Kleinberg (2003))
• Quantify proximity across different kinds of networks• Extend c.f. effective conductance to:
– Multiple endpoints (already demonstrated)– Directed edges (future work – use k-shortest paths in a directed
graph, alg. due to Hershberger et al)
http://public.research.att.com/~volinsky/cgi-bin/prox/prox.pl