Ranking of nodes in graphs - Penn Engineeringese303/block_2_markov_chains/slides/280... · Ranking...
-
Upload
duongnguyet -
Category
Documents
-
view
213 -
download
0
Transcript of Ranking of nodes in graphs - Penn Engineeringese303/block_2_markov_chains/slides/280... · Ranking...
Ranking of nodes in graphs
Alejandro Ribeiro & Cameron FinucaneDept. of Electrical and Systems Engineering
University of [email protected]
http://www.seas.upenn.edu/users/~aribeiro/
October 13, 2010
Stoch. Systems Analysis Ranking of nodes in graphs 1
Ranking of nodes in graphs: Random walk
Ranking of nodes in graphs: Random walk
Ranking of nodes in graphs: Markov chain
Stoch. Systems Analysis Ranking of nodes in graphs 2
Graphs
1
2 3
45
6
I Graph ⇒ A set of J nodes j = 1, . . . , J⇒ Connected by a set of edges E defined as ordered pairs (i , j)
I In figure ⇒ nodes are j = 1, 2, 3, 4, 5,⇒ edges E = {(1, 2), (1, 5),(2, 3), (2, 5), (3, 4), ...
(3, 6), (4, 5), (4, 6), (5, 4)}I Websites and links ⇒ “The” web.
I People and friendship ⇒ Social network
Stoch. Systems Analysis Ranking of nodes in graphs 3
How well connected nodes are?
1
2 3
45
6
I Q: Which node is the most connected? A: Define most connected
I Can define “most connected” in different ways
I Node rankings to measure website quality, social influence
I There are two important connectivity indicators
⇒ How many nodes point to a link (outgoing links irrelevant)
⇒ How Important are the links that point to a node
Stoch. Systems Analysis Ranking of nodes in graphs 4
Connectivity ranking
I Insight ⇒ There is information in the structure of the network
I Knowledge is distributed through the network
⇒ The network (not the nodes) know the rankings
I Idea exploited by Google’s PageRank c© to rank webpages
I ... by social scientists to study trust & reputation in social networks
I ... by ISI to rank scientific papers, transactions & magazines ...
1
2 3
45
6
I No one points to 1
I Only 1 points to 2
I Only 2 points to 3, but 2more important than 1
I 4 as high as 5 with less links
I Links to 5 have lower rank
I Same for 6
Stoch. Systems Analysis Ranking of nodes in graphs 5
Preliminary definitions
I Graph G = (V ,E ) ⇒ sets of vertices V = {1, 2, . . . , J} and edges E
I Edges (elements of E ) are ordered pairs (i , j)
I We say there is a connection from i to j
I Outgoing neighborhood of i is the set of nodes j to which i points
n(i) := {j : (i , j) ∈ E}
I Incoming neighborhood, n−1(i) is the set of nodes that point to i :
n−1(i) := {j : (j , i) ∈ E}
I Connected graph
⇒ There is a path from any node to any other node
Stoch. Systems Analysis Ranking of nodes in graphs 6
Definition of rank
I Agent A chooses node i , e.g., web page, at random for initial visit
I Next visit randomly chosen between links in the neighborhood n(i)
⇒ All neighbors chosen with equal probability
I If reach a dead end because node i has no neighbors
⇒ Chose next visit at random equiprobably among all nodes
I Redefine graph G = (V ,E ) adding edges from dead ends to all nodes
I Restrict attention to connected (modified) graphs
1
2 3
45
6
I Rank of node i is the average number of visits of A to i
Stoch. Systems Analysis Ranking of nodes in graphs 7
Equiprobable random walk
I Formally, let An be the node visited at time nI Define transition probability Pij from node i into node j
Pij := P[An+1 = j
∣∣An = i]
I Next visit equiprobable among neighbors
Pij =1
#[n(i)]=
1
Ni, for all j ∈ n(i)
I Defined number of neighbors Ni = #[n(i)]
1
2 3
45
6
to 1
to 2
to 3
to 4
to 51/2
1/2
1/2
1/2 1/2
1/2
1/2
1/2
1
1/5
1/5
1/5
1/5
1/5
I Still have a graph
I But also a MC
I Red (not blue) circles
Stoch. Systems Analysis Ranking of nodes in graphs 8
Formal definition of rank
I Consider variable I {Am = i} to indicate visit to state i at time m.
I Rank ri of i-th node defined as time average of number of visits, i.e.,
ri := limn→∞
1
n
n∑m=1
I {Am = i}
I Define vector of ranks r := [r1, r2, . . . , rJ ]T
I Rank ri can be approximated by average rni at time n
rni :=1
n
n∑m=1
I {Am = i}
I Since limn→∞ rni = ri , it holds rni ≈ ri for n sufficiently large
I Define vector of approximate ranks rn := [rn1, rn2, . . . , rnJ ]T
I If modified graph is connected, rank independent of initial visit
Stoch. Systems Analysis Ranking of nodes in graphs 9
Algorithm
Output : Vector r(i) with ranking of node iInput : Vector N(i) containing number of neighbors of iInput : Matrix N(i , k) containing indices j of neighbors of i
m = 1; r=zeros(J,1); % Initialize time and ranksA0 = random(’unid’,J); % Draw first visit uniformly at randomwhile m < n do
jump = random(’unid’,NAm−1); % Neighbor uniformly at randomAm = N(Am−1, jump); % Jump to selected neighborr(Am) = r(Am) + 1; % Update ranking for Am
m = m + 1;endr = r/n; % Normalize by number of iterations n
Stoch. Systems Analysis Ranking of nodes in graphs 10
Example: Social graph
I Asked students taking ESE303 about homework collaboration
I Created (crude) graph of the social network of students in this class
I Used ranking algorithm to understand connectedness
I E.g., If I want to know how well students are coping with the class itis best to ask people with higher connectivity ranking
I 2009 data
I Students in 2010 don’t show up in class in enough numbers
Stoch. Systems Analysis Ranking of nodes in graphs 11
Ranked class graph
Aarti Kochhar
Ranga Ramachandran
Saksham Karwal
Aditya Kaji
Alexandra Malikova
Pia Ramchandani
Amanda Smith
Amanda Zwarenstein
Jane Kim
Katie Joo
Lisa Zheng
Michael Harker
Rebecca Gittler
Lindsey Eatough
Ankit Aggarwal
Priya Takiar
Anthony Dutcher
Ciara Kennedy
Carolina Lee
Daniela Savoia
Robert Feigenberg
Ceren Dumaz
Charles JeonChris Setian
Eric Lamb
Pallavi Yerramilli
Ella Kim
Jacci JeffriesHarish Venkatesan
Ivan LevcovitzJesse Beyroutey
Jihyoung Ahn
Madhur Agarwal
Owen Tian
Xiang-Li Lim
Paul Deren
Varun Balan
Thomas Cassel
Shahid Bosan
Sugyan Lohiaa
Stoch. Systems Analysis Ranking of nodes in graphs 12
Convergence metrics
I Recall r is vector of ranks and rn of rank iterates
I By definition limn→∞
rn = r . How fast rn converges to r (r given)?
I Can measure by distance between r and rn ⇒
ζn := ‖r − rn‖2 =
( J∑i=1
(rni − ri )2
)1/2
I If interest is only on largest ranked nodes, e.g., a web search
I Denote r (i) as the index of the i-th highest ranked node
I Similarly, r(i)n is the index of the i-th highest ranked node at time n
I First element wrongly ranked at time n
ξn := mini
r (i) 6= r (i)n
Stoch. Systems Analysis Ranking of nodes in graphs 13
Evaluation of convergence metrics
Distance
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000010−2
10−1
100
101
time (n)
corre
ctly
rank
ed n
odes
First element wrongly ranked
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
2
4
6
8
10
12
14
time (n)
corre
ctly
rank
ed n
odes
I Distance gets close to10−2 in approx. 5× 103
iterations
I Bad: Two largest ranksin 3× 103 iterations
I Awful: Six best ranks in8× 103 iterations
I Convergence appears(very) slow
Stoch. Systems Analysis Ranking of nodes in graphs 14
When does this algorithm converge?
I Can confidently claim convergence not until 105 iterations
I True for particular case. Slow convergence inherent to algorithm
I Example has 40 nodes, want to use in network with 109 nodes
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2x 105
0
5
10
15
20
25
30
35
40
time (n)
corre
ctly r
anke
d no
des
I Use fact that this process is a MC to obtain faster algorithm
Stoch. Systems Analysis Ranking of nodes in graphs 15
Ranking of nodes in graphs: Markov chain
Ranking of nodes in graphs: Random walk
Ranking of nodes in graphs: Markov chain
Stoch. Systems Analysis Ranking of nodes in graphs 16
Limit probabilities
I Recall definition of rank ⇒ ri := limt→∞
1
t
t∑u=1
I{A(u) = i}
I Rank is time average of number of state visits in a MC
⇒ Can be equally obtained from limiting probabilities
I Recall transition probabilities ⇒ Pij =1
Ni, for all j ∈ n(i)
I Stationary distribution π = [π1, π1, . . . , πJ ]T solution of
πi =∑
j∈n−1(i)
Pj iπj =∑
j∈n−1(i)
πj
Njfor all i
I Plus normalization equation∑J
i=1 πi = 1
I As per ergodicity ⇒ r = π
Stoch. Systems Analysis Ranking of nodes in graphs 17
Matrix notation, eigenvalue problem
I As always, can define matrix P with elements Pij
πi =∑
j∈n−1(i)
Pjiπj =J∑
j=1
Pjiπj for all i
I Right hand side is just definition of a matrix product leading to
π = PTπ, πT1 = 1
I Also added normalization equation
I Can solve as system of linear equations or eigenvalue problem on PT
I Non-iterative method ⇒ Convergence not an issue
I But requires matrix P available at a central location
I Computationally costly (matrix P with 109 rows and columns)I All methods are costly to compute exact solutionI This one is costly to find even approximate solution
Stoch. Systems Analysis Ranking of nodes in graphs 18
What are limit probabilities?
I Let pi (n) denote probability of agent A visiting node i at time t
pi (n) := P [An = i ]
I Probabilities at time n + 1 and n can be related
P [An+1 = i ] =∑
j∈n−1(i)
P[An = i
∣∣An = i]
P [An = j ]
I Which is, of course, probability propagation in a MC
pi (n + 1) =∑
j∈n−1(i)
Pjipj(n)
I By definition limit probabilities are (let p(n) = [p1(n), . . . , pJ(n)]T )
limn→∞
p(n) = π = r
I Compute ranks from limit of probability propagation
Stoch. Systems Analysis Ranking of nodes in graphs 19
Probability propagation
I Can also write probability propagation in matrix form
pi (n + 1) =∑
j∈n−1(i)
Pjipj(n) =J∑
j=1
Pjipj(n) for all i
I Right hand side is just definition of a matrix product leading to
p(n + 1) = PTp(n)
I Can approximate rank by probability distribution
⇒ r = limn→∞ p(n) ≈ p(n) for n sufficiently large
Stoch. Systems Analysis Ranking of nodes in graphs 20
Algorithm
I Algorithm is just a recursive matrix product
Output : Vector r(i) with ranking of node iInput : Matrix P containing transition probabilities
m = 1; % Initialize timer=(1/J)ones(J,1); % Initial distribution uniform across all nodeswhile m < n do
r = PT r; % Probability propagationm = m + 1;
end
Stoch. Systems Analysis Ranking of nodes in graphs 21
Interpretation of probability propagation
I Why does the random walk converges so slow?
I What does it take to obtain a time average rni close to ri?
I Need to register a large number of agent visits to every state
I Back of hand: 40 nodes, some 100 visits to each ⇒ 4× 103 iters.
I Idea: Unleash a large number of agents K
ri = limn→∞
1
n
n∑m=1
1
K
K∑k=1
I {Akm = i}
I Visits are now spread over time and space
⇒ Converges “K times faster” (depends agents’ initial distribution)
⇒ But haven’t changed computational cost
Stoch. Systems Analysis Ranking of nodes in graphs 22
Interpretation of prob. propagation (continued)
I What happens if we unleash infinite number of agents K?
ri = limn→∞
1
n
n∑m=1
limK→∞
1
K
K∑k=1
I {Akm = i}
I Using law of large numbers and expected value of indicator function
ri = limn→∞
1
n
n∑m=1
E [I {Am = i}] = limn→∞
1
n
n∑m=1
P [Am = i ]
I Graph walk is a MC, then limm→∞
P [Am = i ] = limm→∞
pi (m) exists, and
ri = limn→∞
1
n
n∑m=1
pi (m) = limn→∞
pi (n)
I Probability propagation ≈ Unleashing infinite number of agents
I Interpretation true for any MC
Stoch. Systems Analysis Ranking of nodes in graphs 23
Distance to rank
I Initialize with uniform probability distribution ⇒ p(0) = (1/J)1
I Distance between p(n) and r
0 20 40 60 80 100 120 14010−4
10−3
10−2
10−1
100
time (n)
Dis
tanc
e
I Distance is 10−2 in approximately 30 iterations, 104 in 140 iterations
I Convergence is two orders of magnitude faster than random walk
Stoch. Systems Analysis Ranking of nodes in graphs 24
Number of nodes correctly ranked
I Rank of highest ranked node that is wrongly ranked by time n
0 20 40 60 80 100 120 1400
5
10
15
20
25
30
35
40
time (n)
corre
ctly
rank
ed n
odes
I Not bad: All nodes correctly ranked in 120 iterations
I Good: Ten best ranks in 80 iterations
I Great: Four best ranks in 20 iterations
Stoch. Systems Analysis Ranking of nodes in graphs 25
Distributed algorithm to compute ranks
I Nodes want to compute their rank ri
⇒ Can communicate with neighbors only (incoming + outgoing)
⇒ Access to neighborhood information only
I Recall probability update
pi (n + 1) =∑
j∈n−1(i)
Pjipj(n) =∑
j∈n−1(i)
1
Njpj(n)
I Uses local information only
I Algorithm. Nodes keep local rank estimates pi (n)
I Receive rank (probability) estimates pj(n) from neighbors j ∈ n−1(i)
I Update local rank estimate pi (n + 1) =∑
j∈n−1(i) pj(n)/Nj
I Communicate rank estimate pi (n + 1) to outgoing neighbors j ∈ n(i)
I Need only know number of neighbors of my neighbors
Stoch. Systems Analysis Ranking of nodes in graphs 26
Distributed implementation of random walk
I Can communicate with neighbors only (incoming + outgoing)
I But cannot access neighborhood information
I Pass agent around
I Local rank estimates ri (n) and counter with number of visits Vi
I Algorithm run by node i at time n
if Agent received from neighbor thenVi = Vi + 1Choose random neighborSend agent to chosen neighbor
endn = n + 1; ri (n) = Vi/n;
I Speed up convergence by generating many agents to pass around
Stoch. Systems Analysis Ranking of nodes in graphs 27
Comparison of different algoithms
I Random walk implementation
⇒ Most secure & robust. No information shared with other nodes
⇒ Implementation can be distributed
⇒ Convergence exceedingly slow
I System of linear equations
⇒ Least security and robustness. Graph in central server
⇒ Distributed implementation not clear
⇒ Non-iterative method, convergence not a problem
⇒ But computationally costly to obtain approximate solutions
I Probability propagation, matrix powers
⇒ Somewhat secure/robust. Info. shared with neighbors only
⇒ Implementation can be distributed
⇒ Convergence rate acceptable (orders of magnitude faster than RW)
Stoch. Systems Analysis Ranking of nodes in graphs 28