Faster Pathfinder algorithm for sparse...
Transcript of Faster Pathfinder algorithm for sparse...
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Faster Pathfinder algorithm for sparsenetworks
Vladimir Batagelj, Anze Vavpetic
University of Ljubljana
Sunbelt XXX, Riva del Garda, Italy, June 29 - July 4, 2010
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Outline
1 Pathfinder2 Semirings3 Spanish algorithms4 Sparse Pathfinder5 Tests6 References
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Pathfinder
The Pathfinder algorithm was proposed in eighties (Schvaneveldt1981, Schvaneveldt etal. 1989; Schvaneveldt, 1990) [14, 13, 15] forsimplification of weighted networks – it removes from the network alllines that do not satisfy the triangle inequality – if for a line a shorterpath exists connecting its endpoints then the line is removed. Thebasic idea of the Pathfinder algorithm is simple. It produces anetwork PFnet(W, r , q) = (V,LPF )
compute W(q);LPF := ∅;for e(u, v) ∈ L do begin
if W(q)[u, v ] = W[u, v ] then LPF := LPF ∪ {e}end;
where W is a network dissimilarity matrix and W(q) the matrix of
values of all walks of length at most q computed over the semiring
(R+0 ,⊕,2r ,∞, 0) with a2r b = r
√ar + br and a⊕ b = min(a, b).
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Pathfinder
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Pathfinder
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Pathfinder
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Pathfinder – theoretical results 1
Theoretical results [7]: For a given dissimilarity matrix W thePFnet(W, r , q)
• Is unique
• Preserves geodetic distances
• Links nearest neighbors
• Contains the same information as the minimum method ofhierarchical clustering
• PFnet(W, r =∞, q = n − 1) is the union of allMINTREES
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Pathfinder – theoretical results 2
It holds also:
• Graph PFnet(W, r2, q) is a spanning subgraph of graphPFnet(W, r1, q) iff r1 < r2
• Graph PFnet(W, r , q2) is a spanning subgraph of graphPFnet(W, r , q1) iff q1 < q2
• Similarity transformations preserve structure: the graphPFnet(W, r , q) is equal to the graph PFnet(αW, r , q) forα > 0.
• Monotonic transformations preserve structure for allr =∞: the graph PFnet(W, r =∞, q) is equal to thegraph PFnet(f (W), r =∞, q) where f : R→ R is strictlyincreasing mapping and f (W = [f (wij)].
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Pathfinder – the original algorithm
In the original algorithm the matrix W(q) is computed on thebasis of its definition
W(q) =
q∑i=0
Wi
by computing all its powers Wi , i = 1, . . . , q. The complexityof the algorithm is O(qn3), therefore O(n4), for q ≥ n − 1.Therefore it can be applied only to relatively small (up to somehundreds vertices) networks.Interest for Pathfinder transformation was renewed around theyear 2000 by Chen [5].
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Semirings – Computing the closure over a semiringwith absorption
Because in our case the set of vertices V is finite, so is the setof all paths Euv . Therefore we can compute the value of allwalks w(S?uv ) = w(Euv ). One possibility is to use for largeenough k the equality:
W? = W(k) = (1 + W)k
To speed-up the computation we can consider the sequence(1 + W)2i , i = 1, .., s.It turned out that this is not the fastest way to compute theW?.
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Semirings – Computing the closure over a completesemiring
Kleene, Warshall, Floyd and Roy contributed to the development ofthe procedure which final form was given by Fletcher [6].
C0 := W ;for k := 1 to n do begin
for i := 1 to n do for j := 1 to n dock [i , j ] := ck−1[i , j ] + ck−1[i , k] · (ck−1[k, k])? · ck−1[k , j ] ;
ck [k, k] := 1 + ck [k , k] ;end;W? := Cn ;
If we delete the statement ck [k , k] := 1 + ck [k , k] we obtain thealgorithm for computing the strict closure W = WW?.
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Semirings – Dissimilarities
Joly and Le Calve theorem [8]:For any even dissimilarity measure d there is a unique numberp ≥ 0, called its metric index, such that: d r is metric for allr ≤ p, and d r is not metric for all r > p.
In the opposite direction we can say: Let d be a dissimilarityand for x , y and z we have d(x , z) + d(z , y) ≥ d(x , y) andd(x , y) > max(d(x , z), d(z , y)) then there exists a uniquenumber p ≥ 0 such that for all r > p
d r (x , z) + d r (z , y) < d r (x , y)
or equivalently
d(x , z)2r d(z , y) < d(x , y)
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Semirings – Minkowski operation
Minkowski operation a2r b = r√
ar + br :
r = 1⇒ a2r b = a + b,r = 2⇒ a2r b =
√a2 + b2,
r =∞⇒ a2r b = max(a, b).
And let a⊕ b = min(a, b).The structure (R+
0 ,⊕,2r ,∞, 0) is a complete semiring witha? = 0. It is called also Pathfinder semiring.
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Spanish algorithms
Since the Pathfinder semiring is idempotent it holds
W(q) = (1⊕W)q
This power can be computed faster using binary algorithm (forexample, to compute a57 = a32 · a16 · a8 · a1 we need only 8multiplications instead of 56). This improvement was proposed byGuerrero-Bote etal. (2006) [7] and reduces complexity to O(n3 log q).When q ≥ n − 1, W(q) = W? and it can be determined by theFletcher’s algorithm over Pathfinder semiring. This improvement wasproposed by Quirin etal. (2008) [9] and reduces complexity to O(n3).
Additional improvement can be made for undirected networks in the
case q ≥ n − 1 and r =∞. In this case the network PF is the union
of all minimal spanning trees of N. It can be obtained using an
adapted version of Kruskal’s minimal spanning tree algorithm as
described in Quirin etal. (2008) [10]. The complexity of this
algorithm is O(m log n) where m is the number of edges.
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Sparse Pathfinder
For sparse networks in general case there is still some space forimprovements. We rewrite the basic Pathfinder algorithm in the form
LPF := ∅;for v ∈ V do begin
compute the list S = ((u, du) : u ∈ N(v)), where du = W(q)[v , u];for (u, du) ∈ S do
if du = W[v , u] then LPF := LPF ∪ {(v , u)}end;
N(v) denotes the set of successors of vertex v .For determining the values du = W(q)[v , u] for q = n − 1 we can use anadapted Dijkstra’s algorithm that determines the list S in a single run. Thejob is done when all values of vertices from N(v) are determined. Only a(small) portion of network should be inspected for each vertex v . Toefficiently implement this algorithm a special data structure IndexedPriority Queue is needed.
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Sparse Pathfinder – BFS algorithm
In the case q < n − 1 a variant of BFS (Breath First Search)algorithm is used to determine the list S .The FIFO queue Q is composed of triples (t, d , l): t is avertex, d is a dist-length and l is a line-length.To make the implementation fast all the structures: the queueQ and lists Plist and Vlist are represented with arrays.
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Sparse Pathfinder – BFS algorithmcompute the list S
S := ∅; T := N(v); emptyQ; dMax := max{wvu : u ∈ T};putLastQ(v, 0, 0); dist[v ] := 0; level := 0;while sizeQ() > 0 do begin
(u, du , l) := firstFromQ(); l := l + 1;if l > level then begin
level := l ;for v ∈ Plist do P[v ] := 0;nPlist := 0;
endfor t ∈ N(u) do begin
dNew := du 2r w(u, t);if dNew ≤ dMax then begin
if V [t] then beginif dNew < dist[t] then begin
dist[t] := dNew ;if l < q then begin
if P[t] > 0 then updateQ(t, dNew)else putLastQ(t, dNew, l);
endend
end else begindist[t] := dNew ; if l < q then putLastQ(t, dNew, l);
endend
endend;for v ∈ Plist do P[v ] := 0; for v ∈ Vlist do V [v ] := false;nPlist := 0; nVlist := 0;for t ∈ T do S := S ∪ {(t, dist[t])};
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Tests – q = 2 and q = 3
0e+00 2e+04 4e+04 6e+04 8e+04 1e+05
020
040
060
080
010
00q= 2
n
t(s)
●●●●●● ● ● ● ● ● deg = 5●●●
●
●
●●●●● ● ● ● ● ● deg = 10●●●
●
●
●●●●● ● ● ●●
● deg = 20
●●●
●
●
●●●●● ● ●●
●
● deg = 50
●●●
●
●
●●●●●●
●
●
● deg = 100
●●●
●
●
●●●
●
●
● deg = 200
●
●
●
0e+00 2e+04 4e+04 6e+04 8e+04 1e+05
050
0010
000
1500
0
q= 3
n
t(s)
●●●●●● ● ● ● ● ● deg = 5●●●●
●
●
●●●●● ● ● ● ● ● deg = 10●●●●
●
●
●●●●● ● ● ●●
● deg = 20
●●●●
●
●
●●●●●●
●
●
●
● deg = 50
●●●●
●
●
●●●●●
●
●
● deg = 100
●●●●
●
●
●●●
●
● deg = 200
●●
●
●
eatRSd5.net: n = 23219, deg = 28.048 Edinbourgh Associtive Thesaurus, d5
Cluster1.net: n = 37689, deg = 15.875 Citations in Clusteringd(u, v) = 1− n(u, v)/max(inS(u), inS(v), outS(u), outS(v))Cluster2.net: n = 37690, deg = 16.016 Citations in Clusteringd(u, v) = 1/n(u, v)
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Tests – q = 4 and q = 5
0e+00 2e+04 4e+04 6e+04 8e+04 1e+05
050
0010
000
1500
020
000
q= 4
n
t(s)
●●●●●● ● ● ● ● ● deg = 5●●●●
●
●
●●●●● ● ● ●●
● deg = 10
●●●●
●
●
●●●●● ●●
●
●
● deg = 20
●●●●
●
●
●●●●●
●
●
● deg = 50
●●●●
●
●
●●●●●
●
● deg = 100
●●●●
●
●
●●●
● deg = 200
●●
●
●
0e+00 2e+04 4e+04 6e+04 8e+04 1e+05
050
0010
000
1500
020
000
q= 5
n
t(s)
●
●●●●● ● ● ● ● ● deg = 5●●●●
●
●
●●●●● ● ●
●
●
● deg = 10
●●●●
●
●
●●●●●●
●
● deg = 20
●●●●
●
●
●●●●●
●
● deg = 50
●●●●
●
●
●●●●●
●
● deg = 100
●●●●
●
●
●●
●
● deg = 200
●●
●
●
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Tests – q = 10 and q = max
0 10000 20000 30000 40000 50000
010
000
2000
030
000
4000
0
q= 10
n
t(s)
●
●●●●● ●●
●
● deg = 5
●●●●
●
●
●●●●● ●
●
● deg = 10
●●●●
●
●
●●●●●●
●
● deg = 20
●●●●
●
●
●●●●●
●
● deg = 50
●●●●
●
●
●●●●●
●
● deg = 100
●●●●
●
●
●●
●
● deg = 200
●●
●
●
0 10000 20000 30000 40000
050
0010
000
1500
020
000
q=max
n
t(s)
●
●●●● ● ● ●
● deg = 5
●●●●
●
●●●● ● ●●
● deg = 10
●●●●
●
●●●● ● ●●
● deg = 20
●●●●
●
●●●● ●●
●
● deg = 50
●●●●
●
●●●● ●●
●
● deg = 100
●●●●
●
●● ●
●
●
● deg = 200
●●
●
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
Conclusions
• the tests with sparse random networks of Erdos-Renyi typeshow that the new algorithms extend the range of sparsenetworks for which we can determine the Pathfindernetwork in reasonable time to at least n = 50000.
• it seems that on real-life networks (green marks) thealgorithm works much faster than on random networkswith the same average degree.
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
References I
A. V. Aho, J. E. Hopcroft, J. D. Ullman, The Design and Analysis ofComputer Algorithms. Addison-Wesley, Reading, Massachusetts (1976).
Batagelj, V.: Semirings for Social Networks Analysis. Journal ofMathematical Sociology, 19(1994)1, 53-68.
R.E. Burkard, R.A. Cuninghame-Greene, U. Zimmermann, eds., Algebraicand Combinatorial Methods in Operations Research. Annals of DiscreteMathematics 19 (1984).
B. Carre, Graphs and Networks. Clarendon, Oxford (1979).
Chen, Chaomei: Generalised similarity analysis and Pathfinder networkscaling. Interacting with Computers, 10 (2): pp. 107-128.
J. G. Fletcher, “A more general algorithm for computing closed semiringcosts between vertices of a directed graph,” CACM (1980), pp. 350-351.
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
References II
Vicente P. Guerrero-Bote, Felipe Zapico-Alonso, Mara EugeniaEspinosa-Calvo, Roco Gomez Crisostomo, Flix de Moya-Anegon: BinaryPathfinder: An improvement to the Pathfinder algorithm. InformationProcessing and Management, Volume 42, Issue 6, December 2006, Pages1484-1490. http://linkinghub.elsevier.com/retrieve/pii/S0306457306000367
Joly S., Le Calve G. (1986) Etude des puissances d’une distance. Statistiqueet Analyse de Donnees, 11/3, 30-50.
A. Quirin, O. Cordon, J. Santamaria, B. Vargas-Quesada, F. Moya-Anegon:A new variantof the Pathfinder algorithm to generate large visual science maps in cubic time.http://www.scimago.es/benjamin/A new variant of the Pathfinder Algorithm.pdf
Arnaud Quirin, Oscar Cordon, Vicente P. Guerrero-Bote, BenjamnVargas-Quesada and Felix Moya-Anegon: A Quick MST-Based Algorithm toObtain Pathfinder Networks (∞, n− 1). Journal of the American Society forInformation Science and Technology, Volume 59, Issue 12 (p 1912-1924).http://www3.interscience.wiley.com/cgi-bin/fulltext/120736756/PDFSTART
V. Batagelj, A. Vavpetic Sparse Pathfinder
SparsePathfinder
V. Batagelj,A. Vavpetic
Pathfinder
Semirings
Spanishalgorithms
SparsePathfinder
Tests
References
References III
F. S. Roberts, Discrete Mathematical Models. Prentice-Hall, EnglewoodCliffs, New Jersey (1976). Amazon.
Schvaneveldt, R. W., Dearholt, D. W., Durso, F. T. (1988) Graph theoreticfoundations of Pathfinder networks. Comput. Math. Applic. 15 (4), 337-345.
Schvaneveldt, R. W. (Ed.) (1990) Pathfinder Associative Networks: Studiesin Knowledge Organization. Norwood, NJ: Ablex.http://interlinkinc.net/PFBook.zip
Schvaneveldt, R. W., Durso, F. T., Dearholt, D. W. (1989) Networkstructures in proximity data. In G. Bower (Ed.), The psychology of learningand motivation: Advances in research and theory, Vol. 24 (pp. 249-284).New York: Academic Press.http://www.interlinkinc.net/Roger/Papers/Schvaneveldt Durso Dearholt 1989.pdf
Interlink – Tools for Pathfinder Network Analysis.http://www.interlinkinc.net/
U. Zimmermann, Linear and Combinatorial Optimization in OrderedAlgebraic Structures. Annals of Discrete Mathematics 10 (1981).
V. Batagelj, A. Vavpetic Sparse Pathfinder