Faster Pathfinder algorithm for sparse...

24
Sparse Pathfinder V. Batagelj, A. Vavpetiˇ c Pathfinder Semirings Spanish algorithms Sparse Pathfinder Tests References Faster Pathfinder algorithm for sparse networks Vladimir Batagelj, Anˇ ze Vavpetiˇ c University of Ljubljana Sunbelt XXX, Riva del Garda, Italy, June 29 - July 4, 2010 V. Batagelj, A. Vavpetiˇ c Sparse Pathfinder

Transcript of Faster Pathfinder algorithm for sparse...

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Faster Pathfinder algorithm for sparsenetworks

Vladimir Batagelj, Anze Vavpetic

University of Ljubljana

Sunbelt XXX, Riva del Garda, Italy, June 29 - July 4, 2010

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Outline

1 Pathfinder2 Semirings3 Spanish algorithms4 Sparse Pathfinder5 Tests6 References

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Pathfinder

The Pathfinder algorithm was proposed in eighties (Schvaneveldt1981, Schvaneveldt etal. 1989; Schvaneveldt, 1990) [14, 13, 15] forsimplification of weighted networks – it removes from the network alllines that do not satisfy the triangle inequality – if for a line a shorterpath exists connecting its endpoints then the line is removed. Thebasic idea of the Pathfinder algorithm is simple. It produces anetwork PFnet(W, r , q) = (V,LPF )

compute W(q);LPF := ∅;for e(u, v) ∈ L do begin

if W(q)[u, v ] = W[u, v ] then LPF := LPF ∪ {e}end;

where W is a network dissimilarity matrix and W(q) the matrix of

values of all walks of length at most q computed over the semiring

(R+0 ,⊕,2r ,∞, 0) with a2r b = r

√ar + br and a⊕ b = min(a, b).

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Pathfinder

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Pathfinder

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Pathfinder

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Pathfinder – theoretical results 1

Theoretical results [7]: For a given dissimilarity matrix W thePFnet(W, r , q)

• Is unique

• Preserves geodetic distances

• Links nearest neighbors

• Contains the same information as the minimum method ofhierarchical clustering

• PFnet(W, r =∞, q = n − 1) is the union of allMINTREES

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Pathfinder – theoretical results 2

It holds also:

• Graph PFnet(W, r2, q) is a spanning subgraph of graphPFnet(W, r1, q) iff r1 < r2

• Graph PFnet(W, r , q2) is a spanning subgraph of graphPFnet(W, r , q1) iff q1 < q2

• Similarity transformations preserve structure: the graphPFnet(W, r , q) is equal to the graph PFnet(αW, r , q) forα > 0.

• Monotonic transformations preserve structure for allr =∞: the graph PFnet(W, r =∞, q) is equal to thegraph PFnet(f (W), r =∞, q) where f : R→ R is strictlyincreasing mapping and f (W = [f (wij)].

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Pathfinder – the original algorithm

In the original algorithm the matrix W(q) is computed on thebasis of its definition

W(q) =

q∑i=0

Wi

by computing all its powers Wi , i = 1, . . . , q. The complexityof the algorithm is O(qn3), therefore O(n4), for q ≥ n − 1.Therefore it can be applied only to relatively small (up to somehundreds vertices) networks.Interest for Pathfinder transformation was renewed around theyear 2000 by Chen [5].

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Semirings – Computing the closure over a semiringwith absorption

Because in our case the set of vertices V is finite, so is the setof all paths Euv . Therefore we can compute the value of allwalks w(S?uv ) = w(Euv ). One possibility is to use for largeenough k the equality:

W? = W(k) = (1 + W)k

To speed-up the computation we can consider the sequence(1 + W)2i , i = 1, .., s.It turned out that this is not the fastest way to compute theW?.

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Semirings – Computing the closure over a completesemiring

Kleene, Warshall, Floyd and Roy contributed to the development ofthe procedure which final form was given by Fletcher [6].

C0 := W ;for k := 1 to n do begin

for i := 1 to n do for j := 1 to n dock [i , j ] := ck−1[i , j ] + ck−1[i , k] · (ck−1[k, k])? · ck−1[k , j ] ;

ck [k, k] := 1 + ck [k , k] ;end;W? := Cn ;

If we delete the statement ck [k , k] := 1 + ck [k , k] we obtain thealgorithm for computing the strict closure W = WW?.

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Semirings – Dissimilarities

Joly and Le Calve theorem [8]:For any even dissimilarity measure d there is a unique numberp ≥ 0, called its metric index, such that: d r is metric for allr ≤ p, and d r is not metric for all r > p.

In the opposite direction we can say: Let d be a dissimilarityand for x , y and z we have d(x , z) + d(z , y) ≥ d(x , y) andd(x , y) > max(d(x , z), d(z , y)) then there exists a uniquenumber p ≥ 0 such that for all r > p

d r (x , z) + d r (z , y) < d r (x , y)

or equivalently

d(x , z)2r d(z , y) < d(x , y)

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Semirings – Minkowski operation

Minkowski operation a2r b = r√

ar + br :

r = 1⇒ a2r b = a + b,r = 2⇒ a2r b =

√a2 + b2,

r =∞⇒ a2r b = max(a, b).

And let a⊕ b = min(a, b).The structure (R+

0 ,⊕,2r ,∞, 0) is a complete semiring witha? = 0. It is called also Pathfinder semiring.

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Spanish algorithms

Since the Pathfinder semiring is idempotent it holds

W(q) = (1⊕W)q

This power can be computed faster using binary algorithm (forexample, to compute a57 = a32 · a16 · a8 · a1 we need only 8multiplications instead of 56). This improvement was proposed byGuerrero-Bote etal. (2006) [7] and reduces complexity to O(n3 log q).When q ≥ n − 1, W(q) = W? and it can be determined by theFletcher’s algorithm over Pathfinder semiring. This improvement wasproposed by Quirin etal. (2008) [9] and reduces complexity to O(n3).

Additional improvement can be made for undirected networks in the

case q ≥ n − 1 and r =∞. In this case the network PF is the union

of all minimal spanning trees of N. It can be obtained using an

adapted version of Kruskal’s minimal spanning tree algorithm as

described in Quirin etal. (2008) [10]. The complexity of this

algorithm is O(m log n) where m is the number of edges.

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Sparse Pathfinder

For sparse networks in general case there is still some space forimprovements. We rewrite the basic Pathfinder algorithm in the form

LPF := ∅;for v ∈ V do begin

compute the list S = ((u, du) : u ∈ N(v)), where du = W(q)[v , u];for (u, du) ∈ S do

if du = W[v , u] then LPF := LPF ∪ {(v , u)}end;

N(v) denotes the set of successors of vertex v .For determining the values du = W(q)[v , u] for q = n − 1 we can use anadapted Dijkstra’s algorithm that determines the list S in a single run. Thejob is done when all values of vertices from N(v) are determined. Only a(small) portion of network should be inspected for each vertex v . Toefficiently implement this algorithm a special data structure IndexedPriority Queue is needed.

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Sparse Pathfinder – BFS algorithm

In the case q < n − 1 a variant of BFS (Breath First Search)algorithm is used to determine the list S .The FIFO queue Q is composed of triples (t, d , l): t is avertex, d is a dist-length and l is a line-length.To make the implementation fast all the structures: the queueQ and lists Plist and Vlist are represented with arrays.

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Sparse Pathfinder – BFS algorithmcompute the list S

S := ∅; T := N(v); emptyQ; dMax := max{wvu : u ∈ T};putLastQ(v, 0, 0); dist[v ] := 0; level := 0;while sizeQ() > 0 do begin

(u, du , l) := firstFromQ(); l := l + 1;if l > level then begin

level := l ;for v ∈ Plist do P[v ] := 0;nPlist := 0;

endfor t ∈ N(u) do begin

dNew := du 2r w(u, t);if dNew ≤ dMax then begin

if V [t] then beginif dNew < dist[t] then begin

dist[t] := dNew ;if l < q then begin

if P[t] > 0 then updateQ(t, dNew)else putLastQ(t, dNew, l);

endend

end else begindist[t] := dNew ; if l < q then putLastQ(t, dNew, l);

endend

endend;for v ∈ Plist do P[v ] := 0; for v ∈ Vlist do V [v ] := false;nPlist := 0; nVlist := 0;for t ∈ T do S := S ∪ {(t, dist[t])};

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Tests – q = 2 and q = 3

0e+00 2e+04 4e+04 6e+04 8e+04 1e+05

020

040

060

080

010

00q= 2

n

t(s)

●●●●●● ● ● ● ● ● deg = 5●●●

●●●●● ● ● ● ● ● deg = 10●●●

●●●●● ● ● ●●

● deg = 20

●●●

●●●●● ● ●●

● deg = 50

●●●

●●●●●●

● deg = 100

●●●

●●●

● deg = 200

0e+00 2e+04 4e+04 6e+04 8e+04 1e+05

050

0010

000

1500

0

q= 3

n

t(s)

●●●●●● ● ● ● ● ● deg = 5●●●●

●●●●● ● ● ● ● ● deg = 10●●●●

●●●●● ● ● ●●

● deg = 20

●●●●

●●●●●●

● deg = 50

●●●●

●●●●●

● deg = 100

●●●●

●●●

● deg = 200

●●

eatRSd5.net: n = 23219, deg = 28.048 Edinbourgh Associtive Thesaurus, d5

Cluster1.net: n = 37689, deg = 15.875 Citations in Clusteringd(u, v) = 1− n(u, v)/max(inS(u), inS(v), outS(u), outS(v))Cluster2.net: n = 37690, deg = 16.016 Citations in Clusteringd(u, v) = 1/n(u, v)

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Tests – q = 4 and q = 5

0e+00 2e+04 4e+04 6e+04 8e+04 1e+05

050

0010

000

1500

020

000

q= 4

n

t(s)

●●●●●● ● ● ● ● ● deg = 5●●●●

●●●●● ● ● ●●

● deg = 10

●●●●

●●●●● ●●

● deg = 20

●●●●

●●●●●

● deg = 50

●●●●

●●●●●

● deg = 100

●●●●

●●●

● deg = 200

●●

0e+00 2e+04 4e+04 6e+04 8e+04 1e+05

050

0010

000

1500

020

000

q= 5

n

t(s)

●●●●● ● ● ● ● ● deg = 5●●●●

●●●●● ● ●

● deg = 10

●●●●

●●●●●●

● deg = 20

●●●●

●●●●●

● deg = 50

●●●●

●●●●●

● deg = 100

●●●●

●●

● deg = 200

●●

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Tests – q = 10 and q = max

0 10000 20000 30000 40000 50000

010

000

2000

030

000

4000

0

q= 10

n

t(s)

●●●●● ●●

● deg = 5

●●●●

●●●●● ●

● deg = 10

●●●●

●●●●●●

● deg = 20

●●●●

●●●●●

● deg = 50

●●●●

●●●●●

● deg = 100

●●●●

●●

● deg = 200

●●

0 10000 20000 30000 40000

050

0010

000

1500

020

000

q=max

n

t(s)

●●●● ● ● ●

● deg = 5

●●●●

●●●● ● ●●

● deg = 10

●●●●

●●●● ● ●●

● deg = 20

●●●●

●●●● ●●

● deg = 50

●●●●

●●●● ●●

● deg = 100

●●●●

●● ●

● deg = 200

●●

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

Conclusions

• the tests with sparse random networks of Erdos-Renyi typeshow that the new algorithms extend the range of sparsenetworks for which we can determine the Pathfindernetwork in reasonable time to at least n = 50000.

• it seems that on real-life networks (green marks) thealgorithm works much faster than on random networkswith the same average degree.

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

References I

A. V. Aho, J. E. Hopcroft, J. D. Ullman, The Design and Analysis ofComputer Algorithms. Addison-Wesley, Reading, Massachusetts (1976).

Batagelj, V.: Semirings for Social Networks Analysis. Journal ofMathematical Sociology, 19(1994)1, 53-68.

R.E. Burkard, R.A. Cuninghame-Greene, U. Zimmermann, eds., Algebraicand Combinatorial Methods in Operations Research. Annals of DiscreteMathematics 19 (1984).

B. Carre, Graphs and Networks. Clarendon, Oxford (1979).

Chen, Chaomei: Generalised similarity analysis and Pathfinder networkscaling. Interacting with Computers, 10 (2): pp. 107-128.

J. G. Fletcher, “A more general algorithm for computing closed semiringcosts between vertices of a directed graph,” CACM (1980), pp. 350-351.

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

References II

Vicente P. Guerrero-Bote, Felipe Zapico-Alonso, Mara EugeniaEspinosa-Calvo, Roco Gomez Crisostomo, Flix de Moya-Anegon: BinaryPathfinder: An improvement to the Pathfinder algorithm. InformationProcessing and Management, Volume 42, Issue 6, December 2006, Pages1484-1490. http://linkinghub.elsevier.com/retrieve/pii/S0306457306000367

Joly S., Le Calve G. (1986) Etude des puissances d’une distance. Statistiqueet Analyse de Donnees, 11/3, 30-50.

A. Quirin, O. Cordon, J. Santamaria, B. Vargas-Quesada, F. Moya-Anegon:A new variantof the Pathfinder algorithm to generate large visual science maps in cubic time.http://www.scimago.es/benjamin/A new variant of the Pathfinder Algorithm.pdf

Arnaud Quirin, Oscar Cordon, Vicente P. Guerrero-Bote, BenjamnVargas-Quesada and Felix Moya-Anegon: A Quick MST-Based Algorithm toObtain Pathfinder Networks (∞, n− 1). Journal of the American Society forInformation Science and Technology, Volume 59, Issue 12 (p 1912-1924).http://www3.interscience.wiley.com/cgi-bin/fulltext/120736756/PDFSTART

V. Batagelj, A. Vavpetic Sparse Pathfinder

SparsePathfinder

V. Batagelj,A. Vavpetic

Pathfinder

Semirings

Spanishalgorithms

SparsePathfinder

Tests

References

References III

F. S. Roberts, Discrete Mathematical Models. Prentice-Hall, EnglewoodCliffs, New Jersey (1976). Amazon.

Schvaneveldt, R. W., Dearholt, D. W., Durso, F. T. (1988) Graph theoreticfoundations of Pathfinder networks. Comput. Math. Applic. 15 (4), 337-345.

Schvaneveldt, R. W. (Ed.) (1990) Pathfinder Associative Networks: Studiesin Knowledge Organization. Norwood, NJ: Ablex.http://interlinkinc.net/PFBook.zip

Schvaneveldt, R. W., Durso, F. T., Dearholt, D. W. (1989) Networkstructures in proximity data. In G. Bower (Ed.), The psychology of learningand motivation: Advances in research and theory, Vol. 24 (pp. 249-284).New York: Academic Press.http://www.interlinkinc.net/Roger/Papers/Schvaneveldt Durso Dearholt 1989.pdf

Interlink – Tools for Pathfinder Network Analysis.http://www.interlinkinc.net/

U. Zimmermann, Linear and Combinatorial Optimization in OrderedAlgebraic Structures. Annals of Discrete Mathematics 10 (1981).

V. Batagelj, A. Vavpetic Sparse Pathfinder