Theoretical analysis of local search strategies to ...jer/papers/ijicc.pdf · Theoretical analysis...

Theoretical analysis of localsearch strategies to optimize

network communication subjectto preserving the total number

of linksBoris Mitavskiy

A-Star Bioinformatics Institute, Singapore

Jonathan RoweSchool of Computer Science, University of Birmingham, Birmingham, UK, and

Chris CanningsSchool of Mathematics and Statistics, University of Sheffield, Sheffield, UK

AbstractPurpose – A variety of phenomena such as world wide web, social or business networks,interactions are modelled by various kinds of networks (such as the scale free or preferentialattachment networks). However, due to the model-specific requirements one may want to rewire thenetwork to optimize the communication among the various nodes while not overloading the number ofchannels (i.e. preserving the number of edges). The purpose of this paper is to present a formalframework for this problem and to examine a family of local search strategies to cope with it.

Design/methodology/approach – This is mostly theoretical work. The authors use rigorousmathematical framework to set-up the model and then we prove some interesting theorems about itwhich pertain to various local search algorithms that work by rerouting the network.

Findings – This paper proves that in cases when every pair of nodes is sampled with non-zeroprobability then the algorithm is ergodic in the sense that it samples every possible network on thespecified set of nodes and having a specified number of edges with nonzero probability. Incidentally,the ergodicity result led to the construction of a class of algorithms for sampling graphs with aspecified number of edges over a specified set of nodes uniformly at random and opened some otherchallenging and important questions for future considerations.

Originality/value – The measure-theoretic framework presented in the current paper is original andrather general. It allows one to obtain new points of view on the problem.

Keywords Communication, Optimization

Paper type Research paper

1. IntroductionNetworks, i.e. graphs, serve as a natural mathematical model in a variety of situationsranging from the world wide web, where the nodes are computers and the edges are the

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/1756-378X.htm

This work was sponsored by the EU Project FP6 “Digital Business Ecosystems” and EPSRCEP/D003/05/1 “Amorphous Computing” Grants during the first author’s postdoctoralemployments in the University of Birmingham and in the University of Sheffield.

Localsearch

strategies

243

Received 24 September 2008Revised 27 December 2008

Accepted 31 December 2008

International Journal of IntelligentComputing and Cybernetics

Vol. 2 No. 2, 2009pp. 243-284

q Emerald Group Publishing Limited1756-378X

DOI 10.1108/17563780910959893

links between them to social and biological networks where the nodes could beindividuals and the links indicate if the two individuals communicate with one another(see, for instance, Watts, 2004 for a detailed discussion and numerous examples). Oftenin social networks, internet, or in the digital ecosystems networks individual nodes donot join the network in places most convenient for their needs. For instance, largenetworks of autonomous agents model a variety of natural phenomena (such as socialnetworks) or human-built applications (such as the internet). Depending on a particularsituation or application, some agents forming the nodes of such networks may desire tobe in close contact with other agents (for instance, if they want to share the informationwith another set of agents) and yet only limited connection resources are available. Tobe more specific, in mechanism design (Nisan, 1999) one’s goal is to design protocolsthat maximize overall profit despite the agent’s selfishness. When dealing withnetwork analysis one is interested in finding out if a specific setting that involves costsfor connections as well as individual cost functions for each agent allows for stablesolutions characterized as Nash equilibria. Depending on the circumstances, suchquestions may be difficult to answer (Anshelevich et al., 2003; Bala and Goyal, 2000;Fabrikant et al., 2003; Chu, 2008, 2009).

A rather different point of view results inmodeling such a network as an evolutionaryprocess governed by the random application of some local mutation operators where theagents involved control the outcome via a selection mechanism based on their ownpreferences. The primary advantage of such a model is that the agents rely only on thelocal information and the mutations alter the network in a rather gradual manner. Suchan approach is taken by Lehmann and Kaufmann (2005), where they present a rathergeneral model for the self-evolution of a network and analyze its theoretical properties.Among other things, Lehmann and Kaufmann (2005) establish some expected runtimebounds for their strategy to reach a stable topology in particular cases. There upper aswell as lower bounds have been later improved by Jansen and Theile (2007). It may beworth mentioning straight away that the evolutionary strategies studied in LehmannandKaufmann (2005) and Jansen andTheile (2007) are very similar to the ones presentedin the current work. In this paper, we establish polynomial time upper bounds on theexpected runtime of our strategies for the case when the algorithm of Lehmann andKaufmann (2005) yields an exponential time lower bound (Jansen and Theile, 2007) toreach the “star tree.” While the bounds in the current work are established only for thecase of trees, we hope they can be extended to families of networks following power lawdegree distributions. At the very least, we have reasonable heuristic expectation thatsimilar results hold in such cases as discussed in more detail in Section 11.

In a digital business ecosystem network, a node represents a client and a user at thesame time. It requests certain goods and services from its neighboring nodes andpasses its services to other nodes along with possible other useful information. Whenjoining the net-work, an individual node is unlikely to possess the completeinformation which nodes are most profitable for it to be in contact with. (This may bedue simply to the sheer size of such networks. They typically have thousands and evenmillions of nodes!) As time passes, the newly joined nodes may want to rewirethemselves to become closer to other nodes which they are more likely to be in contactwith. Owing to the limited resources (such as computer power or individual’s memory),the network cannot form a complete graph. In the current work, we present andanalyze algorithms where the nodes rewire themselves locally within the network with

IJICC2,2

244

the aim of gradually optimizing the expected communication time (number of hops)with respect to a certain measure of which nodes are likely to be in contact. We thenanalyze various preliminary theoretical aspects of this algorithm. Our algorithms are“hybridizations” of the strategies presented and analyzed in Lehmann and Kaufmann(2005) and Jansen and Theile (2007) in the sense that the mutations are carried out onlywhen the agents gather some partial information helping them to decide whichmutations are potentially more beneficial. It should also be noted that the optimalcommunication arrangement problem has been extensively studied in the literatureparticularly for the case of trees (Gomory and Hu, 1969; Hu, 1974).

One of the very important aspects of the algorithms presented in this work weestablish is their ergodicity. We show that the type of mutation operations introducedand studied in Lehmann and Kaufmann (2005) and Jansen and Theile (2007) as well asin this paper can access any connected graph with a specified set of nodes and numberof edges from any other connected graph over the same set of nodes and having thesame number of edges. An equivalent formulation of this result is that the Markovchain modeling any evolutionary strategy where every mutation (every possible localrerouting of the network) is selected with positive probability is irreducible.Irreducibility of such Markov chains must not be taken for granted (i.e. it must, indeed,be rigorously proved) as the following example from the past research in populationgenetics demonstrates. One situation in which one wishes to move around a state spacein a similar manner to that considered here occurs in Monte-Carlo Markov chainprocedures. A particular application is in calculation of probabilities on complexgenealogies using the Gibb’s sampler (Geman and Geman, 1984) in which individual’sgenotypes are updated singly. It was demonstrated by Sheehan and Thomas (1993)that the Markov chain when there were more than two alleles was not necessarilyirreducible. Without that irreducibility the calculations of any probabilities orlikelihoods would potentially be erroneous (Cannings and Sheehan, 2002).

Incidentally, ergodicity has another potential application to sampling connectedgraphs over a specified set of nodes having a specified number of edges uniformly atrandom which will be presented in some detail in Subsection 10.2 of Section 10.Sampling connected graphs over a specified set of nodes and having a specifiednumber of edges uniformly at random is not a trivial question and at the same time notmany algorithms are known to solve this problem. One approach has been developedin Rodionov and Choo (2003) although no proof that their algorithm does indeedsample such graphs uniformly at random has been presented. On the other hand, thequestion of sampling from various collections of connected graphs with specifiedproperties uniformly at random is important for a variety of reasons as mentioned inRodionov and Choo (2003). For instance, random graphs are widely used for testingvarious algorithms on networks (Waxman, 1993; Doar, 1996; Toh, 1996). In fact, asRodionov and Choo (2003) pointed out, generating the graphs uniformly at random isthe only reasonable model for the task since the real network structures for algorithmtesting are usually unavailable. In Subsection 10.2 of Section 10, we present a class ofalgorithms based on Markov chains that allow one to sample connected graphs over aspecified set of nodes having a specified number of edges nearly uniformly at random.The approach is based on the ergodicity properties of the mutation (local rerouting)transformations jointly with the generalized Geiringer theorem of Mitavskiy and Rowe(2006a, b). This also raises a very interesting and important question of studying

Localsearch

strategies

245

convergence rates of the Markov chains modeling these algorithms with the aim ofdeciding which algorithms are more efficient, thereby opening a fascinating and arather challenging question for future work.

In summary, the current work modifies the existing evolutionary strategies tooptimize communication in networks subject to preserving the total number of linksand establishes mathematically rigorous bounds on the runtime complexity of thesealgorithms in special cases. Moreover, it is shown that the algorithms introduced in thecurrent paper are superior to these strategies considered earlier in Lehmann andKaufmann (2005), for these cases, in the sense that we provide quadratic runtime upperbounds for our algorithms as opposed to the exponential lower time bounds for verysimilar algorithms of Lehmann and Kaufmann (2005) established in Jansen and Theile(2007). Our work also rigorously establishes the ergodicity of most evolutionarystrategies used to optimize network communication subject to preserving the totalnumber of edges in a rather general setting. As pointed out before, this is a ratherimportant question related to the irreducibility of the corresponding Markov chains.Finally, the ergodicity came in handy to develop a technique for sampling connectednetworks with a specified number of links uniformly at random which is also ratherimportant as pointed out in the previous paragraph.

In the next section, we introduce the problem and the algorithm to cope with it indetail. In the subsequent section, we present some basic mathematical properties of thealgorithm. Further outline will be provided in Section 3.

2. Formal description of the problem and the algorithm2.1 The statement of the problemFormally, then the problem is as follows. We are given a graph G with nodes V andedges E. From time to time, pairs of nodes communicate with each other. We assumethat there is some (unknown) probability distribution m over the set of pairs of nodesdescribing this communication. We follow some standard algorithm (e.g. Dijkstra’salgorithm) to establish a path between the two nodes, and record the path length(number of hops required). We would like to minimize the average path length:

E!D" #a;b[V

Xm!a; b"d!a; b"

where d(a, b) is the number of hops from a to b. We seek to do this by incrementallychanging the network topology, replacing an existing edge with a new edge. Theproblem is to find an appropriate replacement strategy.

2.2 The description of the algorithmThe algorithm (a rather similar version of which is also introduced in Jansen andTheile (2007) for the case of trees) works as follows:

. Select a, b , # V according to m.

. Find a shortest path g from a to b.

. For each internal node v of the path, create a shortcut with probability p.

. Go to (1).

IJICC2,2

246

To create a shortcut, consider the section of the path surrounding the chosen node v.That is, there are nodes u and w so that (u, v, w) is in the path g. We add a new edge(u, w) to the graph, and delete either edge (u, v) or (v, w). The deleted edge is chosenrandomly.

The idea of this algorithm is that the shortcut will reduce the path-length by one. Ofcourse, it may create problems for other paths, but the hope is that since it will beapplied to the most frequently chosen paths, the overall effect will be beneficial.

It should be noted that the problem, as stated, has, as a special case, the minimumcommunication cost spanning tree problem, which is already known to be NP-complete(Garey and Johnson, 1979). It is not feasible, then, to propose an efficient algorithmwhich will solve the problem exactly. The best we can hope for is that our heuristicalgorithm will generate solutions of acceptable quality. Having said that, it is knownthat in the case where:

. all pairs of nodes communicate with equal frequency; and

. the number of allowed edges is one less than the number of nodes.

That there is a polynomial-time algorithm, due to Gomory and Hu (1969) and Hu (1974).However, in our situation, we do not have advance knowledge of the frequency withwhich pairs of nodes will communicate and it is very unlikely that this will be equal forall pairs of nodes, so their algorithm is not directly applicable. On the other hand, we dohave the advantage of having potentially more edges in the network than the minimalset allowed in the Gomory-Hu algorithm.

3. OverviewIn what follows, we present our preliminary theoretical analysis of our networkre-routing algorithm. It is rather technical, so we first present a high-level summary.

First we have some mathematical observations in which we define the object ofinterest, namely the expected shortest distance in the network. That is the averagenumber of hops that messages will make during transmissions between nodes in thenetwork. We prove a simple result that relates this quantity to the frequency withwhich links (edges) in the network occur in transmissions. If an edge occurs with highfrequency, then it can be considered to be rather important. The connection we prove isthat the expected shortest distance in the network is equal to the sum of the edgefrequencies.

Next, we look at the local effects of a single move of our algorithm to try to establishconditions under which the replacement of one edge by another (a shortcut) will resultin an improvement. Clearly, the shortcut will improve matters between the two nodesthat have just communicated. The problem is the edge that is being replaced mightitself be important. Remembering that the importance of an edge is related to itsfrequency, we derive conditions on edge frequencies which ensure that the overallaverage shortest length is improved by the move. The condition we establish suggestsa simulating-annealing like modification that we introduce in Section 7. Moreover, inthe subsequent sections we analyze the expected runtime complexity of our simulatedannealing like strategy on a simple problem. Exponential lower bounds on theexpected runtime have been previously obtained by Jansen and Theile (2007) for a verysimilar algorithm applied to the same problem. Our algorithm’s performance is of order

Localsearch

strategies

247

O(n 2) where n is the number of edges (or nodes: this is irrelevant from the asymptoticpoint of view) in a tree.

Next, we turn our attention to the question of what happens if any pair of nodes maywant to communicate with some positive probability. In Section 10, we prove that insuch a case the algorithm has a nonzero chance of encountering absolutely anyconnected network topology with a specified number of edges. Technically, this meansthe algorithm is ergodic. In particular, the algorithm does not get stuck in a localoptima. In the worst case, of course, one may end up waiting a long time to reach onenetwork topology starting with another.

4. Basic mathematical observationsSuppose we are given a network (an undirected graph) G # (V, E ) and a probabilitydistribution m on V 2. For a given pair of nodes x and y denote by d(x, y) the distancebetween x and y (i.e. the number of edges in the shortest path joining x and y). We areinterested in restructuring the network G so as to preserve the number of edges (not tomake it bigger), but, at the same time, to minimize the expected shortest distance withrespect to the distribution m, namely E!D" #Px;y[Vm!x; y"d!x; y". We make thefollowing general observation first.

Proposition 4.1. Denote by G a chosen collection of shortest paths: one for everypair of nodes. Then, E!D" #Pe[EPG!e" where PG(e) is the probability that the edge ehas been encountered in a shortest path from G joining a pair selected with respect tothe distribution m.

Proof. Consider the characteristic function X : E £ V 2 ! {0; 1} defined as:

X !e; x; y" #1 if e is an edge in the shortest path from G joining x and y

0 otherwise:

(

First notice that for every x, y [ V, we can write d!x; y" #Pe[EX !e; x; y". We thenobtain:

E!D" #x;y[V

Xm!x; y"d!x; y" #

x;y[V

Xm!x; y"

e[E

XX !e; x; y" #

e[E

X

x;y[V

Xm!x; y"X !e; x; y":

Notice that for a fixed edge e [ E, PG!e" #P

x;y[Vm!x; y"X !e; x; y" is just theprobability that the edge e has been traversed by a path from G joining some pair ofnodes (x, y) which is chosen randomly with respect to m. The desired conclusion thatE!D" #Pe[EPG!e" now follows. A

Notice that the choice of the collection of the shortest paths G is not unique ingeneral. As an immediate consequence of Proposition 4.1 we deduce.

Corollary 1. Given a graph G # (V, E ) and a probability distribution m over V 2, letG denote any collection of the shortest paths between the nodes of G containing aunique path for every pair of nodes of G. Then it follows that

Pe[EPG!e" is

independent of the choice of G.Similar results can be established by expressing the path length counting the nodes

rather than the edges. This is analogous to the above, if we use the characteristicfunction YG : V 3 ! {0; 1} defined as:

IJICC2,2

248

Y!v; x; y" #1 if v is a node on the path from G between x and y

0 otherwise

(

and then notice that:

d!x; y" #v[V

XY!v; x; y"

0

@

1

A2 1

so that we obtain:

E!D" #x;y[V

Xm!x; y"d!x; y" #

x;y[V

Xm!x; y"

v[V

XY!v; x; y"

0

@

1

A2 1

0

@

1

A

#x;y[V

Xm!x; y"

v[V

XY!v; x; y"2

x;y[V

Xm!x; y" #

v[V

X

x;y[V

Xm!x; y"Y!v; x; y"

0

@

1

A2 1

Again, for every fixed node v [ V, we notice thatP

x;y[Vm!x; y"Y!v; x; y" is theprobability PG(v) that the node v has been encountered in the unique path from Gjoining the pair of nodes (x, y) randomly chosen with respect to the distribution m. Thisproduces a result analogous to Proposition 4.1.

Proposition 4.2. Denote by G a chosen collection of shortest paths: one for everypair of nodes. Then:

E!D" #v[V

XPG!v"

0

@

1

A2 1

where PG(v) is the probability that the node v has been encountered in a shortest pathfrom G joining a pair selected with respect to the distribution m.

5. Local algorithm analysisWe consider a simple local replacement algorithm whose step is to remove an edge e 2

from the network and to insert another edge e $ into the network. We shall nowexpress the “improvement” or “worsening” that the algorithm creates after a singletime step. We continue with the notation of the previous section: G # (V, E) denotes ournetwork and G0 # !V ; !E < {e$}"2 {e2}" denotes the modified network. Ourimmediate goal is to express the difference between the expected average distances Dand D0 for the networks G and G0, respectively. Recall that G and G0 denote thecollections of shortest paths between the pairs of nodes of G and G0, respectively,containing exactly one path for every pair of nodes (x, y). Once the set of shortest pathsG has been chosen consider the subset C!e2; e$" # G consisting of all the paths in Gwhich remain the shortest in G0 (i.e. these which do not pass through e 2 and alsoremain the shortest regardless of removing e 2 and adding e $ ). Notice that G0 can bechosen so that C!e2; e$" # G0 (since these paths remain the shortest in G0) and for the

Localsearch

strategies

249

rest of the pairs, we choose the new shortest path in G0. Denote this new set of shortestpaths by N 0!e2; e$" and also let N !e2; e$" # G2 C!e2; e$". Notice also that:

PG!e" # PC!e2;e$"!e" $ PN !e2;e$"!e"where PC!e2;e$"!e" denotes the probability that the edge e occurs in the shortest pathfrom Gwhen sampling with respect to m and PN!e2;e$"!e" denotes the probability that itoccurs in the path from N !e2; e$". Likewise:

PG0 !e" # PC!e2;e$"!e" $ PN 0!e2;e$"!e"where PN 0!e2;e$"!e" denotes the probability that the edge e occurs in the path fromN !e2; e$". Combining these observations with Proposition 4.1 gives us:

E!D"2 E!D0" #e[E

XPG!e"2

e[E 0

XPG0!e" #

e[E2e2

XPG!e"2

e[E2e2

XPG0 !e" $ PG!e2"2 PG0!e$"

#e[E2e2

X!PC!e2;e$"!e" $ PN !e2;e$"!e""2

e[E2e2

X!PC!e2;e$"!e" $ PN 0!e2;e$"!e""

$ PG!e2"2 PG0 !e$"

#e[E2e2

XPN!e2;e$"!e"2

e[E2e2

XPN 0!e2;e$"!e" $ PG!e2"2 PG0!e$":

We summarize this in the following.Proposition 5.1. Let G # (V, E) denote a network and suppose we remove the edge

e 2 from E and add a new edge e $ to E. This gives us a new networkG0 # !V ; !E < {e$}"2 {e2}". Let G denote any particular choice of the shortest pathsin the network G. Also, let G0 denote a choice of shortest paths having the property thatC!e2; e$" # G> G0 (asmentioned before such a choice does exist) whereC!e2; e$" # Gconsisting of all the paths in G which remain the shortest in G0 (i.e. these which do notpass through e 2 and also remain the shortest regardless of removing e 2 andadding e $ ). Let N !e2; e$" # G2 C!e2; e$" and N 0!e2; e$" # G0 2 C!e2; e$".We then have:

E!D"2 E!D0" #e[E2e2

XPN!e2;e$"!e"2

e[E2e2

XPN 0!e2;e$"!e" $ PG!e2"2 PG0 !e$":

Proposition 5.1 tells us, in particular, that the “add/remove an edge” type of analgorithm leads to an improvement if and only if:

e[E2e2

XPN!e2;e$"!e"2

e[E2e2

XPN 0!e2;e$"!e" $ PG!e2"2 PG0!e$" . 0:

In fact, this quantity measures a single step improvement.Next, we shall apply the above observations to deduce some basic theoretical

properties of the algorithm described in Section 2.2. Recall how the algorithm works:we fix a small positive number p ! 1, at every stage we sample a pair (x, y) of nodesrandomly with respect to the probability distribution m. Next we find a shortest path

IJICC2,2

250

joining x and y in the network G. We traverse the path starting with the node x andwhenever, we encounter a consecutive pair of edges (u, v) and (v, w) along the path, withprobability pwe replace one of the edges (either (u, v) or (v, w)) with the edge (u, w). Thedecision which one of the edges is discarded is made with probability 1/2. Now supposeWLOG that e2 # !u; v" and e$ # !u;w" (recall that e 2 denotes an edge that has justbeen removed while e $ denotes the one that has been added). Once we select thecollection of the shortest paths G, notice that for every path g [ N !e2; e$" joining apair of nodes, say x and y, the path g0 in the new network obtained upon removing theedge e 2 and adding the edge e $ , constructed by replacing the edge e 2 with theconsecutive pair of edges {v, w} and e $ (or e $ and {w, v} depending on the order) islonger by exactly one edge. Thus, even if g0 is a shortest path between x and y in thenew network, the shortest distance between x and y has been increased by exactly oneedge. In this case, we let the new set of shortest paths G0 contain the path g0. Likewise,the paths which involve both edges, e 2 and {v, w} are shortened by exactly one edgein the network G0. Again, we let G0 contain these paths as well (these where the edgese 2 and {v, w} are replaced by the single edge e $ ). The figures show all the possiblesituations: Figure 1 shows the shortest paths between the corresponding pairs ofpurple nodes. The case when the path remains unchanged is shown on the Figure 1(a),the path which shall be shortened is displayed on the Figure 1(b) and the path whichshall become longer appears on the Figure 1(c). The shortest paths in the modifiedgraph appear on Figure 2 for all the corresponding cases.

To shorten the notation, let e 0 # {v, w}. Since every edge e [ E 2 {e2; e$; e 0} iseither an edge of g [ G or an edge of the corresponding g0 [ G0 where0: G! G0 is abijection defined as:

g0 #

g if g does not pass through the edge e2

g2 {e2}< {e$; e 0} if g passes through e2 and does not pass through e 0

g2 {e2; e 0}< {e$} if both e2 and e 0 are the edges of g

8>><

>>:

which occurs with the same probability as g with respect to the choice G0[1], it followsthat for every subset K # G and for every edge e[ E2 {e2; e$; e 0} we have PK!e" #PK 0!e" (where K0 is the image of K under the map 0). According to Proposition 5.1, wecan now write:

Figure 1.Shortest path fragments

before the iteration

w w

v

u u u

e+

(a) (b) (c)

e+ e+e– e– e–v

w

v

Localsearch

strategies

251

E!D"2E!D0"$e[E2e2

XPN !e2;e$"!e"2

e[E2e2

XPN 0!e2;e$"!e" $PG!e2"2PG0 !e$"

#e[E2{e2;e 0}

XPN !e2;e$"!e"2

e[E2{e2;e 0}

XPN 0!e2;e$"!e" $PN!e2;e$"!e 0"2PN 0!e2;e$"!e 0"

$PG!e2"2PG0!e$"

# PN!e2;e$"!e 0"2PN 0!e2;e$"!e 0" $PG!e2"2PG0 !e$":

In summary, we have:

E!D"2E!D0"$ PN!e2;e$"!e 0"2PN 0!e2;e$"!e 0" $PG!e2"2PG0!e$":

We now proceed to analyze the differences PN !e2;e$"!e 0"2PN 0!e2;e$"!e 0" and PG!e2"2PG0 !e$" more explicitly. First note that we can write N !e2; e$" # L!e2; e$"<S!e2; e$" where L!e2; e$" is the set of these paths which have been made longer, andS!e2; e$" is the set of these paths which have been shortened upon removal of e 2 andinsertion of e $ , respectively. Notice that the edge e 0 does not occur in a path which hasbeen lengthened by removal of e 2 (according to the previous discussion, all such pathsmust pass through e 2 and not through e 0) so that PL!e2;e$"!e 0" # 0. Likewise,PS!e2;e$"!e 0" is the probability that e 0 is the edge of a path that has been shortened, i.e.a path which goes through both, e 2 and e 0. This is simply the probability that e 2 ande 0 occur jointly. We shall denote this probability by P!e2 ^ e 0". We now deduce that:

PN!e2;e$"!e 0" # PL!e2;e$"!e 0" $PS!e2;e$"!e 0" # 0$P!e2 ^ e 0" # P!e2 ^ e 0":

In general, we shall adopt the following notation: P!e1 ^ e2" is the probability that theedges e1 and e2 are encountered jointly, while P!e1 ^ e2" shall denote the probabilitythat e1 occurs and e2 does not. Similarly, we deduce that:

PN 0!e2;e$"!e 0" # P!e 0 ^ e$" # P!e2 ^ e 0":

(Here the reader would do well to refer back to the picture above). Finally, observe thatwhenever a path in g [ G involves e 2 , the corresponding path g0 [ G0 involves e $

(regardless of whether it is shortened or lengthened). It follows now that PG!e2"2PG0 !e$" so that PG!e2"2PG0 !e$" # 0 and we finally conclude that:

Figure 2.Shortest path fragmentsafter the iteration

w w

v

u u u

e+

(a) (b) (c)

e+ e+e– e– e–v

w

v

IJICC2,2

252

E!D"2E!D0"$ PN !e2;e$"!e 0"2PN 0!e2;e$"!e 0" $PG!e2"2PG0!e$"

# P!e2 ^ e 0"2P!e2 ^ e 0":We summarize this final observation below.

Theorem 5.2. Let G # (V, E) denote a network. Let u, v, w [ V and let e 2 # {u,v}, e 0 # {v, w}. Suppose also that e $ # {u, w} ! E. Assume we remove the edge e 2

from E and add a new edge e $ to E. This gives us a new networkG0 # !V ; !E < {e$}"2 {e2}". Then, we have:

E!D"2 E!D0" $ P!e2 ^ e 0"2 P!e2 ^ e 0"where P!e1 ^ e2" is the probability that the edges e1 and e2 are encountered jointly,while P!e1 ^ e2" denotes the probability that e1 occurs and e2 does not.

6. Empirical results with scale free networksPreferential attachment networks, a particular type of scale-free networks, introducedin Watts (2004) and Barabasi (2001) model a variety of phenomena ranging fromWWW to social interaction networks. The diameter of such networks growslogarithmically with the number of nodes. However, in some cases, we may beinterested in optimizing communication with respect to a particular probabilitydistributions on the pairs of nodes, and the network may be rewired to decrease theaverage communication time even further. We have therefore conducted experimentswith such networks with between 50 and 200 nodes. We seek to simulate the situationthat would occur when two nodes frequently share information, and yet are currentlyrelatively far apart in the network. We hope our adaptive algorithm will reconfigurethe network so as to bring them closer together. Consequently, for each network, weselected a subset of 10 percent of the nodes with probability inversely proportional tothe node’s degree (i.e. the nodes which have smaller degree are more likely to getselected). Such a subset is likely to be fairly widely spread throughout the network andwill tend not to include major hubs which are already well-connected.

The selected nodes were partitioned into two sets, A and B of equal sizes and theprobability distribution m was defined which samples every pair from the set A £ Bwith equal probability. Thus, nodes from group A always want to communicate withnodes from group B. The adaptive algorithm should move these two subsets closertogether in the network as it reconfigures the network topology.

The expected path length with respect to the probability distribution m describedabove has been estimated by performing 50 independent samplings of pairs of nodeswith respect to m. Afterwards, 300 independent iterations of the re-routing algorithmwere performed. A single iteration of the algorithm picks a pair of nodes !x; y" [ V 2 atrandom with respect to the distribution m (notice that our distribution m isconcentrated on the pairs in A £ B only so that with probability 1 we choose a pair inA £ B # V 2). Now we use Dijkstra’s algorithm to find the shortest path between x andy in the original network. Once a shortest path has been selected, we traverse the pathfrom one end to another, replacing a consecutive pair of edges by a single edge joiningnon-common nodes and deleting one of the intermediate edges (see the previous sectionfor more details on the analysis of this algorithm) with some probability p. Threeindependent experiments have been run with the values of p # 0.2, 0.5 and 0.8. Again,

Localsearch

strategies

253

upon the completion of 300 such iterations, we estimate the average path length byperforming 50 independent samplings of pairs in V 2 with respect to m. The plots of theaverage shortest path length vs the total number of nodes in the network before andafter the re-routing algorithm have been produced. In Figures 3 and 4, the label “LBI”shows the plot of the average path length before the 300 iterations of the reroutingalgorithm have been applied, while “LAI” stands for the plot after the iterations of thererouting algorithm. All of these plots appear below: from these plots, we can see thatthe average path length after the algorithm has been run is significantly reduced.Moreover, the reduction is stronger for larger values of p. A possible explanation forthis is that a single step of the algorithm is unlikely to case to much harm. If a stepleads to an increase in the mean path length, according to Theorem 5.2, the only way

Figure 3.Average shortest pathlength vs the number ofnodes before applying thealgorithm

5

4

3

2

1

01 14 27 40 53 66 79

Path

leng

th

92 105 118 131 144

L.B.I., p = 0.2, fr = 10%

Number of nodes + 50Note: Here L.B.I is an abbreviation for "Length Before the Iterations"

Figure 4.Average shortest pathlength vs the number ofnodes after applying thealgorithm with p # 0.2

3

2.5

2

1.5

1

0.5

01 15 29 43 57 71

Path

leng

th

85 99 113 127 141

L.A.I., p = 0.2, fr = 10%

Number of nodes + 50Note: Here L.A.I is an abbreviation for "Length After the Iterations"

IJICC2,2

254

this can happen is if the probability of joint occurrence of the consecutive edges whichhave been altered is smaller than the probability of their separate occurrence. Then,roughly speaking, the opposite will be true in the next step of the algorithm so that thealgorithm is likely to correct itself in the sequel step. At the same time, the algorithmmodifies the network more frequently after a fixed number of iterations when p islarger. However, a deeper theoretical analysis is necessary to understand whichparameters are more suitable for which networks.

As the number of nodes gets larger, the improvement reduces regardless of thevalue of p. This is quite easy to understand: for larger networks, the percentage of thenodes selected increases and the number of possible pairs sampled increasesquadratically. The number of iterations is usually insufficient to sample all the possiblepairs, and, in fact, samples only a very limited number of these pairs. Completelydifferent sets of pairs may be sampled during the mean path estimation after all themodifications are complete and before the iterations have been run (Figures 4-5).

It would seem from the empirical data that it is a good strategy to make the edgere-placement probability as high as possible. Taking this to the extreme, we could setp # 1.0 which effectively means that every time two nodes communicate, we place anedge directly between them and delete a random edge from the original path. However,this trend in the data are probably caused by the rather special probability distributionwhich we used to model communications, in which all communications were restrictedto being between two small subsets of the nodes. With a rather less strict set ofcommunications, it is likely that the extreme choice p # 1.0 will be more likely todisrupt potentially useful pathways in the network (Figure 6).

However, this does lead to the question of the best choice of shortcut to make.A more sophisticated strategy would be to replace our strategy of removing an edge atrandom when making a shortcut with a simulated annealing type of heuristic.A possible approach motivated by Theorem 5.2 will be discussed in Section 7.

It is worth emphasizing that, since the problem is an NP-complete one, we do notexpect to be able to find optimal network configurations efficiently. Our algorithm willonly approximate the optimum. It is important to investigate exactly how good suchapproximations are. In fact it is possible to construct artificial networks

Figure 5.Average shortest path

length vs the number ofnodes after applying thealgorithm with p # 0.5

2.5

2

1.5

1

0.5

01 11 21 31 41 51 61 71

Number of nodes + 50

Path

leng

th

81 10191 111 121 131 141

L.A.I., p = 0.5, fr = 10%

Localsearch

strategies

255

(e.g. star-shaped networks) in which it is very hard for our algorithm to find goodsolutions. In such cases, the algorithm performs rather badly.

7. Simulated annealing-like modification and expected runtime bounds fora special caseThe algorithm we considered so far cycles through the Stages (1)-(4) as described inSection 2. Step (3) consists of making a shortcut, i.e. choosing a consecutive pair ofedges (u, v) and (v, w), adding the edge (u, w) and removing one of the edges: either (u, v)or (v, w). In the examples considered so far, the edge to be removed has been chosenuniformly at random, i.e. the probability of removal of the edge (u, v) was the same asthat of the removal of the edge (v, w). On the other hand, Theorem 5.2 suggests that weshould remove the edge which is less likely to occur separately without the remainingedge along this path with the aim of maximizing the expected distance improvement.Certainly these likelihoods are unknown, but it does not require much memory to keeptrack of the joint occurrence of the edges. This might be a way to make an educatedguess which one of the edges (u, w) or (v, w) is more effective (and also potentially lessharmful) to remove. While we keep track of the joint pairwise occurrence of theconsecutive pairs of edges, as new edges are added and some old ones removed, weneed to set the values for the joint occurrences of the newly added edge with the otheredges as well as to modify the corresponding values for the edges neighboring to theremoved one. To develop reasonable heuristics for this we proceed as follows: supposeat every step of our algorithm a node v keeps track of the joint frequency of occurrenceof the edges (u, v) and (v, w) for every u, w [ V where V is our specified set of nodes.Suppose the edge e 2 # (v, w) has been removed, the edge e $ # (u, w) has been addedand the edge e 0 # (u, v) remains intact as pictured below: we need to set the jointfrequency values for the edge pairs of the form (x, u) with (u, w), (w, y) with (u, w) and(u, v) with (u, w), as well as to reset the existing values for the pairs of the form (x, u)with (u, v) and (z, v) with (u, v). Our strategy is based on the following observations. LetG denote a set of shortest paths in the original graph G (before the algorithm has beenapplied to it) containing one shortest path for every pair of nodes. Also let G0 be thecollection of paths in G0 (the graph obtained from G by removing the edge e 2 # (v, w)

Figure 6.Average shortest pathlength vs the number ofnodes after applying thealgorithm with p # 0.8

2.5

4

1.5

1

0.5

01 13 25 37 49 61 73

Number of nodes + 50

Path

leng

th

85 10997 121 133 145

L.B.I., p = 0.8, fr = 10%IJICC2,2

256

and adding the edge e $ # (u, w)) obtained from G as in Section 5: if g [ G does notpass through the edge e 2 # (v, w) then we let g [ G0. If, on the other hand, g [ G doespass through e 2 but does not pass through (u, v) then we let g0 [ G0 be the pathobtained from g by replacing the edge e 2 with the consecutive pair of edges (u, v) ande $ # (u, w). Finally, if g [ G does pass through e 2 and also passes through (u, v),then we let g0 [ G be the path obtained from g by replacing the consecutive pair ofedges e 2 and (u, v) with the single edge e $ . Notice that in general G0 may not consistof the shortest paths in G0, yet they will provide us with the worst case scenario that isusually not too far from the truth. As in Section 5, we denote by PG(e [ h) the jointfrequency of occurrence of edges e and h and by PG!e ^ !h" the frequency of occurrenceof e without h with respect to the collection G of paths. We now observe the following.

Proposition 7.1. Given G, G0, G and G0 as above, we have the following identities:

PG0!!v; z" ^ !u; v"" # PG!!v; z" ^ !u; v"" $ PG!!v; z" ^ e2"

PG!!x; u" ^ !u; v"" # PG0!!x; u" ^ !u; v"" $ PG0 !!x; u" ^ e$"

PG0!!w; y" ^ e$" # PG!!w; y" ^ e2"

PG0 !!u; v" ^ e$" # PG!e2"

PG0 !!x; u" ^ !u; v"" # PG!!x; u" ^ !u; v" ^ e2"

PG0 !!x; u" ^ e$" # PG!!x; u" ^ !u; v" ^ e2"where:

PG!e2" #h[E!G" is adjacent to e2

XPG!e2 ^ h"

is the frequency of occurrence of the edge e 2 and PG!!x; u" ^ !u; v" ^ e2" denotes thefrequency of the joint occurrence of the three consecutive edges (x, u), (u, v) and e 2 .

Proof. These identities follow directly from the definition of G0 in terms of G. Forinstance, to verify the first identity notice that a path in G0 which passes through theedges (z, v) and (v, u) might either be a valid path of G and there are PG!!v; z" ^ !u; v""such paths, or tmay come from a path of G passing through (z, v) and e 2 (since in thiscase the edge e 2 has been replaced by the consecutive pair (v, u) and e $ ) and there arePG!!v; z" ^ e2" such paths. Verification of the remaining identities is analogouslystraightforward, hence we leave this for the interested reader. A

Proposition 7.1 provides us with reasonable heuristics to update the values of theform PG0!!v; z" ^ !u; v"", PG0 !!w; y ^ e$" and PG0 !!u; v" ^ e$". On the other hand, it onlytells us the sum of the values PG0 !!x; u" ^ !u; v"" and PG0!!u; v" ^ e$". To set thesevalues, we need one more simple equation involving them as unknowns. Unfortunately,it seems impossible to derive any rigorous equation of the type, but we can still invent areasonable strategy based on the observation that PG0 !!x; u" ^ e$" # PG!!x; u" ^!u; v" ^ e2" (see the last equation of Proposition 7.1). In fact, the edges (x, u) and e $

occur jointly in G0 if and only if the edge (x, u) occurs jointly with the edges (u, v) and e 2

Localsearch

strategies

257

in G. Thus, if we have no information about the conditional distribution of the jointoccurrence of the edges (u, v) and e 2 in G, it is reasonable to assume that (u, v) and e 2

are as likely to occur jointly given the occurrence of (x, u) as they are to occur jointly withother edges, and hence the following heuristic update rule.

Update rule for joint frequencies. We update the values PG0!!x; u" ^ !u; v"" andPG0 !!x; u" ^ e$" based on the equation:

PG!!x; u" ^ !u; v"" # PG0!!x; u" ^ !u; v"" $ PG0 !!x; u" ^ e$"

and the assumption that:

PG0 !!x; u" ^ !u; v"" ·PG!!u; v" ^ e2" # PG0 !!x; u" ^ e$" ·PG!!u; v" ^ e2"

where:

PG!!u; v" ^ e2" #h[E!G" is adjacent to !u;v" and h–e2

XPG!!u; v" ^ h"

is the frequency of occurrence of (u, v) without e 2 .As a matter of fact, the frequency values set by the heuristic update rule for joint

frequencies resets the joint frequency values of the form PG0 !!x; u" ^ !u; v"" andPG0 !!x; u" ^ e$" according to the ratio of the total joint probabilities in the followingsense.

Proposition 7.2. Let ~PG0!!x; u" ^ !u; v"" and ~PG0 !!x; u" ^ e$" denote the jointprobability values updated according to the update rule for joint frequencies[2]. LetN !u" # {jj!j; u" [ E!G" and j – v} (Figure 7). Then:

~PG0 !!x; u" ^ !u; v""~PG0 !!x; u" ^ e$"

#P

j[N !u"PG0 !!j; u" ^ !u; v""P

j[N !u"PG0!!j; u" ^ e$"

where ~PG0 !!j; u" ^ !u; v"" and ~PG0!!u; v" ^ e2" denote the true values of the jointprobabilities with respect to the collection G0.Proof. By disjointness we can write:

PG!!u; v" ^ e2" #j[N !u"

XPG!!j; u" ^ !u; v" ^ e2" #

j[N !u"

XPG0!!j; u" ^ !u; v""

Figure 7.“Joint frequency”modification diagram

z

v

u w yx

IJICC2,2

258

where the last equality is one of the equations from Proposition 7.1. Likewise, we have:

PG!!u; v" ^ e2" #j[N !u"

XPG!!j; u" ^ !u; v" ^ e2" #

j[N !u"

XPG0 !!j; u" ^ e$":

The desired conclusion now follows by substituting the above equations into thesecond equation of the update rule for joint frequencies. The first equation of theupdate rule for joint frequencies holds for the true values by Proposition 7.1. A

Proposition 7.1 and the heuristic update rule above tell us how to keep track of thejoint frequency of occurrences of consecutive pairs of edges. Next, we need to invent astrategy for selecting a consecutive pair of edges along a chosen shortest path toperform a modification of the network on. Moreover, we also need to decide on howhigh is the probability of removal of the edge which is less likely to occur without itsadjacent edge in question depending on the joint frequency estimates we collected.

Depending on the measure m on V 2 (see Subsection 2.1 of Section 2 for the meaningof m), the network structure, on how long we run the algorithm for and on the specificpair of nodes being sampled at the present time, various pairs of edges along theshortest path found will be more or less effective to perform a modification on subjectto their joint frequencies of occurrence. We can heuristically judge the “efficiency”based on how often the edge to be replaced occurred jointly with the edge to remainpresent within the consecutive pair in question. For instance, suppose after t timesteps a pair of nodes x and y has been sampled and we found a shortest pathg # u1; u2; u3; . . . ; ul with u1 # x and ul # y between the pair. We may now perform astep of our algorithm on any one of the i 2 2 pairs of consecutive edges of the form{!ui; ui$1"; !ui$1; ui$2"} and our goal is to determine the “reliability measure” of such apair. Let ej # !uj; uj$1" and let Pt!ei; ei$1", Pt!ei; ei$1" and Pt!ei$1; ei" denote theestimated joint frequencies of occurrence of the respective consecutive pairs of edges.According to Theorem 5.2, if we were to perform a step of our algorithm on the pair!ei; ei$1", we should remove the edge eq that maximizes the lower bound on theimprovement of the overall expected distance Pt!ei; ei$1"2 Pt!eq; eq" where q # i ori $ 1 and:

q #i if q # i $ 1

i $ 1 if q # i

(

The “efficiency” of a consecutive pair of edges !ei; ei$1" can then be defined asmax{Pt!ei; ei$1"2 Pt!eq; eq"jq # i; i $ 1}. We may now decide among a number ofheuristic strategies for selecting a consecutive pair of edges along the path g to performthe modification on based on the efficiency measure introduced in the previoussentence:

. Greedy strategy. Select the pair of edges along g having the maximal efficiencyvalue. More precisely, if we write g # e1; e2; . . . ; el then we select the consecutivepair of edges !ei; ei$1" for rerouting which maximizes the efficiency measure, i.e.the function max{Pt!ei; ei$1"2 Pt!eq; eq"jq # i; i $ 1} over its domain{!ei; ei$1"}l21

i#1. In case this function has more than one maximum, we chooseone uniformly at random among the maximums (see how q and q are definedabove).

Localsearch

strategies

259

. Random strategy. Ignore the efficiency measure and choose the pair of edgesalong g uniformly at random. More precisely, if we write g # e1; e2; . . . ; el thenwe just select any consecutive pair !ei; ei$1" of edges uniformly at random fromthe set {!ei; ei$1"}l21

i#1 .. Mixed strategies. Generate a probability distribution on the collection

{!ei; ei$1"}l21i#1 of l 2 1 consecutive pairs of edges which favors the pairs

!ei; ei$1" with higher efficiency values max{Pt!ei; ei$1"2 Pt!eq; eq"jq # i; i $ 1}over these with lower ones and sample a pair to reroute with respect to thisdistribution.

It seems that the mixed strategies are the most reasonable ones to apply though it isnot trivial to decide on the appropriate probability distribution.

Once the pair of edges !ei; ei$1" has been chosen for rerouting, it remains to decideon how high is the probability of the removal of the edge eq depending on how high theefficiency is. Certainly, all the effectiveness of all these parameters depends greatly onthe probability measure m on V 2 for sampling the pairs of nodes as well as, possibly,on the initial network topology and the total number of edges in the network.Theoretical analysis of such questions, as we know, is highly nontrivial. Therefore, westart with a rather simple situation where our network is a tree T over the set of nodesV # {0; 1; . . . ; n} (i.e. a connected graph with jV j2 1 # n edges). The reason thatthis kind of example is theoretically tractable is largely due to the following facts.

Proposition 7.3. Suppose we are given a tree T over the set of nodes V as above.Then, the bound of Theorem 5.2 is an exact equation describing a single stepimprovement of the expected distance after removal of the edge e 2 and insertion of theedge e $ . Moreover, the collection of paths G0 as in Theorem 5.2 as well as inProposition 7.1 is the collection of the shortest paths in the tree T0 obtained from T afterremoval of the edge e2 and insertion of the edge e $ .

Proof. This observation follows immediately from the definition of a tree: everypair of nodes in a tree has a unique path between them. Then every path is the shortestone. The bound in Theorem 5.2 applies to the difference of the expected distance valueswith respect to G and G0 and may not be exact only when G0 is not the collection ofshortest paths which is not the case for trees. A

In Section 9, we will establish some rigorous bounds on the expected runtime of thegreedy strategy and some mixed strategies for the case when the initial graph is a treewith jV j2 1 edges on the set of nodes V # {0; 1; . . . ; n} with the probabilitydistribution m0 on V 2 defined as:

m0!0; i " # m0!i; 0" #1

2nwhile m0!i; j" # 0 if i – 0 and j – 0:

Our theoretical analysis is made possible largely due to the fact that when we assumethe measure of the type above and start with any tree T, the heuristic update rule forjoint frequencies introduced above does not falsify the information on the jointfrequencies we have gathered prior to applying a modification step of our algorithm.

Proposition 7.4. Given any tree T on the set of nodes V # {0; 1; . . . ; n}, let G(T)denote the set of all paths in T. Suppose we are given a probability measure m on V 2

satisfying m!x; y" . 0 ) x # 0 or y # 0. Then ; tree T on the set V of nodes and ;consecutive triplet of edges (x, u), (u, v) and (v, w) [ E(T) with

IJICC2,2

260

d!0; x" , d!0; u" , d!0; v", the equations in the update rule for joint frequencies holdtrue for the collection of paths:

G0 # G p!v;w"!u;v";!v;w"!!u;w"!T"

! "

where p!v;w"!u;v";!v;w"!!u;w"!T" is the tree obtained from T by removing the edge (v, w) and

inserting the edge (u, w) in the sense that if PG(h, g) represents the joint probabilities ofoccurrence of the edges h and g with respect to the measure m and the collection ofpaths G then the values PG0( f, o) set according to the update rule for joint frequencies asabove represent the joint probabilities of occurrence of the edges f and o. Moreover, incase we apply the transformation in place of p!v;w"

!u;v";!v;w"!!u;w"!T" so that:

G0 # G p!v;w"!u;v";!v;w"!!u;w"!T"

! "

then we have PG0 !! y;w" ^ !w; v"" # 0 while PG0!! y;w" ^ !u;w"" # PG!! y;w" ^ !u; v""and this is exactly what the update rule for joint frequencies tells us.

Proof. The first assertion follows immediately from Proposition 7.2 by observingthat each of the summands apart from possibly PG0!!j; u" ^ !u; v"" for j # x in thenumerator and PG0!!j; u" ^ e$" for j # x in the denominator vanish (this is because (x,u) is the only edge involved in any path from the root node passing through u since ourgraph is a tree) so that the ratio of the updated values is the same as that of the exactones.

The second assertion follows from the fact that in any tree rooted at 0 if d!0; x" #d!0; u" # d!0; v" then every path picked with nonzero probability (the only such pathsinitiate at the root node 0) which passes through the edge (v, j) for:

j [ N !v" # zj!v; z" [ E p!u;v"!u;v";!v;w"!!u;w"!T"

! "n o

must also pass through the edge:

e$ [ E p!u;v"!u;v";!v;w"!!u;w"!T"

! ":

When we apply Proposition 7.2 all of the summands in the numerator vanish. Inparticular, the joint probability PG0 !! y;w" ^ !w; v"" # 0. A

In the remaining two sections we will establish some runtime bounds for ouralgorithm to reach the optimizing topology when we start with the measure m0

described in the paragraph above the statement of Proposition 7.4. The optimizingtopology for this kind of measure is evidently the star centered at 0, i.e. a tree withedges of the form (0, i ) for 1 , i , n. In Section 9, we will establish upper bounds oforders n 2 and n ln n depending on the strategy we choose. Needless to say, thesebounds are much better than the exponential bounds on the expected run time obtainedfor a very closely related algorithm on the same problem in Jansen and Theile (2007).Although the particular measure we consider is not likely to arise in practice, it mayserve as a starting point for some more practical measures that are likely to occur whendealing with preferential attachment networks. Owing to the power law degreedistribution in a preferential attachment network, there is a rather small number of hubnodes that are likely to be chosen jointly with other nodes in the network and so the

Localsearch

strategies

261

measurem in such cases resembles to a large extent our measurem0 where a single node(rather than a small group of nodes) is selected jointly with other nodes in the network.Owing to the mathematical difficulty of questions related to estimating expectedruntime, we have to consider simplified cases first and attempt to extend the resultslater.

The primary mathematical tool we exploit in the current work is the so-called “driftanalysis” method. Drift analysis has been successfully applied in He and Yao (2004) tointroduce complexity classes for evolutionary algorithms based on the expectedruntime to reach a population containing the optimum solution. A more advanced andmuch more detailed exposition to the drift analysis techniques appears in Hajek (1982)and many other relevant facts in Syski (1992). In the next section, we introduce andextend the drift analysis tools presented without proofs in He and Yao (2004).

8. The drift analysis methodTo obtain the expected runtime bounds for the special case of rooted trees, we willexploit the following “drift analysis” lemma which is stated without a proof in He andYao (2004). As the proof is not particularly complicated, we present it in the currentpaper and we will also extend the lemma slightly to allow further improvements forour particular application. We now proceed to set the stage for the lemma.

Definition 2. Let !X ; {px!y}x;y[X " denote a Markov chain with finite state space Xand transition probabilities px!y for x and y [ X. Let A # X . A distance function D onX with respect to A is any function D : X ! %0;1" with the property that D(x) # 0 ifand only if x [ A. Let {Xt}

1t#0 denote the stochastic process associated with the

Markov chain X. We are interested in the following waiting time random variable:

T!xjX0 # j0" # min{tjXt!x" [ A}

under the assumption that X0(x) # j0 with probability 1 (i.e. the chain starts at aspecified j0 [ X).

We are now ready to establish the lemma from He and Yao (2004).Lemma 8.1. Suppose we are given a Markov chain !X ; {px!y}x;y[X " a subset

A # X and a distance function D : X ! %0;1" as described in Definition 2. Supposealso ’ a constant l [ (0, 1) such that ;x [ A c (here A c denotes the complement of Ain X) we have D!x"2Py[Xpx!yD! y" $ l. Then:

E!T!xjX0 # j0"" #D!j0"l

:

Likewise, if ’ a constant M [ (0, 1) such that ;x [ A c (here A c denotes thecomplement of A in X) we have D!x"2Py[Xpx!yD! y" # M . Then:

E!T!xjX0 # j0"" #D!j0"M

:

Proof. We prove only the first assertion where we assume that’ a constant l [ (0,1)such that ;x [ A c (here A c denotes the complement of A in X) we haveD!x"2Py[Xpx!yD! y" $ l. The proof of the second assertion is entirely analogous.First of all notice that without loss of generality we may assume that A is an absorbingset of states, i.e. if x [ A then ;y [ X we have px!y . 0 ) y [ A. (Indeed, if this isnot so, consider the new Markov chain with state space Y # X and:

IJICC2,2

262

p0y!z #

py!z if y ! A

0 if y [ A and y – z

1 if y [ A and y # z

8>><

>>:

Observe now that T!xjX0 # j0" # T!xjY 0 # j0" so that estimating T!xjY 0 # j0" isan equivalent problem, but the set of statesA is absorbing for the Markov chain Y). Forevery t [ N consider the random variable Dt!x" # D!xt" where x # {xt}

1t#0 and

x0 # j0. Now observe that:

E!D!xt"2 D!xt$1"" #x[X

XE!D!xt"2 D!xt$1"jXt # x" ·P!Xt # x"

$x!A


$x[A


#x!A

XE!D!xt"2 D!xt$1"jXt # x" ·P!Xt # x" $ 0

$x[Xmin {E!D!xt"2 D!xt$1"jXt # x and x ! A} ·P!Xt ! A"

#x[Xmin !D!x"2

y[X

Xpx!yD!y"

8<

:

9=

; ·P!Xt ! A"

$ l ·P!T!xjX0 # j0" . t":

To summarize, we have shown that E!D!xt"2 D!xt$1"" $ l ·P!T!xjX0 # j0" . t".Now observe that the event U # {x # {xt}

1t#0j’s with xs [ A} is a tail event and so

its probability is either 0 or 1 according to Kolmogorov’s zero-one law. It is easy to seefrom the condition D!x"2Py[Xpx!yD! y" $ l together with the finiteness of X thatsome power of the Markov transition matrix has positive transition probabilities fromany state towards A which shows that P(U) . 0 and hence must be 1. This means thatDs(x) # 0 for some s with probability 1 so that we can write:

D!j0" # E!D!j0"" # s!1limE!D!j0"2 Ds!x"" # s!1limXs

t#0

E!D!xt"2 !D!xt$1""

$X1

t#0

l ·P!T!xjX0 # j0" . t" # l ·X1

t#0

P!T!xjX0 # j0" . t"

# l ·E!T!x$ X0 # j0"":

Localsearch

strategies

263

where we used the fact that for a nonnegative random variable Z we have:

E!Z " #X1

t#0

P!Z . t":

Thereby, we have shown that D!j0" $ l ·E!T!xjX0 # j0"" and the desired conclusionnow follows at once. The proof of the second assertion can be repeated verbatimreplacing the $ signs with # , min with max and l with M. A

Corollary 3. Suppose we are given a Markov chain !X ; {px!y}x;y[X " a subsetA # X and a distance function D : X ! %0;1" as described in Definition 2. Supposealso ’ a constant K [ (0,1) such that ;x [ A c (here A c denotes the complement of Ain X), we have D!x"2Py[Xpx!yD! y" # K. Then:

E!T!xjX0 # j0"" #D!j0"K

:

It may happen in practice that the lower bound l changes over time. We now extendLemma 8.1 in a simple fashion to take this into account. This will allow us to improvean upper bound on the expected running time by a linear factor.

Lemma 8.2. Suppose we are given a Markov chain !X ; {px!y}x;y[X " a subsetA # X and a distance function D : X ! %0;1" as described in Definition 2. Supposealso that for every integer k [ N< {0} ’ a constant lk [ (0, 1) such that ;x [ A c

with dD!x"e $ k (here A c denotes the complement of A in X), we haveD!x"2Py[Xpx!yD! y" $ lk. Then:

E!T!xjX0 # j0"" #XdD!j0"e

k#1

1

lk:

Proof. We proceed by induction on dD!j0"e. When dD!j0"e # 1 we have D!j0" # 1 and;x [ A c we have dD!x"e $ 1 so that ;x [ A c we have D!x"2Py[Xpx!yD! y" $ lk.Lemma 8.1 applies now telling us that:

E!T!xjX0 # j0"" #D!j0"l1

#1

l1

and establishes the base case. Now suppose the statement is true for dD!j0"e # m forsome m . 1. Let dD!j0"e # m$ 1. Let B # {xjx [ X ; dD!x"e # m}. Define a newdistance function V !x" # max{0;D!x"2m}. Clearly V is a distance function withrespect to B in accordance with Definition 2. Notice also thatdV !j0"e # dD!j0"e2m # 1. Let TB!xjX0 # j0" # min{tjXt!x" [ B}. Notice that;x [ X we have:

V !x"2y[X

Xpx!yV ! y" # D!x"2m2

y[X

Xpx!ymax{D! y"2m; 0}

$ D!x"2m2y[X

Xpx!y!D! y"2m"

# D!x"2m2y[X

Xpx!yD! y" $m # D!x"2

y[X

Xpx!yD! y":

IJICC2,2

264

In particular, since dV !x"e # q if and only if dV !x"e # q$m, we conclude that ;x [ B c

with dV !x"e # k we have V !x"2Py[Xpx!yV ! y" $ lk$m. By inductive hypothesis wenow deduce that:

E!TB!xjX0 # j0"" #1

lm$1:

Clearly:

T!xjX0 # j0" # TB!xjX0 # j0" $ !T!xjX0 # j0"2 TB!xjX0 # j0"":Evidently, A # B so that by Markov property:

E!T!xjX0 # j0"2 TB!xjX0 # j0"" #z[BmaxE!T!xjX0 # z"" #

z[Bmax

XdD!z"e

k#1

1

lk

#Xmaxz[B dD!z"e

k#1

1

lk#Xm

k#1

1

lk

by inductive hypothesis. Thus, it follows that:

E!T!xjX0 # j0"" # E!TB!xjX0 # j0"" $ E!T!xjX0 # j0"2 TB!xjX0 # j0""

#1

lm$1$Xm

k#1

1

lk#Xm$1

k#1

1

lk

so that the bound is valid for dD!j0"e # m$ 1. The desired conclusion now follows bythe principle of induction. A

In more rare applications, one of which will be presented in the next section, we mayobtain a lower bound on the expected waiting by exploiting the following fact.

Corollary 4. Suppose we are given a Markov chain !X ; {px!y}x;y[X " a subsetA # X and an integer valued distance function D : X !N < {0} as described inDefinition 2. Suppose further that ;x [ X we have px!y – 0 ) D! y" # D!x" orD! y" # D!x"2 1. Suppose further that ;n [ N ’ a constant Kn such that ;x [ Xwith D(x) # n we have D!x"2Py[Xpx!yD! y" # Kn. Then, ;x [ X we have:

E!T!xjX0 # j0"" $XD!j0"

i#1

1

Ki:

Proof. Let B!i " # {xjx [ X and D!x" # i}. Given any j0 [ X we have j0 [B!D!j0"" and TB!D!j0"21"!xjX0 # j0" # TB!D!j0"22"!xjX0 # j0" # · · · # T!xjX0 # j0"almost surely (due to the assumption that the distance can be decreased onlystep-by-step) where TU !xjX0 # j0" # min{tjxt [ U} is the waiting time to enter thesubset U [ X for the first time starting at the state j0. We can then write:

T!xjX0 # j0" # !T!xjX0 # j0"2 T1!xjX0 # j0"" $ !T1!xjX0 # j0"2 T2!xjX0

# j0"" $ · · ·$ !TB!D!j0"22"!xjX0 # j0"2 TB!D!j0"21"!xjX0

# j0"" $ TB!D!j0"21"!xjX0 # j0":

Localsearch

strategies

265

By linearity of the expectation it then suffices to estimate the expectation of eachsummand separately. A straightforward shift in the distance function, completelyanalogous to that in the proof of Lemma 8.2 followed by application of Lemma 8.1 andthe Markov property show that E!TB!D!j0"21"!xjX0 # j0"" $ !1=KD!j0"" while:

E!TB!D!j0"2i "!xjX0 # j0"2 TB!D!j0"2i$1"!xjX0 # j0"" $1

KD!j0"2i$1

and the desired conclusion now follows. AWhen applying the above tools in practice we have to cope with estimating the

“expected single step distance improvement” of our Markov chain. This distanceimprovement is of the form Estep!D!x"" # D!x"2Py[Xpx!yD! y". As it was pointedout in He and Yao (2004)[3], for practical purposes it is often convenient to decomposethe above expression into “positive” and “negative” parts as follows:

Estep!D!x"" # D!x"2y[X

Xpx!yD! y"

# D!x"2D! y"#D!x"

Xpx!yD! y"2

D! y",D!x"

Xpx!yD! y"2

D! y".D!x"

Xpx!yD! y"

#D! y",D!x"

Xpx!y!D!x"2 D! y""2

D! y".D!x"

Xpx!y!D! y"2 D!x"":

To summarize, we can write:

Estep!D!x"" # E $!D!x""2 E2!D!x"" !1"

where:

E $!D!x"" #D! y",D!x"

Xpx!y!D!x"2 D! y""

and:

E2!D!x"" #D! y".D!x"

Xpx!y!D! y"2 D!x"": !2"

Apart from the drift analysis tools presented so far we will make use of the followingwell-known inequality called the “Chernoff bound” (see, for instance, Section 4.1 ofMotwani and Raghavan, 1995).

Theorem 8.3 (Multiplicative Chernoff bound). Let X1;X2; . . . ;Xk be independentBernoulli trials with Pr!Xi # 1" # pi and Pr!Xi # 0" # 12 pi . Then if X #Pk

i#1Xi

and if m is E(X), for any d [ (0, 1] we have:

Pr!X , !12 d"m" , e2md 2

2 :

By letting pi # p ;i in the statement of Theorem 8.3 and observing that the sum ofindependent and identically distributed Bernoulli trials with probability p each is

IJICC2,2

266

distributed binomially (with probability of success p, probability of failure q and meankp), we immediately deduce the following.

Let X denote a binomially distributed random variable with probability of success pand probability of failure q # 1 2 p. Then:

Pr!X , !12 d"pk" , e2kpd2

2 :

9. Theoretical runtime bounds for a special caseWe now return to the setting of Section 7 where we introduced severalsimulated-annealing like modifications of our algorithm. Our goal in the currentsection is to establish upper bounds on the expected run time when we start with theset of nodes V # {0; 1; 2; . . . ; n}, the measure m0 on V 2 defined in Section 7 as:

m0!0; i " # m0!i; 0" #1

2nwhile m0!i; j" # 0 if i – 0 and j – 0

and any connected graph having n edges on V (i.e. a tree on V) to reach the optimumnetwork topology that is evidently a “star” tree centered at 0 (i.e. the tree with edges ofthe form (0, i ) for the nonzero nodes i [ V). We will assume throughout that ouralgorithm makes only informative changes in the following sense: suppose a pair ofnodes (0, y) [ V 2 is sampled with respect to m0 (only such pairs are sampled withnontrivial probability according to our assumption). Upon updating the jointfrequencies of the edges at the nodes involved in the path, we need to select a pair ofedges to modify. Once we select a consecutive pair of edges (u, v) and (v, w) along thepath, there may be a tie, i.e. it may happen that the joint frequencies of occurrence of theedges:

P!!u; v"; !v;w"" # P!!v;w"; !u; v"":If such a thing happens, it is not possible to decide which edge removal will bedetrimental. This implies, particularly in our case when we sample pairs of nodes withrespect to the measure m, that we have not sampled enough (otherwise, by Propositions7.3 and 7.4, the edge further from the root never occurs without the edge closer to theroot in a path joining the root with any ancestor of the furthest node involved in thepair of edges). This motivates the following definition.

Definition 6. We say that a consecutive pair of edges ((u, v), (v, w)) is informative if:

P!!u; v"; !v;w"" – P!!v;w"; !u; v"":

For simplicity of theoretical analysis we will also assume that unless there is aninformative pair of consecutive edges (in the sense of Definition 6) within the pathjoining the sampled pair of nodes (0, y) [ V 2, we perform no change at all. So far, wewould not assume anything about which pair of consecutive edges we select to performrerouting as long as the pair is informative. The next lemma tells us that we do notneed to wait for too long to obtain informative pairs with high probability.

Lemma 9.1. Suppose we have already done sampling 2n $ i times with respect tothe measure m0 (i.e. we implemented 2n $ i steps of our algorithm already). Let (0, y) beany other pair sampled at the 2n $ ith step of the algorithm for i . 1. Then the

Localsearch

strategies

267

probability that the consecutive pair of edges (0, u) and (u, v) along the path joining 0and y is informative is bounded below by 12 e2!1=4".

Proof. The total number of times that the node u has been sampled during the first2n steps of the algorithm is distributed binomially with success probability 1/n. Themean of this binomial distribution is then 2n(1/n) # 2. According to Corollary 5, theprobability that the node u has been sampled jointly with the node 0 with respect to m0

before 2n times steps fewer than 1 # !12 !1=2"" · 2 time steps is:

e22 · 1n ·!1=2"2

2 # e214:

On the other hand, the event that u has been sampled at least once is contained in theevent that the pair of edges (0, u) and (u, v) is informative in the sense of Definition 6.Indeed, if u has been sampled then the frequency of occurrence of the edge (0, u)separately from the edge (u, v) is at least 1 while the edge (u, v) could never haveoccurred without the edge (0, u) in the process of sampling due to Propositions 7.3 and7.4. The desired lower bound now follows at once by estimating the probability of thecomplement of the event that u has not been sampled. A

Next we aim to apply the drift analysis method to estimate the time it takes for ouralgorithm to reach the star-tree topology. We assume the worst case scenario thatduring the first 2n steps no favorable alterations have been made. The state space X ofthe Markov chain under consideration consists of all possible trees having n edges onthe set V of nodes. The desirable set A of states consists of only a single absorbingstate, namely the star-tree centered at 0. The transition probability px!y for trees x andy is the probability that the tree y is obtained from the tree x upon completion of a singlestep of our algorithm. We introduce the following distance function D on the set of alltrees with n nodes on V (the state space of our Markov chain) that satisfies Definition 2:given a tree t [ X, let:

D!t" #Pn

i#1d!0; i "n

2 1

where d denotes the number of edges in the path joining 0 and i in t[4].We assume that the algorithm already ran for 2n time steps and estimate the

maximum time we have to wait starting with some tree obtained upon completion of 2ntimes steps. To apply the drift analysis technique with the aim of obtaining upperbounds on the expected waiting time to reach the star-tree, we need to estimate:

Estep!D!x"" # E $!D!x""2 E2!D!x""

from below (see equations (1) and (2)). Assuming that the only change taking place is afavorable one, i.e. that we remove the edge which is less likely to occur without theother one within the pair (this will always be the edge further away from the root 0 incase of a tree) simplifies our analysis in the sense that it makes E2!D!x"" # 0 and whatis left is to estimate E $!D!x"" #PD! y",D!x"px!y!D!x"2 D! y"" (equation (2)) frombelow[5]. The following lemma will help us establishing lower bounds for E $!D!x"".

Lemma 9.2. Given a tree x on the set V # {0; 1; 2; . . . ; n} of nodes, denote byS!x" # {ijd!0; i " . 1} the set of all nodes distance 1 away from the root node 0. Then:

IJICC2,2

268

E $!D!x"" $ !12 e2!1=4"" jS!x"jn 2

Proof. Given a tree x let Impr!x" # {yjD! y" , D!x"}. First observe that ;y [Impr!x" we have D! y" # D!x"2 !1=n" so that:

D!x"2 D! y" $ 1

n: !3"

Indeed,whenever an alteration has been performed, i.e. a pair of edges (u, v) and (v,w) hasbeen replaced by the pair (u, v) and (u, w) with u being closer to the root than w, thedistance of w as well as that of every ancestor of w (i.e. every node i such that the pathjoiningw and i does not pass through the root 0) has been shortened by 1. It follows thenthat the average distance

Pni#1d!0; i "=n has been shortened by at least !1$ jans!w"j" $

!1=n" where ans(w) denotes the set of ancestors of w. We then have:

E $!D!x"" #D! y",D!x"

Xpx!y!D!x"2 D! y"" $ px!Impr!x" ·

1

n

where px!Impr!x" #P

y[Impr!x"px!y is the probability that the distance has beenimproved and we have used equation (3) to estimate the difference D!x"2 D! y". It nowonly remains to bound px!Impr!x" from below and the desired conclusion follows byobserving that performing an alteration (always a favorable one) is equivalent tosampling a pair of nodes of the form !0; i " with i [ S!x"with respect to themeasurem0

and also possessing an informative pair of edges along the path joining 0 and i. Samplinga desirable pair (0, i ) with i [ S!x" happens with probability jS!x"j=n and theconditional probability of possessing an informative pair of edges along the path joining0 and i given that (0, i ) has been sampled is bounded below by 12 e21=4 according toLemma 9.1. The desired conclusion now follows at once. A

A rather cheap estimate on the expected waiting time can be obtained directly fromLemma 9.2 by observing that unless x is a star tree to begin with (i.e. D(x) # 0) we musthave S(x) # B so that jS(x)j $ 1. Thus, we have:

;x [ X E $!D!x"" $ 12 e214

! " jS!x"jn 2

$ 12 e214

! " 1

n 2

Applying Lemma 8.1 directly leads to the following.Corollary 7. Suppose our algorithm performs only favorable changes and only for

the informative pairs of consecutive edges, then the expected running time to reach thestar-tree topology starting with a connected tree x on the set V # {0; 1; 2; . . . ; n} ofnodes:

E!T!xjX0 # x" # 2n$4 ##

ep

4##e

p2 1

·D!x" · n 2:

In particular, we have E!T!xjX0 # x" # O!D!x" · n 2" where D(x) is considered as afunction of n as well as x. In the worst case, when x is a tree rooted at 0 isomorphic tothe tree with the set of edges of the form E!x" # {!i; i $ 1"; !i $ 1; i "j0 # i , n},D!x" # !n=2"2 1 and we obtain a worst case scenario bound of order O(n 3). On the

Localsearch

strategies

269

other hand, for most trees, we have D!x" # Q!ln!n"" so that the run time bound for amajority of trees is of order n 2ln!n".

The bound in Corollary 7 can be significantly improved by estimating the rate atwhich the lower bound on the distance improvement obtained in Lemma 9.2 decays asD(x) does and applying Lemma 8.2 in place of Lemma 7. Indeed, D!x" #Pn

i#1!d!0; i "=n"2 1 so that:

Xn

i#1

d!0; i " # n · !D!x" $ 1" !4"

On the other hand, from the way S(x) is defined (see Lemma 9.2), we see that:

Xn

i#1

d!0; i " # !n2 jS!x"j" $i[S!x"

Xd!0; i " # n2

i[S!x"

X!d!0; i "2 1":

Combining this with equation (4) and subtracting n from both sides, we obtain:

i[S!x"

X!d!0; i "2 1" # n ·D!x": !5"

Let m # max{d!0; i "ji [ S!x"}. If j is a node for which this maximum is achieved let0; j1; j2; . . . ; jm # j denote the nodes along the path joining 0 and j in the tree x. We thenhave d!0; jq" # q and, thereby, j2; j3; . . . ; jm [ S!x" so that:

jS!x"j $ m2 1 !6"and we can write S!x" # Pred! j"< Pred! j" where Pred! j" # {jij2 # i # m} whilePred( j) denotes the complement of Pred! j" in S(x). We can then write:

i[S!x"

X!d!0; i "2 1" #

i[Pred! j"

X!d!0; i "2 1" $

i[Pred! j"

X!d!0; i "2 1"

#Xm21

k#1

k$i[Pred! j"

X!d!0; i "2 1" # m 2 2m

2$i[Pred! j"

X!d!0; i "2 1": !7"

Since m maximizes d(0, i ) over i [ S!x", we also have:

i[Pred! j"

X!d!0; i "2 1" # !m2 2"jPred! j"j

so that:

i[S!x"

X!d!0; i "2 1" #

Xm21

k#1

k$i[Pred! j"

X!d!0; i "2 1" #

Xm21

k#1

k$ !m2 2"jPred! j"j

,X!m21"$jPred! j"j

k#1

k:

From the last inequality, we see that if we wish to minimize:

IJICC2,2

270

jS!x"j # jPred! j"j$ jPred! j"j # m2 1$ Pred! j"j

subject to keeping n ·D!x" #Pi[S!x"!d!0; i "2 1" (equation (5)) constant, we need tomakem as large as possible. Thereby, an upper bound onmwill give us a lower boundon jS(x)j via equation (6). Combining equation (4) with inequality equation (7) andmultiplying both sides by 2 gives us:

2n ·D!x" # m 2 2m

which, in turn, tells us an upper bound on m:

0 # m #1

2$

#########################1

4$ 2n ·D!x"

r!8"

finally leading to a lower bound:

jS!x"j $#########################1

4$ 2n ·D!x"

r2

1

2:

We summarize the derivations above in the following lemma.Lemma 9.3. Given any connected tree x on the set of nodes V # {0; 1; 2; . . . ; n},

we have:

jS!x"j $#########################1

4$ 2n ·D!x"

r2

1

2

where D!x" #Pni#1!d!0; i "=n"2 1 and S!x" # {ijd!0; i " . 1}.

Substituting the inequality of Lemma 9.3 into that of Lemma 9.2 readily tells us that:

E $!D!x"" $ 12 e214

! " 1

n 2·

#########################1

4$ 2n ·D!x"

r2

1

2

!

: !9"

We are now in a position to apply Lemma 8.2 for the case when our algorithm performsonly favorable changes and only for the informative pairs of consecutive edges todeduce that:

E!T!xjX0 # x"" # 2n$4 ##

ep

4##e

p2 1

XdD!x"e

k#1

n 2

#########################1

4$ 2n ·D!x"

r2

1

2

!21

: !10"

To investigate the asymptotic behavior of equation 10 as n ! 1, observe that forsufficiently large n:

#########################1

4$ 2n ·D!x"

r2

1

2$ 2

##############n ·D!x"

p# 2

###n

p #########D!x"

p

and substituting this bound into equation (10) entails:

Localsearch

strategies

271

E!T!xjX0 # x"" # O n###n

p XdD!x"e

k#1

1#########D!x"

p !

:

Integral test for series yields:

XdD!x"e

k#1

1#########D!x"

p # Q#########D!x"

p! "

and we finally obtain:

E!T!xjX0 # x"" # O n###n

p #########D!x"

p! ":

For most trees D!x" # Q!ln!n"", although in the worst case scenario it may happen thatD!x" # Q!n". We are now ready to summarize these results.

Theorem 9.4. Suppose our algorithm performs only favorable changes and onlyfor the informative pairs of consecutive edges. Then the expected running time to reachthe star-tree topology starting with a connected tree x on the set V # {0; 1; 2; . . . ; n} ofnodes:

E!T!xjX0 # x"" # 2n$4

##e

p4

##e

p2 1

XdD!x"e

k#1

n 2

#########################1

4$ 2n ·D!x"

r2

1

2

!21

:

Asymptotically:

E!T!xjX0 # x"" # O n###n

p #########D!x"

p! "

and, in particular, in the worst case scenario when D(x) # Q(n), we have:

E!T!xjX0 # x"" # O!n 2"while in the average case when D(x) # Q(ln(n)) our bound yields:

E!T!xjX0 # x"" # O n###n

p ###########ln!n"

p! ":

If one intends to make the algorithm ergodic in the sense of Section 10, theprobability of making a detrimental change must not be 0. It is easy to prescribe amethod for deciding on the probability of making unfavorable change so as topreserve the asymptotic bounds we obtained for the case when no such change isallowed. Indeed, suppose now we have selected an informative pair of consecutiveedges and wish to assign a nonzero probability to making a detrimental step, i.e. toremoving the edge that is closer to the root (as discussed before, for any informativepair of edges, the edge further from the root will never be sampled without the edgecloser to the root with respect to the measure m0). We denote the probability ofremoving the “wrong” edge pbad and the probability of removing the edge closer tothe root pgood. Let us further assume for simplicity (although this assumption can bemodified depending on one’s needs without introducing any new ideas of interest)that pgood and pbad do not depend on the specific choice of an informative pair of

IJICC2,2

272

edges. Given any tree x let Impr(x) denote the set of all trees obtained from the tree xby removing only the “good” edge from some informative pair while Worse(x) the setof all trees obtained from x by removing the “bad” edge from some informative pair.Clearly:

Impr!x" # {yjpx!y . 0 and D! y" , D!x"}

while:

Wo r se!x" $ {yjpx!y . 0 and D! y" . D!x"} !11"

and, consequently we have:

E $!D!x"" #D! y",D!x"

Xpx!yD! y" $

y[Impr!x"

Xpx!y

1

n

while:

E2!D!x"" #D! y".D!x"

Xpx!yD! y" #

y[Wor se!x"

Xpx!y !12"

where we used equation (11) together with the fact that a favorable distanceimprovement during a single time step can be no worse than 1/n (see the proof ofLemma 9.2) while the worst worsening of the distance cannot be as great as 1 (sinceat every step a single node can only worsen its distance by no more than 1/n andthere are fewer than n potential nodes like this).

Now, we construct the following bijection between sets:

f : Impr!x"!Worse!x" :

;t [ Impr!x" let (e 0, e 2 ) denote the pair of consecutive such that t is obtained from xby removing e 2 and inserting the “shortcut” edge in the usual manner. We then letf(t) be the tree obtained from x by removing the edge e 0 in place of e 2 . The mapf21 : Worse!x"! Impr!x" is defined entirely analogously so that f is, indeed, abijection. Let now pt denote the probability that the consecutive pair of edges (e 0, e 2 )has been chosen to perform modification. Then we have px!t # pt · pgood whilepx!f!t" # pt · pbad. Inequality equation (12) can now be rewritten using the bijection fto obtain:

Estep!D!x"" # E $!D!x""2 E 2!D!x"" $y[Impr!x"

Xpx!y

1

n2y[Worse!x"

Xpx!y #

y[Impr!x"

Xpx!y

1

n2y[Impr!x"

Xpx!y

#y[Impr!x"

Xpx!y

1

n2y[Wor se!x"

Xpx!f! y" #

y[Impr!x"

Xpt · pgood ·

1

n2 pbad

$ %:

Summarizing we obtained:

Localsearch

strategies

273

Estep!D!x" $y[Impr!x"

Xpt · pgood ·

1

n2 pbad

$ %: !13"

Recall from the proof of Lemma 9.2 that:

y[Impr!x"

Xpt $

4 ##e

p2 1

4##e

p ·jS!x"jn

so that if we assume that:

pbad #pgood 2 a

n

for some constant a . 0 independent of any random choices we made, we concludethat:

Estep!D!x"" $4 ##

ep

2 14

##e

p ·jS!x"jn

· a: !14"

Notice the similarity between equation (14) expressing the rate of distance decrease andthe one in Lemma 9.2. These differ only by a multiplicative factor of a. Thus, whenapplying Lemma 8.2, the rate appears in the denominator scaling everyone of thesummands by a multiplicative factor of 1/a yielding all the same bounds up to thefactor of 1/a. We restate the main result for the reader’s convenience.

Theorem 9.5. Suppose our algorithm performs changes only for the informativepairs of edges and that ’ a constant a . 0 such that:

pbad #pgood 2 a

n

where pgood is the probability of removing the edge that is estimated to occur less likelywithout the other edge within the pair while pbad is the probability of removing theedge that is estimated to occur more frequently without the other edge within the pair.Then the expected running time to reach the star-tree topology starting with aconnected tree x on the set V # {0; 1; 2; . . . ; n} of nodes:

E!T!xjX0 # x"" # 2n$4

##e

p

a!4 ##e

p2 1"

·XdD!x"e

k#1

n 2

#########################1

4$ 2n ·D!x"

r2

1

2

!21

:

Asymptotically:

E!T!xjX0 # x"" # O n###n

p #########D!x"

p! "

and, in particular, in the worst case scenario when D(x) # Q(n), we have:

E!T!xjX0 # x"" # O!n 2"while in the average case when D(x) # Q(ln(n)) our bound yields:

IJICC2,2

274

E!T!xjX0 # x"" # O n###n

p ###########ln!n"

p! ":

The results established so far in this section concern the upper bounds on the expectedruntime. We now establish a lower bound on the expected runtime. It is worth notingthat we do not assume any specific probability distribution along the path joining thechosen pair of nodes for the selection of a consecutive pair of edges. Thereby, ourresults apply even for the most greedy strategy imaginable where we deterministicallyperform an alteration at the consecutive pair of nodes closest to the root[6]. For thepurpose of obtaining a lower bound on the expected runtime we consider a newdistance function D on the set of trees over the nodes V # {0; 1; 2; . . . ; n} defined as~D!t" # n2 degt!0" where degt(0) denotes the degree of the root node 0 in the tree t.Clearly, ~D!t" # n2 degt!0" # 0 if and only if degt(0) # n if and only if t is a star treecentered at 0. Thus, ~D is, indeed, a distance function for the set A of trees consisting ofa singleton star tree centered at 0 in the sense of Definition 2. Moreover, notice that atevery step of our algorithm the degree of node 0 can be either increased by 1 or it mayremain unchanged (depending if the consecutive pair of edges at the root node has beenchosen for alteration and if the edge further from the root has been removed). Thus, wehave px!y . 0 ) ~D! y" # ~D!x" or ~D! y" # ~D!x"2 1 so that the distance function ~Dsatisfies the condition of Corollary 4. To apply Corollary 4, we now only need toprovide an upper bound on the expected distance improvement which is:

Estep! ~D!x"" # E $! ~D!x""2 E2! ~D!x"" # E $! ~D!x"" #~D! y", ~D!x"

Xpx!y!D!x"2 D! y""

#~D! y", ~D!x"

Xpx!y! ~D!x"2 ! ~D!x"2 1"" #

~D! y", ~D!x"

Xpx!y # px!Impr!x"

where:

Impr!x" # {yj ~D! y" # ~D!x"2 1 and px!y . 0}:

To estimate px!Impr!x" from above, observe that in order to obtain a tree y withdegy!0" # degx!0" $ 1, we need to sample any pair of the form (0, i ) with i not being aneighbor of 0 with respect to m and the probability of this event is:

n2 degx!0"n

#~D!x"n

so that we now have:

Estep! ~D!x"" # px!Impr!x" #~D!x"n

and Corollary 4 finally gives us:

Localsearch

strategies

275

E!T!xjX0 # x"" $X~D!x"

i#1

n

i# n

X~D!x"

i#1

1

i< n ln ~D!x": !15"

We now summarize our lower bound result.Theorem 9.6. Suppose we are given any algorithm which selects a consecutive

pair of edges removes an intermediate one and inserts a “shortcut” edge instead as wehave always considered up to now. Then, we always have:

E!T!xjX0 # x"" $X~D!x"

i#1

n

i# n

X~D!x"

i#1

1

i< n ln ~D!x":

The largest value D(x) can have is n 2 1 which then shows the runtime bound of order:

E!T!xjX0 # x"" # V!n ln n":

10. Ergodicity of the algorithmSo far, we analyzed the expected runtime bounds for the measure m which selects onlya central node in the network jointly with some other node with positive probability.We now turn our attention to the situation when every pair of nodes is likely to besampled with positive probability. In this case, we show that our algorithm is ergodicin the sense that there is a positive probability of obtaining any specified graph G0 fromanother graph G upon the application of finitely many steps of our algorithm. Thisbasically amounts to saying that given any two connected graphs G and G0 on n nodesand k edges, there is a way to obtain G0 from G by performing the steps of ouralgorithm.

10.1 Proving the ergodicityBefore stating the formal theorem we want to establish, it is convenient to introduce thefollowing group action on the set G(S, k) of connected graphs on the specified set S of nnodes and having a specified number k of edges.

Definition 8. For every triple of nodes {i, j, k} # S, we introduce the followingpermutations on the set G(S, k):

p!a;b"!a;b";!b;c"!!a;c"!G"#

G0#!S;E2{!a;b"< !a;c"}" if !a;b" and !b;c"[E and !a;c"!E

0 otherwise:

8><

>:

where {a, b, c} # {i, j, k} and G # (S, E) [ G(S, k).We shall also denote byA the subgroup of permutations on G(S, k) generated by the

set D of all possible permutations of the form described above.Applying a step of our algorithm to a graph G can then be described as selecting

a permutation from D < {I} with some probability (which does depend on G) andapplying it to G. Notice that such a probability distribution over D < {I} canalways be chosen so that every one of the elements in D < {I} is chosen withpositive probability as long as the distribution m on V 2 assigns a positive

IJICC2,2

276

probability to every pair of nodes. It follows that a graph G0 can be obtained from agiven graph G [ G(S, k) upon completion of finitely many steps of our algorithm ifand only if there exists an element a [ A (the group generated by D) such thata · G # G0 where the action z is the usual function evaluation (which is the actioninduced by the generators). In other words, G0 can be obtained from G uponcompletion of finitely many steps of our algorithm if and only if G0 and G are in thesame orbit under the action of the group A. In the language of group actions,saying that every G0 [ G(S, k) can be obtained from G upon completion of finitelymany steps of our algorithm amounts to saying that the action of A on G(S, k) istransitive (there is only one orbit under this action).

Definition 9. We shall write G , G0 is and only if G and G0 are in the same orbitunder the action of A.

It is well-known from group theory (and is very easy to show as well) that , is anequivalence relation. Our goal in the current section is to establish the following result.

Theorem 10.1. The group action of A on G(S, k) is transitive, or, equivalently, ;Gand G0 [ G(S, k) we have G , G0.

Theorem 10.1 is nontrivial and we shall break down the proof into several simplelemmas. The major steps in the proof is to show first that every graph in G(S, k) isequivalent under, to a generalized star centered at a specified node (the notion of thegeneralized star will be introduced below) and then to show that any two generalizedstars are equivalent. This will then imply Theorem 10.1 via the equivalence of , .

Definition 10. A generalized star is a graph G [ G(S, k) with the property that’o [ S such that ;v [ Swith v – o have {o, v} [ E(G). We will say that the node o isa center of the generalized star G[7].

We now carry out the first major step of the proof, i.e. we show that any graph inG(S, k) is equivalent under , to a generalized star with a specified center.

Lemma 10.2. Suppose we have given a graph G [ G(S, k) and a node o [ S. ThenG is equivalent to a generalized star with a center o (see Definition 10).

Proof. Let:

t!H " #u[S;u–o

XdH !o; u"

where dH !o; u" denotes the shortest distance between o and u in the graph H. Noticethat ;H [ G(S, k), we have t!H " $ jSj2 1 and t!H " # jSj2 1 , H is ageneralized star. (Indeed, since ;u [ S we have dH !o; u" . 1, it follows thatt!H " #Pu[S;u–odH !o; u" $ !jSj2 1" · 1 # jSj2 1. On the other hand, H is ageneralized star if and only if ;u [ S with u # o we have {o, u} [ E(H) if and onlyif ;u [ S with u # o we have dH !o; u" # 1 if and only if t!H " #P

u[S;u–odH !o; u" #P

u[S;u–o1 # jSj2 1.) Now, let m!G" # min{t!H "jH , G}.Clearly m!G" $ jSj2 1. We will argue that m!G" # jSj2 1 thereby implying thedesired conclusion via the least natural number principle (equivalent to induction).Indeed, suppose m!G" . jSj2 1. Consider the graph H were the minimum isachieved (i.e. t(H) # m(G) and H , G). Then ’u [ S with dH(o, u) . 1 (otherwiseP

u[S;u–odH !o; u" #P

u[S;u–o1 # jSj2 1 contrary to the assumption). Nowchoose any shortest path p # o; u1; u2; . . . ; ul22; ul21; ul ; ul # v of maximal lengthl # max{d!o; u"ju [ S; u – o}. Notice that {ul22; ul} ! E!H " (otherwise thepath p could not have been the shortest path joining o and v: the

Localsearch

strategies

277

path q # o; u1; u2; . . . ; ul22; ul21; ul # v would have been). Now we considerthe graph:

H 0 # p!ul21;ul "!ul22;ul21";!ul21;ul "!!ul22;ul "!H "

obtained from G by removing the edge {ul21; ul} and adding the edge {ul22; ul}.We now observe that the removed edge {ul21; ul} has not been present in anyshortest path in H connecting o with some node u [ S 2 {v}. (Indeed, if q is apath in H passing through and not terminating at the node ul, it must havelength bigger than l since p is the shortest path in H passing through ul. Thereby,q cannot be a shortest path since the longest shortest path in H is of length l byassumption). Thus, we conclude that ;u [ S 2 {v} we have dH 0 !o; u" # dH !o; u"(indeed, every shortest path connecting o and u in H remains a valid path in H0

since it does not contain the removed edge {ul21; ul} so that it is at least as longas the shortest path in H0). On the other hand dH 0 !o; v" # dH !o; v"2 1 since thenewly added edge {ul22; ul} shortens the path p by 1. But then we haveG , H0 (since G , H and H , H0) and also t!H 0" #Pu[S;u–odH 0 !o; u" #P

u[S;u–odH !o; u"2 1 # t!H "2 1 contrary to the assumption that t(H) # m(G).Thus, we must have m(G) # jSj 2 1 so that G is equivalent to a generalized starwith a center o. A

Owing to Lemma 10.2, all that remains to show to establish Theorem 10.1 isthat any two generalized stars having a common center (see Definition 10)are equivalent under the relation , of Definition 9. We accomplish this task in threesteps.

Lemma 10.3. Suppose we are a generalized star G [ G(S, k). Enumerate the nodesof G as 0,1, . . . jSj 2 1 with 0 denoting a star center. Suppose an edge {i, j} [ E(G) and{i, l} ! E(G) for i, j and l . 0. Then G is equivalent via , to the generalized starG0 [ G(S, k) determined by the set of edges E!G0" # E!G"< {{i; l}}2 {{i; j}}. Inother words, G is equivalent to the generalized star obtained from G by removing theedge {i, j} and inserting the edge {i, l}.

Proof. First, let G00 # p!i;0"!i;0";!0;l "!!i;l "!G" so that E!G00" # E!G"< {{i; l}}; {{i; 0}}

(notice that G00 may not be a generalized star). Now, let G0 # p!i;j"!i;j";! j;0"!!i;0"!G00" and

notice that:

E!G0" # E!G00"< {{0; i}}2 {{i; j}}

# E!G"< {{i; l}}2 {{i; 0}}"< {{0; i}}2 {{i; j}}

# E!G"< {{i; l}}2 {{i; j}}

which is what we were after. ALemma 10.4. Suppose we are a generalized star G [ G(S, k). Enumerate the nodes

of G as 0; 1; . . . ; jSj2 1 with 0 denoting a star center. Suppose an edge {i, j} [ E(G)and {q, l} ! E(G) for i, j, q and l . 0. Then G is equivalent via, to the generalized starG0 [ G!S; k" determined by the set of edges E!G0" # E!G"< {{q; l}}2 {{i; j}}. Inother words, G is equivalent to the generalized star obtained from G by removing theedge {i, j} and inserting the edge {i, l}.

IJICC2,2

278

Proof. If q # i the situation is exactly that of Lemma 10.3 and so we assume thatq # i. In such a case, either one of the following mutually exclusive situations canoccur: either {i; q} [ E!G" or {i; q} ! E!G":

. Case 1: {i; q} [ E!G". In this case, according to Lemma 10.3, G , G00 withE!G00" # E!G"< {{q; l}}2 {{i; q}} and, again due to Lemma 10.3, G00 , G0

with:

E!G0" # E!G00"< {{i; q}}2 {{i; j}}

# !E!G"< {{q; l}}2 {{i; q}}"< {{i; q}}2 {{i; j}}

# E!G"< {{q; l}}2 {{i; j}}:

. Case 2: {i; q} ! E!G". In this case, according to Lemma 10.3, G , G00 withE!G00" # E!G"< {{i; q}}2 {{i; j}} and, again due to Lemma 10.3, G00 , G0

with:

E!G0" # E!G00"< {{q; l}}2 {{i; q}}

# !E!G"< {{i; q}}2 {{i; j}}"< {{q; l}}2 {{i; q}}

# E!G"< {{q; l}}2 {{i; j}}:

Thus, in any of the possible cases we deduce that G , G0 with E!G0" #E!G"< {{q; l}}2 {{i; j}} so that the desired conclusion follows at once. A

Lemma 10.4 brings us very close to reaching our final goal. Indeed, let0; 1; . . . ; jSj2 1 enumerate the nodes of a generalized star graph G [ G(S, k) with 0denoting the star center. Notice that G is uniquely determined by the binary sequenceof length !!jSj2 1"!jSj2 2"=2" indexed by all pairs (i, j) satisfying 1 # i # j #jSj2 1 and having a 1 in position (i, j) if and only if {i, j} [ E(G). Such a sequencecontains exactly k 2 jSj $ 1, 1 s and the rest are zeros. Lemma 10.4 tells us thattransposing a one and a zero in this sequence results in an equivalent generalized star.It is well-known that every permutation is a product of transpositions and so wededuce.

Lemma 10.5. Any two generalized stars in G(S, k) having a common star center areequivalent via , .

In summary, Lemma 10.5 tells us that any two generalized stars with a commoncenter are equivalent. Now, given any two graphs G and G0 [ G(S, k), according toLemma 10.2, G , G1 and G0 , G2 with G1 and G2 being generalized stars having acommon center o, and, according to the previous sentence, G1 , G2 so that we finallyhave G , G1 , G2 , G0 so that Theorem 10.1 now follows.

10.2 Potential applications of ergodicity to sampling connected graphs uniformly atrandomIt was briefly discussed in the introduction that an important potential application ofthe mutation operators introduced in the current paper is to sampling connectedgraphs on the specified set of nodes and having a specified number of edges uniformlyat random. In fact, the local rerouting application of our algorithm (replacing one of the

Localsearch

strategies

279

edges within a consecutive pair of edges with a shortcut edge) can be viewed asperforming a mutation or a (unary recombination) step of an evolutionary algorithmhaving population size 1. Moreover, such a mutation step is invertible in the sense thatif we start with a graph G, perform mutation and obtain a graph G0 then we canperform another mutation to get the graph G back from G0 (this is the reason for thesymmetry property of the equivalence relation , introduced in Subsection 10.1: seeDefinitions 8 and 9). The generalized Geiringer theorem of Mitavskiy and Rowe (2006a,b, 2005) for some elegant applications to genetic programming) applies telling us that ifwe choose any probability on the collection of “mutation” transformations (i.e. on thecollection of permutations of the form described in Definition 8), then the stationarydistribution of the corresponding Markov chain is uniform on the collection of allconnected graphs with n nodes and k edges[8]. This means, in particular, that if westart with any initial graph on a specified set of n nodes having k edges and applyrandomly chosen mutations sufficiently many times then we are almost equally likelyto end up with any connected graph on the same set of nodes having k edges. Thisprovides an outline for the algorithm that allows sampling connected graphs over aspecified set of nodes and having a specified number of edges nearly uniformly atrandom. Of course, the complexity of the algorithm (the number of times one needs toapply the mutations to obtain a graph with a specified number of edges nearlyuniformly at random) depends on the rate of convergence of the corresponding Markovchain. Questions of this nature require some effort to tackle. A rather extended surveyof known techniques for estimating convergence rates of Markov chains presented bythe top experts in the field can be found in Aldous and Fill (2002). It should alsobe noted that the set of mutation transformations selected with nonzero probability canbe extended at will as long as the newly added transformations are bijective, whilepreserving the uniformity of the corresponding stationary distribution since theconclusion of the generalized Geiringer theorem of Mitavskiy and Rowe (2006a, b)remains the same. For example, one might select any composition of severaltransformations described in Definition 8 with positive probability. It seems that thiswill speed up the convergence rate. Notice that the probability distribution on the set ofthese transformations does not have to be uniform. Any distribution will do as long asall these transformations are selected with nonzero probability. Extending the familyof mutation transformations as well as selecting the probability distribution over thisextension in an intelligent manner is a very interesting and challenging question forfuture work.

11. ConclusionsIn the current work, we presented and analyzed a “local search” rerouting algorithm tooptimize communication in networks subject to preserving the connectivity and thetotal number of edges. Although some facts are known about similar algorithms for thecase of trees (Jansen and Theile, 2007; Lehmann and Kaufmann, 2005) not muchliterature is devoted to considering graphs with the total number of edges accedingn 2 1 where n is the number of nodes in the network. In this work, we have establisheda simple bound on the one-step improvement/worsening of the algorithm and also haveshown that the algorithm is ergodic, meaning that with positive probability it does notget stuck at a local optima. Finally, we have provided some preliminary empiricalresults for the case of scale-free networks.

IJICC2,2

280

Our theoretical findings regarding the expected average distance improvementsuggest a simulating-annealing like modification which we introduce and explore inSection 7. One of the central aspects of algorithm analysis is the runtime complexity.Establishing mathematically rigorous bounds on the expected runtime to reach adesired state is usually a very challenging question even for very simple algorithms onsimple problems. Jansen and Theile (2007) established an exponential lower bound forthe runtime of a very similar algorithm to the one described in the current paper for thecase when the network is a rooted tree and the only pairs of nodes ever sampled arethe root node jointly with any other node in the network. The probability of sampling isdistributed uniformly among the pairs sampled with nonzero probability. It is quiteclear that for this specific measure the optimizing topology is a “star tree,”, i.e. a treewhere each node is a neighbor of the root node. In the current paper we use the driftanalysis method in a similar manner as He and Yao (2004) did (see also Hajek, 1982;Syski, 1992, for more detailed theory) to establish a polynomial time upper bound oforder O(n 2) where n is the number of edges in the tree and appropriate simulatedannealing parameters have been chosen (see Theorems 9.4 and 9.5). Finally, Theorem9.6 shows that our upper bound is, at worst, not too far from the true asymptotics sincea lower bound for any class of algorithms we consider is of order V(n ln n).

Although the specific distribution that we establish the bounds for is a rather simpleone, and the theoretical assumptions are that we deal only with the trees, many socialbiological and computer networks, such as, for instance, the preferential attachmentones, have relatively few high degree (popular for receiving various requests, i.e. beingsampled jointly with another node) nodes and a large number low degree (unpopularfor being sampled jointly with a node of the same kind). We, therefore, hope that theexpected runtime analysis presented in the current paper provides the first stepstowards analyzing one of the “real life” scenarios.

Incidentally, the ergodicity of the local mutation (rerouting) transformationsestablished in Subsection 10.1 of Section 10 allows us to develop an alternative class ofalgorithms for sampling connected graphs over a specified set of nodes and having aspecified number of edges uniformly at random. Owing to the importance of this typeof questions (see discussion in the introduction) one such algorithm has been developedin Rodionov and Choo (2003). It is a rather interesting and challenging question tostudy and compare the efficiency of various algorithms offered in the current work totackle this problem based on the Markov chain convergence rate analysis methodsextensively surveyed and studied in Aldous and Fill (2002). We leave this subject forfuture investigation.

Notes

1. It should be noted that the choice G0 may not consist of the shortest paths, however,Proposition 4.1 applies to any choice of paths as long as the expected distance is measuredwith respect to that choice (the proof is exactly the same), and the conclusion of Proposition5.1 will turn into an inequality instead of an equation.

2. In the statement of the update rule for joint frequencies, we abuse the notation when wedenote the updated joint probability values as the true ones. This is just from the algorithmicpoint of view: when running the algorithm we do not know the true values and attempt toapproximate them.

3. Be aware of the typo in their paper regarding the following notions ofE $ (D(x)) andE 2 (D(x)).

Localsearch

strategies

281

4. It is straightforward to verify that D is, indeed, a distance function for the set A consistingonly of a single star-tree centered at 0.

5. The assumption of positive change only will be slightly relaxed later.

6. Of course, such a greedy strategy is unlikely to be implemented in practice simply becausewe do not necessarily know which nodes are the hub ones.

7. It is possible for a generalized star to have more than one center, but we will be interested inconsidering only a specified center.

8. We do need ergodicity to reach this conclusion.

References

Aldous, D. and Fill, J. (2002), Reversible Markov Chains and Random Walks on Graphs, availableat: www.stat.berkeley.edu/aldous/RWG/book.html

Anshelevich, E., Dasgupta, A., Tardos, E. and Wexler, T. (2003), “Near optimal network designwith selfish agents”, Proceedings of the 35th Annual ACM Symposium on Theory ofComputing (STOC ’03), San Diego, CA, USA, pp. 511-20.

Bala, V. and Goyal, S. (2000), “A noncooperative model of network formation”, Econometrica,Vol. 68, pp. 1181-230.

Barabasi, A.-L. (2001), “The physics of the web”, Physics World, Vol. 14 No. 33.

Cannings, C. and Sheehan, N. (2002), “On a misconception about irreducubility of the single-siteGibb’s sampler in a pedigree application”, Genetics, Vol. 162, pp. 993-6.

Chu, D. (2008), “The evolution of group-level pathogenic traits”, Journal of Theoretical Biology,Vol. 253 No. 2, pp. 355-62.

Chu, D. (2009), “Modes of evolution in a parasite-host interaction: disentangling factorsdetermining the evolution of regulated fimbriation in E. coli”, Biosystems, Vol. 95 No. 1,pp. 67-74.

Doar, M. (1996), “A better mode for generating test networks”, Proceedings of the GlobalTelecommunication Conference (GLOBECOM’96), London, UK, pp. 86-95.

Fabrikant, A., Luthra, A., Maneva, E. and Papadimitriou, C. (2003), “On a network creationgame”, Proceedings of the 22nd Annual Symposium on Principles of Distributed Computing(PODC ’03), Boston, MA, USA, pp. 347-51.

Garey, M. and Johnson, D. (1979), Computers and Intractability: A Guide to the Theory ofNP-completeness, W.H. Freeman and Company, New York, NY.

Geman, S. and Geman, D. (1984), “Stochastic relaxation, Gibbs distribution and Bayesianrestoration of images”, IEEE Transactions on Pattern Analysis and Machine Intelligence,Vol. 6, pp. 721-41.

Gomory, R. and Hu, T. (1969), “Multiterminal network flows”, SIAM Journal on AppliedMathematics, Vol. 9, pp. 551-70.

Hajek, B. (1982), “Hitting time and occupation time bounds implied by drift analysis withapplications”, Advances in Applied Probability, Vol. 14, pp. 502-25.

He, J. andYao, X. (2004), “A study of drift analysis for estimating computation time of evolutionaryalgorithms”, Natural Computing: An International Journal, Vol. 3 No. 1, pp. 21-35.

Hu, T. (1974), “Optimum communication spanning trees”, SIAM Journal on Computing, Vol. 3No. 3, pp. 188-95.

Jansen, T. and Theile, M. (2007), “Stability in the self-organized evolution of networks”,Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2007),ACM Press, New York, NY, pp. 931-8.

IJICC2,2

282

Lehmann, K. and Kaufmann, M. (2005), “Evolutionary algorithms for the self-organized evolutionof networks”, Proceedings of the Genetic and Evolutionary Computation Conference(GECCO 2005), Washington, DC, USA, pp. 563-70.

Mitavskiy, B. and Rowe, J. (2005), “A schema-based version of Geiringer’s theorem for nonlineargenetic programming with homologous crossover”, in Wright, A.H., Vose, M.D.,de Jong, K.A. and Schmitt, L.M. (Eds), Foundations of Genetic Algorithms 8 (FOGA 2008),Lecture Notes in Computer Science, Vol. 3469, Springer, Berlin, pp. 156-75.

Mitavskiy, B. and Rowe, J. (2006a), “An extension of Geiringer theorem for a wide class ofevolutionary search algorithms”, Evolutionary Computation, Vol. 14 No. 1, pp. 87-118.

Mitavskiy, B. and Rowe, J. (2006b), “Some results about the Markov chains associated to GPs andgeneral EAs”, Theoretical Computer Science, Vol. 361 No. 1, pp. 72-110.

Motwani, R. and Raghavan, P. (1995), Randomized Algorithms, Cambridge University Press,New York, NY.

Nisan, N. (1999), “Algorithms for selfish agents: mechanism design for distributed computation”,Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science(STACS’99), Vol. 1563, Springer, Berlin, pp. 1-17.

Rodionov, A. and Choo, H. (2003), “On generating random network structures: connectedgraphs”, Computational Science (ICCS 2003), pp. 1611-3349.

Sheehan, N. and Thomas, A. (1993), “On the irreducibility of a Markov chain defined on a spaceof genotypic configurations by a sampling scheme”, Biometrics, Vol. 49, pp. 163-75.

Syski, R. (1992), Passage Times for Markov Chains, IOS Press, Amsterdam.

Toh, C. (1996), “Performance evaluation of crossover switch discovery algorithms for wirelessATM LANs”, Proceedings of the IEEE Conference on Computer Communications(INFO-COM’96), March, pp. 1386-7.

Watts, D. (2004), Small Worlds: The dynamics of Networks between Order and Randomness,Princeton University Press, Princeton, NJ.

Waxman, B. (1993), “Routing of multipoint connections”, IEEE Journal on Selected Areas inCommunications, Vol. 9, pp. 1617-22.

About the authors

Boris Mitavskiy earned his PhD in Mathematics during 2004 in the University ofMichigan, USA. He is currently a Senior Postdoctoral Research Fellow in theA-Star Bioinformatics Institute in Singapore. His research interests are inapplications of category theory, Markov chains, random walks on groups, largedeviation inequalities, and other mathematical structures and concepts to thetheory of evolutionary computing, theory of small-world and scale-free networks,random graph theory, gene-regulatory networks and to other complex systems

dealing with artificial intelligence and genetics. Boris Mitavskiy is the corresponding author andcan be contacted at: [email protected]

Jonathan Rowe is a Reader in Natural Computation at the University ofBirmingham. He got his PhD in 1991 from the University of Exeter. His researchinterests include multi-agent systems, artificial life, various other complexadaptive systems and the theory of genetic and other evolutionary algorithms. Hehelped organizing a number of conferences and workshops in the field oftheoretical evolutionary computing. He is also an author of numerouspublications in the field.

Localsearch

strategies

283

Chris Cannings is a Professor of Mathematics in the Department of Probabilityand Statistics, School of Mathematics and Statistics, in the University ofSheffield. His research interests include deterministic and stochastic modeling inevolutionary biology, population, molecular and human genetics. Currentprojects lie within genetic epidemiology, evolutionary games, genomics andproteomics, and the theory of random graphs, combinatorics and stochasticprocesses. He has published over a hundred research papers. He is on the

editorial boards of the Journal of Applied Probability and Advances in Applied Probability, aMember of the Scientific Advisory Board of Myriad Genetics, an Expert for INSERM, a Memberof the EPRSC Peer Review Panel and of the MRC Panel of Experts.

To purchase reprints of this article please e-mail: [email protected] visit our web site for further details: www.emeraldinsight.com/reprints

IJICC2,2

284

Reproducedwith permission of the copyright owner. Further reproduction prohibitedwithout permission.

Theoretical analysis of local search strategies to ...jer/papers/ijicc.pdf · Theoretical analysis...

Documents

Transcript of Theoretical analysis of local search strategies to ...jer/papers/ijicc.pdf · Theoretical analysis...