Failure tolerant alternate-path distance-vector...

(

Failure tolerant alternate-path distance-vector routing

Jean-François Girard

School of Computer Science

McGill University, Montréal

August 1991

A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES AND RESEARCH

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE

© Jean-François Girard, 1991

(

Abstract

In the Alternate-path Distance-vector Routing approach presented by Tsuchiya [?J each node of a network keeps enough information to route messages after a failure.

This thesis presents an algorithm to update this information in reaction to link fail

ures. The proposed algorithm appears to perform its task without introducing a count

to infinity problem. This thesis also presents a way to reduce the size of one of the

most frequently sent messages. The size of this message can he reduced by upto 50 %

of its equivalent in the Tsuchiya approach. The algorithm was tested on a simulated

networks consisting of 98 nodes representative of the ARPANET (1983). The results

obtained show that the algorithm displays reasonahle performance in the context of

hop count metrics.

11

Résumé

Dans l'approche de routage utilisant des vecteurs de distance et des chemins al

ternatifs présentée par Tsuchiya [?], chaque noeud d'un réseau garde suffisamment

d'information pour acheminer les messages après la rupture d'un lien. Ce mémoire

présente un algorithme qui met à jour cette information quand il y a rupture de

lien. L'algorithme proposé ici semble accomplir sa tâche sans introdUIre le prob

lème du "count to infinity". Ce mémoire présente aussi une méthode pour réduire

la longueur du message le plus fréquemment employé par l'algorithme. La longueur

de ce mec;sage peut être réduite jusqu'à la moitié de la longueur du message équiv

alent dans l'approche de Tsuchiya. Cet algorithme a été testé par sImulation d'un

réseau représentant 98 noeuds importants du réseau ARPANET(1983) Les résultats

ainsi obtenus montrent que l'algorithme offre une performanœ raisonnable quand les

"hop-counts" sont utilisés comme mesure de la distance entre les noeuds.

1ll

( ..

Acknowledgements

1 would like to thank my tbesis supervisor, Professor Carl Tropper, for the knowledge

and experience he offered as 1 conducted my reasearC'h. He helped me to focus my

work, putting it in perspective and reorienting it when 1 was diverging.

1 would like to tbank the people who proofread my thesis at its various stages:

Andrea Erickson, Kristina Pitula and Louis Vroomen.

Lastly, 1 would like to thank God, for the possibility to write this thesis and for

his help during the time when it seemed that this thesis would never end.

lV

-

Contents

Abstract

Résumé

Acknowledgements

1 Introduction

1.1 ADR in the context of other routing schemes .

1.2 Evolution of distance-vector Routing

1.2.1 ARPANET

1.2.2 Split-horizon.

1.2.3 J affe and Moss

1.2.4 Hagouel .... 1.2.5 Garcia-Luna-Aceves

1.2.6 Hold-down. . .

2 General concepts

2.1 Network description.

2.1.1 First network

2.1.2 Second network

2.1.3 Third network .

..

2.2 Chronological table of events .

v

Il

... III

IV

1

1

3

3

.5

6

8

9

10

Il

13

13

17

17

19

3 Isolated link failures 22

3.1 Failure on down link 22

3.2 Failure on horizontallink . 24

3.3 Failure on up link . . . . . 26

3.3.1 Example of failure on one of many up links. 28

3.3.2 Example of failure on one of two up links . 29

3.3.3 Example of failure on single up link . . . . 31

4 Strategies 39

4 1 Fighting an illusion 39

4.2 Shorter je messages . . 40

4.3 Stopping messages .. 43

4.4 Guiding update messages . 45

4.5 Improvements due to the strategies 46

5 Performance 49

5.1 A single failure ........... 52

5.2 Mutual influence of destination 54

5.3 Multiple failures . 58

5.4 A comparison 59

6 Conclusion 65

A Message description 68

B Pseudo-code 71

C Buffer description 83

D Stor~ge space calculation 85

D.1 Garcia approach . 85

D.2 ADR approach 86

~I Bibliography 90

VI

.....

List of Tables

2.1 Messages stored in the first network after setup

2.2 Messages stored in the second network arter setup

2.3 Mes5ages stored in the third network after setup

2.4 Example of time table ........ .

3.1 Possible sequences of event at no de 13

3.2 failure link 2-5 .. . ..

4.1 Number of messages sent.

4.2 Total number of bytes sent in ail messages

4.3 Number of messages sent ..

D.1 Number of links at each no de

D.2 Messages stored in the network before failure .

D.3 Messages stored in the network after failure

vu

16

18

lU

20

35

37

46

48

48

85

87

88

(

List of Figures

1.1 Node 3 routing information 3

1.2 First example of count-to-infinity 5

1.3 Second example of count-to-infinity 6

1.4 Jaffe and Moss' approach . . 7

2.1 First basic network graph 13

2.2 Second basic network graph 17

2.3 Third ba.sic network graph 18

3.1 Failure on link 8 - 10 23

3.2 Failure Q - R .... 24 3.3 Failure D - E .... 25

3.4 Failure on link F--G 28 3.5 Failure on link 7 - 9 . 29

3.6 Fail ure on link Q - U 30 3.7 Fail ure on link 2 - 5 . 31 3.8 After failure on link 2 - 5 . 38

4.1 Example for juncture status oscillation 40 4.2 Short juncture message - first example 42 4.3 Short juncture message - second example . 43 4.4 Guidmg update message 45

4.fi A basic net wor k graph 48 5.1 98 major nodes of the ARPANET . 50

5.2 Messages sent in response to the ARPA-USC failure 53

( 5.3 Bandwith used to respond to the ARPA-USC failure . 54 ,

V1I1

5.4 Time required to respond to the ARPA-USC failure . . . . . . .

5.5 Time to react to the ARPA-USC failure vs number of destmat.ions

5.6 Number of messages sent in response to ARPA-USC failure

5.7 Number of messages sent in response to the number of failures

5.8 Time to respond to multiple failures. '. ..... . ....

lX

55

56 r:" .) 1

58

59

(

Chapter 1

Introd uction

In [Tsu87], Tsuchiya presents the Alternate-path Distance-vector Routing (ADR).

He does "not give an exact algorithm for the ADR"[Tsu87, page 24], but provides a

"rough descrIption of Its operatmg". His description sketches how the routing scherne

would react to a link metric change. This thesis clevelops an algori thm to react to

link failures and studies its performance.

The rest of thls chapter glves an overview of the preVlOUS work in distance-vedor

routing. Chapter Two mtroduces the key concepts of ADR. Chapter Three introduces

the routing scheme's reaction to link failures. The following chapter explains strate

gies which improve the performance of the basic scheme. Chapter Five then f'tudies

the performance of the resulting algorithrn. The text concludes with suggestions

about the environment in which such an algorithm could be useful.

1.1 ADR in the context of other routing schemes

This section briefly reviews sorne of the routing schemes available1 . We first show

where the ADR fits in a classification of routing algorithrns. The rest of this chapter

presents various distance-vector routing algorithms to give elements of comparison

for ADR

1 A more detail overview of routing schemes is available in Schwartz [Sch87J chapter Six. Tanenbaum [Tan891 chapter Five presents routing schernes in the context of the OSI mode!.

1

.' CHAPTER 1. INTRODUCTION 2

Routing algorithms can he classified in various ways. They can be clas~ified as to

their response to change: an algorithm is said to he statze (cr non-adaptive) If the

routes are computed off-line, and adaptzvf if it tries ta modify its routing deeisioll

to reflect change In topology or link "cost". Ther: is a gradation in adaptabilit,y,

certain algorithms only react to changes in topology, other adapt to varIOUS measures

of the link "cost": estimated propagation delay ([Gal77], [CAT88]), link utilization,

buffer occupancy, measured error condItions on a link, etc. Those measures ean be

combined among themselves (link & buffer uhlization [GT90]) or with fixed quanbty

like link length, speed or handwidth The amount of information used and how

often this inÎolmation is updated defines a possible sub-classificatlOn of the adaptlvc

algorithms.

Routing algorithms can also be classlfied on the basis of their performance objec

tives: tl:.e shortest-p<.T,'. approach is a greedy algorithm, providing a least cost path

between the source and destinatIon (the sum of the costs on this least cost path IS

often referred to as the distance between the source and destinatIOn), the nelwork

wide approach minimizes the average time delay. The network wide approach usually

leads to the use of multiple paths hetween source auc! destmation. ThIs multipath

routing (also called bifurcated rou ting) was not adopted by any net work or network

architecture until recently (OSPF, an interior gateway protocol of Tep /IP, can dls

tribute traffic over multiple routes), although suggestions of the use of m ultipath

routing arise in sorne early papers [CG74], [FGK73] and [GaI77]. On the other hand

the shortest-path approach is used by mast packet-switched netwark (old ARPANET

and new ARPANET, BNA, DNA, SNA, TYMNET ... ).

Finally, routing algorithms can be classified as whether thcir routes are computed

at each node or at a central node. The former approach is called dzstrzbutcd routmg.

The ARPANET, BNA (Burroughs Network Architecture) and DNA (Digital Network

Architecture) are examples of netwo:::ks or architectures that us(' a distributed routing

algorithm. The latter approach is called eentraltzed routing. TYMNET, the onginal

IBM SNA and GTE are example of networks or architectures that use a ccntralizcd

routing algorithrn.

There are two classes of distributed routing: dista.nce-vedor and link-states. In

( CHAPTER 1. INTRODUCTION 3

Routing table Distance table dest dist successor dest Link

DESIlNATION

l 2 l 2 l 2 3 0 nil

2 4 1 2 3 2 1 2

--0 4 l 4 4 2 1

Figure 1.1: Node 3 routing information

distance-vector routing (the old ARPANET and the Bellman-Ford routing algorithm

are example of this), adjacent nodes exchange lists of their distances to each destina

tion. Each node then uses this informatio'l to compute its own short est distance to

the destination. The nodes keep only these distances and the fi15t link on the path to

the destination. In a link-state algorithm (the Shortest Path First, new ARPANET,

and Dijkstra's algorithm are example of this), cach node broadcasts the status of the

links between itself and each of Hs neighbors. Each node keeps a representation of

the entire network, along with the cost associated with each link in the network.

The Alternate-path Distance-vector Routing (ADR) is a distributed adaptive rout

ing aJgorithm, which can exploit multiple paths between the source and destination.

We will study it in more detail in the following chapters.

1.2 Evolution of distance-vector Routing

1.2.1 ARPANET

The old ARP ANET algorithm was one of the first versions of distance-vector routing

algorithm. In this version, each node2 in the network maintains a routing table and

a distance table. The distance table con tains the distance to every destination for

each of the node's links. The routing table contains the shortest distance to every

destination and the link which was used to obtain that distance (called the successor

of this node for this destination). For example figure 1.1 shows the routing table and

the distance table of node 3 in the network depicted in figure 1.1.

:lIn the old ARPANET, switching e1ements were caUed InterCace Message Processors. Here we use the tcrm 'node' to reCer to an IMP or a router, implying that ail nodes considered are routers.

l , 1

....... ' " >

CHAPTER 1. INTRODUCTION 4

Each node uses the messages it receives from its neighbors to fill these tables.

These messages are summaries of the neighbors' routing tables. When anode receives

one of these messages, it updates its distance table. The update of the table requires

severa! steps. For each destination the node replaces the entry corresponding to the

sender's distance by the sum of the distance from that node and the "length" of the

link. If the new distance is smaller than the distance in the routmg table, then the

node replaces it by the new distance and defines the successor node to be the sender

of the message.

Wh en the algorithm starts, each node sets the successors to nil in its routing table.

It also sets all the distances to infinity except for the distance from the node to itself.

Alter that, each no de periodically sends an update message to all of its neighbors.

This updating process can be summarized by :

For an example of the updating procedure, consider the initialization of the net

work of figure 1.1 for only one destination, node 1 and from the poini of view of

node 3. In this example, a message is represented as MSG( destination,distance) and

aIl links have a cost of one. Assume node 3 receives its first message from node 4,

MSG(1,2). Node 3 adds the link cost (one) to node 4's distance (two) to obtain the

distance to put in the the distance table (three). It is smaller than the initial distance

in the routing table (00), so it replaces this distance and the successor becomes node

4. When no de 3 receives MSG(I,I) from no de 2, it puts the sum of the link cost and

the message distance in the distance table. That value (t wo) is sm aller than the value

of the current shortest path (through node 4), so node 2 replaces node 4 as successor.

At this point, the entry for destination 1 in both table correspond to what is shown

on figure 1.1.

The major drawback of the old ARPANET approach is that it suffers from the

count-to-infinity problem (abbreviated CToo). This problem (also called good

newsfbad news) occurs when anode expeIlences an increase in its distance to desti

nation and finds a new "shorter" route looping back through itself. For example, in

the network of figure 1.2, if node A is the destination, CToo occurs alter a lailure on


(:~ .... "".""::.~;0 X ·8 œ\11NATION

.' ........... "11 •11

Figure 1.2: J.4'irst example of count-to-infinity

link A-B. Before this failure, node B has a distance of one on link A and of three on

link C, and node Chas a distance of two on link B. Wh en the failure occurs, node

B looks for another route to reach the destination. It chooses the route starting with

link C (with a distance of two) and sends an update to node C to inform C that

B's distance to A has increased. When node C receives this message it changes its

distance to A and sends an update to B. This exchange continues until the nodes'

distances each reach infinity (or the maximum value that can fit in the space reserved

to store the distance).

1.2.2 Split-horizon

One solution to this simple version of CToo is the split-horizon technique [Ceg75]. In

this technique, node C in figure 1.2 cannot use the distance of path going through B

in the update message that C sends to node B. Node C does not send any update

message since it has no other path to destination except through B. Therefore node

B does not !mow about the "deceitful" path which starts with link C. This solution

does not, however, work with more complex CToo's. In figure 1.3, if a failure occurs

on the link between the destination (node 1) and no de 2, then no de 2 sends an update

message to nodes 3 and 4. This message do es not offer the short-cycles 2-3-2 or 2-4-2

because the nodes use the split-horizon. However, this technique does not prevent

node 3 from telling node 2 about its path through node 4 nOI does it prevent node 4

from telling about its path through node 3. Therefore, the update message that no de

2 sends to node 3 has a distance of four. This distance reflects the new path through

node 4. When no de 3 receives this message, it sends a message with a distance of

five to node 4. Node 4, in turn, sends an update with distance of six to no de 2. As a

result, no de 2 sends a new message with a distance of seven to node 3.


DES11NATION

Figure 1.3: Second example of count-to-infinity

The same process happens with the message no de 2 sends to node 4. There are

two CToo's (2-3-4-2 and 2-4-3-2) which take place as a result of this failure and

the split-horizon technique does not prevent them from occurring.

1.2.3 Jaffe and Moss

Jaie and Moss [JM82] address this problem. For example, in figure 1.3, no de 2 would

be prevented from taking into account the distance on links 3 and 4 until all the

nodes downtree3 from node 2 have been informed of the link failure. By the time

node 2 gets the confirmation that aIl nodes downtree know about the link failure, it

has received messages on link 3 and 4 which prevents it from choosing either of those

links as its successor. As a resuIt, node 2 cannot start a cycle. In the rest of this

section, we first describe JaKe and Moss' algorithm, and then illustrate its operation

via a simple example.

This approach exploits the fact that the old ARPANET algorithm, in the presence

of constant or decreasing distances to a destination, maintains loop-free paths. In the

old ARPANET, loops can occur only when there is an increase in the distance to the

destination. In Jaffe-Moss' approach, these increases are treated differently. When

a no de sees that its successor link's cost has increased or that this link has failed, it

updates its distance table, sends a link increase update message4 to all of its neighbors

3This notation reCers to a spanning tree rooted at the destination. 4Jaft"e-Moss' update message contains, in addition to the destination and the shortest distante

c.


$lep 1 step2 $lep 3 slep4

Figure 1.4: Jaffe and Moss' approach

and freezes. This frozen state prevents the node from choosing another successor,

but it still lets the node update the distance associated with each link. Anode remains

frozen until it receives, from each of its neighbors, an acknowledgment of the increase

update message.

When anode receives an increase update message on its successor link, it changes

the link dIstance, sends update messages to all of its neighbors, and enters the frozen

state. The update messages carry the node's new distance to destination using its old

successor. If anode receives an update message on a link which is not its successor, it

sim ply adjusts the distance on that link and sends back an acknowledgement. When

a frozen no de has received acknowledgments from aU of its neighbors, it can, in turn,

send an acknowledgment on the link which received an update. It then considers the

link with the shortest distance to destination. If its distance is sm aller than in finit y,

the node choses it as its a new successor and update the routing table accordingly.

Otherwise the successor becomes nil. If the successor is not nil, the node sends an

update message to its neighbors.

In the second example, Jaffe-Moss' approach works as illustrated in figure 1.4.

ln the figure, an additional circle around the node means that the node is frozen;

Upd(distance) l'efers to an increase update message and ACK to an acknowledgment.

ACter the failure, node 2 enters the Crozen state and sends an update message to

nodes 3 and 4. When node 3 and node 4 receive this message they react as depicted

in step 2 of the figure. They freeze because the message came from their successor.

lo il, a single bit field which indicates if the message was send in reaction ta a link increase or not. Note that a failure is considered as a link increase to infinity.


They then set the distance on link 2 to infinity, and send an update message to each

of their neighbors (nodes 2 & 3 for no de 4 and nodes 2 & 4 for node 3). Stcp 3

shows how the neighbors react to these messages. Each of the nCIghbors receives the

message on a link which is not its successor. 50, it can send back an acknowledgement

after it has set the link's distance to infinity. Note that, as a result, node 2 has a

distance of infinity on the link to each of its nelghbors. In step 4, nodes 3 and 4

react to the acknowledgment which they received from each of their neighbors They

then unfreeze and send an acknowledgment to node 2. Whell node 2 receives both

acknowledgements it unfreezes and discovers that It cannot select a new successor for

destination 1 because alllinks have a distance of infinity. As a reslllt, Ilodc 2 cannot

start a loop which leads to CToo.

This approach also handles the cases multiple links increasing their cost. 1'0 do

this, the nodes keep, in addition to the information mentioned before, a bit vector

for each of its links. The nodes use these vectors to keep track of the "outstandlllg"

acknowledgments. When anode freezes, it adds a bit to each of its vectors. Later

when the node receives a acknowledgement on a link, it removes a bit from the vector

attached to this link. The node remains frozen until all of its vectors have no bits

left.

This approach eliminates many CToo problems, but the extensive freezing that

might be present decreases the responsiveness of the old ARPANET algorithm.

1.2.4 Hagouel

Hagouel [Hag83] also addresses the problem described in the second example of C'l'oo

(see figure 1.3) by preventing node 2 from taking into account the distance to desti

nation on links 3 and 4. He does it, however in a more drastic way In his approach

(which he refers to as "algorithm A") there is no distance table, instef\d each node

keeps only a single cost estimate for each destination. When anode necd!> to know

the distance to a destination on a path starting with a link which IS not It& successor,

it asks its neighbors to send back a message with this distance. However, anode is

not allowed to reply to such a request from its successor when this request reflect& a

link increase.

( CHAPTER 1. INTRODUCTION 9

This approach decreases the probability of CToc but there is no proo{ in [Hag83]

that this approach completely eliminates the problem. One drawback of Hagouel's

approach is that anode cannot tell whether it cannot reach a destination or if the

message asking for an update has been lost or delayed. This is the case with the

second example ol CToo. Node 2 will never get any message about the destination

from node 3 or node 4. A positive aspect of this approach is that it does not require

any freezing.

1.2.5 Garcia-Luna-Aceves

Garcia-Luna-Aceves has proposed many solutions to the CToc[GLA86, GLA87, GLA88b].

One of most recent ones is presented here. It is an extension to the Jaffe-Moss' ap

proach.

This approach [GLA88a, GLA88b] adds the concept of feasible successor to the

Jatfe-Moss algorithm. A feasibJe successor to anode is a new successor which has

a distance to the destination smaller than or equal to that of the current successor.

When anode receives, from its current successor, an update message that reflects an

increase in link cost, it does not necessarily freeze, as in Jaffe-Moss algorithm. If there

is a feasible successor, it will become the new successor and no freezing is necessary.

Similarly, when a node's link to a successor fails or the link cost increases, the no de

looks for a feasible successor belore going into a frozen state. If there is a feasible

successor, once again the node does not need to {reeze.

With this key difference and sorne other minor ones, Garcia-Luna-Aceves obtains

an "algorithm that is always loop-free, operates with arbitrary link and node delays,

and provides shortest paths within a finite time after the occurrence of an arbitrary

sequence of topological changes."[GLA88a, page 1126]. He gives a proof of this loop

freedom, in [GLA88a).

In [GLA8Sc], he also proposes a modification to this algorithm which uses less

internodal update coordination. However, it is required to include, both in the update

messages and the routing table, the complete path from source to destination in order

to avoid CToo in the case ol multiple failures. So it seems that, unless a single failure

....


can be assumed, this family5 of algorithms requires strong coordination

As Garcia-Luna-Aceves says himself [GLAS8c) in a "large network subject to sev

eral topological changes [ ... l, the coordination that they reqUlre among nodes distant

from each other can inhibit the responsiveness of the network". However, it is not

clear if it is possible to improve the old ARPANET to avoid CToo and to preserve

responsiveness, using any of the methods presented 50 far.

1.2.6 Hold-down

The hold-down solution to the CToo is an old solution to this problem It precedes

.Jaffe-Moss's solution. It has been proposed by a McQuillan [McQ74]. It diverges

from the ARPANET algorithm when anode receives an mcrease update message"

from its current successor. The node then waits for a fixed period of time before

making any change to its routing table. This waiting period IS called a hold-down

period. It should be long enough so that ail the old information has been replaC'Cd,

using the normal update process, by the new information The problem with this

approach is to determine what is a "long enough" period of time. If the network WaJt

for a too short period, it can still suffer from the CToo. If it waits too long, it wastes

precious resources. With multiple failures the hold-down period can easily be not

long enough, so it might not work.

This hold-down period replaces the various mechanism which actively rem ove the

old information in .Jatre-Moss, Hagouel's, and Garcia-Luna-Aceve&' approach. From

this point of view, the hold-down approach trades being less responsive then those

approaches which use fewer messages.

SOld ARPANET, Jaffe-Moss and Garcia 6 An update message which was sent in reaction to increase of link cost

{

Chapter 2

General concepts

This chapter introduces a few conventions and key ideas used in the following text. It

presents the three basic networks that are used in exarnples which will he presented

in the later sections. It presents a tool, called the chronological table of events, which

will he used to summarize examples. Such a table can he found at the end of each

example in order to give a time perspective. At the end of this chapter, we present

the assumptions of the proposed algorithms.

A first convention is to discuss a network as if the destination were the root of

a spanning tree, and this root was the highest node in a figure. The other nodes

are ordered according to their distance. the destination. When two nodes have a

common link on their path to the destination, the one that has a shorter path leading

to the destination, is said to he uplink (or simply up) from the other. If two nodes

which share a link have the same distance to the destination, the link connecting

them is called a horizontal link. For exampl~, in figure 2.1 nodes 9 and 10 are said

to he up from node 12. Node 11 is horizontal from no de 12. Node 13 is down from

node 12. This convention is consistent with Tsuchiya's initial text [Tsu87]. However

in the figures, the destination is usually located on the right, in order to save space.

In this document the distance between two nodes is measured in hop counts (the

minimum number of links hetween two nodes). Hop counts are used to make the

exposition more straightforward.

Another way to render the exposition clear is to consider only one destination

11

CHAPTER 2. GENERAL CONCEPTS 12

in each example. This can he done without difficulty because each node keeps a

separate representation of the network for each destination. These representations

contain views of the network which are relatively independent, since a message for

one destination is rarely relevant for another. For example, conslder figure 2.1 with

destination nodes 0 and 4. A message informing node 6 about a failure on link 0-1

for destination 0, is not relevant for node 6 with respect to destination 4.

The central concept of Alternate path Distance-vedor Routing (ADR) is the con

cept of juncture. Most messages carry information about their position, a.nd this

information constitutes a considerable portion of each node's view of the network. A C'

juncture is a node or a pair of nodes with two or more up paths of equal dlstancc

to the destination. Such anode is important hecause, if a link fadure occurs on one

of the paths to the destination, the second path from the juncture offers an alterna-

tive (also called alternate path) to nodes between the failure and the juncture. For

example, consider a failure cn the link between nodes 1 and 3 (figure 2.1). Node 6 is

the closest juncture to the failure, so it would provlde a path to the destinatIon for

node 3.

Juncture nodes can be divided into two categories depelldmg on the type of alter

nate paths to the destination they offer. A full juncture ofiers at least two distinct

paths to the destination (for example, node 9 in figure 2.1 offer a path elldmg with

link 1-0 and another path on with link 2-1]) A partial juncture also offers at least

two paths to destination, but those paths have sorne nodes in cornmOlL, other than

the destination (for example node 6 in figure 2.1 offer paths 6-3-1-0 & 6-4-1-0) This

type of juncture provides an alternative path in case of link failure, but only when

the failure occurs hetween the juncture and the first node corn mon to both paths

(that is, if a failure occurs on the link between node 0 and 1, node 6 does not offer

an alternative to node 1,3 & 4).

The difference in the range of nodes a juncture can help in case of faiiure dictates

which nodes should he informed about the alternative. A full juncture ofiers an

alternative to all nodes on an up path to the destination l . A partial Juncture should

1 However it is not necessary to go above another fuU juncture (or reasons explained in section 4.3.

('


Figure 2.1: First basic network graph

tell all the nodes on its up paths up to the node at which these pa.ths merge.

The no de where two paths towards the destination merge IS called a split point.

This name was chosen because during the setup part of the ADR algorithm, the first

messages go down from the destination aild they split at this node.

2.1 Network description

In this section we first describe a network as a static entity, and then describe how

it develops its self-knowledge (i.e. the setup part of the algorithm) and how it stores

this knowledge. The second and third networks are only described as static entities.

There is a table showing which messages are kept at each node at the end of the

description of each network.

2.1.1 First network

Consider the basic graph example (figure 2.1) where node 0 is the destination. All

arrows point towards the up direction.

There are three split points, apart from the destination, in this network: node

l, node 5, and node 9. Node 6 is the only partial juncture here. It brings alternate

paths to Dode 3 and 4. There are three full junctures in this network: no de 9, node

12, and node 13. Node 9 provides an alternate path, in case of failure, for nodes l to

-


7. Node 12 provides an alternate path for nodes 8 and 10. Node 13 does the same

for no de 11. Both of them also provide a long alternate path to nodes which rely on

node 9 in case link 0-1, and one link between 5 and 9 fail They do tins by provldlllg

an alternate path to node 9.

The network learns about the position of junctures through 3 kinds of messages

the jc message, the papp message and the fapp message. The JC messagc tells nodes

that they are juncture. The app messages tell nodes that therc IS a juncture below

them. It also tells them if it is a full or partial junct ure These messages have the

following structure:

je msg: abbreviatIon of juncture configuratIOn message

• Fj element. - the node Id of the last full Juncture encountered (or

the first node aCter destination)

• Pj list - a. hst of split points that have met slnee the last full

juncture

fapp msg: abbreviatIOn of full alternaie prtmmg pa th (app) message

• origin - node id of the full jundure

• distance - distance to destination along the path which goe~

through the juncture

papp msg: abbreviation of partzal alternate pnmmg paih (app) messagc

• origin - node id of the partial juncture

• distance - distance to destination along the path which goes

through the Juncture

• stop field - node Id of the mat ching split pomt

or split count - number of split points before the matching split

point

Once, the first phase of the setup has classified the links of aU nodes as UP,

(

(


horizontal, or down2, the destination sends Je messages to all its neighbors. These

nelghbors put their node id in the Fj part ofthe message and send these Je messages

on their down and horizontal links. These messages are propagated down to the last

nodes which have no other down or horizontal links. On their way the Je messages

are modlfied in three ways. Each split point they encounter adds its node id to their

Pj list. Each partial junctllre removes some elements from the Pj list (the details

follow). Each full juncture puts its HO de id in the Fj part of the message and deletes

the Pj list

For anode receiving a Je message, the Fj part of the message indicates the mes

sage's origin (WhiCh neighbors of the destination, or which full juncture the message

came from). When a juncture has received aIl its Je messages, it can deduce its status

b3.sed on the oflgin of these messages: partial juncture if they have the same origin,

full juncture if they have different origins.

When a juncture discovers its status, it sends the appropriate app message on its

up links. A full Juncture sends fapp messages which are propagated up to the next

full juncture. A partial juncture sends papp messages which are propagated up to,

but excluding, the matching split point (the first common element to the Pj lists of

all the je messages received). The partiaI juncture sends a Je message with a Pj list

containing only the Pj elements common to all the je messages received, minus the

matching split point.

For example, node 6 learned it was a partial juncture when it received two je

messages with the same origin (same Fj). Node 6 can aIso tell which node is the

matching split point (node 1), by 100kiDg for the first identical address iD the Pj list

of bath messages. Node 6 put this split point Dode id iD the stop field of the papp

messages it sends to no de 3 and 4. These papp messages bring aItern,je paths to

node 3 and 4.

Nodes 9, 12, and 13 l("arn that they are full jUDctures when they received two je

messages with dlfferent Fjs (node 9 -> 1 & 2, no de 12 -> 9 & 2 and node 13 -> 9 &

:lThis classification requites the knowledge oUhe distance on each link. In this thesis, it is assumed that this information is obtained using a traditional Bellman-Ford method. Further research could develop a protocol whlrh would combine this phase of the setup and the second one which is described in the next paragraphs.

r !<

" , CHAPTER 2. GENERAL CONCEPTS 16

no de Imk dlr mat messages 1 0 U 1 Jc(FJ - 0, PJ U)

3 D 7 fapp(orlg;n = 9, dislance = 7) 4 D 7 fapp( orlgm = 9, mstance == 7)

2 0 U 1 Jc(FJ - 0, PJ - D) 5 D 1 fapp(orlgm::: 9, distance = 7), fapp(ongm = 12, dIstance == 9)

3 1 U 2 Jc(FJ - l, PJ .;-rïJY 6 D 6 fapp( orlgln == 9, mstance = 6), papp( orlgU\=6, dlstnncr= 3)

4 1 U 2 Jc{FJ - 1, PJ - [1]) 6 D 6 fapp( orlgm == 9, distallce = 6), pnpp( orlgm:6, dIStance;:: 3)

5 2 U 2 Jc(FJ = 2, PJ [2]) --7 D 6 fapp(orlgln = 9, mstance = 6) 8 D 8 fapp( orlgln == 12, distance = 8)

6 3 U J Jc(Fj - l, PJ - [I)) 4 U 3 Jc(FJ = l, PJ = [11) 9 D 5 fapp( orlgln == 9, mstance = 5)

7 5 U 3 Jc(FJ - 2, PJ [2,5]) 9 0 5 fapp( ongm = 9, distlll1ce = 6)

8 5 U 3 Jc(FJ - 2, PJ [2,5]) 10 0 7 fapp{orlgln == 12, mslance = 7)

9 6 U 4 Jc(FJ - l, PJ - [I)) 7 U 4 Jc(FJ = l, PJ = (1])

12 0 6 fapp(orlgln:: 12, mstance = 6) 11 0 7 fapP(orlgln:: 13, mstance = 8)

10 8 U 4 Jc(FJ = 2, PJ 12,5)) 12 0 6 fapp(orlgln:: 12, dIstance = 6)

11 9 U 5 Jc(FJ - 9, PJ - J9J} 13 D 7 Capp( Oflgll1 ::: 13, di.t"" .. e = 7)

12 9 U 5 Jc~FJ - 9, P.l - !9j) 10 U 5 Jc(FJ == l, PJ == [2,5]) 13 D 7 Capp(orlgln == 13, mLtance = 7)

1-13 11 U 6 Jc\FJ - Il, PJ - ~J)

12 U 6 Jc(FJ == 12, PJ == Dl

Table 2.1: Messages stored in the first network after setup

12). Then they send fapp messages on their up links to inform nodes above them of

the alternate path the full juncture is providing.

Each node keeps track of wh;ch links are up, which are horizontal and which art:'

down. It also keeps the distance to destination for alllinks. Each node keeps the last

je message it has received on each up and horizontal link. It alsa kecps aIl the app

message it has received on a down link until a message tell the node ta remove them

These two types of messages are kept, because they are needed when there is a fai)ure

on a link.

Table 2.1 gives a list of the message stored at the nodes after the setup.

<- CHAPTER 2. GENERAL CONCEPTS 17

Figure 2.2: Second basic network graph

2.1.2 Second network

The second network (figure 2.2) where node A is the destination has two interesting

features. It contains a two node partial juncture (nodes D and E), and node

G has three up links. A two nodes partial juncture is a partial juncture made

of a pair of nodes with a horizontal link between them (note that there is no arrow

because neither of the nodes is above the other).

In this network there are also two full juncture nodes, 1 and G. Node 1 provides

an alternate path to nodes H, F, C, and G. Node G provides an alternate path to

nodes B to F.

Table 2.2 gives a list of the messages stored at the nodes after the setup.

2.1.3 Third network

The important aspect of the third network (figure 2.3) is that it conta.ins a two node

full juncture (nodes U and V). A two nodes full juncture is a full juncture

made of a pair of l':.odes with a horizontal link between them. It is also interesting

to note that if the horizontallink between those two nodes were not there, no de R

would be a partial juncture, but no de Q would still be a full juncture.

Table 2.3 gives a list of the message stored at the nodes in the third network after

the setup.

"

CHAPTER 2, GENERAL CONCEPTS 18

node Imk chr dut meslages B A U 1 Jc(FJ - A, PJ Il>

D D 5 Capp(oriSID = G, distance = 5) E D 5 Capp(oriSID = G, chstance = 5}

C A U 1 Jc(FJ - A, PJ = UJ. F D 5 Capp(orisin = G, distance::; 5), fapp(origm = l, chstance = 7)

D B U 2 Jc~Fj - B, PJ ::; !~!~ E H 3 je(FJ = B, PJ ::; [B]) G D 4 Capp(ongm = G, chstanee::; 4)

E B U 2 JC!~J - B, PJ - !~!~ D H 3 je(FJ = B, PJ = (B]) G D 4 fapp(ongm = G, chstanee = 4)

F C U 2 Jc(FJ - C, PJ Il> G D 4 rapp(origm::; G, distance = 4) H D 6 Capp(ongm::; l, chstanee = 6)

G D U 3 JC~~J - B, PJ - !~P E u 3 Jc(FJ = B, PJ = [B]) F U 3 Jc(FJ = C, PJ = [F)) 1 D 5 fapp(origin = l, d ... tance == 6)

H F U 3 Jc(Fj - C, PJ - [.F)) 1 D 5 fapp(ongin ::; l, distance == 5)

1 G U 4 ~c~~J - G, PJ = ID H U 4 Jc(FJ::; C, PJ = [F) --

Table 2.2: Messages stored in the second network after setup

Ot:S11NATION

Figure 2.3: Third ba.sic network graph


node link wr chlt me"agea y Z U 1 Jc(FJ - Z. PJ - O~

U D 5 Capp(orltpn = Q. d.1ltance = 5) V D 6 Capp(oritpn = R. diatance = 6) W D 6 Capp(origin = R. di,tance :: 6).Capp(origm = p. cLstance = 7)

X Z U 1 Jc(Fj - Z. PJ - m T D 5 Capp(orltpn = T. diltance = 5)

W y U 2 jc(FJ - Y. PJ - IY]) S D 5 Capp(orltpn = p. di,tance = 6)

V Y U 2 Jc(FJ - Y. Pj - M) R D 5 Capp(orltpn = R. distance = 5)

U y U 2 jc(FJ - Y. Pj - IY]) Q D 4 Capp(origln = Q. di,tance = 4)

T X U 2 JCfFJ - X. PJ - m V D 4 Capp(orltpn = Q. chslance = 4)

S W U 3 Jc(FJ - Y. PJ - [Y.Wl} P D 5 Capp(oritpn = p. di,tance = 5)

R W U 3 JcTFJ - Y. PJ - (!.W)) V U 3 Jc(FJ = Y. Pj = M) Q H 4 Jc(Fj = Q. PJ = 0) P D 5 Capp(orllPn = p. chstance = 5)

Q T U 3 jc(FJ - X. PJ - 0) U U 3 Jc(FJ = Y. PJ = M) R H 4 Jc(Fj = Y. PJ = M)

P s U 4 jc(FJ = Y. Pj - (!.W]) R U 4 Jc(FJ = R. PJ = Dl

Table 2.3: Messages stored in the third network after setup

2.2 Chronological table of events

A chronological table (see table 2.4) has the following structure. The first column

shows the time at which an event occurred. The first row shows the node which is

involved, the second the links of the node, and the third the direction of the link (U

up, D-down, H-horizontal, *-broken link) at the beginning of the example. Nodes

are separated by double lines. Each entry in the table is made up of two parts. The

first line expresses the event which occurred (failure or message arrival) in the column

corresponding to the link on which it occurred. The second shows the direction of the

links after the event occurred (after a failure or upon receiving a message mentioning

a failure farther away, the direction of a link can change). Here is a list of possible

events3 :

FAIL failure on that link

3Each message of an event "receiving a message" will be explain in the next chapter.

_. __ .. __ .. _-_ ..... _------------------------------------

CHAPTER 2. GENERAL CONCEPTS

35 FAIL D U *

38 U

43 AF12 D U *

44 AF9 D U U

Table 2.4: Example of time table

5 U

UPD 9 D

UPDxx receiving an update message \Vith xx = alternate distance to desti

nation.

QC xx receiving a quick cost update message with xx = new distance to

destination.

Fxx receiving a fapp message with xx = origin of the message (full junc

ture id)

AFxx receiving an anti-fapp message with xx = origin of the message (full

juncture id)

Fxx receiving a papp message with xx = origin of the message (partial

juncture id)

APxx receiving an antz-papp message with xx = origin of the message (par

tial full juncture id)

20

The tables in this document assume that time is discrete and that processing and

sending messages take a constant time. It takes 2 time units for anode to interpret an

incoming control message and adjust its representation of the network. It then takes

1 time unit for each message the nocle builds and sends to its neighbors (sending time

includes the time to cross the link). If the receiving node is already busy proccssing

-----------------------------_.


or sending another message, the incoming message waits until it is free. If anode is

busy with other control messages, new incoming messages will wait.

The assumption that time is discrete and that the operations take a constant time

keep the messages tractable when the simulation is done manually.

The proposed algorithm also relies on certain assumptions. It assumes that there

is a protocol a.vailable to inform the two nodes sharing a link when this link fail. It

also assumes that no failure will occur before the end of the setup period. It also

simplifies its task by allowing new links to be added only during the setup.

Chapter 3

Isolated link failures

This chapter presents the different types of link failures and how a network reacts to

them. Examples of single link failures illustrate which messages are sent and what role

each type of message plays in the adaptation to a failure. Note that the description

of events in the examples follows a depth first fashion rather than a chronological

fashion. That is, it follows the consequences of one message before considering the

consequences of another.

Since this system is distrihuted, it can he described from the point of view of each

node. The description of a "~action to a failure takes the following approach. When

a no de sees one of its links has failed, it can easily identify it as a failure on an up

link, on a horizontallink, or on a down link. The link failure can be classified further 1

according to the number of links with the same orientation as the one which failed.

The way the no de reacts to the failure depends upon the class to which it belongs.

3.1 Failure on down link

When a failure occurs on a down link, the node has to rernove the link and any

information that came on that link from its internai representation. In addition, it

has to start a process which will "chase" and rem ove all messages which have traversed

the broken link as far as they were propagated. These messages (fapp messages and

papp messages) carry information about the network helow which is not valid as a

22

•

• "" _ ..... ~ u.-. ................ _ .. :WiitaZ tlt

CHAPTER 3. ISOLATED LlNK FAILURES 23

dnLinadon

Figure 3.1: Failure on link 8 - 10

result of the failure. The way the proposed system "chases" these app messages is by

sen ding other messages which follow exactly the same path as the original messages

and remove them on the way. These other messages are called anti-fapp messages

and anti-papp messages l .

If tbere were two down links before the failure, tben aftel' the failure, the node

would no longer be a split point'. Since the last je message which was sent had this

split point in its Pj list, a newone has to be sent to inform down links of this change.

Consider a failure on link 8-10 {rom the point of view of node 8 (figure 3.1). Node

8 has to send anti-messages to chase the message that came from juncture node 12.

The message that came from jUDcture Dode 12 was a fapp message, 50 node 8 will send

an anti-fapp message on link 8-5. When node 5 receives the message from no de 8, it

looks {or a fapp message with origin node 12, removes it and then sends3 the anti-fapp

message on all up links which do not lead to a full juncture or to the destination.

Node 5 then sends the anti-fapp message to node 2. Node 2 does not propagate the

anti-fapp message to node 0 sin ce node 0 is the destination.

1 A description of these messages and ail other messages used in trus system are described in appendix A.

2If there were more, the node would still be a split point. If there were only one, the node would never have been a split poant in the first place

lNote that if node 5 were not to find a fapp meuage of origin 12 from link 8 it wouId not send the antl·fapp meuage any farther, sirIce there would be nothing to remove.

CHAPTER 3. ISOLATED LINK FAIL URES 24

Df.'illNATION

Figure 3.2: Failure Q - R

3.2 Failure on horizontal link

When a failure occurs on a horizontallink, the node has to remove the link and the Je

message that came on that link from its internai representation of the network. Then

it has to evaluate its new juncture status, and react according to this new status.

There are three possibili ties:

• The no de preserves its juncture status. If the node is aIready a juncture with

the same status (it has 2 up links or it is part of another tvo node juncture),

then the failure does not cause a change in status. The node does not need to

send any message, since nothing has really changed for the rest of the network:

the node still provides an alternate path of the same quality (through a juncture

with the same status).

For example, consider a failure on link Q - R in figure 3.2, from the point of

view of no de Q. Node Q is still a full juncture, so it does not need to send any

message .

• The no de was a two node full juncture and it becomes a partial juncture

or a tvo Dode partial juncture. In this case the node has to send papp

messages on up links and a new je mes3age on down links to replace the oid

messages.

-------------------------------------_ .. _--_. --

( CHAPTER 3. ISOLATED LINK FAILURES 25

Figure 3.3. Failure D - E

Consider the same failure Q - R from the point of view of no de R(figure 3.2).

Node R becomes a partial juncture, because its two up links have the same Fj

(Y). To reflect this change in status, node R sends a papp message to nodes

V & W and a new je message to node P. When nodes V & W receive this

message, they replace the old fapp message from R by the new message. This

message st~ps there4, but they send an anti-fapp message to node Y to remove

its fapp message. When node P receives the new je message, it discovers that

it is a partial jundure, since both of its up links have the same Fj (Y). As a

result, it sends a papp message on it.s up links to replace the fapp message .

• The node stops being a juncture. If the node had only one up link and one

horizontallink, it stops being a juncture when the horizontal link fails. The

node has to send a new je message on its down links and an anti-app message

on its up link.

Con si der a failure on link D - E from the point of view of node D(figure 3.3).

Node D is not a partial juncture anymore 50 it has to send a new je message

to node G. Node G reacts to this message by sending a new papp message up

on links D and E. They do not send an antl-papp message to node B sin ce it

was the split point of the juncture before the failure.

4Node Y is the split point for Dode R as a partial juncture. However, the fapp meuage send by node R did reach node Y, so an anti·fapp meuage is required to remove it.

CHAPTER 3. ISOLATED LINK FAILURES 26

3.3 Failure on up link

When a failure occurs on an up link, the node has to remove the link and the Je

message that came on that link from its internaI representation. It then has to

reassess the direction of the other links, and if need be, reorganize them and evaluatc

its new jundure status. Failures on an up link can be divlded into threc categories

according to the number of up links present before the failure. An example of each

kind is given after the following list .

• If there are more than two up links before failure, th en the node will remain a

juncture. If there is no change in juncture status, nothing has changed for Ùe

rest of the network.

A full juncture can become a partial juncture if all the up links except the one

that failed had the same origin (same Fj), and It did not have a honzontallmk

with a different origin (if it did, the node remains a full junctllre) III this case,

the no de is still ajuncture and the links do not have to be reorganized. However,

the node has to send a new je message on all down and on all horizontal links as

weIl as papp messages on all up links. This is done to inform other nodes of the

change in its juncture status. The papp message will tell up nodes to replace

their fapp message by the new papp message. The new je message WIll replace

the old ones.

If the node has a horizontal link which carried a jc message with the same Fj

as the remaining up links, then the node which shares the horizontal link will

change status from a (two node) full juncture to a (two node) partial juncture.

This change will be done when the no de which shares the horizontallink receives

the JC message from the node which "suffered" the failure .

• If there are two up links before the failure, then the node's reaction to the failure

depends on the presence of a horizontallink.

- If the node is not part of a tvo node juncture (no horizontallink), then

it los es its juncture status. The node has to inform every no de which knew

(~


that it was a juncture that it is not a juncture anymore. So, it sends the

je message which was received on the remaining up link on all down links

and sends an anti-app message on all up links.

- If the node is part of a t1l0 no de juncture which changes its juncture

status due to the failure, then the node has to inform nodes which depend

on this juncture about this change in status. It sends je messages on down

and horizontal links and papp message on up linkss.

- If the no de is part of a t1l0 node juncture which does not change its

juncture status due to the failure, then the node only need to send a new

Je message on its horizontallink(s).

In all cases the node will not need to reorganize its links .

• If the failure occurs on the only up link, then the links have to be reorganized

whether or not the node is part of a tvo node juncture. The link(s) with

the shortest distance to the destination will become up link(s). Links with

a distance equal to that of the up link( s) plus their respective link cost are

considered horizontal links. Links with larger distances are labeled down links.

The node sends an update message on the new up link(s) to reach anode which

has access to the destination via another path. Once such anode receives the

updale message on one horizontal or up link, it puts this link in the right position

(horizontal or down) and sends aje message back on that link. At the same time

that it sends an update message, the node sends a qcu message on horizontal

and on down links to inform them of the increase in distance to the destination

on that path. When the je messages from all new np links have reached the

no de which "suffers" from failure, the node can send a new Je message on al1

down and on all horizontal links, and the appropriate app message on all up

links, if the node has become a juncture.

Il A failure cannot cause a partial juncture to become a full juncture, so if there was a change it is full to partial. Hence the app meuage must be a papp me66age.


dt.lUl1Uon

Figure 3.4: Failure on link F-G

3.3.1 Example of failure on one of many up links

Consider a failure on link F-G from the point of view of node G (see figure 3.4)

This failure Îs on that up link which has a different origin from that of the otller up

links. Node G does not lose its juncture status but becomes a partial Juncture. ft

does this because the JC messages that came on links D and E have the same Fj (B)

It was the jc message coming from node F with Fj = C that made node G a full

juncture. This link failure does not force the node to reorganize its links, but node G

still has to inform nodes around it about the change in juncture status. First node

G sends papp message on link D and E t.hen it takes the Je message from both up

links, builds a new je message and sends it to node I.

When no de D and no de E receive the papp message from G they look for a similar

app message. When they discover the fapp message from the same origin, they replace

it by the papp message.

When no de 1 receives the je message from G , it compares the new message's Fj

with the Fj of the je message on link H. The two Fjs are different, the one 011 G has

Fj = B and the one on H has Fj = C, 50 no de I remains a full juncture. However,

the failure might have changed the group of nodes that receive its fapp message6, 50

it sends again a fapp message on all up links. This message is propagated to nodes

B to H.

6In this case it does change that group, before nodes D and E were getting node G 's fapp meuage, now they get node l's message instead.


dnlinl'ion


However, a failure on one of more tban two up links can have fewer consequences.

Consider a failure on link E-G. Since links D and F carry JC messages with different

Fjs, node G would still be a full juncture and the JC message it wou Id huild would

not he different. Therefore, there should be no need to send a new je message down,

and no need to send app message up. To sum up, no message exchange would be

required from the down part of the failure.

3.3.2 Exalnple of failure on one of two up links

Consider a failure on link 7-9 from the point of view of node 9 (see figure 3.5). This

failure is on one of its two up links. Node 910ses it juncture status, so it bas to inform

the nodes that could be atfected by tbis. To do this it sends anti-fapp message on

link 6 and a je message on links 11 & 12.

When node 6 gets these messages, it looks in its internai representation for the

corresponding fapp message. When it finds it, it removes it and propagates the ant1.

fapp message on up links which do not lead to the destination, bere links 3 and 4.

When nodes 3 and 4 receive titis message, they search in their internai representations

to locate the corresponding fapp message, remove it, and propagate the anti-fapp

message up to node 1. When node 1 receives the first anti-message, it removes the

fapp message and does not propagate it up to node 0, sinee no de 0 is the destination.

Wh en node 1 receives the second copy of the anti-message, it does not find tbe message

CHAPTER 3. ISOLATED LINK FAIL URES 30

Figure 3.6: Failure on link Q - U

to remove, 50 the anti-message is not propagated any further.

Just alter node 9 sends the antt-fapp message up, it takes the je message it received

from no de 6, and 5ends it to nodes 11 and 12.

When no de 11 receives the messages, it sends it to no de 13. When no de 12 receives

the je message from no de 9, it compares the message Fj (1) with the Fj (2) of the Je

message that came on link 10. They are different, 50 node 12 is still a full juncture.

The new je message that node 12 would send to node 13 is identical to the last

message sent, so there is no need to send it down again. However, the failure might

have ehanged the group of nodes that receive its fapp message, so it sends agam a

fapp message on all up links.

When node 13 receives the je message from Rode 11, it compares the Fj (1) of this

message with the Fj (12) of the je message received on link 12. lt discovers that it is

still a full juncture, sinee they are different. Again, the failure might have changed the

group of nodes that receive its fapp message, 50 node 13 sends again a fapp message

on all u p links.

The previous example illustrate5 what happen5 if there is no horizontal links.

With a horizontallink the situation is different, the ho node juncture IDight keep

its juncture status. FOT example, if a failure occur on link Q - U of figure 3.6 node

q is still part of a full juncture, because the Fj(X) on link T is different from the

Fj(Y) of link R. The ooly thing, node Q has to do is send a je message to node R,

because this message is different from the last message seni. to notle R.

(



However, it is not because anode is part of a tvo node juncture that it neces

sarily keeps its juncture status. If the failure occurs on link Q - T instead of link

Q - V, node Q becomes a tvo node partial juncture because link V bring a je

message with the same Fj as link R. In this case, node Q has to send a papp message

on link U to replace the old fapp message. Node Q still sends a je message on link

R and if it had down links it would need to send them the je message.

3.3.3 Example of failure on single up Iink

Con si der a fallure on link 2-5 from the point of view of node 5 (see figure 3.7). This

failure is on its only up link (2). Node 5 promotes link 7 to an up link because link 7

has the shortesL path to the destination (node 7's distance is six, node 8's is eight).

Link 8 remains a down link because its distance to the destination is greater then

seveD (if it were seven it would become horizontal).

To get a je message from its new up link (7), no de 5 sends it an update message.

The role of this message is to change the direction of the links which it traverses7,

7This message is sent on the horizontal or dowll Iink(s) with the shodest path to destination. Consequently, each node which receives it has to have another Iink with a distance smaller than the one of the links which carried the message. Therefore, the Iink receiving this message cannot remain up. See appendix A for details on the content of this message and section 4.4 for details how this message havels in difficult situations.

r ~ l'


update their distances and reach a nodeB which can send back a jc message. Node 7

receives this update message. It reacts to this message by changing the distance on

link 5 from three to nine and reorganizing its links. As a result of this reorganization,

link 5 becomes down and the link 9 (with a dIstance of five) becomes up. Node 7

recognizes that it cannot send a Je message back without help: the new up hnk was

down, so it does not have a JC meliSage. To obtain that message, node 7 sends an

update message on link 9.

Wh en no de 9 receives the update message from node 7, the mformatlOn in the

message causes no de 9 to label link 7 down (the increase is too big to make it hori

zontal). Node 9 loses its juncture statns because it is left with only one up link. Node

9 has to inform the rest of the network of this status change. It sends an anti-fapp

message on the remaining up link (6). Node 9 also takes the JC message from link

6 and sends it on all horizontal and on all down links (here nodes Il and 12). This

message sent to node 7 is the answer to the update message, that node 7 is expecting.

When node 6 receives an anti-fapp message from node 9, it finds the preceding

fapp message from node 9 and removes it. It then sends an anti-fapp message to nodes

3 and 4 since it knows that neither of them are the destination. These nodes also

find the preceding fapp message from 9 and removes it. Then they send an anh-fapp

message to node 1. Node 1 reacts to the anti-message by removing the preceding

message and not sen ding the anh-fapp message to node 0 since it is the destination.

When no de 11 receives the jc message from node 9, it propagates thls message to

Dode 13. Node 12 also receives a jc message from node 9, but its reaction depend!.

on a second message that it gets from link 10. The description of this reaction is

postponed to include this second message.

When no de 7 receives the JC message from no de 9, it sends it to node 5, which

sends it to node 8. The way node 8 reacts to this Je message depends on the effects

of another message no de 5 sent before.

Right after node 5 sends au update message to no de 7, it also sends a quick

cost update(qcu) message to node 8. The role of this message is to inform nodes

8Such anode can only be a juncture (single node or tvo node juncture)

(, ~. CHAPTER 3. ISOLATED LINK FAILURES 33

between a failure and a juncture9 below it that their distance to the destination

has increased due to the failure. For node 8, the distance on link 5 increases from

two to six. If node 5 were to wait until it received the jc message before telling

the horizontal and down nodes about the distance increase, many normal messages10

would be routed on longer paths.

When node 8 receives the qcu message from node 5, it adjusts the distance of

that link and removes the last jc message which came on that link from its internaI

representation. It then discovers that both link 5 and link 10 have a distance of seven,

so both links become up links and the no de becomes a juncture. Note that node 8

does not have a jc message for either of its up links, so it cannot tell if it is a full or

partial juncture until those messages come, but it can still route normal messages to

the destination as usual. Node 8 also sends the qcu message on all the links which

were down or horizontal before the qcu message from no de 5 came (here link 10).

Node 10 receives that qcu message. In response to that message, node 10 adjusts

the distance to the destination on link 8, reorganizes its link and sends a qcu message

on link 12. As a result of the reorganization, node 10 discovers that now link 8, with

a distance of eight, is down and that link 12 with distance of six, is up. Since link 8

is down, node 10 removes the last je message which came on link 8 from its internaI

representation.

Node 12 receives one message on each of its up links. The first message, mentioned

before, is the JC message from node 9. The qcu message from node 10, is the second

message. These two messages put no de 12 in a special situation. They each bring

information about a different impact of the failure and depending which one gets to

node 12 first, the sequence of events will differ. This is called a delay dependent

situation because the order in which they reach the node depends on how much

they are delayed. The final result is the same but the exact number of messages

transmitted depends on which message reaches the no de first. The following two

9This message stops when it reaches a jundure because the juncture provides another path with the same distance. On the other hand, any node which bas a link with a distance smaller then the one provided by the link carrying the qcu me"age still sends a qcu me"age to its down links, but this messAge is different from the one received. It refteds a path ta destination going through the link with the smaller distance. See appendix A for details on the content of tbis message.

lONon-control messages, ie user's messages


descriptions show what happens in each case .

• Node 12 receives the je message from node 9 first. It compares the message's Fj

with the Fj of the last je message on link 10. Since they are different (link 9'5

message has Fj = 1, link 10 has Fj = 2), no de 12 keeps its full Juncture status.

In reaction to the message, node 12 builds a new je messagc, but it does not

send that message to node 13, since this new message is identical ta the last

one it sent. However, despite the fact that the juncture status has not changed,

no de 12 still sends a fapp message on aIl up links, because the failure Ilught

have changed the group of nodes that receives node 12's fapp messagc

When no de 12 receives the qcu message on link 10, it labels the link down, sends

anti-fapp message on link 9, and sends a new jC message to nodes 10 and 13.

The anti-message will rem ove the fapp message at nodes 9, 6, 3, 4, and 1.

• Node 12 receives the qeu message from node 10 first. It adJusts the dIstance of

that link and discovers that that link should be labeled down. This implies that

the node is no longer a jundure. Node 12 needs to tell the rest of the ndwork

its new status, so it takes the je message from the other up node (9), and sends

it to node 13. It also sends an anti-fapp message on link 9. Node 9 is the only

no de with fapp messa.ge from no de 12, so the anti-message stop there

When no de 12 receives a JC message from 9, it sim ply sends it to nodes 10 and

13.

In both cases, nodes 10 and 13 receive the JC message only when node 12 has

received the qcu message.

When node 10 receives the jC message from no de 12 it sends it to node 8 When

node 8 has received the jc messages from both of its up links, it discovers that it is a

partial juncture.

Node 13 is also in a delay dependent situation. Node 13 is a jundure which

gets information about the failure from its two up links. As in the case of no de 12,

the no de status and reactions are temporarily different depending on which of the up

links gets the message first. However, this case is slightly different from the case of

( CHAPTER 3. ISOLATED LINK FAILURES

Il ~ _e_ve_n_t 4 1-'1i":-' n"""k-I __ I-_se-,Q:....u-rT-c_e_l_""'lin-:""k id 1/ 1 1 FJC( Fj = 1) , '1 ~ full jundure a .

Il 2 l ' FJC( Fj = 9} '1

~ull jundure 'Y

Il 3 1 - 1 FJC( Fj = 1) Il

partiât jundure ï3

Il event llink 11 seQUrce 3 link 12 ,1

Il l FJC( Fj = 1) '1

partiât jundure ï3

r 2 1 FJC( Fj = 1) l '1 partiât jundure 6

Il 3 1 FJc(FJ = ï) l '1

.. partial jundure &

Il event llink Il seQUrce 2 Iink 12 ,/

Il l , FJC( Fj =~

partiaI jundure (3 ~

Il 2 , FJC( Fj = 1) '1

partial jundure 6

Ir 3 1 FJC( Fj = 1) , '1 1/ partial jundure E:

" event llink Il seqUrce 4 link 12 Il

Il 1 1 FJC( Fj = 1) , '1

full jundure a .

Il 2 f 1 FJe( Fj - 1) '1 partial jundure (3 .

Il 3 l '1 Table 3.1: Possible sequences of event at Dode 13

35

node 12. With Dode 12, one liDk brings a qeu message which causes the reorganization

of its links, and the 10ss of its juncture status. Node 13 only gets je messages, but

there are three different messages which can reach no de 13 at different times, and

depending on the order in which they arrive the no de reacts differently. Table 3.1

illustrates the various sequences of events. One factor limits the number of possible

sequences of event. Node 12 sends a je message to node 13 only after it has received

the qcu message from node 10, so only four of the six sequences can take place.

The table entries have two components. The first line shows to the message that

came on the link corresponding to the column where it is located. The second line

shows the status of the node aiter the message have been processed a.nd the symbol

which refers to a note of explanation. The following list of notes group them together

to explain their common features. Note that before node 13 receives the je messages,

the Fj from link 11 is 9 and the Fj from link 12 is 12. Node 13's fapp messages have

reach nodes 10, lI, and 12.

a Node 13 remains a full juncture because the Fj(l) of the new message is different

from the Fj(12) from the message received on the other link. Node 13 has to


send a new fapp message on all up links to reach nodes that might not know

that it is a full juncture. The new message will reach nodes 1, 3, 4, and 6 in

addition to nodes 9, 11, and 12, because node 9 is not a full juncture a.nymore

and the fapp message does not stop there anymore.

f3 When node 13 becomes a partial juncture, it sends a papp message on all of its

up links. This message will replace the fapp message sent before. It will reach

nodes 9, 11 and 12.

"'( When Dode 13 receives a second Je message which makes it preserve its full juncture

status, it sends a fapp message on ail up links.

6 Node 13 receives a JC message which has a Fj different from the one on the other up

link. This normally means that the node should becomes a full juncture. How

ever, no de 13 recognizes that this situation would lead to a juncture status

oscillationll • Consequently, it remains a partial juncture and do es not send

any message.

e The second JC message makes node 13 become a partial juncture again.

In all cases, node 13 ends up considering itself as a partial juncture and sends papp

message on all up links. There is no formal proof that this will work for all cases, but

for all tests presented in chapter 5 no problem were found. These tests are performed

on a graph representing the most important nodes of the Arpanet and they involvc

many different failures.

Figure 3.8 shows the network configuration aCter the failure.

Table 3.2 summarizes the message exchange during the example.

11 A message that would cause a juncture to change its status from partial to fuU junct ure and then to partial jundure again. See page 39 for details

( CHAPTER 3. ISOLATED LINK FAILURES 37

U.e , • 7 • Il

• T 2 • t , Il r. III " 12 11 7 6 D D 11 D U U D U D U D D U U

n P~IL .. .. 3. u .. ~ li .. 311 Q'; 1 · • 1 UP~ • . · · ft A~ " . . • PJ~ 1 . fi l'! 2 · . · 1>0 FJ!, 1 . . U A~'2 · . · U l'J~ 1 · ". A~12 • . " PJ; 1 . n l' •• . . 112 1"" . . -' ".e 10 Il 1" 13

12 • 13 li 13 10 Il 12 11 D U D U D U U U U

• 2 Q<; • . ., FJ!, 1 · .. ta FJ: 1 . fi .. ~ . .. . tG .. !~ l''J~ 1 · . f>2 l'! 3 FJ: 1 .. .. . U PJ; 1 P!3 · .. U 1"13 · .. . "II 1'!3 · 60 l' •• . &3 1''-· ..

Ta.ble 3.2: failure link 2-5 .

c


de.llll.uall

Figure 3.8: After failure on link 2 - 5

(

Chapter 4

Strategies

This chapter presents various approaches which have been used to limit the number

of messages transmitted. It also presents the reasons why these approaches work. At

the end of this chapter a table shows the improvement that each strategy provides.

4.1 Fighting an illusion

There are situations which cause many messages to be exchanged in valD. One of

these situations is called a juncture status oscillation. This name describes a

partial junct ure node w hich oscilla tes once (partial juncture -> full junct ure -> partial

juncture). This situation arises when anode above the split point matching a pa.rtial

juncture sends a new Je message. In addition, the message's Fj must he different t'rom

the one contained in the last Je message tha.t was sent. Wh en the partial juncture

matching the split point receives the new Je message on one of its up links, it considers

itself to be a full juncture since the other up links still have Je messages with different

Fj. Consequently, it sends fapp messages on its up links and a new je message on

down links. When the second je message arrives at the juncture, the juncture learns

that its status was not full but partial, so it sends papp messages and another Je

message.

For example, in figure 4.1, node C sends a je message with a new Fj because of

the failure. The message reaches node G on one of its up links first. This link then

39

CHAPTER 4. STRATEGIES 40

DESTlNA'nON

:' , , '... " " ,...... ,

,,1 ... ,

" ,

Figure 4.1: Example for juncture status oscillation

has the new Fj and the other link has a Fj equal to C, so node G considers itself to

be a full juncture. It sends a new Je message down to link H and fapp messages to

nodes E and F. When node G receives the second Je message, it discovers that it is

a partial juncture, so it sends another Je message to node H and papp messages to

nodes E and F.

The first Je message that node G sent could have been skipped, since only the

second one reflects a persistent change. The nodes which receive the app messages

have the same information about the partial juncture as before they received the two

series of messages. So the juncture, does not need to send the fapp messages and

papp messages as they only reflect a temporary illusion.

One way to identify this illusion is to ruscern the JC messages which caused the

problem from other Je messages. When a partial juncture receives a je message with

a Fj different from the one of the previous message, it looks for the split point in the

new message's Pj list. If this split point is in the list, then the other up links will

receive a je message with the split point in the message's Pj list. Therefore, the node

will remain a partial juncture. Once this situation is identified, the juncture neither

changes its juncture status nor sends fapp messages. It waits until all up links have

received their new je messages to send down a new je message.

4.2 Shorter je messages

In [Tsu87], the Je message has the following structure:

•

( CHAPTER 4. STRATEGIES 41

• destination - node id of the destination (if more than one destination)

• Fj element - the node id of the last full juncture encountered (or the first node

after destination)

• Pj list - a list of pairs of the form:

(node id of last split point, distance to destination at split point)

The distance field among the elements in the Pj list is used to inform partial

junctures how far their papp messages should travel. When anode discovers that its

juncture status is partial, it looks at the Pj lists of the Je messages it received on its

up link. The first Pj element which is common to aU of these lists was added to the

list by the matching split point. This element contains the split point 's node id and

its distance to the destination. The partial juncture subtracts this distance from its

own distance to the destination in order to obtain the distance to the split point.

In our approach, the distance field was eliminated from the Pj element. This

modification decrease the space used by the Pj list by 50 % (if the distance uses the

same space as the no de id.) In a jc message with one Pj element this represents a

reduction of 25 % of the size of its equivalent in the Tsuchiya approach. As the number

of Pj elements increases, the reduction approaches 50 % (limn ..... oo t:2: = 50%). As a

result of the use of the shorter Je message, the total bandwith was reduced by 10 to

25 % during tests. l .

There are two ways to avoid using the distance. If there is a common Pj element,

the juncture puts the node id of this Pj element in the papp message at the stop

field. This field allows the message to stop one link away from the split point (the

common Pj element's node id is the split point's node id). If there is no common Pj

element, the number of Pj elements is used as a measure of how far the papp message

should go. To see how this works, consider the situation in which a partial juncture

would find no common Pj element.

In figure 4.2, no de F receives a JC message with the Pj list [Pj(A),Pj(C)] on link

lThe Je me$$age becomes the largest message when there is more than three Pj elements in the list. In addition, these messages are sent frequently. Every linlt carries one during the setup, and the y represent around 15 % of the messages sent in reaction to a failure

t t t, t-'

~ t [

~ 1 l f!" ,-~ l

~ 1 • . '

f. t

:"

; ~ ,

'.

'"//Ir" CHAPTER 4. STRATEGIES 42

DESTINATION

Figure 4.2: Short juncture message - first example

E and one with [Pj(D)] on link D. Node D removed Pj(A) because no de D was a

partial juncture matching with the split point A. This is the same no de A that is the

split point mat ching node F. In general, a partial juncture does Dot find a common

Pj when there is another partial juncture above which matches the same split point.

This partial juncture might not find a corn mon Pj, but this would imply that therc

is yet Mother partial juncture matching the same split point. There must be partHlI

juncture ab ove the other partial junctures which does find a corn mon Pj clement.

When a partial juncture finds no common Pj, it does not use the stop field in the

papp message. Instead, it uses another field called the Pj count. For each up link,

the juncture counts the number of Pj elements in the Je message from that link, puts

this number in the Pj cou nt of a papp message, and sends the message up. Each split

point which receives a papp message decreases this count. The message stops when

the count gets to zero. For example, no de F sends a papp message to node E with a

Pj count of two, and another one to no de D with a Pj count of one. These messages

stop when they reach node A.

There is one minor si de effect to this use of the Pj coun l. The papp mes.r;age

can, in certain cases, stop before it reaches the split point. This si de effect does not

have major consequences, bec au se the nodes still receive another papp messa.ge with

a shorter distance to the destination. 2 For example, in the network of figure 4.3, node

2The reason for send,ng papp meuage8 above a partial juncture is to guarantee the providmg of a distance on dow!\ links of nodes which have a partial juncture below.


Figure 4.3: Short juncture message - second example

13 sends a papp message to nodes 11 and 12. The message for node 11 has a Pj count

of two and reaches node 1. The message for no de 12 has a Pj count of one and stops

when it reaches node 5. However, node 12's papp message reaches node 3, providing

node 3 with a distance on node 3'5 down path.

One drawback to this approach is that, when there is no common Pj element at the

partial juncture, the papp message reaches the split point. However, this drawback is

compensated for by the reduction in the Je message's size.

4.3 Stopping messages

Another way the number of messages transmitted can be reduced is by stopping them

as soon as possible. This section presents two types of conditions to tell whether or not

a message should be propagated further. The first type uses knowledge of a nodes's

status to make this decision about app messages. The second type uses information

about previous messages to make this decision about Je messages and app messages.

A papp message ca.n stop one link away from a full juncture since this message

does not provide an alternate path for this juncture. The path that the papp message

refers to leads back to the juncture. If this is not the case, the partial juncture which

sent the message would be a full juncture.3 Anode can tell if the next no de is a full

juncture because if it is, the Fj part of the je message from that node will contain

the juncture node id.

JOne of a juncture's up links would have a Je meuage with Fj equal to the full juncture's node id, and at least one other up link would have one with a different Fj.


On the other hand, the papp message has to go beyond a partial juncture because

there are cases where, if the message were to stop at the juncture, sorne nodes would

not receive any papp messages.4 For example, in figure 4.3, if the papp message were

to stop at node 10 because it is a partial juncture, node 3 would not receive any l'app

messages.

The reason to stop fapp messages at full Junctures IS not as strong than the one to

stop the papp message. The fapp message can stop because it only provides a longer

path to the destination than the full juncture's alternate path. Thereforc, it 18 not.

necessary to send a fapp message that far. However, a fapp message reaches a full

juncture to bring (l, distance on the juncture's down lmk which receive8 the message.

Of (,,:mrse, no app messages should be sent to the destination sin ce it 18 the functlOn

of these messages to tell nodes alternate ways to reach the destination The next

paragraphs show how information about the previous message can be usecl to take

the decision whether or not a message should he {urther propagated

Anode should not send a JC message which is identical ta the last one it sent

This new message would not bring any additional information to the nodes helow

and the app messages which could come back would not reach a greater number of

nodes then the messages which were sent before. There are two cases where the two

Je messages could be identical. The nrst case is when there is a failure 011 an up link,

but despite the failure, the node keeps its juncture status. The second is when anode

recelves a jc message with a Fj different from the one of the last message received,

but keeps its jundure status in spite of this.

A split point can receive identical app messages on different clown links When a

no de has sent t.he app message on the up links, there is no point for sencling it again.

Therefore, when a split point receives an app message, it checks whether or nnt 1t has

received an identical message. If it has, the app messa.ge should Ilot be propagated.

One of the reasons that the node keeps the last app message it received is 50 that It

can compare it with the next one.

4Without an app meuage a 1I0de assumes that the distance 011 a down Imk i~ equal to infinity


DESTINATION

Figure 4.4: Guiding update message

4.4 Guiding update messages

This approach does a little more than save messages; it prevents the network from

choosing the wrong path to the destination in certain situations. This wrong choice

seems to he temporary, but the occurrence of another failure during that period

could le ad to problems. For this reason, the improvement in performance due to this

approach is not presented in a table at the end of this chapter.

The choice of the path that an update message follows should be made by the

node which "suffers" the failure, because anode helow the failure point could choose

a path w hich is effected by the failure. It is not enough to send the update message

on the link with the shortest distance as the following examples show.

Consider figure 4.4, where node 12 is a full juncture and node 9 is a partial

juncture. During the setup, node 1 receives a fapp message with origin no de 12, but

do es not reeeive a papp message from 9 because Dode 1 is the split point for that

partial juncture. In the case where there is a f&ilure on link 1 - 0, node 1 sends a

update message to node 7.

If node 7 were to ehoose on which link the update message should be sent, it would

select link 9 beeause its distance is smaller, but this path is also eft'ected by the failure

and when tbe news of the failure cornes the distance will become larger. However if

the failure was on link 4 - 1, and node 7 were still ask to choose on which link to

send the updaie message, then link 9 would be the right link to ehoose, sinee it oft'ers

r ~ ~ , f l' ' .....

~

f ~ ~ " r f ~

'.


the shortest path and this path won't be efFected by this failure.

The node which "suffers" the failure already knows the distance of the path which

should be used. In the case of failure 1 - 0, node 1 did not receive the papp message

of no de 9, so it does not have the distance associated with the path going through

node 9 and it only knows of the distance through node 12 In the case of failure 4 -

1, node 4 did receive the papp message of node 9 and thls path is the shorter way to

reach the destination (six through node 9 vs eight through node 12).

One way to get the update message to use the shorte&t path IS to "guide" It by

putting the distance of the selected path in the message. In this way, anode below

the failure can select the right path.

4.5 Improvements due to the strategies

This section presents three tables which summarize the improvements due to each

of the strategies introduced in this chapter. These tables contam results from two

graphs: the basic network of figure 4.5 and a network representing essential nodes of

the ARP ANET of 19835 • The tests on the basic net work in volve the failure of link

2-5, and the reaction of each no de seeing all the other nodes as a destinatIon. The

results are averaged by the number of nodes. The ARPANET example used here,

shows the number of messages and the number of bytes sent in reaction to a failure

on the link between ARPA-USC, using MITRE as destination.

14 nodes basic network 98 nodes ARP ANET net work setup failure setup failure

continue 39 23 356 212 stopping 34 21 280 206

Table 4.1: Number of messages sent

Table 4.1 compares the number of messages sent when the messages stop as early

as possible to the number of messages sent when messages are propagated further.

Table 4.2 contrasts the total number of bytes sent using the short Je messages and

Tsuchiya's Je messages.

5This network is considered in details in the next chapter.


Table 4.3 shows the result of the reaction to a failure on link ARPA-use in the

ARPANET network with no de MITRE as a destination. The improvement due to

this strategy in the basic network is not shown, because none of the ex amples selected

contained a juncture oscillation.


Figure 4.5: A basic network graph

14 nodes basic network 98 nodes ARPANET network setup failure setup failure

using Tsuchiya's jc msgs 292 113 3513 2139 using short jc msgs 258 103 2671 1756 improvement 11% 9% 24 % 18 %

------

Table 4.2: Total number of bytes sent in all messages

failure ignoring jc oscillation 251 avoiding je oscillation 206 improvement 18 %

Table 4.3: Number of messages sent

(

(

Chapter 5

Performance

In this chapter, we examine the performance of the proposed failure tolerant ADR in

the {ace of failures. We organize our findings the following way. The first section of the

chapter introduces the network used for the performance analysis and the simulator

that per{orms the test. The second section considers the effects of a single failure on

98 different destinations. The third section investigates how the presence of many

destinations affects the way in which one node reacts to a failure. The fourth section

studies the impa.ct of simultaneous failures. The last section presents an example of a.

small network with a failure on a single link. It compares two reactions to this failure:

Garcia-Luna-Aceves's approach[GLA88b] and the ADR approach.

The system performance presented in this chapter was evaluated using the network

of figure 5.1. This figure represents 98 major nodes of the ARPANET in 1983. This

network was chosen since it is of a reasonable size and the connections do not suffer

from an artificial symmetry (as opposed to a mesh or other geometrical networks).

In addition, this ex ample constitutes a realistic situation for the application of such a

system Out of the 98 nodes, the following were selected as destinations for the tests

which were performed on a small number of destinations:

BERK PURDUE ReCs

UCLA COLLINS ANDREW

MIT44

MITRE

These destinations were selected because they represent the different sections of

the network. They are far apart in order to represent as wide a range of nodes as

49

"'%j ~ . ... ('D

t11 ....

~ ~

_ S"'HIIII C'~CUII

• C/JO'"' o JIIIAC b cne 'AC

ARPANET/MILNET GEOGRAPHIC MAP, DECEMBER 1983

_on '"IS •• 'OOU"Of SIt""A.'A.~t:k".t.(II1Al SAUlt.Tf CON,,(enONS "_SSMOWIOA~t '.'"".5 "01 '"lCtSSAl'lYIM051 " .... ts

(

~ '1:!

~ ~

~ "zj

~ ~ ~ Q ~

t11 CI

(

CHAPTER 5. PERFORMANCE 51

possible.

We chose links to fail which tic together large sections of the network. This way,

the tests reveal if the system reacts to failure in critical situations. The tests described

in this chapter, involve f&ilures on the following major links of the network:

UTAH-LBL RCC5-UWISC HARVARD-SCOTT

STANDFORD-ISI22 ARPA-USC MITRE-GUNTER

Thefailures on links UTAH-LBL, RCC5-UWISC, HARVARD-SCOTT, ARPA

USC and MITRE-GUNTER were chosen because tlley lie on the back-bone between

the east and the west. The STANDFORD-ISI22link was chosen bec au se it connects

the north-west area with the the south-west area.

This simulator is discrete-event stepped. It uses absolute time units which are

not based on speed of lines and other physical measures. These units only reflect

the relative order of magnitude of the time taken by each operation anode performs.

The simulator assumes that it takes 2 time units for anode to interpret an incoming

control message and adjust its representation of the network. Then it takes 1 time unit

for each message the node builds and sends to its neighbors (sen ding time includes

the time to cross the link). If the receiving node is already busy processing or sending

another messages, the incoming message waits until it is !tee. The fnct that it takes

a fix time to react to a message makes the sim ulator deterministic.

The simulator sends only control messages. A more complete analysis should

also include normal messages. We only attempt to estimate the number of control

messages required with the amount of time required to process them. While this is

not an "ideal" measu. e of the algorithm performance, we should note that the absence

of normal traffic is not eatastrophie sinee in an operational system, control messages

would be handled as soon as they arrive on the queue.

Another limitation of this simnlator is that it doe not handle ho node partial

juncture. This feature was not included because its implementation would have

required a considerable time. The system in its present forms can provide us with

significant results.

For the simulation, the address of anode takes two bytes, and the distance is

stored in at least 10 bits. The size of the messages sent, and the storage space

......


allocated inside the node reflect these dimensions. The exact contents of each type of

message is described in appendix A. Appendix C shows the structure of the node's

buffer. This buffer contains all the information anode maintains about the network.

Appendix B contains pseudo-code which gives an outline of the way anode reacts

to the different types of messages. This pseudo-code does not present, in detail, the

special cases which are covered in the actual code of the simulator.

5.1 A single fail ure

The impact of a failure depends upon two factors. These are (1) the number of nodes

which change their role because ofthe failure (split points which stop being split points

and junctures which change their status), and (2) the number of nodes which depend

upon these junctures and split points. For the same link failure, these parameters

vary according to the destination. For example, a failure on the link between MITRE

and GUNTER causes 4 messages to be exchanged for the destination PURDUE and

144 for COLLINS.

Figures 5.2, 5.3 and 5.4 contain three graphs which summarize the impact of a

failure on the ARPA-USC link for each of 98 destinations. Each graph presents the

number of destinations where a specifie parameter faUs into a given range of values.

The three parameters represented are: the number of messages sent, the total number

of bytes in the messages, and the time elapsed between the failure and the network

stabilizing. The following describcs how the graphs should be read. A bar up to y

destinations, at position Z2 (on the x-axis) means that there are y destinatIOns which

use more than Zl and at most 2:2 "resources" in reaction to the f&ilure. For example,

in figure 5.2 the bar up to the line indicating 25 destinations at position 50 on the

x-axis means that there are 25 destinations which use more than 25 and at most 50

messages in reaction to the f&ilure.

The graph in figure 5.2, shows that the number of messages required to react

to the failure is not too large. For the majority of destinations (62 %), at most 50

messages are sent. The average number of messages sent is 75. It is mstructive to

study the destinations which involve more than 400 messages. Thcse destinations

(

(

CHAPTER 5. PERFORMANCE

30

25

20

number of 15

destinations

10

5

o o

1 1 1 1

1 1 1 1 1 1 1_

100 200 300 400 500 number of messages used to react to failure

- 1

-

-

-

-

-

600

Figure 5.2: Messages sent in response to the ARPA-USC failure

messages sent for destination setup {allure

use 312 501 CIT 289 440

RAND 300 584 ARPA 283 591

ARPAI06 267 244

Table 5.1: Number messages send for destinations close to failure

53

have a point in common: they are very close to the failure (one link away from the

failure). This fact could be exploited. Since most of these nodes end up sending

more messages than during the setup phase (see table 5.1), it would be interesting

to identify this situation and initiate a setup procedure. U sing the setup procedure

for all destinations within two links of the failure would decrease the average number

of messages sent to 71. If the setup procedure were used when it takes less messages

than the normal failure procedure, the average would be brought down bring down

to 69 messages.

Figure 5.3 shows the distribution of the total number of bytes sent in response to

the failure. The majority of destinations (65 %) require at most 360 bytes to react to

OHAPTER 5. PERFORMANCE 54

30 , , 1 , , , , 1 1

25 -

20 -

number of 15 -

destinations

10

5

1 1 1

-

0 1 1 1

0 120 240 360 480 600 720 840 960 1080 1200 total number of bytes sent in messages

Figure 5.3: Bandwith used to respond to the ARPA-USC fallure

the failure. Bowever, the average number of bytes sent is 615 bytes. This is again due

to the small number of destinations close to the failure which send an excessively high

number of messages. Still, this average represents only 1.07 % of the bandwidth on a

network with 56 Kb line. U sing the setup procedure only when it takes less messages

than the normal failure procedure would bring the average down to 566 bytes.

Figure 5.4 shows the distribution of elapsed time between the failure and the time

at which the network reached stability. Fifty-five destinations take at O105t 27 tirne

units to reach tbis operating point.

5.2 Mutual influence of destinations

The presence of multiple destinations can cause two type of interference. Messages

can interfere either with the speed at which the network adapts to a failurc or with

the actual way in which the system reacts. This section is concerned with both types

of interference.

In the previous section we pointed out that the impact a failure depcnds heavily

c

(


25

20

15 number

of destinatiYBs

5

o

f-

o

1

9

1 1 1 1 1 1 1 1

-

-

-

-

I J

18 27 36 45 54 63 72 81 90 time to react to (allure (iD time units)

Figure 5.4: Time required to respond to the ARP A-USC failure

55

on the destination. The following example combines six destinations1 in order to

decrease the weight of an individual destination in the result. The destinations are

combined as follows: a result for N destinations is the average of the results from aU

combinations of N destinations taken out of the six destinations. Using this method,

an example with six destinations requires 63 tests, while one with eight requires 297

tests. Consequently, the number of destinations considered is smaU.

The graph in figure 5.5 shows that the time required to adapt to a failure increases

with the number of destinations to which anode can route messages. This time

increase is caused by the fact that when there is more than one destination, a message

with information pertinent to one destination might need the attention of anode

at the same time as a second message which is pertinent to another destination.

Consequently, one of the messages has to wait. An occasion for waiting presents itself

as soon as the failure occurs.

When a node sees one of its links fail, it sends messages to its neighbors. For each

destination that the Dode cao route to, the node sends a different set of messages.

1 Berk, Purdue, RCC5, UCLA, Collins and MITRE

. U-


65

60

55

50 time

45

40

35

30 1 2 3 4 5

number of destinations

Figure 5.5: Time to react to the ARPA-USC failure vs numher of destinations

Each set of messages carries information about the change in the representation of the

network associated with one destination. When there is more than one destination,

the node sends all of the messages associated wlth one destination before sending

the messages associated with another destination. This is the first factor which COII

tributes to the increase in time before the network stahilizes. The second factor ilt

that with additional destinations, many messages travel in the same directIOn, so

there is more chance that a message must wait for a busy node.

These two factors contribute to produce the increase in time as the numLer of

destinations grow as depicted in figure 5.5 In this figure, the presence of a second

destination increases the time hefore the network reaches stability by 10 time unitlt.

With the addition of more destinations, the time increases is only 5 or 6 time units

per additional destination. This difference ("an he explained by the fact that with two

destinations, most of the waiting for busy nodes occurs sequentially; with more than

two destinations a portion of the waiting time can occur in parallel. The fad that

after the second destination, there appears to he a regular slope ln the time increase

is encouraging. Su ch a slope would keep the progression withm reasonable bounds .

1.

\ CHAPTER 5 PERFORMANCE

550 ,.------,--------y-------r-------., 500

450

400

350 messages

300

250

200

150

100 ~----~----~-----~-----~ 1 2 3 4 5

number of destinations

Figure 5.6: Number of messages sent in response to ARPA-USC fallure

57

Figure 5.6 shows the influence that the number of destinations has on the num

ber of messages needed to adapt to a f&ilure on the ARPA-USC link. This graph

indicates that the number of messages sent increases almost linearly with respect to

the number of destinations. The increase in the number of messages is not perfectly

linear because when a message is delayed, a juncture can sometimes react as if the

message was not going to come. This situation is correeted as soon as the message is

reeeived, but this requires sorne addition al messages.

The presence of multiple destinations appears to inerease the number of messages

linearly. The time to react to failure appears to increase by a. constant factor as

additional destinations are added, sinee this constant is rather small, the time increase

is rather slow. Overall, it appears that the mutual influence of multiple destinations,

when a failure oecurs, is not an obstacle to the applicability of the proposed approach.

r f-t

.J'


500 ,--------r-

450

400

350

300 messages

250

200

150

100

50 ~-------~--------~--------~--------~ 1 2 3 4 5

number of faiJures

Figure 5.7: Number of messages sent in response ta the number of failures

5.3 Multiple failures

58

This section studies the impact of multiple, simutaneous failures on the performance

of the proposed system. We consider up ta five simultaneom. failures and observe the

increase in the number of messages and the time to adapt ta failures.

As in the previous sections, ta decrease the importance of any single result, each

point on the gl'aphs of figure 5.7 and 5.8 represents the average of many results. Fust

five major links were selected for the failures2 • Then the faHures were combined, as

in the last section. The result for N failures is the average of the results from aU

combinations of N links taken from the five possible link failures. In addition, each

set of failures is observed from five different destinations3 •

Figure 5.7 shows the influence that the number of failures has on the ilumbcr of

messages needed to adapt to these failures. The increase in the number of failure is

aImost linear with respect to the number of failures, The incrcase in the number of

messages is not perfectly linear because when a message is delayed, a juncture can

2USC-ARPA, RCC5-UWISC, HARVARD-SCOTT, ISI22-STANDFORD, UTAH-L8L 3PURDUE, COLINS, RCC5, MIT44, HERK

(

( ..


80

75

70

65

60 time

55

50

45

40

35 1 2 3 4 5

N umber of failures

Figure 5.8: Time to respond to multiple failures

sometimes react as if the message was not going to come. This situation is corrected

as soon as the message is received, but this requires sorne addition al messages.

This factor also has an impact on the time required to respond to the failure, as

shown in figure 5.8. Again, the increase in time with respect to the number of failures

exhibits similar behavior to the increase with respect to the number of destinations.

After the second failure, there appears to be a regular slope in the time increase.

required by the network to stabilize.

Overall, it appears that the proposed system per{orms reasonably well under mul

tiple simultaneous failures. If the trend shown in this results continue as the number

of failure increases, these failures should not be an obstacle to the applicability of the

system.

5.4 A comparison

This section presents an example of a failure on a network with a single destination

(A), and shows how Garcia-Luna~Aceves's algorithm and the proposed approach react


-' ... 2

_.

Garcia

Symbols A double clrele mdJC:ates that the no de IS Crozen D ~ B lnwcates that B 15 the succelsor oC D (dJslance,synchroruzation ftag) - an update message ack(da.tance,~ynchrolUzatlon fiag) - an acknowledgment The synchroruzahon fiag IS set ta 1 when the node wluch sent the message Il Crozcn, otherWlse Il 15 0

"f't'

",,. .. ,11

. ... ' ADR

A-full - anll-full app message

.", '

....

/Ii, ".11 1.

·or·

Papp(ongm, dlstau('e) - partl"l app me.l"g" Jc(Fj element, Pj hlt) - juncture message

tlpd(dastance) -- an update message wlth distance = the shortest daslrmce tQ destmatlOn

Figure 5.9: Gareia's approaeh - vs -- ADR approacb

60

to this failure. This eomparison does not account for the cost of setting up the

network. It starts from a network which has already reached stability. To simplify

the example and provide an easy framework for comparison, the example is first

presented as if both algorithms behave synehronously4 TheIl a few observatIOns arc

made about how both algorithms would react in an asynehronous cnvironlllcnt.

The left half of figure 5.9 illustrates how Gareia's approach5 worh Table 5.2

con tains the steps of both approaches. When the lmk A-B fruls, node B freezes

4 At each step, a Dode receives and processes the messages sen t by its nelghbors duung the previoulo step. ft then sends its own messages which are gomg to be recdved at the next step.

5Using the algorithm described in [GLA88bJ.

(


1 .tep Garcia ADR 1 node B freesea and .enda update to D " E node B .end. update to B " D 2 node D ~ E freese and send update to B ~ G node. B Il; D und update to G 3 node G freue. and und. update to l, D " E nodes G .enda update to 1

no de B .end. aek for updates from D " E 4 node G ree:elves aek for Ita update nug node.1 .end. Je: m.g to Gand antl-full app msg to H 6 bee.u.e of thll node G unfreueI, nodes G aenda Je m.g to Dit. E

aend. ack D" E and an update to 1 node. H .enda anta·full app mag to F 6 node D unfreeae., .ends an aek to B and an update G node. D ~ E .end. Je m.g to B

node E unfreeaea, .end. an .ck to B and an upd.te G node. F .end. anh·full app nug to C 7 node B unfreeaes" .enda update to D " E no de B .enda partial app m.g to B Il D

Table 5.2: Message exchange ID figure 5.9

since it does not have a feasible successor6 It also sends an update message to nodes

D and E. As soon as they get the message, they freeze and send an update to all

their neighbors (B & G). Node G reacts the same way; it freezes and sends updates

to D, E, and 1. Node 1 reacts differently, it does not need to freeze since it has

a Ceasible successor (node H). It can send an acknowledgement immediately. This

acknowledgement and the messages it triggers unfreeze nodes G, D, E and B. Each

node which unfreezes, sends an acknowledgement to the node(s) which caused it to

freeze, and sends an update message to its other neighbors.

The rest of figure 5.9 illustrates the ADR approach. When link A-B fails, node

B sends aIl update to node D and E, since they become the new up links. This update

message contains the distance to destination A going through the full juncture 1. This

update message is propagated to node G. At this point, node G sends an update

message to node l, because the distance in the message is the distance of the path

starting at link 1 When node 1 receives the update message, it recognizes that it is

not a juncture anymore sin ce one of its up links has become down. To illform the

nodes above of thls change in status, no de 1 sends an ant2-fapp message to no de H.

Node 1 then builds a new JC message which it sends to node G. This JC message is

propagated down to node B. When node B receives both messages, it realizes that it

is a partial jundure matching the split point G. As a result, it sends a papp message

to D and E.

The two approaches use seven synchronous steps (enumerated in table 5.2) to

b A ünk with a distance to destination smaller or equal to the distance through the cunent successor, is called a feasible successor. See 1.2.5 for details.


attain a stable state where no control messages are sent. The similaritJes stop there.

The number of messages is different: Garcia's approach uses 23 messages whereas the

ADR approach uses 15. On the other hand, the ADR's messages are, on average,

longer than Garcia's messages, so that a comparison on the bandwldth is reqlllred.

In this example, assuming that distances and node addresses use l byte of a message

and that the destination's address IS contained in each message, Garcla's approach

sends a total of 49 bytes (46 bytes and 23 bits) whereas the ADR approach sends a

total of 36 bytes

We analyze the storage space required by both approaches to operat.e 1JI more

detail. First, the amount of storage space required, changes with the numher of

destinations. In order to access this, we compare the space reqUlred to store the

information pertaining to one destination (A), and for aIl nodes cODSldered al> desti

nations simultaneously. Second, the storage space depends on partlcularihes of each

approach

In the ADR approach, each no de keeps a copy of the last Je messages and app

messages it has received. The quantity and type of messages that anode keeps

depends upon the topology of the network. Since a failure modifies the topology of

a network, the total storage space of aIl the nodes in the network before the failure

is different from the total after the failure. To reflect this dlfferellce, both totals are

compared to the space requirements of Garcia's approach in this example.

For Garcia's approach, the storage requiremcnts at cach node are the same rtnd do

not vary with the topology. Howcver, to determine thls fixed amount of storage spaœ,

an assumption has to be made about the maximum number of hmes anode can he

in a situation which requîres freezmg. When a frozen node recelves a message WhlCh

would require it to freeze, it adds a bIt to a vector which keeps track of "outstanding"

acknowledgments. In this example, no de G needs to freeze twicc beforc it unfrcczcs,

so two bits are put asîde for each link of each node dIstance table7

Table 5.3 shows the total storage space requirements8 of aU nodes in the ndwork

This result suggest that the ADR approach requires bet.ween two to three tml('s the

1See the description of Jaffe-Moss's approach in section 1.2.3 for details 8 Appendix D shows the details of the calculation to obtain those numbers.

(

(


destination Garcia ADR before fa.ilure

~--------+---~---+----63 bytes 128 bytes A

407 bytes 1106 bytes L-________ ~ ____ ~ __ L-__ _ all nodes

after failure 88 bytes

751 bytes

Table 5.3: Storage space in the ex ample of figure 5.9

storage space required by Garcia approach

63

The next point of comparlSon leads to a clearer conclusion. When the example

is considered without the assumption that the network behaves synchronously, there

is an addltional difference between the two approaches With Garcia's approach, a

node can be temporarily mislead into believing that it has a feasible successor when

it does not. This situation can occur in thIS example, if node G receives its update

messages in an inappropriate order.

If noùe G receives the update message from node D before E, no de G assumes

that E is a feasible successor. Thus it sends an acknowledgment to D immediately

and does not freeze. When node G receives the update from E, it freezes and sends

update messages to all its neighbors. The update message sent, will correct the wrong

distance that was propagated by the acknowledgment. When node G unfreezes,

everythmg will be normal again.

ADR does not suffer from this problem. The reaction of node G does not depend

on the order in which the update messages are received. Its reaction is only etfected

by the distance within the message which "guides" the message to no de r. As a last point of comparison, it should be noted that the ADR approach does

not require freezing. As a result, as soon as anode has learned about a failure, it can

route normal traffic to a destination.

For example, immediately aCter step 2, node D knows that the shortest path to A

goes through node Gand it can route messages accordingly. With Garcia's approach

node B is unable to route normal traffic between step 2 and step 6. Similarly, node

D and E suffer from the same problem between steps 3 and 5.

The comparison of the reaction to failure on the network of figure 5.9 can be

!>Section 4.4 describes this in deta.il.

-


summarized as follows. ADR sends less messages (15 vs 23) and they use less band

width (36 vs 49 bytes) to react to the failure The ADR approaeh appears to reqUlre

between two and three time the storage spaee of Garcia approach. However UliS exam

pIe, shows one situation where Garcia's approaeh is mislead while the ADR approaeh

performs normally. This comparison is l '\sed on only one example of link failure

To obtain more significant results, a large sample of network topologies wou Id be

required. A systematic comparison between the performance of ADR and the other

available approaehes constitutes a topie for further researeh.

(

Chapter 6

Conclusion

The purpose of this thesis was to develop an algorithm to react to link failure in

a network using Alternate-path Distance-vector Routing and study its performance.

The proposed algorithm reacts to link failures in networks using hop count as a

metric. It was sludied using a simulation where link addition were not allowed. In

thls context, it does not appear to introduce the CToo problem to ADR and displays

a reasonable performance.

Our approach started with the original ideas presented by Tsuchiya [Tsu87] and

was refined into the proposed algorithm. New messages were introduced (anti-fapp,

anti-papp and qcu messages). The update procedure was specified in details (see

section 4.4 and appendix B for details). The most commonly 11sed message in the

Alternate-path Distance-vector Routing approach, the je message, was modified to

be shorter. The version of the Je message used in the proposed algorithm is shorter

than the one proposed in Tsuchiya by 25 to 50 %, for messages with non-null Pj list

(section 4.2). In addition, strategies were developed to limit the number of messages

transmit ted (chapter 4).

The resulting algorithm performed reasonably. In a series of tests on a network

representing 98 major nodes of the ARPANET in 1983, an average of 75 messages

are required to adapt to a failure on a major link for a single destination. The total

number of bytes sent in reaction to a failure is on average 615 bytes. As the number

of addressable destinations increases the number of messages increases linearly and

65

, r.: , ,

~.

CHAPTER 6. CONeL US/ON 66

the time to react appears to increase by a small constant. The sarne trend ean be

observed as the number of simultaneous failures increases. there is a linear lIlcrease

in the nurnber of messages sent, and a small, constant mcrease in the time reqmred

to react to the failures.

We also considered an example involving a single failure on a small network of nille

nodes (section 5.4). Using thls example, we compared the response of the proposed

algorithm and that of Garcia-Luna-Aceves [GLA88b]. The proposed algonthm sent

less messages (15 vs 23) and used less bandwidth (36 vs 49 bytes), hut on the othcr

hand, it appears to require between two and three times the space used by the Garna

approach. It should be noted that the comparison only involvcd the reaction to the

failure.

A characteristic of the proposed algorithm (and of the ADR in general) 1S tltat Il

node requires a distinct representation of the network (per "destmatlOn") it nceds to

be able to address directly. The number of messages and the storage spacc useo ar('

proportional to the number of destinations.

There are two major advantages to the ADR and the proposed algonthm ]t. allowb

multipath routmg and each node knows its alternative paths to a destmation hcfore

a failure, 50 when a failure occurs, the noJes can route to a destina.tion as soon as

they are informed of the failure (Other approaches look for an alternative path aCter

the failure. [Hag83], [JM82] and to a lesser degree [GLA88h]1 are example of t}wsc

approaches.). The ability to react immediately to a failure can he very important in

an environment hke the backbone of a hierarchical routmg algorithm, bccausc of the

high volume of traflie going through the backhone.

In an environment where the two advantages mentioned above are Important and

avoiding the CToo is an issue, the proposed algorithm could he used. The hmiting

factor would be the 5torage space and the bandwidth. The other algorithms winch

try to solve the CToo problem, also trade more space and bandwidth to tackJe It

(Jaffe-Moss and Garcia-Luna-Aceves use additional messages, and frc<'zc part of the

network to do it).

There is one environment where the proposed algorithm can be llscd: T&uchiya's

lGarcia take into account sonte alternative paths (feasible succcssors), but he ignore!> others.

(

CHAPTER 6. CONCL USION 67

Landmark hierarchy. It is for this environment that he developed the ADR approach

[Tsu87]. In this approach, each node sees only a few Landmarks and the routing is

done through these Landmarks. Consequently, each node maintains a representation

only for the Landrnarks it sees (one representation per Landmark). This limits the

space and the bandwidth used.

There are number of possible extensions to this research. The rnost important

one is to forrnally prove that the proposed algorithm avoids eToo Once this is

established, the performance of the proposed cllgorithm and of the other available

approaches should be tested on sorne large, representative networks.

The proposed algorithrn should also be extended to allow the addition of new

links at any time (currently new links are only added du ring the setup). It should

also be improved in order to recognize as soon as possible, when part of the network

is disconnected.

Another aspect of the ADR that should be considered is the question of when

to adapt the current representation and when to st art a new setup. As observed in

section 5.1, even with single failure, there are cases where the proposed approach uses

more messages to adapt to a failure than a second setup phase. We suggested sorne

rough criterion2 to identify these cases. This criterion should be refined, especially

for environrnents where multiple simultaneous Imk failures are likely.

The proposed algorithm used a two phase setup: the first phase class;fies the

link of each node as up, horizontal or down, and the second uses these directions in

conjunction with Je messages and app messages to build the nodes' internai represen

tation of the network. A single phase approach would most likely be faster, but the

required protocol would need t,o be developed. This is a worthwhile extension to the

algorithm.

2 AU the cases where adapting to the failure required more messages than a setup, involved a destination which was very close to the failure (less than 2 links away from the failure).

Appendix A

Message description

This appendix describes each message used in the proposed algorithm. In addition to

the fields presented here, each message should aIso include a destmation field whcn

there is more then one node which could be the destination of user's messages 'l'llls

field is omitted here to be consistant with the examples which involved only one

destination.

juncture msg: This message is transmitted along horizontal and down links ft Îs

used to discover which nodes are juncture, if they are full or partial junctllre

and how far they are from the split point. 1t has the following structure'

• Fj element: the node id of the last full juncture encountered

• Pi lut: a list of split points

They are modified at each split points, partial and full juncture.

full app msg: The full alternate-priming path message is translIutted on up links to

tell nodes closer to the destination that: there is a full jundure below them and

how far they are from the destination if they have to use this alternate route.

This message contains the following parts:

• ongin: node id of the full juncture

• dIstance: distance to the destination along the path which goes


68

{

APPENDIX A MESSAGE DESCRIPTION

• last node: each node records the link on which it receive a mes

sage, so implicitly the last Dode on the path is passed.

This way, a message can he routed ta the destination using this

alternate path.

69

partial app msg: The partial alternate-priming path message is transmitted on up

links to tell nodes doser to the destination that: there is a full juncture below

them and how far they are from the destination if they have to use this alternate

route. This message contains the following elements:

• ongm: node id of the partial juncture

• d,stance: distance ta the destination along the path which goes


• last node: each node record the link on which it receives a mes

sage, so implicitly the last Dode OD the path is passed.

This way, a message can he routed ta the destination using this

alternate path.

• stop field: node id of the matching split point

or split count: the number of split points hefore the matching

split point

anti-full app msg: This message goes up to "chase" full app messages and remove

them. It has the same contents as the full app msg.

anti-partial app msg: This message goes up to "chase" partial app messages and

remove them when they are no longer accurate. It has the same contents as the

full app msg.

update msg: This message is transmitted along horizontal and down links. It tells

a "lower" node that is now "above" the node which sent the message. It also

ask the receiver to send back a JC message. This message has the following

contents:

APPENDIX A. MESSAGE DESCRIPTION

• dlstance: guiding distance which indicates which paths the up

date message should follow (see section 4.4 for details).

70

quick cost update msg: This message is transmitted on horizontal and clown links

It goes "down" from a failure to tell nodes that the distance to the destmation

on the path beginning with the link on which they receive the message has

increased. This message precedes the Je message after a failure to re-route the

traffic away from the failure into faster route This message has the followmg

contents:

• distance: new distance to the destination on path which starts

with the link which receivecl the message.

ask for qcu msg This message is transmitted on down links. It is sent only to obtain

a new distance for a down link which receives a Je message 011 a clown llll!,. This

message does not contain any information other than its type ident lflcatlon.

(

(

Appendix B

Pseudo-code

To compute juncture status:

take the je message on aIl up and horizontal Iink

1f the y aIl have the same Fj

the node is a partial juncture

else

it is a full juncture

To build full app msg .

take distance to destination on each uptree Iink

add it to each uptree Iink cost -> aIt Distance

status <- full

Origin <- node id

To build partial app msg (with or without common Pj element):

take d1stance to destination on each uptree link

add it to eaeh uptree link cost -> aIt Distance

status <- full

Origin <- node id

if there is a common Pj element (split point)

stop (- common Pj element

oIse

71

APPENDIX B. PSEUDO-GODE

/. the message is different for each up links./

split_count <- # of Pj element in the jc_msg received on that link

EVENT: no de starts the setup being a destination

send jc_msg(dest id,dest id, []) on all links (aIl 11I~S are dovn)

EVENT: no de receives a jCJIlsg

if this message = last msg received

exit receive jc_msg /. do nothing./

else

store it and continue with the rest of the code

if it came from destination (Fj = dest)

put node id in the Fj part of the new Jc_msg

record the fact that th1S link lead to dest

if the mag came from a full juncture (Fj = link 1d)

record the fact that th1S link lead to full juneture

if it came on hor1zontal link

if aIl up links have received a jc_msg

compute juncture status

send appropriate app_msg and jc_msg

else

wait

if it came on the single uptree link

{

if # horizontal link > 0

{

if horizontal link(s) have receive je_msg

compute juncture status

sand appropr1ate app_msg and jc_msg

else

wait

7'2

(

•

APPENDIX B. PSEUDO-GODE

)

else /. # horizontal linlt = 0./ {

}

if there is 1 dovntree link

send the jc msg vithout touchlng the Pj list

if there is more than 1 dovntree link

build ne" '\'j element (Pj. e1mt = node id)

add it to Pj list

send ne" j c_msg on a11 dovn \t horizontal links

} /. came on single up HnIt */

if i t came on 1 up Hnk amongs many up links

{

vai t for j c_msg' s from aIl uptree links

When a11 jc_msgs on up links have been received

{

compute juncture status using only up links

if # horizontal link > 0

{

}

send a jc_msg on the horizontal link(s)

vait for jc_msg on horizontal link

recompute juncture status taking into account horizontal

links

if the node is a full jWlcture

{

if the no de vas a partial jWlcture before

{

/* look for juncture statua oscillation ./

73

--

APPENDIX B. PSEUDO-CODE

}

look at the Pj list of the new jc_msg

if it contains the matching split point of the node

(Ilhen lt lias a part1al juncture)

this is a case 01 oscillation

don' t react to thls message

if it is not a case 01 juncture status oscillation

{

build a nell full app msg

send full app_msg on aIl uptree l1n.ks whl.ch

are not full JC or destinatlon

}

/. build a )c_msg for this full juncture ./

"rite the node id in Fj part of nell Jc_msg

if there is more than 1 downtree hn.k

put node 1d into PJ element

and put this element into the PJ hst

send the ne" jc_msg on a11 down 11nks

} /. node is full juncture ./

if the node is a partial juncture

{

find the common part in Pj list from aIl jc_msg

from up Iink

build a new partial app msg using common Pj element

Bend partial app_msg on aIl uptree links that are not

full JC or destination

/. build a jc_msg for this partial juncture ./

if there lS no Pj element ln common or only 1 in common

Pj <- notie id

replace old Pj list of the jc_msg received by

the new Pj

;4

(

(


send the nev j c_lIIsg on all dovn links

if there 18 more than one Pj element in common

if # link dovn (= 1

else

new Pj Hst = common - lat common Pj element

send jc_msg vith new Pj liat on aIl dovn l11lks

Pj element (- node ld

add it to Pj liu

send the new jc_msg on aIl dOIm 11nks

) /* node 1.S a partial juncture ./

} /. wh en all je_msg have been reeeived on up links ./

} /* eame on 1 up hnk among many ./

1.f the je_msg came on dovn link

/* the link probably has a the vrong distance send a message to

ask for a nev distance./

store the je_msg

send a ask_for_qeu

EVENT: nod~ receives a askl'oLqcu

send a qcu baek on the link which reeeived the message

EVENT· node receives an appJllsg

adjust distance of down Hnlt according to app_msg

if the node has already received an app_msg vith the same

destinatlon, origin, status t distance on a dovn link

if the identieal msg vas on different on dlfferent link

store the message

/* don't send app up last message vent up, i t is enough./

exit receive app_msg /. do nothing*/

75


else

{

}

if ( the app msg has same destinat10n t origin

but different status or distance)

replace the old app msg

else

store it

1.f app type = full

{

}

add to distance to destinat1.on the cost to traverse each

uptree hnk

send app_msg on aIl uptree tbat do not lead to a full je

or to the destinat ion

if app type = partial

{

}

if next node = stop field of the papp msg (split point)

{

}

else

stop msg here

if the message replace a fapp msg

send anti-fapp on all up links

distance traveled increase by eacb uptree cost

send app on aIl uptree links

EVENT: Dode receives 3.n anti-app.msg

if app type = full

{

if an app_msg is found vith sarne origin. destination

remove it

;6

(

APPENDIX B PSEUDO-CODE

}

send anti-app.Msg on aIl up link vhich do not lead to

the destinatlon or to a full juncture

if app type = partial

{

}

if an app.msg il found vith same origin, destination

remove it

if next node = stop held of the papp msg (spl1t point)

stop msg here

aIse

send antl.-apP.Msg on all up link vhich do not lead to

the destination or to a full juncture

else

stop mllg here

adjust distance of dovn link according to app_Msg

EVENT: no de receives an updateJIlsg

if came on horizontal Hnk

malte that HllA dovntree

store the distance to dest going on that link

(distance in update_msg)

if (the node was part of 2 node jlIDcture

and the no de keeps the sarne jc status after horizontal link

ie gone)

send a new je on aIl down links

send a new app on all up links

if msg came on uptree Hnk

77

.......


if # up 1 inlt = 1

{

}

if # links;> 1

{

}

relabel links vlth nev direction accordlng to upd

distance

- link(s) with distance < upd_dlBtance -> honzontal lin1ds) wlth dutance = upd_dlBtance -;> up

link(s) with distance = upd_distance+l ->honzontal

- link(s) wlth distance> upd_dlstance -;> down

send update on nev up llnk

send quiek cost update on all horizontal a down links

except the one vith update !

else

{

}

if netvork disconnected

stop aIl activities

if there vas another failure

send update msg on the link vhere i t came

if # up link = 2

{

}

reorganize the links

if the remaining up hnk has a jc_msg

else

node is not a juncture anymore -> send anti-app_msg

send failure je msg

send update msg on that link to receive

71'

{

\

{


if # up 1ink > 2

{

reorganize tho links

compute the type of juncture agal.n

if i t changed

/lend failure je msg

}

EVENT: receives a quick cost update (qcu) msg

reorganl.ze the links according to the nev distance of the 1 ink

vhich receives qcu

if the msg came on single uptree

{

}

if there is another link vi th distance < qcu 1 s distance

put this dutance in the qcu_msg

adjust distance (- add the cost of link to traverse

send quick cost update msg on aIl horizontal a: dovn link

remove a11 app on aIl dom links

send anti app msg on aIl uptree links

if the msg came on an uptree link amongs other

{

stop msg

if the node juncture status changed from full to partia1

build a nev jc_msg

send i t on aIl dovn 1/; horizontal links

send papp msg on the remaining up links

if the node 100se its juncture status

build a nev j c_msg

send i t on a11 dovn 1/; horizonta1 links

send anti-app msg on the remaining up 1inks

79


}

EVENT: failure on a node's up Jink

N_UP = # of up link before failure

remove the broken link fl)r this destination

reorganize the relllaining links

}

else

if there is 1 link left /* there is no alternate path */

send update msg alLong that link(with dist alt = INFINITY)

If there is more than 1 link left

/. build update ~ 'lcU msg and send them * / update' s alternate distance (- distance on the second hnk

qcu's alternate distance (- distance on the second link

send update on all up links

send quick cost update on all horizontal ~ dOIm links

if link the nev up link vas honzontal

/. can send Je_msg immediately for faster response ./

if 1 horizontal link

send the je_msg from that linlt on aIl dom

l horizontal links

if there vas more than 1 horizontal Hnk

build nev je msg and send i t

remove all app on dovn links

{ /. more than 1 uptree link ./

if N_UP > 2

/. node is still juncture */

build a new je msg

80

(

(


if nev je MSg is different from the previous one

und i t on all horizontal 1: dovn liNts

if K_UP = 2

if # horizontal I1nks = 0

/* tailure -> not a juncture anymore */

talte the jc_msg from the up Hnk laft

und i t on aIl dovn links

Bend anti-app_msg on up link

else

{

compute the no de 's juneture statua

if it is the same as before lailure

send jc_msg on horizontal links

else

send Jc_msg on horizontal ~ dovn 11MS

if the node is a partial juncture

send papp_msg on up links

ebe

send anti-app msg on up linka

} /* # horizontal link > 0 */

EVENT: failure on a node's horizontallink

remove the broken link for this destination

compute the node juneture status

in the following cases:

{

the juncture status has not changed

do nothing more

the node stop being a juncture

aend anti-app msg on up links

send a nev je_msg on horizontal and dovn links

81


}

the node lias a t1l0 node full Juncture

and it becomes partial juncture

send papp msg on up links

send a nev je_msg on horizontal and down 11Itks

EVENT: failure on a node's down link

for aU app msg received on the broken hnk

remove the app_msg

send anti -app_msg

remove the broken Hnlt for this destination

82

(

Appendix C

Buffer description

The buffer is the place where anode keeps its representation of the network and the

messages it keeps. There is one representation per destination. This representation

is divlded in two parts: a description of the node and a representation of the link.

The node description contains the following elements:

• the node status (full Juncture, partial juncture, not juncture)

• the # of up links

• the # of down links

• the # of horizontal links

Each link is represented by the following elements:

• the node id of the no de at the other end of the link

• the direction of the hnk (up, horizontal, down)

• a pointer to the list of messages which came on that link and which are kept at

the node.

In addition, the link representation has two fields which have a different meaning

depending on the direction of the link. If the link is clown, the link keeps two distances:

83

1

APPENDIX C. BUFFER DESCRIPTION

• the shortest distance through a full juncture

• the shortest distance through a partial juncture

Otherwise, the fields have the followmg meaning:

• the shortest distance to the destination

• the status of the other node sharing this link

(

(

Appendix D

Storage space calculation

This appendix presents an analysis of the storage space required to store each ap

proach representation of the network. 1t uses the assumption stated in section 5.4,

namely the assumption that distances and node addresses can be stored in 1 byte,

and that the size of the feasibility vector is 2 bits. In both approaches the space

required depends on the number of links at each node. Table Dol show the number

links at each node.

D.I Garcia approach

Garcia-Luna-Aceves approach uses three tables: the distance table, the routing table

and the link table. Those tables contain different information and they have different

slzes .

• A routing table entry has the form - (distance, successor). It uses 2 bytes per

entry. There is one entry per destination .

• A distance table entry has the form - (distance, feasibility vector). It uses 1

byte for the distance + 2 bits for the feasibIlity vector per entry. There is one

node A B C D E F G H 1 total links 2 3 2 2 2 2 3 2 2 20

Table D.l: Number of links at each node

85

APPENDIX D. STORAGE SPACE CALCVLATION 86

en try for each pair (hnk ,destinatIon).

• A distt'.nce entry has the form - (cost), 80 It uses 1 byte per entry Thcre 15

one entry per link.

The sum of the nurnber of link representations, in the 9 nodes of the network, is

20 (see table D.l). For one destination, the sum of the space in each table is the

following:

• routing table -- # nodes x space used by one entry - 9 x 2 bytes = 18 byte,';

• distance table - I: # of links x space used by one entry -~ 20 x (1 byte + 2 bits) = 20 bytes + 40 bits

• link table - 1: # of links x space used by one entry - 20 x 1 byte = 20 bytes

The total space used by ail the tables over aIl the nodes for one destination IS th en

58 bytes + 40 bits or 63 bytes

When aU nodes of the network are destination, this space becomes:

• routing table - # nodes x space used by one entry x # of destin~tion -

9 x 2 bytes x 9 = 162 bytes

• distance table - I: # of links x space used by one entry x # of destination

~ 20 x (1 byte + 2 bits) x 9 = 180 bytes + 360 bits

• link table - 1: # of links x space used by one entry -; 20 x 1 byte = 20 bytcB

The total space used becomes 362 bytes + 360 bits or 407 bytes.

D.2 ADR approach

In the ADR approach the space required depends on the messages kept at cach lIode

Table D.2 show the messages kept at each node before the frulurc. Table D ;~ r-,how the

messages after the failure. Both tables only show the messages for destmutioJl A 'l'he

detailed calculation for all destination 15 too long to be displayed hcre The JI UlII ben')

APPENDIX D. STORAGE SPACE CALCULATION 87

node lmk du dist messt\ses B A U 1 Jc(FJ - A. PJ 0)

D D 5 fapp(orlgm = I, distance = 7) E D 5 fapp(orlgm = I. distance = 7)

C A U 1 Jc(FJ - A. PJ - 0) F D 5 fapp( origm = I. distance = 7)

D B U 2 Jc(FJ - B. PJ -,lBp G D 4 papp(orlgm = G, wstance ::: 4), (app(orlgJn ::: l, dlstJmce = ")

E B U 2 Jc(FJ - B, PJ - )B) G D 4 papp(orlgm = G, dl,tance ::: 4), fapp(origJn ::: l, d ... tance = 7)

F C U 2 Jc(FJ = C, PJ = ~) H D 6 !app(orlgln = I, dlshmce = 6)

G D U 3 Jc~FJ :: B, PJ - [B]} E U 3 Jc(FJ = B, PJ = [B]) 1 D 5 fapp(orlgm = I, distance = 5)

H F U 3 jc(FJ = C, PJ = 0) 1 D 5 fapp(orlgln = l, distance = 5)

1 G U 4 JC~~J - G, PJ - y) H U 4 Jc(FJ = C, PJ = 0

Table D.2: Messages ston'd in the network before failure

{or the "aU destination case" shown m the table 5.3 of section 5.4 were obtamed using

the simulator.

The representation of a link in the proposed version of the ADR approach r~quires

3 bytes + 2 bIts here: 2 bits to record the direction of the link, 2 distances (1 byte

each) and 1 byte to point where the last message received is stored. The vaflous

messages have the {ollowing content, which requires the space mentioned:

• fapp message - (msg type,destination,origin,distance)- 4 bytes

• papp message - (msg type,destination,origin,stop field) - 4 bytes.

• jc message - (msg type,destillation,# of Pj clement, Fj) - 4 bytes

+ 1 byte per Pj element

Table D.2 shows that there are 8 fapp messages, 2 papp messages, 6 JC messages

with no Pj element, and 4 Je messages with one Pj element stored in the network

before the failure. The space required to store all these messages and all the lmks

representation is 128 bytes.

(

L __ _

APPENDIX D. STORAGE SPACE CALCUL A TlON 88

node Imk dlr dis! measage. B D 0 5 JC~~J - A, Pj - !~U

E 0 li Jc(FJ = A, Pj = [GD c A U 1 jc(FJ - A, Pj - U)

F 0 5 n B U 2 jc(FJ - A, PJ - [G])

G 0 4 papp( ofllPn = G, distance = 4) E B U 2 Jc{FJ - A, PJ -.IG1)

G 0 4 papp(orilPn = G, distance = 4) F C U 2 Jc{FJ - C, Pj - UJ

H D 6 G D U 3

E U 3 1 D 1'> jc(FJ = C, PJ = m

H F U 3 Jc(FJ = C, PJ - m 1 D 5

1 G U 4 H U 4 Jc(FJ = C, PJ = 0

Table D.3: Messages stored in the network after {ailure

8 fapp messages X 4 bytes 32 bytes

2 papp messages x 4 bytes 8 bytes

6 je messages with no Pj element x 4 bytes 24 bytes

4 je messages with 1 Pj element x 1 byte 4 bytes

18 links x 3 bytes + 2 bits 54 bytes + 36 bits

total 122 bytes 36 bits Table D.3 shows that there are 2 papp messages, 4 Je messages with no Pj elementl

and 4 je messages with one PJ element stored in the ndwok before the failure.

The space required ta store all these messages and all the link representations is

88 bytes. 2 papp messages

4 je messages with no Pj element

4 je messages with 1 Pj element

18 links

total

x 4 bytes

x 4 bytes

x 1 byte

x 3 bytes + 2 bits

8 bytes

16 bytes

4 bytes

54 bytes + 36 bits

82 bytes 36 bits

Bibliography

[CAT88] C.G. Cassandras, M.V. Abidi, and D. Towsley. Distributed routing with

on-line marginal delay estimation. IEEE Infocom '88, pages 603-612,

March 1988.

[Ceg75] T. Cegrell. A routing procedure fot t}le tidas message-switchmg network

IEEE Transactzons on Communzcations, COM-23(6):575-585, June 1975.

[CG74) D.G. Cantor and M. Gerla. Opt;'"7lal routing in a packet-swithced computer

network. IEEE Transactions on Computers, C-23(10):1062-1069, Oetober

1974.

[FGK73] L. Frata, M. Gerla, and L. Kleinrock. The flow deviatioll method: An

approach to store-and-forward communication network design. Networks,

3:97-133, 1973.

[Gal77J Robert G. Galager. A minimum delay ~'outing algorithm using distributed

computation. IEEE Transactions on Communications) COM-25(l):73-85,

January 1977.

[GLA86] J.J. Garcia-Luna-Aceves. An algorithm for shortest-path routing with

distributed information. Technical report, SRI International, Menlo Park,

Ca, December 1986.

[GLA87) J.J. Garcia-Luna-Aceves. A new minimum-hop routmg algonthm. In

Proceedmgs of IEEE INFOCOM '87, April 1987.

89

(

(

ft

BIBLIOGRAPIIY 90

[CLASSa] J.J. Garcia-Luna-Aceves. A distributed, loop-free, shortest-path routing

algorithm. In Proceedings of IEEE INFOCOM '88, 1988.

[CLA88b] J.J. Garcia-Luna-Aceves. Distributed routing using internodal coordina

tion. In Proceedings of IEEE INFOCOM '88, 1988.

[GLA88c] J.J. Garcia-Luna-Aceves. A minimum-hop routing algorithm based on dis

tributed information. Computer Networks and ISDN Systems, 16(5):367-

382, May 1988

[GT90j D. W. Glazf'r and C. Tropper. On congestion based dynamic routing.

volume COM-38, pages 360-368, March 1990.

[Hag83] Jacob Hagouel. Issues in Routing for Large and Dynamic Networks. PhD

thesis, Columbia University, 1983.

[JM82] J. M. Jaffe and F.M. Moss. A responsive routing algorithm for computer

networks. IEEE Transacttons on Computers, COM-30(7):1758-1762, July

1982.

IMcQ74] J. McQuillan. Adaptive Routing Algonthms in Dtstributed Computer Net

work. PhD thesis, Harvard University, 1974.

[Sch87] Mischa Schwartz. Telecommuicatlon Networks: Protocols, Modeling and

Analysls. Addison-Wesley, 1987.

[Tan89] Andrew S. Tanenbaum. Computer Networks. Prentice Hall, second edition,

1989.

[Tsu87] Paul F. Tsuchiya. Landmark routing: Architecture, algorithms, and is

sues. Technical report, MITRE Corporation, McLean, Virginia, September

1987.

Failure tolerant alternate-path distance-vector...

Documents

Transcript of Failure tolerant alternate-path distance-vector...