Failure tolerant alternate-path distance-vector...
Transcript of Failure tolerant alternate-path distance-vector...
(
Failure tolerant alternate-path distance-vector routing
Jean-François Girard
School of Computer Science
McGill University, Montréal
August 1991
A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES AND RESEARCH
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
© Jean-François Girard, 1991
(
Abstract
In the Alternate-path Distance-vector Routing approach presented by Tsuchiya [?J each node of a network keeps enough information to route messages after a failure.
This thesis presents an algorithm to update this information in reaction to link fail
ures. The proposed algorithm appears to perform its task without introducing a count
to infinity problem. This thesis also presents a way to reduce the size of one of the
most frequently sent messages. The size of this message can he reduced by upto 50 %
of its equivalent in the Tsuchiya approach. The algorithm was tested on a simulated
networks consisting of 98 nodes representative of the ARPANET (1983). The results
obtained show that the algorithm displays reasonahle performance in the context of
hop count metrics.
11
Résumé
Dans l'approche de routage utilisant des vecteurs de distance et des chemins al
ternatifs présentée par Tsuchiya [?], chaque noeud d'un réseau garde suffisamment
d'information pour acheminer les messages après la rupture d'un lien. Ce mémoire
présente un algorithme qui met à jour cette information quand il y a rupture de
lien. L'algorithme proposé ici semble accomplir sa tâche sans introdUIre le prob
lème du "count to infinity". Ce mémoire présente aussi une méthode pour réduire
la longueur du message le plus fréquemment employé par l'algorithme. La longueur
de ce mec;sage peut être réduite jusqu'à la moitié de la longueur du message équiv
alent dans l'approche de Tsuchiya. Cet algorithme a été testé par sImulation d'un
réseau représentant 98 noeuds importants du réseau ARPANET(1983) Les résultats
ainsi obtenus montrent que l'algorithme offre une performanœ raisonnable quand les
"hop-counts" sont utilisés comme mesure de la distance entre les noeuds.
1ll
( ..
Acknowledgements
1 would like to thank my tbesis supervisor, Professor Carl Tropper, for the knowledge
and experience he offered as 1 conducted my reasearC'h. He helped me to focus my
work, putting it in perspective and reorienting it when 1 was diverging.
1 would like to tbank the people who proofread my thesis at its various stages:
Andrea Erickson, Kristina Pitula and Louis Vroomen.
Lastly, 1 would like to thank God, for the possibility to write this thesis and for
his help during the time when it seemed that this thesis would never end.
lV
-
Contents
Abstract
Résumé
Acknowledgements
1 Introduction
1.1 ADR in the context of other routing schemes .
1.2 Evolution of distance-vector Routing
1.2.1 ARPANET
1.2.2 Split-horizon.
1.2.3 J affe and Moss
1.2.4 Hagouel .... 1.2.5 Garcia-Luna-Aceves
1.2.6 Hold-down. . .
2 General concepts
2.1 Network description.
2.1.1 First network
2.1.2 Second network
2.1.3 Third network .
..
2.2 Chronological table of events .
v
Il
... III
IV
1
1
3
3
.5
6
8
9
10
Il
13
13
17
17
19
3 Isolated link failures 22
3.1 Failure on down link 22
3.2 Failure on horizontallink . 24
3.3 Failure on up link . . . . . 26
3.3.1 Example of failure on one of many up links. 28
3.3.2 Example of failure on one of two up links . 29
3.3.3 Example of failure on single up link . . . . 31
4 Strategies 39
4 1 Fighting an illusion 39
4.2 Shorter je messages . . 40
4.3 Stopping messages .. 43
4.4 Guiding update messages . 45
4.5 Improvements due to the strategies 46
5 Performance 49
5.1 A single failure ........... 52
5.2 Mutual influence of destination 54
5.3 Multiple failures . 58
5.4 A comparison 59
6 Conclusion 65
A Message description 68
B Pseudo-code 71
C Buffer description 83
D Stor~ge space calculation 85
D.1 Garcia approach . 85
D.2 ADR approach 86
~I Bibliography 90
VI
.....
List of Tables
2.1 Messages stored in the first network after setup
2.2 Messages stored in the second network arter setup
2.3 Mes5ages stored in the third network after setup
2.4 Example of time table ........ .
3.1 Possible sequences of event at no de 13
3.2 failure link 2-5 .. . ..
4.1 Number of messages sent.
4.2 Total number of bytes sent in ail messages
4.3 Number of messages sent ..
D.1 Number of links at each no de
D.2 Messages stored in the network before failure .
D.3 Messages stored in the network after failure
vu
16
18
lU
20
35
37
46
48
48
85
87
88
(
List of Figures
1.1 Node 3 routing information 3
1.2 First example of count-to-infinity 5
1.3 Second example of count-to-infinity 6
1.4 Jaffe and Moss' approach . . 7
2.1 First basic network graph 13
2.2 Second basic network graph 17
2.3 Third ba.sic network graph 18
3.1 Failure on link 8 - 10 23
3.2 Failure Q - R .... 24 3.3 Failure D - E .... 25
3.4 Failure on link F--G 28 3.5 Failure on link 7 - 9 . 29
3.6 Fail ure on link Q - U 30 3.7 Fail ure on link 2 - 5 . 31 3.8 After failure on link 2 - 5 . 38
4.1 Example for juncture status oscillation 40 4.2 Short juncture message - first example 42 4.3 Short juncture message - second example . 43 4.4 Guidmg update message 45
4.fi A basic net wor k graph 48 5.1 98 major nodes of the ARPANET . 50
5.2 Messages sent in response to the ARPA-USC failure 53
( 5.3 Bandwith used to respond to the ARPA-USC failure . 54 ,
V1I1
5.4 Time required to respond to the ARPA-USC failure . . . . . . .
5.5 Time to react to the ARPA-USC failure vs number of destmat.ions
5.6 Number of messages sent in response to ARPA-USC failure
5.7 Number of messages sent in response to the number of failures
5.8 Time to respond to multiple failures. '. ..... . ....
lX
55
56 r:" .) 1
58
59
(
Chapter 1
Introd uction
In [Tsu87], Tsuchiya presents the Alternate-path Distance-vector Routing (ADR).
He does "not give an exact algorithm for the ADR"[Tsu87, page 24], but provides a
"rough descrIption of Its operatmg". His description sketches how the routing scherne
would react to a link metric change. This thesis clevelops an algori thm to react to
link failures and studies its performance.
The rest of thls chapter glves an overview of the preVlOUS work in distance-vedor
routing. Chapter Two mtroduces the key concepts of ADR. Chapter Three introduces
the routing scheme's reaction to link failures. The following chapter explains strate
gies which improve the performance of the basic scheme. Chapter Five then f'tudies
the performance of the resulting algorithrn. The text concludes with suggestions
about the environment in which such an algorithm could be useful.
1.1 ADR in the context of other routing schemes
This section briefly reviews sorne of the routing schemes available1 . We first show
where the ADR fits in a classification of routing algorithrns. The rest of this chapter
presents various distance-vector routing algorithms to give elements of comparison
for ADR
1 A more detail overview of routing schemes is available in Schwartz [Sch87J chapter Six. Tanenbaum [Tan891 chapter Five presents routing schernes in the context of the OSI mode!.
1
.' CHAPTER 1. INTRODUCTION 2
Routing algorithms can he classified in various ways. They can be clas~ified as to
their response to change: an algorithm is said to he statze (cr non-adaptive) If the
routes are computed off-line, and adaptzvf if it tries ta modify its routing deeisioll
to reflect change In topology or link "cost". Ther: is a gradation in adaptabilit,y,
certain algorithms only react to changes in topology, other adapt to varIOUS measures
of the link "cost": estimated propagation delay ([Gal77], [CAT88]), link utilization,
buffer occupancy, measured error condItions on a link, etc. Those measures ean be
combined among themselves (link & buffer uhlization [GT90]) or with fixed quanbty
like link length, speed or handwidth The amount of information used and how
often this inÎolmation is updated defines a possible sub-classificatlOn of the adaptlvc
algorithms.
Routing algorithms can also be classlfied on the basis of their performance objec
tives: tl:.e shortest-p<.T,'. approach is a greedy algorithm, providing a least cost path
between the source and destinatIon (the sum of the costs on this least cost path IS
often referred to as the distance between the source and destinatIOn), the nelwork
wide approach minimizes the average time delay. The network wide approach usually
leads to the use of multiple paths hetween source auc! destmation. ThIs multipath
routing (also called bifurcated rou ting) was not adopted by any net work or network
architecture until recently (OSPF, an interior gateway protocol of Tep /IP, can dls
tribute traffic over multiple routes), although suggestions of the use of m ultipath
routing arise in sorne early papers [CG74], [FGK73] and [GaI77]. On the other hand
the shortest-path approach is used by mast packet-switched netwark (old ARPANET
and new ARPANET, BNA, DNA, SNA, TYMNET ... ).
Finally, routing algorithms can be classified as whether thcir routes are computed
at each node or at a central node. The former approach is called dzstrzbutcd routmg.
The ARPANET, BNA (Burroughs Network Architecture) and DNA (Digital Network
Architecture) are examples of netwo:::ks or architectures that us(' a distributed routing
algorithm. The latter approach is called eentraltzed routing. TYMNET, the onginal
IBM SNA and GTE are example of networks or architectures that use a ccntralizcd
routing algorithrn.
There are two classes of distributed routing: dista.nce-vedor and link-states. In
( CHAPTER 1. INTRODUCTION 3
Routing table Distance table dest dist successor dest Link
DESIlNATION
l 2 l 2 l 2 3 0 nil
2 4 1 2 3 2 1 2
--0 4 l 4 4 2 1
Figure 1.1: Node 3 routing information
distance-vector routing (the old ARPANET and the Bellman-Ford routing algorithm
are example of this), adjacent nodes exchange lists of their distances to each destina
tion. Each node then uses this informatio'l to compute its own short est distance to
the destination. The nodes keep only these distances and the fi15t link on the path to
the destination. In a link-state algorithm (the Shortest Path First, new ARPANET,
and Dijkstra's algorithm are example of this), cach node broadcasts the status of the
links between itself and each of Hs neighbors. Each node keeps a representation of
the entire network, along with the cost associated with each link in the network.
The Alternate-path Distance-vector Routing (ADR) is a distributed adaptive rout
ing aJgorithm, which can exploit multiple paths between the source and destination.
We will study it in more detail in the following chapters.
1.2 Evolution of distance-vector Routing
1.2.1 ARPANET
The old ARP ANET algorithm was one of the first versions of distance-vector routing
algorithm. In this version, each node2 in the network maintains a routing table and
a distance table. The distance table con tains the distance to every destination for
each of the node's links. The routing table contains the shortest distance to every
destination and the link which was used to obtain that distance (called the successor
of this node for this destination). For example figure 1.1 shows the routing table and
the distance table of node 3 in the network depicted in figure 1.1.
:lIn the old ARPANET, switching e1ements were caUed InterCace Message Processors. Here we use the tcrm 'node' to reCer to an IMP or a router, implying that ail nodes considered are routers.
l , 1
....... ' " >
CHAPTER 1. INTRODUCTION 4
Each node uses the messages it receives from its neighbors to fill these tables.
These messages are summaries of the neighbors' routing tables. When anode receives
one of these messages, it updates its distance table. The update of the table requires
severa! steps. For each destination the node replaces the entry corresponding to the
sender's distance by the sum of the distance from that node and the "length" of the
link. If the new distance is smaller than the distance in the routmg table, then the
node replaces it by the new distance and defines the successor node to be the sender
of the message.
Wh en the algorithm starts, each node sets the successors to nil in its routing table.
It also sets all the distances to infinity except for the distance from the node to itself.
Alter that, each no de periodically sends an update message to all of its neighbors.
This updating process can be summarized by :
For an example of the updating procedure, consider the initialization of the net
work of figure 1.1 for only one destination, node 1 and from the poini of view of
node 3. In this example, a message is represented as MSG( destination,distance) and
aIl links have a cost of one. Assume node 3 receives its first message from node 4,
MSG(1,2). Node 3 adds the link cost (one) to node 4's distance (two) to obtain the
distance to put in the the distance table (three). It is smaller than the initial distance
in the routing table (00), so it replaces this distance and the successor becomes node
4. When no de 3 receives MSG(I,I) from no de 2, it puts the sum of the link cost and
the message distance in the distance table. That value (t wo) is sm aller than the value
of the current shortest path (through node 4), so node 2 replaces node 4 as successor.
At this point, the entry for destination 1 in both table correspond to what is shown
on figure 1.1.
The major drawback of the old ARPANET approach is that it suffers from the
count-to-infinity problem (abbreviated CToo). This problem (also called good
newsfbad news) occurs when anode expeIlences an increase in its distance to desti
nation and finds a new "shorter" route looping back through itself. For example, in
the network of figure 1.2, if node A is the destination, CToo occurs alter a lailure on
CHAPTER 1. INTRODUCTION 5
(:~ .... "".""::.~;0 X ·8 œ\11NATION
.' ........... "11 •11
Figure 1.2: J.4'irst example of count-to-infinity
link A-B. Before this failure, node B has a distance of one on link A and of three on
link C, and node Chas a distance of two on link B. Wh en the failure occurs, node
B looks for another route to reach the destination. It chooses the route starting with
link C (with a distance of two) and sends an update to node C to inform C that
B's distance to A has increased. When node C receives this message it changes its
distance to A and sends an update to B. This exchange continues until the nodes'
distances each reach infinity (or the maximum value that can fit in the space reserved
to store the distance).
1.2.2 Split-horizon
One solution to this simple version of CToo is the split-horizon technique [Ceg75]. In
this technique, node C in figure 1.2 cannot use the distance of path going through B
in the update message that C sends to node B. Node C does not send any update
message since it has no other path to destination except through B. Therefore node
B does not !mow about the "deceitful" path which starts with link C. This solution
does not, however, work with more complex CToo's. In figure 1.3, if a failure occurs
on the link between the destination (node 1) and no de 2, then no de 2 sends an update
message to nodes 3 and 4. This message do es not offer the short-cycles 2-3-2 or 2-4-2
because the nodes use the split-horizon. However, this technique does not prevent
node 3 from telling node 2 about its path through node 4 nOI does it prevent node 4
from telling about its path through node 3. Therefore, the update message that no de
2 sends to node 3 has a distance of four. This distance reflects the new path through
node 4. When no de 3 receives this message, it sends a message with a distance of
five to node 4. Node 4, in turn, sends an update with distance of six to no de 2. As a
result, no de 2 sends a new message with a distance of seven to node 3.
CHAPTER 1. INTRODUCTION 6
DES11NATION
Figure 1.3: Second example of count-to-infinity
The same process happens with the message no de 2 sends to node 4. There are
two CToo's (2-3-4-2 and 2-4-3-2) which take place as a result of this failure and
the split-horizon technique does not prevent them from occurring.
1.2.3 Jaffe and Moss
Jaie and Moss [JM82] address this problem. For example, in figure 1.3, no de 2 would
be prevented from taking into account the distance on links 3 and 4 until all the
nodes downtree3 from node 2 have been informed of the link failure. By the time
node 2 gets the confirmation that aIl nodes downtree know about the link failure, it
has received messages on link 3 and 4 which prevents it from choosing either of those
links as its successor. As a resuIt, node 2 cannot start a cycle. In the rest of this
section, we first describe JaKe and Moss' algorithm, and then illustrate its operation
via a simple example.
This approach exploits the fact that the old ARPANET algorithm, in the presence
of constant or decreasing distances to a destination, maintains loop-free paths. In the
old ARPANET, loops can occur only when there is an increase in the distance to the
destination. In Jaffe-Moss' approach, these increases are treated differently. When
a no de sees that its successor link's cost has increased or that this link has failed, it
updates its distance table, sends a link increase update message4 to all of its neighbors
3This notation reCers to a spanning tree rooted at the destination. 4Jaft"e-Moss' update message contains, in addition to the destination and the shortest distante
c.
CHAPTER 1. INTRODUCTION 7
$lep 1 step2 $lep 3 slep4
Figure 1.4: Jaffe and Moss' approach
and freezes. This frozen state prevents the node from choosing another successor,
but it still lets the node update the distance associated with each link. Anode remains
frozen until it receives, from each of its neighbors, an acknowledgment of the increase
update message.
When anode receives an increase update message on its successor link, it changes
the link dIstance, sends update messages to all of its neighbors, and enters the frozen
state. The update messages carry the node's new distance to destination using its old
successor. If anode receives an update message on a link which is not its successor, it
sim ply adjusts the distance on that link and sends back an acknowledgement. When
a frozen no de has received acknowledgments from aU of its neighbors, it can, in turn,
send an acknowledgment on the link which received an update. It then considers the
link with the shortest distance to destination. If its distance is sm aller than in finit y,
the node choses it as its a new successor and update the routing table accordingly.
Otherwise the successor becomes nil. If the successor is not nil, the node sends an
update message to its neighbors.
In the second example, Jaffe-Moss' approach works as illustrated in figure 1.4.
ln the figure, an additional circle around the node means that the node is frozen;
Upd(distance) l'efers to an increase update message and ACK to an acknowledgment.
ACter the failure, node 2 enters the Crozen state and sends an update message to
nodes 3 and 4. When node 3 and node 4 receive this message they react as depicted
in step 2 of the figure. They freeze because the message came from their successor.
lo il, a single bit field which indicates if the message was send in reaction ta a link increase or not. Note that a failure is considered as a link increase to infinity.
CHAPTER 1. INTRODUCTION 8
They then set the distance on link 2 to infinity, and send an update message to each
of their neighbors (nodes 2 & 3 for no de 4 and nodes 2 & 4 for node 3). Stcp 3
shows how the neighbors react to these messages. Each of the nCIghbors receives the
message on a link which is not its successor. 50, it can send back an acknowledgement
after it has set the link's distance to infinity. Note that, as a result, node 2 has a
distance of infinity on the link to each of its nelghbors. In step 4, nodes 3 and 4
react to the acknowledgment which they received from each of their neighbors They
then unfreeze and send an acknowledgment to node 2. Whell node 2 receives both
acknowledgements it unfreezes and discovers that It cannot select a new successor for
destination 1 because alllinks have a distance of infinity. As a reslllt, Ilodc 2 cannot
start a loop which leads to CToo.
This approach also handles the cases multiple links increasing their cost. 1'0 do
this, the nodes keep, in addition to the information mentioned before, a bit vector
for each of its links. The nodes use these vectors to keep track of the "outstandlllg"
acknowledgments. When anode freezes, it adds a bit to each of its vectors. Later
when the node receives a acknowledgement on a link, it removes a bit from the vector
attached to this link. The node remains frozen until all of its vectors have no bits
left.
This approach eliminates many CToo problems, but the extensive freezing that
might be present decreases the responsiveness of the old ARPANET algorithm.
1.2.4 Hagouel
Hagouel [Hag83] also addresses the problem described in the second example of C'l'oo
(see figure 1.3) by preventing node 2 from taking into account the distance to desti
nation on links 3 and 4. He does it, however in a more drastic way In his approach
(which he refers to as "algorithm A") there is no distance table, instef\d each node
keeps only a single cost estimate for each destination. When anode necd!> to know
the distance to a destination on a path starting with a link which IS not It& successor,
it asks its neighbors to send back a message with this distance. However, anode is
not allowed to reply to such a request from its successor when this request reflect& a
link increase.
( CHAPTER 1. INTRODUCTION 9
This approach decreases the probability of CToc but there is no proo{ in [Hag83]
that this approach completely eliminates the problem. One drawback of Hagouel's
approach is that anode cannot tell whether it cannot reach a destination or if the
message asking for an update has been lost or delayed. This is the case with the
second example ol CToo. Node 2 will never get any message about the destination
from node 3 or node 4. A positive aspect of this approach is that it does not require
any freezing.
1.2.5 Garcia-Luna-Aceves
Garcia-Luna-Aceves has proposed many solutions to the CToc[GLA86, GLA87, GLA88b].
One of most recent ones is presented here. It is an extension to the Jaffe-Moss' ap
proach.
This approach [GLA88a, GLA88b] adds the concept of feasible successor to the
Jatfe-Moss algorithm. A feasibJe successor to anode is a new successor which has
a distance to the destination smaller than or equal to that of the current successor.
When anode receives, from its current successor, an update message that reflects an
increase in link cost, it does not necessarily freeze, as in Jaffe-Moss algorithm. If there
is a feasible successor, it will become the new successor and no freezing is necessary.
Similarly, when a node's link to a successor fails or the link cost increases, the no de
looks for a feasible successor belore going into a frozen state. If there is a feasible
successor, once again the node does not need to {reeze.
With this key difference and sorne other minor ones, Garcia-Luna-Aceves obtains
an "algorithm that is always loop-free, operates with arbitrary link and node delays,
and provides shortest paths within a finite time after the occurrence of an arbitrary
sequence of topological changes."[GLA88a, page 1126]. He gives a proof of this loop
freedom, in [GLA88a).
In [GLA8Sc], he also proposes a modification to this algorithm which uses less
internodal update coordination. However, it is required to include, both in the update
messages and the routing table, the complete path from source to destination in order
to avoid CToo in the case ol multiple failures. So it seems that, unless a single failure
....
CHAPTER 1. INTRODUCTION 10
can be assumed, this family5 of algorithms requires strong coordination
As Garcia-Luna-Aceves says himself [GLAS8c) in a "large network subject to sev
eral topological changes [ ... l, the coordination that they reqUlre among nodes distant
from each other can inhibit the responsiveness of the network". However, it is not
clear if it is possible to improve the old ARPANET to avoid CToo and to preserve
responsiveness, using any of the methods presented 50 far.
1.2.6 Hold-down
The hold-down solution to the CToo is an old solution to this problem It precedes
.Jaffe-Moss's solution. It has been proposed by a McQuillan [McQ74]. It diverges
from the ARPANET algorithm when anode receives an mcrease update message"
from its current successor. The node then waits for a fixed period of time before
making any change to its routing table. This waiting period IS called a hold-down
period. It should be long enough so that ail the old information has been replaC'Cd,
using the normal update process, by the new information The problem with this
approach is to determine what is a "long enough" period of time. If the network WaJt
for a too short period, it can still suffer from the CToo. If it waits too long, it wastes
precious resources. With multiple failures the hold-down period can easily be not
long enough, so it might not work.
This hold-down period replaces the various mechanism which actively rem ove the
old information in .Jatre-Moss, Hagouel's, and Garcia-Luna-Aceve&' approach. From
this point of view, the hold-down approach trades being less responsive then those
approaches which use fewer messages.
SOld ARPANET, Jaffe-Moss and Garcia 6 An update message which was sent in reaction to increase of link cost
{
Chapter 2
General concepts
This chapter introduces a few conventions and key ideas used in the following text. It
presents the three basic networks that are used in exarnples which will he presented
in the later sections. It presents a tool, called the chronological table of events, which
will he used to summarize examples. Such a table can he found at the end of each
example in order to give a time perspective. At the end of this chapter, we present
the assumptions of the proposed algorithms.
A first convention is to discuss a network as if the destination were the root of
a spanning tree, and this root was the highest node in a figure. The other nodes
are ordered according to their distance. the destination. When two nodes have a
common link on their path to the destination, the one that has a shorter path leading
to the destination, is said to he uplink (or simply up) from the other. If two nodes
which share a link have the same distance to the destination, the link connecting
them is called a horizontal link. For exampl~, in figure 2.1 nodes 9 and 10 are said
to he up from node 12. Node 11 is horizontal from no de 12. Node 13 is down from
node 12. This convention is consistent with Tsuchiya's initial text [Tsu87]. However
in the figures, the destination is usually located on the right, in order to save space.
In this document the distance between two nodes is measured in hop counts (the
minimum number of links hetween two nodes). Hop counts are used to make the
exposition more straightforward.
Another way to render the exposition clear is to consider only one destination
11
CHAPTER 2. GENERAL CONCEPTS 12
in each example. This can he done without difficulty because each node keeps a
separate representation of the network for each destination. These representations
contain views of the network which are relatively independent, since a message for
one destination is rarely relevant for another. For example, conslder figure 2.1 with
destination nodes 0 and 4. A message informing node 6 about a failure on link 0-1
for destination 0, is not relevant for node 6 with respect to destination 4.
The central concept of Alternate path Distance-vedor Routing (ADR) is the con
cept of juncture. Most messages carry information about their position, a.nd this
information constitutes a considerable portion of each node's view of the network. A C'
juncture is a node or a pair of nodes with two or more up paths of equal dlstancc
to the destination. Such anode is important hecause, if a link fadure occurs on one
of the paths to the destination, the second path from the juncture offers an alterna-
tive (also called alternate path) to nodes between the failure and the juncture. For
example, consider a failure cn the link between nodes 1 and 3 (figure 2.1). Node 6 is
the closest juncture to the failure, so it would provlde a path to the destinatIon for
node 3.
Juncture nodes can be divided into two categories depelldmg on the type of alter
nate paths to the destination they offer. A full juncture ofiers at least two distinct
paths to the destination (for example, node 9 in figure 2.1 offer a path elldmg with
link 1-0 and another path on with link 2-1]) A partial juncture also offers at least
two paths to destination, but those paths have sorne nodes in cornmOlL, other than
the destination (for example node 6 in figure 2.1 offer paths 6-3-1-0 & 6-4-1-0) This
type of juncture provides an alternative path in case of link failure, but only when
the failure occurs hetween the juncture and the first node corn mon to both paths
(that is, if a failure occurs on the link between node 0 and 1, node 6 does not offer
an alternative to node 1,3 & 4).
The difference in the range of nodes a juncture can help in case of faiiure dictates
which nodes should he informed about the alternative. A full juncture ofiers an
alternative to all nodes on an up path to the destination l . A partial Juncture should
1 However it is not necessary to go above another fuU juncture (or reasons explained in section 4.3.
('
CHAPTER 2. GENERAL CONCEPTS 13
Figure 2.1: First basic network graph
tell all the nodes on its up paths up to the node at which these pa.ths merge.
The no de where two paths towards the destination merge IS called a split point.
This name was chosen because during the setup part of the ADR algorithm, the first
messages go down from the destination aild they split at this node.
2.1 Network description
In this section we first describe a network as a static entity, and then describe how
it develops its self-knowledge (i.e. the setup part of the algorithm) and how it stores
this knowledge. The second and third networks are only described as static entities.
There is a table showing which messages are kept at each node at the end of the
description of each network.
2.1.1 First network
Consider the basic graph example (figure 2.1) where node 0 is the destination. All
arrows point towards the up direction.
There are three split points, apart from the destination, in this network: node
l, node 5, and node 9. Node 6 is the only partial juncture here. It brings alternate
paths to Dode 3 and 4. There are three full junctures in this network: no de 9, node
12, and node 13. Node 9 provides an alternate path, in case of failure, for nodes l to
-
CHAPTER 2. GENERAL CONCEPTS 14
7. Node 12 provides an alternate path for nodes 8 and 10. Node 13 does the same
for no de 11. Both of them also provide a long alternate path to nodes which rely on
node 9 in case link 0-1, and one link between 5 and 9 fail They do tins by provldlllg
an alternate path to node 9.
The network learns about the position of junctures through 3 kinds of messages
the jc message, the papp message and the fapp message. The JC messagc tells nodes
that they are juncture. The app messages tell nodes that therc IS a juncture below
them. It also tells them if it is a full or partial junct ure These messages have the
following structure:
je msg: abbreviatIon of juncture configuratIOn message
• Fj element. - the node Id of the last full Juncture encountered (or
the first node aCter destination)
• Pj list - a. hst of split points that have met slnee the last full
juncture
fapp msg: abbreviatIOn of full alternaie prtmmg pa th (app) message
• origin - node id of the full jundure
• distance - distance to destination along the path which goe~
through the juncture
papp msg: abbreviation of partzal alternate pnmmg paih (app) messagc
• origin - node id of the partial juncture
• distance - distance to destination along the path which goes
through the Juncture
• stop field - node Id of the mat ching split pomt
or split count - number of split points before the matching split
point
Once, the first phase of the setup has classified the links of aU nodes as UP,
(
(
CHAPTER 2. GENERAL CONCEPTS 15
horizontal, or down2, the destination sends Je messages to all its neighbors. These
nelghbors put their node id in the Fj part ofthe message and send these Je messages
on their down and horizontal links. These messages are propagated down to the last
nodes which have no other down or horizontal links. On their way the Je messages
are modlfied in three ways. Each split point they encounter adds its node id to their
Pj list. Each partial junctllre removes some elements from the Pj list (the details
follow). Each full juncture puts its HO de id in the Fj part of the message and deletes
the Pj list
For anode receiving a Je message, the Fj part of the message indicates the mes
sage's origin (WhiCh neighbors of the destination, or which full juncture the message
came from). When a juncture has received aIl its Je messages, it can deduce its status
b3.sed on the oflgin of these messages: partial juncture if they have the same origin,
full juncture if they have different origins.
When a juncture discovers its status, it sends the appropriate app message on its
up links. A full Juncture sends fapp messages which are propagated up to the next
full juncture. A partial juncture sends papp messages which are propagated up to,
but excluding, the matching split point (the first common element to the Pj lists of
all the je messages received). The partiaI juncture sends a Je message with a Pj list
containing only the Pj elements common to all the je messages received, minus the
matching split point.
For example, node 6 learned it was a partial juncture when it received two je
messages with the same origin (same Fj). Node 6 can aIso tell which node is the
matching split point (node 1), by 100kiDg for the first identical address iD the Pj list
of bath messages. Node 6 put this split point Dode id iD the stop field of the papp
messages it sends to no de 3 and 4. These papp messages bring aItern,je paths to
node 3 and 4.
Nodes 9, 12, and 13 l("arn that they are full jUDctures when they received two je
messages with dlfferent Fjs (node 9 -> 1 & 2, no de 12 -> 9 & 2 and node 13 -> 9 &
:lThis classification requites the knowledge oUhe distance on each link. In this thesis, it is assumed that this information is obtained using a traditional Bellman-Ford method. Further research could develop a protocol whlrh would combine this phase of the setup and the second one which is described in the next paragraphs.
r !<
" , CHAPTER 2. GENERAL CONCEPTS 16
no de Imk dlr mat messages 1 0 U 1 Jc(FJ - 0, PJ U)
3 D 7 fapp(orlg;n = 9, dislance = 7) 4 D 7 fapp( orlgm = 9, mstance == 7)
2 0 U 1 Jc(FJ - 0, PJ - D) 5 D 1 fapp(orlgm::: 9, distance = 7), fapp(ongm = 12, dIstance == 9)
3 1 U 2 Jc(FJ - l, PJ .;-rïJY 6 D 6 fapp( orlgln == 9, mstance = 6), papp( orlgU\=6, dlstnncr= 3)
4 1 U 2 Jc{FJ - 1, PJ - [1]) 6 D 6 fapp( orlgm == 9, distallce = 6), pnpp( orlgm:6, dIStance;:: 3)
5 2 U 2 Jc(FJ = 2, PJ [2]) --7 D 6 fapp(orlgln = 9, mstance = 6) 8 D 8 fapp( orlgln == 12, distance = 8)
6 3 U J Jc(Fj - l, PJ - [I)) 4 U 3 Jc(FJ = l, PJ = [11) 9 D 5 fapp( orlgln == 9, mstance = 5)
7 5 U 3 Jc(FJ - 2, PJ [2,5]) 9 0 5 fapp( ongm = 9, distlll1ce = 6)
8 5 U 3 Jc(FJ - 2, PJ [2,5]) 10 0 7 fapp{orlgln == 12, mslance = 7)
9 6 U 4 Jc(FJ - l, PJ - [I)) 7 U 4 Jc(FJ = l, PJ = (1])
12 0 6 fapp(orlgln:: 12, mstance = 6) 11 0 7 fapP(orlgln:: 13, mstance = 8)
10 8 U 4 Jc(FJ = 2, PJ 12,5)) 12 0 6 fapp(orlgln:: 12, dIstance = 6)
11 9 U 5 Jc(FJ - 9, PJ - J9J} 13 D 7 Capp( Oflgll1 ::: 13, di.t"" .. e = 7)
12 9 U 5 Jc~FJ - 9, P.l - !9j) 10 U 5 Jc(FJ == l, PJ == [2,5]) 13 D 7 Capp(orlgln == 13, mLtance = 7)
1-13 11 U 6 Jc\FJ - Il, PJ - ~J)
12 U 6 Jc(FJ == 12, PJ == Dl
Table 2.1: Messages stored in the first network after setup
12). Then they send fapp messages on their up links to inform nodes above them of
the alternate path the full juncture is providing.
Each node keeps track of wh;ch links are up, which are horizontal and which art:'
down. It also keeps the distance to destination for alllinks. Each node keeps the last
je message it has received on each up and horizontal link. It alsa kecps aIl the app
message it has received on a down link until a message tell the node ta remove them
These two types of messages are kept, because they are needed when there is a fai)ure
on a link.
Table 2.1 gives a list of the message stored at the nodes after the setup.
<- CHAPTER 2. GENERAL CONCEPTS 17
Figure 2.2: Second basic network graph
2.1.2 Second network
The second network (figure 2.2) where node A is the destination has two interesting
features. It contains a two node partial juncture (nodes D and E), and node
G has three up links. A two nodes partial juncture is a partial juncture made
of a pair of nodes with a horizontal link between them (note that there is no arrow
because neither of the nodes is above the other).
In this network there are also two full juncture nodes, 1 and G. Node 1 provides
an alternate path to nodes H, F, C, and G. Node G provides an alternate path to
nodes B to F.
Table 2.2 gives a list of the messages stored at the nodes after the setup.
2.1.3 Third network
The important aspect of the third network (figure 2.3) is that it conta.ins a two node
full juncture (nodes U and V). A two nodes full juncture is a full juncture
made of a pair of l':.odes with a horizontal link between them. It is also interesting
to note that if the horizontallink between those two nodes were not there, no de R
would be a partial juncture, but no de Q would still be a full juncture.
Table 2.3 gives a list of the message stored at the nodes in the third network after
the setup.
"
CHAPTER 2, GENERAL CONCEPTS 18
node Imk chr dut meslages B A U 1 Jc(FJ - A, PJ Il>
D D 5 Capp(oriSID = G, distance = 5) E D 5 Capp(oriSID = G, chstance = 5}
C A U 1 Jc(FJ - A, PJ = UJ. F D 5 Capp(orisin = G, distance::; 5), fapp(origm = l, chstance = 7)
D B U 2 Jc~Fj - B, PJ ::; !~!~ E H 3 je(FJ = B, PJ ::; [B]) G D 4 Capp(ongm = G, chstanee::; 4)
E B U 2 JC!~J - B, PJ - !~!~ D H 3 je(FJ = B, PJ = (B]) G D 4 fapp(ongm = G, chstanee = 4)
F C U 2 Jc(FJ - C, PJ Il> G D 4 rapp(origm::; G, distance = 4) H D 6 Capp(ongm::; l, chstanee = 6)
G D U 3 JC~~J - B, PJ - !~P E u 3 Jc(FJ = B, PJ = [B]) F U 3 Jc(FJ = C, PJ = [F)) 1 D 5 fapp(origin = l, d ... tance == 6)
H F U 3 Jc(Fj - C, PJ - [.F)) 1 D 5 fapp(ongin ::; l, distance == 5)
1 G U 4 ~c~~J - G, PJ = ID H U 4 Jc(FJ::; C, PJ = [F) --
Table 2.2: Messages stored in the second network after setup
Ot:S11NATION
Figure 2.3: Third ba.sic network graph
CHAPTER 2. GENERAL CONCEPTS 19
node link wr chlt me"agea y Z U 1 Jc(FJ - Z. PJ - O~
U D 5 Capp(orltpn = Q. d.1ltance = 5) V D 6 Capp(oritpn = R. diatance = 6) W D 6 Capp(origin = R. di,tance :: 6).Capp(origm = p. cLstance = 7)
X Z U 1 Jc(Fj - Z. PJ - m T D 5 Capp(orltpn = T. diltance = 5)
W y U 2 jc(FJ - Y. PJ - IY]) S D 5 Capp(orltpn = p. di,tance = 6)
V Y U 2 Jc(FJ - Y. Pj - M) R D 5 Capp(orltpn = R. distance = 5)
U y U 2 jc(FJ - Y. Pj - IY]) Q D 4 Capp(origln = Q. di,tance = 4)
T X U 2 JCfFJ - X. PJ - m V D 4 Capp(orltpn = Q. chslance = 4)
S W U 3 Jc(FJ - Y. PJ - [Y.Wl} P D 5 Capp(oritpn = p. di,tance = 5)
R W U 3 JcTFJ - Y. PJ - (!.W)) V U 3 Jc(FJ = Y. Pj = M) Q H 4 Jc(Fj = Q. PJ = 0) P D 5 Capp(orllPn = p. chstance = 5)
Q T U 3 jc(FJ - X. PJ - 0) U U 3 Jc(FJ = Y. PJ = M) R H 4 Jc(Fj = Y. PJ = M)
P s U 4 jc(FJ = Y. Pj - (!.W]) R U 4 Jc(FJ = R. PJ = Dl
Table 2.3: Messages stored in the third network after setup
2.2 Chronological table of events
A chronological table (see table 2.4) has the following structure. The first column
shows the time at which an event occurred. The first row shows the node which is
involved, the second the links of the node, and the third the direction of the link (U
up, D-down, H-horizontal, *-broken link) at the beginning of the example. Nodes
are separated by double lines. Each entry in the table is made up of two parts. The
first line expresses the event which occurred (failure or message arrival) in the column
corresponding to the link on which it occurred. The second shows the direction of the
links after the event occurred (after a failure or upon receiving a message mentioning
a failure farther away, the direction of a link can change). Here is a list of possible
events3 :
FAIL failure on that link
3Each message of an event "receiving a message" will be explain in the next chapter.
_. __ .. __ .. _-_ ..... _------------------------------------
CHAPTER 2. GENERAL CONCEPTS
35 FAIL D U *
38 U
43 AF12 D U *
44 AF9 D U U
Table 2.4: Example of time table
5 U
UPD 9 D
UPDxx receiving an update message \Vith xx = alternate distance to desti
nation.
QC xx receiving a quick cost update message with xx = new distance to
destination.
Fxx receiving a fapp message with xx = origin of the message (full junc
ture id)
AFxx receiving an anti-fapp message with xx = origin of the message (full
juncture id)
Fxx receiving a papp message with xx = origin of the message (partial
juncture id)
APxx receiving an antz-papp message with xx = origin of the message (par
tial full juncture id)
20
The tables in this document assume that time is discrete and that processing and
sending messages take a constant time. It takes 2 time units for anode to interpret an
incoming control message and adjust its representation of the network. It then takes
1 time unit for each message the nocle builds and sends to its neighbors (sending time
includes the time to cross the link). If the receiving node is already busy proccssing
-----------------------------_.
CHAPTER 2. GENERAL CONCEPTS 21
or sending another message, the incoming message waits until it is free. If anode is
busy with other control messages, new incoming messages will wait.
The assumption that time is discrete and that the operations take a constant time
keep the messages tractable when the simulation is done manually.
The proposed algorithm also relies on certain assumptions. It assumes that there
is a protocol a.vailable to inform the two nodes sharing a link when this link fail. It
also assumes that no failure will occur before the end of the setup period. It also
simplifies its task by allowing new links to be added only during the setup.
Chapter 3
Isolated link failures
This chapter presents the different types of link failures and how a network reacts to
them. Examples of single link failures illustrate which messages are sent and what role
each type of message plays in the adaptation to a failure. Note that the description
of events in the examples follows a depth first fashion rather than a chronological
fashion. That is, it follows the consequences of one message before considering the
consequences of another.
Since this system is distrihuted, it can he described from the point of view of each
node. The description of a "~action to a failure takes the following approach. When
a no de sees one of its links has failed, it can easily identify it as a failure on an up
link, on a horizontallink, or on a down link. The link failure can be classified further 1
according to the number of links with the same orientation as the one which failed.
The way the no de reacts to the failure depends upon the class to which it belongs.
3.1 Failure on down link
When a failure occurs on a down link, the node has to rernove the link and any
information that came on that link from its internai representation. In addition, it
has to start a process which will "chase" and rem ove all messages which have traversed
the broken link as far as they were propagated. These messages (fapp messages and
papp messages) carry information about the network helow which is not valid as a
22
•
• "" _ ..... ~ u.-. ................ _ .. :WiitaZ tlt
CHAPTER 3. ISOLATED LlNK FAILURES 23
dnLinadon
Figure 3.1: Failure on link 8 - 10
result of the failure. The way the proposed system "chases" these app messages is by
sen ding other messages which follow exactly the same path as the original messages
and remove them on the way. These other messages are called anti-fapp messages
and anti-papp messages l .
If tbere were two down links before the failure, tben aftel' the failure, the node
would no longer be a split point'. Since the last je message which was sent had this
split point in its Pj list, a newone has to be sent to inform down links of this change.
Consider a failure on link 8-10 {rom the point of view of node 8 (figure 3.1). Node
8 has to send anti-messages to chase the message that came from juncture node 12.
The message that came from jUDcture Dode 12 was a fapp message, 50 node 8 will send
an anti-fapp message on link 8-5. When node 5 receives the message from no de 8, it
looks {or a fapp message with origin node 12, removes it and then sends3 the anti-fapp
message on all up links which do not lead to a full juncture or to the destination.
Node 5 then sends the anti-fapp message to node 2. Node 2 does not propagate the
anti-fapp message to node 0 sin ce node 0 is the destination.
1 A description of these messages and ail other messages used in trus system are described in appendix A.
2If there were more, the node would still be a split point. If there were only one, the node would never have been a split poant in the first place
lNote that if node 5 were not to find a fapp meuage of origin 12 from link 8 it wouId not send the antl·fapp meuage any farther, sirIce there would be nothing to remove.
CHAPTER 3. ISOLATED LINK FAIL URES 24
Df.'illNATION
Figure 3.2: Failure Q - R
3.2 Failure on horizontal link
When a failure occurs on a horizontallink, the node has to remove the link and the Je
message that came on that link from its internai representation of the network. Then
it has to evaluate its new juncture status, and react according to this new status.
There are three possibili ties:
• The no de preserves its juncture status. If the node is aIready a juncture with
the same status (it has 2 up links or it is part of another tvo node juncture),
then the failure does not cause a change in status. The node does not need to
send any message, since nothing has really changed for the rest of the network:
the node still provides an alternate path of the same quality (through a juncture
with the same status).
For example, consider a failure on link Q - R in figure 3.2, from the point of
view of no de Q. Node Q is still a full juncture, so it does not need to send any
message .
• The no de was a two node full juncture and it becomes a partial juncture
or a tvo Dode partial juncture. In this case the node has to send papp
messages on up links and a new je mes3age on down links to replace the oid
messages.
-------------------------------------_ .. _--_. --
( CHAPTER 3. ISOLATED LINK FAILURES 25
Figure 3.3. Failure D - E
Consider the same failure Q - R from the point of view of no de R(figure 3.2).
Node R becomes a partial juncture, because its two up links have the same Fj
(Y). To reflect this change in status, node R sends a papp message to nodes
V & W and a new je message to node P. When nodes V & W receive this
message, they replace the old fapp message from R by the new message. This
message st~ps there4, but they send an anti-fapp message to node Y to remove
its fapp message. When node P receives the new je message, it discovers that
it is a partial jundure, since both of its up links have the same Fj (Y). As a
result, it sends a papp message on it.s up links to replace the fapp message .
• The node stops being a juncture. If the node had only one up link and one
horizontallink, it stops being a juncture when the horizontal link fails. The
node has to send a new je message on its down links and an anti-app message
on its up link.
Con si der a failure on link D - E from the point of view of node D(figure 3.3).
Node D is not a partial juncture anymore 50 it has to send a new je message
to node G. Node G reacts to this message by sending a new papp message up
on links D and E. They do not send an antl-papp message to node B sin ce it
was the split point of the juncture before the failure.
4Node Y is the split point for Dode R as a partial juncture. However, the fapp meuage send by node R did reach node Y, so an anti·fapp meuage is required to remove it.
CHAPTER 3. ISOLATED LINK FAILURES 26
3.3 Failure on up link
When a failure occurs on an up link, the node has to remove the link and the Je
message that came on that link from its internaI representation. It then has to
reassess the direction of the other links, and if need be, reorganize them and evaluatc
its new jundure status. Failures on an up link can be divlded into threc categories
according to the number of up links present before the failure. An example of each
kind is given after the following list .
• If there are more than two up links before failure, th en the node will remain a
juncture. If there is no change in juncture status, nothing has changed for Ùe
rest of the network.
A full juncture can become a partial juncture if all the up links except the one
that failed had the same origin (same Fj), and It did not have a honzontallmk
with a different origin (if it did, the node remains a full junctllre) III this case,
the no de is still ajuncture and the links do not have to be reorganized. However,
the node has to send a new je message on all down and on all horizontal links as
weIl as papp messages on all up links. This is done to inform other nodes of the
change in its juncture status. The papp message will tell up nodes to replace
their fapp message by the new papp message. The new je message WIll replace
the old ones.
If the node has a horizontal link which carried a jc message with the same Fj
as the remaining up links, then the node which shares the horizontal link will
change status from a (two node) full juncture to a (two node) partial juncture.
This change will be done when the no de which shares the horizontallink receives
the JC message from the node which "suffered" the failure .
• If there are two up links before the failure, then the node's reaction to the failure
depends on the presence of a horizontallink.
- If the node is not part of a tvo node juncture (no horizontallink), then
it los es its juncture status. The node has to inform every no de which knew
(~
CHAPTER 3. ISOLATED LINK FAILURES 27
that it was a juncture that it is not a juncture anymore. So, it sends the
je message which was received on the remaining up link on all down links
and sends an anti-app message on all up links.
- If the node is part of a t1l0 no de juncture which changes its juncture
status due to the failure, then the node has to inform nodes which depend
on this juncture about this change in status. It sends je messages on down
and horizontal links and papp message on up linkss.
- If the no de is part of a t1l0 node juncture which does not change its
juncture status due to the failure, then the node only need to send a new
Je message on its horizontallink(s).
In all cases the node will not need to reorganize its links .
• If the failure occurs on the only up link, then the links have to be reorganized
whether or not the node is part of a tvo node juncture. The link(s) with
the shortest distance to the destination will become up link(s). Links with
a distance equal to that of the up link( s) plus their respective link cost are
considered horizontal links. Links with larger distances are labeled down links.
The node sends an update message on the new up link(s) to reach anode which
has access to the destination via another path. Once such anode receives the
updale message on one horizontal or up link, it puts this link in the right position
(horizontal or down) and sends aje message back on that link. At the same time
that it sends an update message, the node sends a qcu message on horizontal
and on down links to inform them of the increase in distance to the destination
on that path. When the je messages from all new np links have reached the
no de which "suffers" from failure, the node can send a new Je message on al1
down and on all horizontal links, and the appropriate app message on all up
links, if the node has become a juncture.
Il A failure cannot cause a partial juncture to become a full juncture, so if there was a change it is full to partial. Hence the app meuage must be a papp me66age.
CHAPTER 3. ISOLATED LINK FAILURES 28
dt.lUl1Uon
Figure 3.4: Failure on link F-G
3.3.1 Example of failure on one of many up links
Consider a failure on link F-G from the point of view of node G (see figure 3.4)
This failure Îs on that up link which has a different origin from that of the otller up
links. Node G does not lose its juncture status but becomes a partial Juncture. ft
does this because the JC messages that came on links D and E have the same Fj (B)
It was the jc message coming from node F with Fj = C that made node G a full
juncture. This link failure does not force the node to reorganize its links, but node G
still has to inform nodes around it about the change in juncture status. First node
G sends papp message on link D and E t.hen it takes the Je message from both up
links, builds a new je message and sends it to node I.
When no de D and no de E receive the papp message from G they look for a similar
app message. When they discover the fapp message from the same origin, they replace
it by the papp message.
When no de 1 receives the je message from G , it compares the new message's Fj
with the Fj of the je message on link H. The two Fjs are different, the one 011 G has
Fj = B and the one on H has Fj = C, 50 no de I remains a full juncture. However,
the failure might have changed the group of nodes that receive its fapp message6, 50
it sends again a fapp message on all up links. This message is propagated to nodes
B to H.
6In this case it does change that group, before nodes D and E were getting node G 's fapp meuage, now they get node l's message instead.
CHAPTER 3. ISOLATED LINK FAILURES 29
dnlinl'ion
Figure 3.5: Failure on link 7 - 9
However, a failure on one of more tban two up links can have fewer consequences.
Consider a failure on link E-G. Since links D and F carry JC messages with different
Fjs, node G would still be a full juncture and the JC message it wou Id huild would
not he different. Therefore, there should be no need to send a new je message down,
and no need to send app message up. To sum up, no message exchange would be
required from the down part of the failure.
3.3.2 Exalnple of failure on one of two up links
Consider a failure on link 7-9 from the point of view of node 9 (see figure 3.5). This
failure is on one of its two up links. Node 910ses it juncture status, so it bas to inform
the nodes that could be atfected by tbis. To do this it sends anti-fapp message on
link 6 and a je message on links 11 & 12.
When node 6 gets these messages, it looks in its internai representation for the
corresponding fapp message. When it finds it, it removes it and propagates the ant1.
fapp message on up links which do not lead to the destination, bere links 3 and 4.
When nodes 3 and 4 receive titis message, they search in their internai representations
to locate the corresponding fapp message, remove it, and propagate the anti-fapp
message up to node 1. When node 1 receives the first anti-message, it removes the
fapp message and does not propagate it up to node 0, sinee no de 0 is the destination.
Wh en node 1 receives the second copy of the anti-message, it does not find tbe message
CHAPTER 3. ISOLATED LINK FAIL URES 30
Figure 3.6: Failure on link Q - U
to remove, 50 the anti-message is not propagated any further.
Just alter node 9 sends the antt-fapp message up, it takes the je message it received
from no de 6, and 5ends it to nodes 11 and 12.
When no de 11 receives the messages, it sends it to no de 13. When no de 12 receives
the je message from no de 9, it compares the message Fj (1) with the Fj (2) of the Je
message that came on link 10. They are different, 50 node 12 is still a full juncture.
The new je message that node 12 would send to node 13 is identical to the last
message sent, so there is no need to send it down again. However, the failure might
have ehanged the group of nodes that receive its fapp message, so it sends agam a
fapp message on all up links.
When node 13 receives the je message from Rode 11, it compares the Fj (1) of this
message with the Fj (12) of the je message received on link 12. lt discovers that it is
still a full juncture, sinee they are different. Again, the failure might have changed the
group of nodes that receive its fapp message, 50 node 13 sends again a fapp message
on all u p links.
The previous example illustrate5 what happen5 if there is no horizontal links.
With a horizontallink the situation is different, the ho node juncture IDight keep
its juncture status. FOT example, if a failure occur on link Q - U of figure 3.6 node
q is still part of a full juncture, because the Fj(X) on link T is different from the
Fj(Y) of link R. The ooly thing, node Q has to do is send a je message to node R,
because this message is different from the last message seni. to notle R.
(
CHAPTER 3. ISOLATED LINK FAILURES 31
Figure 3.7: Failure on link 2 - 5
However, it is not because anode is part of a tvo node juncture that it neces
sarily keeps its juncture status. If the failure occurs on link Q - T instead of link
Q - V, node Q becomes a tvo node partial juncture because link V bring a je
message with the same Fj as link R. In this case, node Q has to send a papp message
on link U to replace the old fapp message. Node Q still sends a je message on link
R and if it had down links it would need to send them the je message.
3.3.3 Example of failure on single up Iink
Con si der a fallure on link 2-5 from the point of view of node 5 (see figure 3.7). This
failure is on its only up link (2). Node 5 promotes link 7 to an up link because link 7
has the shortesL path to the destination (node 7's distance is six, node 8's is eight).
Link 8 remains a down link because its distance to the destination is greater then
seveD (if it were seven it would become horizontal).
To get a je message from its new up link (7), no de 5 sends it an update message.
The role of this message is to change the direction of the links which it traverses7,
7This message is sent on the horizontal or dowll Iink(s) with the shodest path to destination. Consequently, each node which receives it has to have another Iink with a distance smaller than the one of the links which carried the message. Therefore, the Iink receiving this message cannot remain up. See appendix A for details on the content of this message and section 4.4 for details how this message havels in difficult situations.
r ~ l'
CHAPTER 3. ISOLATED LINK FAILURES 32
update their distances and reach a nodeB which can send back a jc message. Node 7
receives this update message. It reacts to this message by changing the distance on
link 5 from three to nine and reorganizing its links. As a result of this reorganization,
link 5 becomes down and the link 9 (with a dIstance of five) becomes up. Node 7
recognizes that it cannot send a Je message back without help: the new up hnk was
down, so it does not have a JC meliSage. To obtain that message, node 7 sends an
update message on link 9.
Wh en no de 9 receives the update message from node 7, the mformatlOn in the
message causes no de 9 to label link 7 down (the increase is too big to make it hori
zontal). Node 9 loses its juncture statns because it is left with only one up link. Node
9 has to inform the rest of the network of this status change. It sends an anti-fapp
message on the remaining up link (6). Node 9 also takes the JC message from link
6 and sends it on all horizontal and on all down links (here nodes Il and 12). This
message sent to node 7 is the answer to the update message, that node 7 is expecting.
When node 6 receives an anti-fapp message from node 9, it finds the preceding
fapp message from node 9 and removes it. It then sends an anti-fapp message to nodes
3 and 4 since it knows that neither of them are the destination. These nodes also
find the preceding fapp message from 9 and removes it. Then they send an anh-fapp
message to node 1. Node 1 reacts to the anti-message by removing the preceding
message and not sen ding the anh-fapp message to node 0 since it is the destination.
When no de 11 receives the jc message from node 9, it propagates thls message to
Dode 13. Node 12 also receives a jc message from node 9, but its reaction depend!.
on a second message that it gets from link 10. The description of this reaction is
postponed to include this second message.
When no de 7 receives the JC message from no de 9, it sends it to node 5, which
sends it to node 8. The way node 8 reacts to this Je message depends on the effects
of another message no de 5 sent before.
Right after node 5 sends au update message to no de 7, it also sends a quick
cost update(qcu) message to node 8. The role of this message is to inform nodes
8Such anode can only be a juncture (single node or tvo node juncture)
(, ~. CHAPTER 3. ISOLATED LINK FAILURES 33
between a failure and a juncture9 below it that their distance to the destination
has increased due to the failure. For node 8, the distance on link 5 increases from
two to six. If node 5 were to wait until it received the jc message before telling
the horizontal and down nodes about the distance increase, many normal messages10
would be routed on longer paths.
When node 8 receives the qcu message from node 5, it adjusts the distance of
that link and removes the last jc message which came on that link from its internaI
representation. It then discovers that both link 5 and link 10 have a distance of seven,
so both links become up links and the no de becomes a juncture. Note that node 8
does not have a jc message for either of its up links, so it cannot tell if it is a full or
partial juncture until those messages come, but it can still route normal messages to
the destination as usual. Node 8 also sends the qcu message on all the links which
were down or horizontal before the qcu message from no de 5 came (here link 10).
Node 10 receives that qcu message. In response to that message, node 10 adjusts
the distance to the destination on link 8, reorganizes its link and sends a qcu message
on link 12. As a result of the reorganization, node 10 discovers that now link 8, with
a distance of eight, is down and that link 12 with distance of six, is up. Since link 8
is down, node 10 removes the last je message which came on link 8 from its internaI
representation.
Node 12 receives one message on each of its up links. The first message, mentioned
before, is the JC message from node 9. The qcu message from node 10, is the second
message. These two messages put no de 12 in a special situation. They each bring
information about a different impact of the failure and depending which one gets to
node 12 first, the sequence of events will differ. This is called a delay dependent
situation because the order in which they reach the node depends on how much
they are delayed. The final result is the same but the exact number of messages
transmitted depends on which message reaches the no de first. The following two
9This message stops when it reaches a jundure because the juncture provides another path with the same distance. On the other hand, any node which bas a link with a distance smaller then the one provided by the link carrying the qcu me"age still sends a qcu me"age to its down links, but this messAge is different from the one received. It refteds a path ta destination going through the link with the smaller distance. See appendix A for details on the content of tbis message.
lONon-control messages, ie user's messages
CHAPTER 3. ISOLATED LINK FAILURES 34
descriptions show what happens in each case .
• Node 12 receives the je message from node 9 first. It compares the message's Fj
with the Fj of the last je message on link 10. Since they are different (link 9'5
message has Fj = 1, link 10 has Fj = 2), no de 12 keeps its full Juncture status.
In reaction to the message, node 12 builds a new je messagc, but it does not
send that message to node 13, since this new message is identical ta the last
one it sent. However, despite the fact that the juncture status has not changed,
no de 12 still sends a fapp message on aIl up links, because the failure Ilught
have changed the group of nodes that receives node 12's fapp messagc
When no de 12 receives the qcu message on link 10, it labels the link down, sends
anti-fapp message on link 9, and sends a new jC message to nodes 10 and 13.
The anti-message will rem ove the fapp message at nodes 9, 6, 3, 4, and 1.
• Node 12 receives the qeu message from node 10 first. It adJusts the dIstance of
that link and discovers that that link should be labeled down. This implies that
the node is no longer a jundure. Node 12 needs to tell the rest of the ndwork
its new status, so it takes the je message from the other up node (9), and sends
it to node 13. It also sends an anti-fapp message on link 9. Node 9 is the only
no de with fapp messa.ge from no de 12, so the anti-message stop there
When no de 12 receives a JC message from 9, it sim ply sends it to nodes 10 and
13.
In both cases, nodes 10 and 13 receive the JC message only when node 12 has
received the qcu message.
When node 10 receives the jC message from no de 12 it sends it to node 8 When
node 8 has received the jc messages from both of its up links, it discovers that it is a
partial juncture.
Node 13 is also in a delay dependent situation. Node 13 is a jundure which
gets information about the failure from its two up links. As in the case of no de 12,
the no de status and reactions are temporarily different depending on which of the up
links gets the message first. However, this case is slightly different from the case of
( CHAPTER 3. ISOLATED LINK FAILURES
Il ~ _e_ve_n_t 4 1-'1i":-' n"""k-I __ I-_se-,Q:....u-rT-c_e_l_""'lin-:""k id 1/ 1 1 FJC( Fj = 1) , '1 ~ full jundure a .
Il 2 l ' FJC( Fj = 9} '1
~ull jundure 'Y
Il 3 1 - 1 FJC( Fj = 1) Il
partiât jundure ï3
Il event llink 11 seQUrce 3 link 12 ,1
Il l FJC( Fj = 1) '1
partiât jundure ï3
r 2 1 FJC( Fj = 1) l '1 partiât jundure 6
Il 3 1 FJc(FJ = ï) l '1
.. partial jundure &
Il event llink Il seQUrce 2 Iink 12 ,/
Il l , FJC( Fj =~
partiaI jundure (3 ~
Il 2 , FJC( Fj = 1) '1
partial jundure 6
Ir 3 1 FJC( Fj = 1) , '1 1/ partial jundure E:
" event llink Il seqUrce 4 link 12 Il
Il 1 1 FJC( Fj = 1) , '1
full jundure a .
Il 2 f 1 FJe( Fj - 1) '1 partial jundure (3 .
Il 3 l '1 Table 3.1: Possible sequences of event at Dode 13
35
node 12. With Dode 12, one liDk brings a qeu message which causes the reorganization
of its links, and the 10ss of its juncture status. Node 13 only gets je messages, but
there are three different messages which can reach no de 13 at different times, and
depending on the order in which they arrive the no de reacts differently. Table 3.1
illustrates the various sequences of events. One factor limits the number of possible
sequences of event. Node 12 sends a je message to node 13 only after it has received
the qcu message from node 10, so only four of the six sequences can take place.
The table entries have two components. The first line shows to the message that
came on the link corresponding to the column where it is located. The second line
shows the status of the node aiter the message have been processed a.nd the symbol
which refers to a note of explanation. The following list of notes group them together
to explain their common features. Note that before node 13 receives the je messages,
the Fj from link 11 is 9 and the Fj from link 12 is 12. Node 13's fapp messages have
reach nodes 10, lI, and 12.
a Node 13 remains a full juncture because the Fj(l) of the new message is different
from the Fj(12) from the message received on the other link. Node 13 has to
CHAPTER 3. ISOLATED LINK FAILURES 36
send a new fapp message on all up links to reach nodes that might not know
that it is a full juncture. The new message will reach nodes 1, 3, 4, and 6 in
addition to nodes 9, 11, and 12, because node 9 is not a full juncture a.nymore
and the fapp message does not stop there anymore.
f3 When node 13 becomes a partial juncture, it sends a papp message on all of its
up links. This message will replace the fapp message sent before. It will reach
nodes 9, 11 and 12.
"'( When Dode 13 receives a second Je message which makes it preserve its full juncture
status, it sends a fapp message on ail up links.
6 Node 13 receives a JC message which has a Fj different from the one on the other up
link. This normally means that the node should becomes a full juncture. How
ever, no de 13 recognizes that this situation would lead to a juncture status
oscillationll • Consequently, it remains a partial juncture and do es not send
any message.
e The second JC message makes node 13 become a partial juncture again.
In all cases, node 13 ends up considering itself as a partial juncture and sends papp
message on all up links. There is no formal proof that this will work for all cases, but
for all tests presented in chapter 5 no problem were found. These tests are performed
on a graph representing the most important nodes of the Arpanet and they involvc
many different failures.
Figure 3.8 shows the network configuration aCter the failure.
Table 3.2 summarizes the message exchange during the example.
11 A message that would cause a juncture to change its status from partial to fuU junct ure and then to partial jundure again. See page 39 for details
( CHAPTER 3. ISOLATED LINK FAILURES 37
U.e , • 7 • Il
• T 2 • t , Il r. III " 12 11 7 6 D D 11 D U U D U D U D D U U
n P~IL .. .. 3. u .. ~ li .. 311 Q'; 1 · • 1 UP~ • . · · ft A~ " . . • PJ~ 1 . fi l'! 2 · . · 1>0 FJ!, 1 . . U A~'2 · . · U l'J~ 1 · ". A~12 • . " PJ; 1 . n l' •• . . 112 1"" . . -' ".e 10 Il 1" 13
12 • 13 li 13 10 Il 12 11 D U D U D U U U U
• 2 Q<; • . ., FJ!, 1 · .. ta FJ: 1 . fi .. ~ . .. . tG .. !~ l''J~ 1 · . f>2 l'! 3 FJ: 1 .. .. . U PJ; 1 P!3 · .. U 1"13 · .. . "II 1'!3 · 60 l' •• . &3 1''-· ..
Ta.ble 3.2: failure link 2-5 .
c
CHAPTER 3. ISOLATED LINK FAILURES 38
de.llll.uall
Figure 3.8: After failure on link 2 - 5
(
Chapter 4
Strategies
This chapter presents various approaches which have been used to limit the number
of messages transmitted. It also presents the reasons why these approaches work. At
the end of this chapter a table shows the improvement that each strategy provides.
4.1 Fighting an illusion
There are situations which cause many messages to be exchanged in valD. One of
these situations is called a juncture status oscillation. This name describes a
partial junct ure node w hich oscilla tes once (partial juncture -> full junct ure -> partial
juncture). This situation arises when anode above the split point matching a pa.rtial
juncture sends a new Je message. In addition, the message's Fj must he different t'rom
the one contained in the last Je message tha.t was sent. Wh en the partial juncture
matching the split point receives the new Je message on one of its up links, it considers
itself to be a full juncture since the other up links still have Je messages with different
Fj. Consequently, it sends fapp messages on its up links and a new je message on
down links. When the second je message arrives at the juncture, the juncture learns
that its status was not full but partial, so it sends papp messages and another Je
message.
For example, in figure 4.1, node C sends a je message with a new Fj because of
the failure. The message reaches node G on one of its up links first. This link then
39
CHAPTER 4. STRATEGIES 40
DESTlNA'nON
:' , , '... " " ,...... ,
,,1 ... ,
" ,
Figure 4.1: Example for juncture status oscillation
has the new Fj and the other link has a Fj equal to C, so node G considers itself to
be a full juncture. It sends a new Je message down to link H and fapp messages to
nodes E and F. When node G receives the second Je message, it discovers that it is
a partial juncture, so it sends another Je message to node H and papp messages to
nodes E and F.
The first Je message that node G sent could have been skipped, since only the
second one reflects a persistent change. The nodes which receive the app messages
have the same information about the partial juncture as before they received the two
series of messages. So the juncture, does not need to send the fapp messages and
papp messages as they only reflect a temporary illusion.
One way to identify this illusion is to ruscern the JC messages which caused the
problem from other Je messages. When a partial juncture receives a je message with
a Fj different from the one of the previous message, it looks for the split point in the
new message's Pj list. If this split point is in the list, then the other up links will
receive a je message with the split point in the message's Pj list. Therefore, the node
will remain a partial juncture. Once this situation is identified, the juncture neither
changes its juncture status nor sends fapp messages. It waits until all up links have
received their new je messages to send down a new je message.
4.2 Shorter je messages
In [Tsu87], the Je message has the following structure:
•
( CHAPTER 4. STRATEGIES 41
• destination - node id of the destination (if more than one destination)
• Fj element - the node id of the last full juncture encountered (or the first node
after destination)
• Pj list - a list of pairs of the form:
(node id of last split point, distance to destination at split point)
The distance field among the elements in the Pj list is used to inform partial
junctures how far their papp messages should travel. When anode discovers that its
juncture status is partial, it looks at the Pj lists of the Je messages it received on its
up link. The first Pj element which is common to aU of these lists was added to the
list by the matching split point. This element contains the split point 's node id and
its distance to the destination. The partial juncture subtracts this distance from its
own distance to the destination in order to obtain the distance to the split point.
In our approach, the distance field was eliminated from the Pj element. This
modification decrease the space used by the Pj list by 50 % (if the distance uses the
same space as the no de id.) In a jc message with one Pj element this represents a
reduction of 25 % of the size of its equivalent in the Tsuchiya approach. As the number
of Pj elements increases, the reduction approaches 50 % (limn ..... oo t:2: = 50%). As a
result of the use of the shorter Je message, the total bandwith was reduced by 10 to
25 % during tests. l .
There are two ways to avoid using the distance. If there is a common Pj element,
the juncture puts the node id of this Pj element in the papp message at the stop
field. This field allows the message to stop one link away from the split point (the
common Pj element's node id is the split point's node id). If there is no common Pj
element, the number of Pj elements is used as a measure of how far the papp message
should go. To see how this works, consider the situation in which a partial juncture
would find no common Pj element.
In figure 4.2, no de F receives a JC message with the Pj list [Pj(A),Pj(C)] on link
lThe Je me$$age becomes the largest message when there is more than three Pj elements in the list. In addition, these messages are sent frequently. Every linlt carries one during the setup, and the y represent around 15 % of the messages sent in reaction to a failure
t t t, t-'
~ t [
~ 1 l f!" ,-~ l
~ 1 • . '
f. t
:"
; ~ ,
'.
'"//Ir" CHAPTER 4. STRATEGIES 42
DESTINATION
Figure 4.2: Short juncture message - first example
E and one with [Pj(D)] on link D. Node D removed Pj(A) because no de D was a
partial juncture matching with the split point A. This is the same no de A that is the
split point mat ching node F. In general, a partial juncture does Dot find a common
Pj when there is another partial juncture above which matches the same split point.
This partial juncture might not find a corn mon Pj, but this would imply that therc
is yet Mother partial juncture matching the same split point. There must be partHlI
juncture ab ove the other partial junctures which does find a corn mon Pj clement.
When a partial juncture finds no common Pj, it does not use the stop field in the
papp message. Instead, it uses another field called the Pj count. For each up link,
the juncture counts the number of Pj elements in the Je message from that link, puts
this number in the Pj cou nt of a papp message, and sends the message up. Each split
point which receives a papp message decreases this count. The message stops when
the count gets to zero. For example, no de F sends a papp message to node E with a
Pj count of two, and another one to no de D with a Pj count of one. These messages
stop when they reach node A.
There is one minor si de effect to this use of the Pj coun l. The papp mes.r;age
can, in certain cases, stop before it reaches the split point. This si de effect does not
have major consequences, bec au se the nodes still receive another papp messa.ge with
a shorter distance to the destination. 2 For example, in the network of figure 4.3, node
2The reason for send,ng papp meuage8 above a partial juncture is to guarantee the providmg of a distance on dow!\ links of nodes which have a partial juncture below.
CHAPTER 4. STRATEGIES 43
Figure 4.3: Short juncture message - second example
13 sends a papp message to nodes 11 and 12. The message for node 11 has a Pj count
of two and reaches node 1. The message for no de 12 has a Pj count of one and stops
when it reaches node 5. However, node 12's papp message reaches node 3, providing
node 3 with a distance on node 3'5 down path.
One drawback to this approach is that, when there is no common Pj element at the
partial juncture, the papp message reaches the split point. However, this drawback is
compensated for by the reduction in the Je message's size.
4.3 Stopping messages
Another way the number of messages transmitted can be reduced is by stopping them
as soon as possible. This section presents two types of conditions to tell whether or not
a message should be propagated further. The first type uses knowledge of a nodes's
status to make this decision about app messages. The second type uses information
about previous messages to make this decision about Je messages and app messages.
A papp message ca.n stop one link away from a full juncture since this message
does not provide an alternate path for this juncture. The path that the papp message
refers to leads back to the juncture. If this is not the case, the partial juncture which
sent the message would be a full juncture.3 Anode can tell if the next no de is a full
juncture because if it is, the Fj part of the je message from that node will contain
the juncture node id.
JOne of a juncture's up links would have a Je meuage with Fj equal to the full juncture's node id, and at least one other up link would have one with a different Fj.
CHAPTER 4. STRATEGIES 44
On the other hand, the papp message has to go beyond a partial juncture because
there are cases where, if the message were to stop at the juncture, sorne nodes would
not receive any papp messages.4 For example, in figure 4.3, if the papp message were
to stop at node 10 because it is a partial juncture, node 3 would not receive any l'app
messages.
The reason to stop fapp messages at full Junctures IS not as strong than the one to
stop the papp message. The fapp message can stop because it only provides a longer
path to the destination than the full juncture's alternate path. Thereforc, it 18 not.
necessary to send a fapp message that far. However, a fapp message reaches a full
juncture to bring (l, distance on the juncture's down lmk which receive8 the message.
Of (,,:mrse, no app messages should be sent to the destination sin ce it 18 the functlOn
of these messages to tell nodes alternate ways to reach the destination The next
paragraphs show how information about the previous message can be usecl to take
the decision whether or not a message should he {urther propagated
Anode should not send a JC message which is identical ta the last one it sent
This new message would not bring any additional information to the nodes helow
and the app messages which could come back would not reach a greater number of
nodes then the messages which were sent before. There are two cases where the two
Je messages could be identical. The nrst case is when there is a failure 011 an up link,
but despite the failure, the node keeps its juncture status. The second is when anode
recelves a jc message with a Fj different from the one of the last message received,
but keeps its jundure status in spite of this.
A split point can receive identical app messages on different clown links When a
no de has sent t.he app message on the up links, there is no point for sencling it again.
Therefore, when a split point receives an app message, it checks whether or nnt 1t has
received an identical message. If it has, the app messa.ge should Ilot be propagated.
One of the reasons that the node keeps the last app message it received is 50 that It
can compare it with the next one.
4Without an app meuage a 1I0de assumes that the distance 011 a down Imk i~ equal to infinity
CHAPTER 4. STRATEGIES 45
DESTINATION
Figure 4.4: Guiding update message
4.4 Guiding update messages
This approach does a little more than save messages; it prevents the network from
choosing the wrong path to the destination in certain situations. This wrong choice
seems to he temporary, but the occurrence of another failure during that period
could le ad to problems. For this reason, the improvement in performance due to this
approach is not presented in a table at the end of this chapter.
The choice of the path that an update message follows should be made by the
node which "suffers" the failure, because anode helow the failure point could choose
a path w hich is effected by the failure. It is not enough to send the update message
on the link with the shortest distance as the following examples show.
Consider figure 4.4, where node 12 is a full juncture and node 9 is a partial
juncture. During the setup, node 1 receives a fapp message with origin no de 12, but
do es not reeeive a papp message from 9 because Dode 1 is the split point for that
partial juncture. In the case where there is a f&ilure on link 1 - 0, node 1 sends a
update message to node 7.
If node 7 were to ehoose on which link the update message should be sent, it would
select link 9 beeause its distance is smaller, but this path is also eft'ected by the failure
and when tbe news of the failure cornes the distance will become larger. However if
the failure was on link 4 - 1, and node 7 were still ask to choose on which link to
send the updaie message, then link 9 would be the right link to ehoose, sinee it oft'ers
r ~ ~ , f l' ' .....
~
f ~ ~ " r f ~
'.
CHAPTER 4. STRATEGIES 46
the shortest path and this path won't be efFected by this failure.
The node which "suffers" the failure already knows the distance of the path which
should be used. In the case of failure 1 - 0, node 1 did not receive the papp message
of no de 9, so it does not have the distance associated with the path going through
node 9 and it only knows of the distance through node 12 In the case of failure 4 -
1, node 4 did receive the papp message of node 9 and thls path is the shorter way to
reach the destination (six through node 9 vs eight through node 12).
One way to get the update message to use the shorte&t path IS to "guide" It by
putting the distance of the selected path in the message. In this way, anode below
the failure can select the right path.
4.5 Improvements due to the strategies
This section presents three tables which summarize the improvements due to each
of the strategies introduced in this chapter. These tables contam results from two
graphs: the basic network of figure 4.5 and a network representing essential nodes of
the ARP ANET of 19835 • The tests on the basic net work in volve the failure of link
2-5, and the reaction of each no de seeing all the other nodes as a destinatIon. The
results are averaged by the number of nodes. The ARPANET example used here,
shows the number of messages and the number of bytes sent in reaction to a failure
on the link between ARPA-USC, using MITRE as destination.
14 nodes basic network 98 nodes ARP ANET net work setup failure setup failure
continue 39 23 356 212 stopping 34 21 280 206
Table 4.1: Number of messages sent
Table 4.1 compares the number of messages sent when the messages stop as early
as possible to the number of messages sent when messages are propagated further.
Table 4.2 contrasts the total number of bytes sent using the short Je messages and
Tsuchiya's Je messages.
5This network is considered in details in the next chapter.
CHAPTER 4. STRATEGIES 47
Table 4.3 shows the result of the reaction to a failure on link ARPA-use in the
ARPANET network with no de MITRE as a destination. The improvement due to
this strategy in the basic network is not shown, because none of the ex amples selected
contained a juncture oscillation.
CHAPTER 4. STRATEGIES 48
Figure 4.5: A basic network graph
14 nodes basic network 98 nodes ARPANET network setup failure setup failure
using Tsuchiya's jc msgs 292 113 3513 2139 using short jc msgs 258 103 2671 1756 improvement 11% 9% 24 % 18 %
------
Table 4.2: Total number of bytes sent in all messages
failure ignoring jc oscillation 251 avoiding je oscillation 206 improvement 18 %
Table 4.3: Number of messages sent
(
(
Chapter 5
Performance
In this chapter, we examine the performance of the proposed failure tolerant ADR in
the {ace of failures. We organize our findings the following way. The first section of the
chapter introduces the network used for the performance analysis and the simulator
that per{orms the test. The second section considers the effects of a single failure on
98 different destinations. The third section investigates how the presence of many
destinations affects the way in which one node reacts to a failure. The fourth section
studies the impa.ct of simultaneous failures. The last section presents an example of a.
small network with a failure on a single link. It compares two reactions to this failure:
Garcia-Luna-Aceves's approach[GLA88b] and the ADR approach.
The system performance presented in this chapter was evaluated using the network
of figure 5.1. This figure represents 98 major nodes of the ARPANET in 1983. This
network was chosen since it is of a reasonable size and the connections do not suffer
from an artificial symmetry (as opposed to a mesh or other geometrical networks).
In addition, this ex ample constitutes a realistic situation for the application of such a
system Out of the 98 nodes, the following were selected as destinations for the tests
which were performed on a small number of destinations:
BERK PURDUE ReCs
UCLA COLLINS ANDREW
MIT44
MITRE
These destinations were selected because they represent the different sections of
the network. They are far apart in order to represent as wide a range of nodes as
49
"'%j ~ . ... ('D
t11 ....
~ ~
_ S"'HIIII C'~CUII
• C/JO'"' o JIIIAC b cne 'AC
ARPANET/MILNET GEOGRAPHIC MAP, DECEMBER 1983
_on '"IS •• 'OOU"Of SIt""A.'A.~t:k".t.(II1Al SAUlt.Tf CON,,(enONS "_SSMOWIOA~t '.'"".5 "01 '"lCtSSAl'lYIM051 " .... ts
(
~ '1:!
~ ~
~ "zj
~ ~ ~ Q ~
t11 CI
(
CHAPTER 5. PERFORMANCE 51
possible.
We chose links to fail which tic together large sections of the network. This way,
the tests reveal if the system reacts to failure in critical situations. The tests described
in this chapter, involve f&ilures on the following major links of the network:
UTAH-LBL RCC5-UWISC HARVARD-SCOTT
STANDFORD-ISI22 ARPA-USC MITRE-GUNTER
Thefailures on links UTAH-LBL, RCC5-UWISC, HARVARD-SCOTT, ARPA
USC and MITRE-GUNTER were chosen because tlley lie on the back-bone between
the east and the west. The STANDFORD-ISI22link was chosen bec au se it connects
the north-west area with the the south-west area.
This simulator is discrete-event stepped. It uses absolute time units which are
not based on speed of lines and other physical measures. These units only reflect
the relative order of magnitude of the time taken by each operation anode performs.
The simulator assumes that it takes 2 time units for anode to interpret an incoming
control message and adjust its representation of the network. Then it takes 1 time unit
for each message the node builds and sends to its neighbors (sen ding time includes
the time to cross the link). If the receiving node is already busy processing or sending
another messages, the incoming message waits until it is !tee. The fnct that it takes
a fix time to react to a message makes the sim ulator deterministic.
The simulator sends only control messages. A more complete analysis should
also include normal messages. We only attempt to estimate the number of control
messages required with the amount of time required to process them. While this is
not an "ideal" measu. e of the algorithm performance, we should note that the absence
of normal traffic is not eatastrophie sinee in an operational system, control messages
would be handled as soon as they arrive on the queue.
Another limitation of this simnlator is that it doe not handle ho node partial
juncture. This feature was not included because its implementation would have
required a considerable time. The system in its present forms can provide us with
significant results.
For the simulation, the address of anode takes two bytes, and the distance is
stored in at least 10 bits. The size of the messages sent, and the storage space
......
CHAPTER 5. PERFORMANCE 52
allocated inside the node reflect these dimensions. The exact contents of each type of
message is described in appendix A. Appendix C shows the structure of the node's
buffer. This buffer contains all the information anode maintains about the network.
Appendix B contains pseudo-code which gives an outline of the way anode reacts
to the different types of messages. This pseudo-code does not present, in detail, the
special cases which are covered in the actual code of the simulator.
5.1 A single fail ure
The impact of a failure depends upon two factors. These are (1) the number of nodes
which change their role because ofthe failure (split points which stop being split points
and junctures which change their status), and (2) the number of nodes which depend
upon these junctures and split points. For the same link failure, these parameters
vary according to the destination. For example, a failure on the link between MITRE
and GUNTER causes 4 messages to be exchanged for the destination PURDUE and
144 for COLLINS.
Figures 5.2, 5.3 and 5.4 contain three graphs which summarize the impact of a
failure on the ARPA-USC link for each of 98 destinations. Each graph presents the
number of destinations where a specifie parameter faUs into a given range of values.
The three parameters represented are: the number of messages sent, the total number
of bytes in the messages, and the time elapsed between the failure and the network
stabilizing. The following describcs how the graphs should be read. A bar up to y
destinations, at position Z2 (on the x-axis) means that there are y destinatIOns which
use more than Zl and at most 2:2 "resources" in reaction to the f&ilure. For example,
in figure 5.2 the bar up to the line indicating 25 destinations at position 50 on the
x-axis means that there are 25 destinations which use more than 25 and at most 50
messages in reaction to the f&ilure.
The graph in figure 5.2, shows that the number of messages required to react
to the failure is not too large. For the majority of destinations (62 %), at most 50
messages are sent. The average number of messages sent is 75. It is mstructive to
study the destinations which involve more than 400 messages. Thcse destinations
(
(
CHAPTER 5. PERFORMANCE
30
25
20
number of 15
destinations
10
5
o o
1 1 1 1
1 1 1 1 1 1 1_
100 200 300 400 500 number of messages used to react to failure
- 1
-
-
-
-
-
600
Figure 5.2: Messages sent in response to the ARPA-USC failure
messages sent for destination setup {allure
use 312 501 CIT 289 440
RAND 300 584 ARPA 283 591
ARPAI06 267 244
Table 5.1: Number messages send for destinations close to failure
53
have a point in common: they are very close to the failure (one link away from the
failure). This fact could be exploited. Since most of these nodes end up sending
more messages than during the setup phase (see table 5.1), it would be interesting
to identify this situation and initiate a setup procedure. U sing the setup procedure
for all destinations within two links of the failure would decrease the average number
of messages sent to 71. If the setup procedure were used when it takes less messages
than the normal failure procedure, the average would be brought down bring down
to 69 messages.
Figure 5.3 shows the distribution of the total number of bytes sent in response to
the failure. The majority of destinations (65 %) require at most 360 bytes to react to
OHAPTER 5. PERFORMANCE 54
30 , , 1 , , , , 1 1
25 -
20 -
number of 15 -
destinations
10
5
1 1 1
-
0 1 1 1
0 120 240 360 480 600 720 840 960 1080 1200 total number of bytes sent in messages
Figure 5.3: Bandwith used to respond to the ARPA-USC fallure
the failure. Bowever, the average number of bytes sent is 615 bytes. This is again due
to the small number of destinations close to the failure which send an excessively high
number of messages. Still, this average represents only 1.07 % of the bandwidth on a
network with 56 Kb line. U sing the setup procedure only when it takes less messages
than the normal failure procedure would bring the average down to 566 bytes.
Figure 5.4 shows the distribution of elapsed time between the failure and the time
at which the network reached stability. Fifty-five destinations take at O105t 27 tirne
units to reach tbis operating point.
5.2 Mutual influence of destinations
The presence of multiple destinations can cause two type of interference. Messages
can interfere either with the speed at which the network adapts to a failurc or with
the actual way in which the system reacts. This section is concerned with both types
of interference.
In the previous section we pointed out that the impact a failure depcnds heavily
c
(
CHAPTER 5. PERFORMANCE
25
20
15 number
of destinatiYBs
5
o
f-
o
1
9
1 1 1 1 1 1 1 1
-
-
-
-
I J
18 27 36 45 54 63 72 81 90 time to react to (allure (iD time units)
Figure 5.4: Time required to respond to the ARP A-USC failure
55
on the destination. The following example combines six destinations1 in order to
decrease the weight of an individual destination in the result. The destinations are
combined as follows: a result for N destinations is the average of the results from aU
combinations of N destinations taken out of the six destinations. Using this method,
an example with six destinations requires 63 tests, while one with eight requires 297
tests. Consequently, the number of destinations considered is smaU.
The graph in figure 5.5 shows that the time required to adapt to a failure increases
with the number of destinations to which anode can route messages. This time
increase is caused by the fact that when there is more than one destination, a message
with information pertinent to one destination might need the attention of anode
at the same time as a second message which is pertinent to another destination.
Consequently, one of the messages has to wait. An occasion for waiting presents itself
as soon as the failure occurs.
When a node sees one of its links fail, it sends messages to its neighbors. For each
destination that the Dode cao route to, the node sends a different set of messages.
1 Berk, Purdue, RCC5, UCLA, Collins and MITRE
. U-
CHAPTER 5. PERFORMANCE 56
65
60
55
50 time
45
40
35
30 1 2 3 4 5
number of destinations
Figure 5.5: Time to react to the ARPA-USC failure vs numher of destinations
Each set of messages carries information about the change in the representation of the
network associated with one destination. When there is more than one destination,
the node sends all of the messages associated wlth one destination before sending
the messages associated with another destination. This is the first factor which COII
tributes to the increase in time before the network stahilizes. The second factor ilt
that with additional destinations, many messages travel in the same directIOn, so
there is more chance that a message must wait for a busy node.
These two factors contribute to produce the increase in time as the numLer of
destinations grow as depicted in figure 5.5 In this figure, the presence of a second
destination increases the time hefore the network reaches stability by 10 time unitlt.
With the addition of more destinations, the time increases is only 5 or 6 time units
per additional destination. This difference ("an he explained by the fact that with two
destinations, most of the waiting for busy nodes occurs sequentially; with more than
two destinations a portion of the waiting time can occur in parallel. The fad that
after the second destination, there appears to he a regular slope ln the time increase
is encouraging. Su ch a slope would keep the progression withm reasonable bounds .
1.
\ CHAPTER 5 PERFORMANCE
550 ,.------,--------y-------r-------., 500
450
400
350 messages
300
250
200
150
100 ~----~----~-----~-----~ 1 2 3 4 5
number of destinations
Figure 5.6: Number of messages sent in response to ARPA-USC fallure
57
Figure 5.6 shows the influence that the number of destinations has on the num
ber of messages needed to adapt to a f&ilure on the ARPA-USC link. This graph
indicates that the number of messages sent increases almost linearly with respect to
the number of destinations. The increase in the number of messages is not perfectly
linear because when a message is delayed, a juncture can sometimes react as if the
message was not going to come. This situation is correeted as soon as the message is
reeeived, but this requires sorne addition al messages.
The presence of multiple destinations appears to inerease the number of messages
linearly. The time to react to failure appears to increase by a. constant factor as
additional destinations are added, sinee this constant is rather small, the time increase
is rather slow. Overall, it appears that the mutual influence of multiple destinations,
when a failure oecurs, is not an obstacle to the applicability of the proposed approach.
r f-t
.J'
CHAPTER 5. PERFORMANCE
500 ,--------r-
450
400
350
300 messages
250
200
150
100
50 ~-------~--------~--------~--------~ 1 2 3 4 5
number of faiJures
Figure 5.7: Number of messages sent in response ta the number of failures
5.3 Multiple failures
58
This section studies the impact of multiple, simutaneous failures on the performance
of the proposed system. We consider up ta five simultaneom. failures and observe the
increase in the number of messages and the time to adapt ta failures.
As in the previous sections, ta decrease the importance of any single result, each
point on the gl'aphs of figure 5.7 and 5.8 represents the average of many results. Fust
five major links were selected for the failures2 • Then the faHures were combined, as
in the last section. The result for N failures is the average of the results from aU
combinations of N links taken from the five possible link failures. In addition, each
set of failures is observed from five different destinations3 •
Figure 5.7 shows the influence that the number of failures has on the ilumbcr of
messages needed to adapt to these failures. The increase in the number of failure is
aImost linear with respect to the number of failures, The incrcase in the number of
messages is not perfectly linear because when a message is delayed, a juncture can
2USC-ARPA, RCC5-UWISC, HARVARD-SCOTT, ISI22-STANDFORD, UTAH-L8L 3PURDUE, COLINS, RCC5, MIT44, HERK
(
( ..
CHAPTER 5. PERFORMANCE 59
80
75
70
65
60 time
55
50
45
40
35 1 2 3 4 5
N umber of failures
Figure 5.8: Time to respond to multiple failures
sometimes react as if the message was not going to come. This situation is corrected
as soon as the message is received, but this requires sorne addition al messages.
This factor also has an impact on the time required to respond to the failure, as
shown in figure 5.8. Again, the increase in time with respect to the number of failures
exhibits similar behavior to the increase with respect to the number of destinations.
After the second failure, there appears to be a regular slope in the time increase.
required by the network to stabilize.
Overall, it appears that the proposed system per{orms reasonably well under mul
tiple simultaneous failures. If the trend shown in this results continue as the number
of failure increases, these failures should not be an obstacle to the applicability of the
system.
5.4 A comparison
This section presents an example of a failure on a network with a single destination
(A), and shows how Garcia-Luna~Aceves's algorithm and the proposed approach react
CHAPTER 5. PERFORMANCE
-' ... 2
_.
Garcia
Symbols A double clrele mdJC:ates that the no de IS Crozen D ~ B lnwcates that B 15 the succelsor oC D (dJslance,synchroruzation ftag) - an update message ack(da.tance,~ynchrolUzatlon fiag) - an acknowledgment The synchroruzahon fiag IS set ta 1 when the node wluch sent the message Il Crozcn, otherWlse Il 15 0
"f't'
",,. .. ,11
. ... ' ADR
A-full - anll-full app message
.", '
....
/Ii, ".11 1.
·or·
Papp(ongm, dlstau('e) - partl"l app me.l"g" Jc(Fj element, Pj hlt) - juncture message
tlpd(dastance) -- an update message wlth distance = the shortest daslrmce tQ destmatlOn
Figure 5.9: Gareia's approaeh - vs -- ADR approacb
60
to this failure. This eomparison does not account for the cost of setting up the
network. It starts from a network which has already reached stability. To simplify
the example and provide an easy framework for comparison, the example is first
presented as if both algorithms behave synehronously4 TheIl a few observatIOns arc
made about how both algorithms would react in an asynehronous cnvironlllcnt.
The left half of figure 5.9 illustrates how Gareia's approach5 worh Table 5.2
con tains the steps of both approaches. When the lmk A-B fruls, node B freezes
4 At each step, a Dode receives and processes the messages sen t by its nelghbors duung the previoulo step. ft then sends its own messages which are gomg to be recdved at the next step.
5Using the algorithm described in [GLA88bJ.
(
CHAPTER 5. PERFORMANCE 61
1 .tep Garcia ADR 1 node B freesea and .enda update to D " E node B .end. update to B " D 2 node D ~ E freese and send update to B ~ G node. B Il; D und update to G 3 node G freue. and und. update to l, D " E nodes G .enda update to 1
no de B .end. aek for updates from D " E 4 node G ree:elves aek for Ita update nug node.1 .end. Je: m.g to Gand antl-full app msg to H 6 bee.u.e of thll node G unfreueI, nodes G aenda Je m.g to Dit. E
aend. ack D" E and an update to 1 node. H .enda anta·full app mag to F 6 node D unfreeae., .ends an aek to B and an update G node. D ~ E .end. Je m.g to B
node E unfreeaea, .end. an .ck to B and an upd.te G node. F .end. anh·full app nug to C 7 node B unfreeaes" .enda update to D " E no de B .enda partial app m.g to B Il D
Table 5.2: Message exchange ID figure 5.9
since it does not have a feasible successor6 It also sends an update message to nodes
D and E. As soon as they get the message, they freeze and send an update to all
their neighbors (B & G). Node G reacts the same way; it freezes and sends updates
to D, E, and 1. Node 1 reacts differently, it does not need to freeze since it has
a Ceasible successor (node H). It can send an acknowledgement immediately. This
acknowledgement and the messages it triggers unfreeze nodes G, D, E and B. Each
node which unfreezes, sends an acknowledgement to the node(s) which caused it to
freeze, and sends an update message to its other neighbors.
The rest of figure 5.9 illustrates the ADR approach. When link A-B fails, node
B sends aIl update to node D and E, since they become the new up links. This update
message contains the distance to destination A going through the full juncture 1. This
update message is propagated to node G. At this point, node G sends an update
message to node l, because the distance in the message is the distance of the path
starting at link 1 When node 1 receives the update message, it recognizes that it is
not a juncture anymore sin ce one of its up links has become down. To illform the
nodes above of thls change in status, no de 1 sends an ant2-fapp message to no de H.
Node 1 then builds a new JC message which it sends to node G. This JC message is
propagated down to node B. When node B receives both messages, it realizes that it
is a partial jundure matching the split point G. As a result, it sends a papp message
to D and E.
The two approaches use seven synchronous steps (enumerated in table 5.2) to
b A ünk with a distance to destination smaller or equal to the distance through the cunent successor, is called a feasible successor. See 1.2.5 for details.
CHAPTER 5. PERFORMANCE 62
attain a stable state where no control messages are sent. The similaritJes stop there.
The number of messages is different: Garcia's approach uses 23 messages whereas the
ADR approach uses 15. On the other hand, the ADR's messages are, on average,
longer than Garcia's messages, so that a comparison on the bandwldth is reqlllred.
In this example, assuming that distances and node addresses use l byte of a message
and that the destination's address IS contained in each message, Garcla's approach
sends a total of 49 bytes (46 bytes and 23 bits) whereas the ADR approach sends a
total of 36 bytes
We analyze the storage space required by both approaches to operat.e 1JI more
detail. First, the amount of storage space required, changes with the numher of
destinations. In order to access this, we compare the space reqUlred to store the
information pertaining to one destination (A), and for aIl nodes cODSldered al> desti
nations simultaneously. Second, the storage space depends on partlcularihes of each
approach
In the ADR approach, each no de keeps a copy of the last Je messages and app
messages it has received. The quantity and type of messages that anode keeps
depends upon the topology of the network. Since a failure modifies the topology of
a network, the total storage space of aIl the nodes in the network before the failure
is different from the total after the failure. To reflect this dlfferellce, both totals are
compared to the space requirements of Garcia's approach in this example.
For Garcia's approach, the storage requiremcnts at cach node are the same rtnd do
not vary with the topology. Howcver, to determine thls fixed amount of storage spaœ,
an assumption has to be made about the maximum number of hmes anode can he
in a situation which requîres freezmg. When a frozen node recelves a message WhlCh
would require it to freeze, it adds a bIt to a vector which keeps track of "outstanding"
acknowledgments. In this example, no de G needs to freeze twicc beforc it unfrcczcs,
so two bits are put asîde for each link of each node dIstance table7
Table 5.3 shows the total storage space requirements8 of aU nodes in the ndwork
This result suggest that the ADR approach requires bet.ween two to three tml('s the
1See the description of Jaffe-Moss's approach in section 1.2.3 for details 8 Appendix D shows the details of the calculation to obtain those numbers.
(
(
CHAPTER 5. PERFORMANCE
destination Garcia ADR before fa.ilure
~--------+---~---+----63 bytes 128 bytes A
407 bytes 1106 bytes L-________ ~ ____ ~ __ L-__ _ all nodes
after failure 88 bytes
751 bytes
Table 5.3: Storage space in the ex ample of figure 5.9
storage space required by Garcia approach
63
The next point of comparlSon leads to a clearer conclusion. When the example
is considered without the assumption that the network behaves synchronously, there
is an addltional difference between the two approaches With Garcia's approach, a
node can be temporarily mislead into believing that it has a feasible successor when
it does not. This situation can occur in thIS example, if node G receives its update
messages in an inappropriate order.
If noùe G receives the update message from node D before E, no de G assumes
that E is a feasible successor. Thus it sends an acknowledgment to D immediately
and does not freeze. When node G receives the update from E, it freezes and sends
update messages to all its neighbors. The update message sent, will correct the wrong
distance that was propagated by the acknowledgment. When node G unfreezes,
everythmg will be normal again.
ADR does not suffer from this problem. The reaction of node G does not depend
on the order in which the update messages are received. Its reaction is only etfected
by the distance within the message which "guides" the message to no de r. As a last point of comparison, it should be noted that the ADR approach does
not require freezing. As a result, as soon as anode has learned about a failure, it can
route normal traffic to a destination.
For example, immediately aCter step 2, node D knows that the shortest path to A
goes through node Gand it can route messages accordingly. With Garcia's approach
node B is unable to route normal traffic between step 2 and step 6. Similarly, node
D and E suffer from the same problem between steps 3 and 5.
The comparison of the reaction to failure on the network of figure 5.9 can be
!>Section 4.4 describes this in deta.il.
-
CHAPTER 5. PERFORMANCE 64
summarized as follows. ADR sends less messages (15 vs 23) and they use less band
width (36 vs 49 bytes) to react to the failure The ADR approaeh appears to reqUlre
between two and three time the storage spaee of Garcia approach. However UliS exam
pIe, shows one situation where Garcia's approaeh is mislead while the ADR approaeh
performs normally. This comparison is l '\sed on only one example of link failure
To obtain more significant results, a large sample of network topologies wou Id be
required. A systematic comparison between the performance of ADR and the other
available approaehes constitutes a topie for further researeh.
(
Chapter 6
Conclusion
The purpose of this thesis was to develop an algorithm to react to link failure in
a network using Alternate-path Distance-vector Routing and study its performance.
The proposed algorithm reacts to link failures in networks using hop count as a
metric. It was sludied using a simulation where link addition were not allowed. In
thls context, it does not appear to introduce the CToo problem to ADR and displays
a reasonable performance.
Our approach started with the original ideas presented by Tsuchiya [Tsu87] and
was refined into the proposed algorithm. New messages were introduced (anti-fapp,
anti-papp and qcu messages). The update procedure was specified in details (see
section 4.4 and appendix B for details). The most commonly 11sed message in the
Alternate-path Distance-vector Routing approach, the je message, was modified to
be shorter. The version of the Je message used in the proposed algorithm is shorter
than the one proposed in Tsuchiya by 25 to 50 %, for messages with non-null Pj list
(section 4.2). In addition, strategies were developed to limit the number of messages
transmit ted (chapter 4).
The resulting algorithm performed reasonably. In a series of tests on a network
representing 98 major nodes of the ARPANET in 1983, an average of 75 messages
are required to adapt to a failure on a major link for a single destination. The total
number of bytes sent in reaction to a failure is on average 615 bytes. As the number
of addressable destinations increases the number of messages increases linearly and
65
, r.: , ,
~.
CHAPTER 6. CONeL US/ON 66
the time to react appears to increase by a small constant. The sarne trend ean be
observed as the number of simultaneous failures increases. there is a linear lIlcrease
in the nurnber of messages sent, and a small, constant mcrease in the time reqmred
to react to the failures.
We also considered an example involving a single failure on a small network of nille
nodes (section 5.4). Using thls example, we compared the response of the proposed
algorithm and that of Garcia-Luna-Aceves [GLA88b]. The proposed algonthm sent
less messages (15 vs 23) and used less bandwidth (36 vs 49 bytes), hut on the othcr
hand, it appears to require between two and three times the space used by the Garna
approach. It should be noted that the comparison only involvcd the reaction to the
failure.
A characteristic of the proposed algorithm (and of the ADR in general) 1S tltat Il
node requires a distinct representation of the network (per "destmatlOn") it nceds to
be able to address directly. The number of messages and the storage spacc useo ar('
proportional to the number of destinations.
There are two major advantages to the ADR and the proposed algonthm ]t. allowb
multipath routmg and each node knows its alternative paths to a destmation hcfore
a failure, 50 when a failure occurs, the noJes can route to a destina.tion as soon as
they are informed of the failure (Other approaches look for an alternative path aCter
the failure. [Hag83], [JM82] and to a lesser degree [GLA88h]1 are example of t}wsc
approaches.). The ability to react immediately to a failure can he very important in
an environment hke the backbone of a hierarchical routmg algorithm, bccausc of the
high volume of traflie going through the backhone.
In an environment where the two advantages mentioned above are Important and
avoiding the CToo is an issue, the proposed algorithm could he used. The hmiting
factor would be the 5torage space and the bandwidth. The other algorithms winch
try to solve the CToo problem, also trade more space and bandwidth to tackJe It
(Jaffe-Moss and Garcia-Luna-Aceves use additional messages, and frc<'zc part of the
network to do it).
There is one environment where the proposed algorithm can be llscd: T&uchiya's
lGarcia take into account sonte alternative paths (feasible succcssors), but he ignore!> others.
(
CHAPTER 6. CONCL USION 67
Landmark hierarchy. It is for this environment that he developed the ADR approach
[Tsu87]. In this approach, each node sees only a few Landmarks and the routing is
done through these Landmarks. Consequently, each node maintains a representation
only for the Landrnarks it sees (one representation per Landmark). This limits the
space and the bandwidth used.
There are number of possible extensions to this research. The rnost important
one is to forrnally prove that the proposed algorithm avoids eToo Once this is
established, the performance of the proposed cllgorithm and of the other available
approaches should be tested on sorne large, representative networks.
The proposed algorithrn should also be extended to allow the addition of new
links at any time (currently new links are only added du ring the setup). It should
also be improved in order to recognize as soon as possible, when part of the network
is disconnected.
Another aspect of the ADR that should be considered is the question of when
to adapt the current representation and when to st art a new setup. As observed in
section 5.1, even with single failure, there are cases where the proposed approach uses
more messages to adapt to a failure than a second setup phase. We suggested sorne
rough criterion2 to identify these cases. This criterion should be refined, especially
for environrnents where multiple simultaneous Imk failures are likely.
The proposed algorithm used a two phase setup: the first phase class;fies the
link of each node as up, horizontal or down, and the second uses these directions in
conjunction with Je messages and app messages to build the nodes' internai represen
tation of the network. A single phase approach would most likely be faster, but the
required protocol would need t,o be developed. This is a worthwhile extension to the
algorithm.
2 AU the cases where adapting to the failure required more messages than a setup, involved a destination which was very close to the failure (less than 2 links away from the failure).
Appendix A
Message description
This appendix describes each message used in the proposed algorithm. In addition to
the fields presented here, each message should aIso include a destmation field whcn
there is more then one node which could be the destination of user's messages 'l'llls
field is omitted here to be consistant with the examples which involved only one
destination.
juncture msg: This message is transmitted along horizontal and down links ft Îs
used to discover which nodes are juncture, if they are full or partial junctllre
and how far they are from the split point. 1t has the following structure'
• Fj element: the node id of the last full juncture encountered
• Pi lut: a list of split points
They are modified at each split points, partial and full juncture.
full app msg: The full alternate-priming path message is translIutted on up links to
tell nodes closer to the destination that: there is a full jundure below them and
how far they are from the destination if they have to use this alternate route.
This message contains the following parts:
• ongin: node id of the full juncture
• dIstance: distance to the destination along the path which goes
through the juncture
68
{
APPENDIX A MESSAGE DESCRIPTION
• last node: each node records the link on which it receive a mes
sage, so implicitly the last Dode on the path is passed.
This way, a message can he routed ta the destination using this
alternate path.
69
partial app msg: The partial alternate-priming path message is transmitted on up
links to tell nodes doser to the destination that: there is a full juncture below
them and how far they are from the destination if they have to use this alternate
route. This message contains the following elements:
• ongm: node id of the partial juncture
• d,stance: distance ta the destination along the path which goes
through the juncture
• last node: each node record the link on which it receives a mes
sage, so implicitly the last Dode OD the path is passed.
This way, a message can he routed ta the destination using this
alternate path.
• stop field: node id of the matching split point
or split count: the number of split points hefore the matching
split point
anti-full app msg: This message goes up to "chase" full app messages and remove
them. It has the same contents as the full app msg.
anti-partial app msg: This message goes up to "chase" partial app messages and
remove them when they are no longer accurate. It has the same contents as the
full app msg.
update msg: This message is transmitted along horizontal and down links. It tells
a "lower" node that is now "above" the node which sent the message. It also
ask the receiver to send back a JC message. This message has the following
contents:
APPENDIX A. MESSAGE DESCRIPTION
• dlstance: guiding distance which indicates which paths the up
date message should follow (see section 4.4 for details).
70
quick cost update msg: This message is transmitted on horizontal and clown links
It goes "down" from a failure to tell nodes that the distance to the destmation
on the path beginning with the link on which they receive the message has
increased. This message precedes the Je message after a failure to re-route the
traffic away from the failure into faster route This message has the followmg
contents:
• distance: new distance to the destination on path which starts
with the link which receivecl the message.
ask for qcu msg This message is transmitted on down links. It is sent only to obtain
a new distance for a down link which receives a Je message 011 a clown llll!,. This
message does not contain any information other than its type ident lflcatlon.
(
(
Appendix B
Pseudo-code
To compute juncture status:
take the je message on aIl up and horizontal Iink
1f the y aIl have the same Fj
the node is a partial juncture
else
it is a full juncture
To build full app msg .
take distance to destination on each uptree Iink
add it to each uptree Iink cost -> aIt Distance
status <- full
Origin <- node id
To build partial app msg (with or without common Pj element):
take d1stance to destination on each uptree link
add it to eaeh uptree link cost -> aIt Distance
status <- full
Origin <- node id
if there is a common Pj element (split point)
stop (- common Pj element
oIse
71
APPENDIX B. PSEUDO-GODE
/. the message is different for each up links./
split_count <- # of Pj element in the jc_msg received on that link
EVENT: no de starts the setup being a destination
send jc_msg(dest id,dest id, []) on all links (aIl 11I~S are dovn)
EVENT: no de receives a jCJIlsg
if this message = last msg received
exit receive jc_msg /. do nothing./
else
store it and continue with the rest of the code
if it came from destination (Fj = dest)
put node id in the Fj part of the new Jc_msg
record the fact that th1S link lead to dest
if the mag came from a full juncture (Fj = link 1d)
record the fact that th1S link lead to full juneture
if it came on hor1zontal link
if aIl up links have received a jc_msg
compute juncture status
send appropriate app_msg and jc_msg
else
wait
if it came on the single uptree link
{
if # horizontal link > 0
{
if horizontal link(s) have receive je_msg
compute juncture status
sand appropr1ate app_msg and jc_msg
else
wait
7'2
(
•
APPENDIX B. PSEUDO-GODE
)
else /. # horizontal linlt = 0./ {
}
if there is 1 dovntree link
send the jc msg vithout touchlng the Pj list
if there is more than 1 dovntree link
build ne" '\'j element (Pj. e1mt = node id)
add it to Pj list
send ne" j c_msg on a11 dovn \t horizontal links
} /. came on single up HnIt */
if i t came on 1 up Hnk amongs many up links
{
vai t for j c_msg' s from aIl uptree links
When a11 jc_msgs on up links have been received
{
compute juncture status using only up links
if # horizontal link > 0
{
}
send a jc_msg on the horizontal link(s)
vait for jc_msg on horizontal link
recompute juncture status taking into account horizontal
links
if the node is a full jWlcture
{
if the no de vas a partial jWlcture before
{
/* look for juncture statua oscillation ./
73
--
APPENDIX B. PSEUDO-CODE
}
look at the Pj list of the new jc_msg
if it contains the matching split point of the node
(Ilhen lt lias a part1al juncture)
this is a case 01 oscillation
don' t react to thls message
if it is not a case 01 juncture status oscillation
{
build a nell full app msg
send full app_msg on aIl uptree l1n.ks whl.ch
are not full JC or destinatlon
}
/. build a )c_msg for this full juncture ./
"rite the node id in Fj part of nell Jc_msg
if there is more than 1 downtree hn.k
put node 1d into PJ element
and put this element into the PJ hst
send the ne" jc_msg on a11 down 11nks
} /. node is full juncture ./
if the node is a partial juncture
{
find the common part in Pj list from aIl jc_msg
from up Iink
build a new partial app msg using common Pj element
Bend partial app_msg on aIl uptree links that are not
full JC or destination
/. build a jc_msg for this partial juncture ./
if there lS no Pj element ln common or only 1 in common
Pj <- notie id
replace old Pj list of the jc_msg received by
the new Pj
;4
(
(
APPENDIX B. PSEUDO-CODE
send the nev j c_lIIsg on all dovn links
if there 18 more than one Pj element in common
if # link dovn (= 1
else
new Pj Hst = common - lat common Pj element
send jc_msg vith new Pj liat on aIl dovn l11lks
Pj element (- node ld
add it to Pj liu
send the new jc_msg on aIl dOIm 11nks
) /* node 1.S a partial juncture ./
} /. wh en all je_msg have been reeeived on up links ./
} /* eame on 1 up hnk among many ./
1.f the je_msg came on dovn link
/* the link probably has a the vrong distance send a message to
ask for a nev distance./
store the je_msg
send a ask_for_qeu
EVENT: nod~ receives a askl'oLqcu
send a qcu baek on the link which reeeived the message
EVENT· node receives an appJllsg
adjust distance of down Hnlt according to app_msg
if the node has already received an app_msg vith the same
destinatlon, origin, status t distance on a dovn link
if the identieal msg vas on different on dlfferent link
store the message
/* don't send app up last message vent up, i t is enough./
exit receive app_msg /. do nothing*/
75
APPENDIX B. PSEUDO-CODE
else
{
}
if ( the app msg has same destinat10n t origin
but different status or distance)
replace the old app msg
else
store it
1.f app type = full
{
}
add to distance to destinat1.on the cost to traverse each
uptree hnk
send app_msg on aIl uptree tbat do not lead to a full je
or to the destinat ion
if app type = partial
{
}
if next node = stop field of the papp msg (split point)
{
}
else
stop msg here
if the message replace a fapp msg
send anti-fapp on all up links
distance traveled increase by eacb uptree cost
send app on aIl uptree links
EVENT: Dode receives 3.n anti-app.msg
if app type = full
{
if an app_msg is found vith sarne origin. destination
remove it
;6
(
APPENDIX B PSEUDO-CODE
}
send anti-app.Msg on aIl up link vhich do not lead to
the destinatlon or to a full juncture
if app type = partial
{
}
if an app.msg il found vith same origin, destination
remove it
if next node = stop held of the papp msg (spl1t point)
stop msg here
aIse
send antl.-apP.Msg on all up link vhich do not lead to
the destination or to a full juncture
else
stop mllg here
adjust distance of dovn link according to app_Msg
EVENT: no de receives an updateJIlsg
if came on horizontal Hnk
malte that HllA dovntree
store the distance to dest going on that link
(distance in update_msg)
if (the node was part of 2 node jlIDcture
and the no de keeps the sarne jc status after horizontal link
ie gone)
send a new je on aIl down links
send a new app on all up links
if msg came on uptree Hnk
77
.......
APPENDIX B. PSEUDO-CODE
if # up 1 inlt = 1
{
}
if # links;> 1
{
}
relabel links vlth nev direction accordlng to upd
distance
- link(s) with distance < upd_dlBtance -> honzontal lin1ds) wlth dutance = upd_dlBtance -;> up
link(s) with distance = upd_distance+l ->honzontal
- link(s) wlth distance> upd_dlstance -;> down
send update on nev up llnk
send quiek cost update on all horizontal a down links
except the one vith update !
else
{
}
if netvork disconnected
stop aIl activities
if there vas another failure
send update msg on the link vhere i t came
if # up link = 2
{
}
reorganize the links
if the remaining up hnk has a jc_msg
else
node is not a juncture anymore -> send anti-app_msg
send failure je msg
send update msg on that link to receive
71'
{
\
{
APPENDIX B. PSEUDO-CODE
if # up 1ink > 2
{
reorganize tho links
compute the type of juncture agal.n
if i t changed
/lend failure je msg
}
EVENT: receives a quick cost update (qcu) msg
reorganl.ze the links according to the nev distance of the 1 ink
vhich receives qcu
if the msg came on single uptree
{
}
if there is another link vi th distance < qcu 1 s distance
put this dutance in the qcu_msg
adjust distance (- add the cost of link to traverse
send quick cost update msg on aIl horizontal a: dovn link
remove a11 app on aIl dom links
send anti app msg on aIl uptree links
if the msg came on an uptree link amongs other
{
stop msg
if the node juncture status changed from full to partia1
build a nev jc_msg
send i t on aIl dovn 1/; horizontal links
send papp msg on the remaining up links
if the node 100se its juncture status
build a nev j c_msg
send i t on a11 dovn 1/; horizonta1 links
send anti-app msg on the remaining up 1inks
79
APPENDIX B. PSEUDO-CODE
}
EVENT: failure on a node's up Jink
N_UP = # of up link before failure
remove the broken link fl)r this destination
reorganize the relllaining links
}
else
if there is 1 link left /* there is no alternate path */
send update msg alLong that link(with dist alt = INFINITY)
If there is more than 1 link left
/. build update ~ 'lcU msg and send them * / update' s alternate distance (- distance on the second hnk
qcu's alternate distance (- distance on the second link
send update on all up links
send quick cost update on all horizontal ~ dOIm links
if link the nev up link vas honzontal
/. can send Je_msg immediately for faster response ./
if 1 horizontal link
send the je_msg from that linlt on aIl dom
l horizontal links
if there vas more than 1 horizontal Hnk
build nev je msg and send i t
remove all app on dovn links
{ /. more than 1 uptree link ./
if N_UP > 2
/. node is still juncture */
build a new je msg
80
(
(
APPENDIX B. PSEUDO-CODE
if nev je MSg is different from the previous one
und i t on all horizontal 1: dovn liNts
if K_UP = 2
if # horizontal I1nks = 0
/* tailure -> not a juncture anymore */
talte the jc_msg from the up Hnk laft
und i t on aIl dovn links
Bend anti-app_msg on up link
else
{
compute the no de 's juneture statua
if it is the same as before lailure
send jc_msg on horizontal links
else
send Jc_msg on horizontal ~ dovn 11MS
if the node is a partial juncture
send papp_msg on up links
ebe
send anti-app msg on up linka
} /* # horizontal link > 0 */
EVENT: failure on a node's horizontallink
remove the broken link for this destination
compute the node juneture status
in the following cases:
{
the juncture status has not changed
do nothing more
the node stop being a juncture
aend anti-app msg on up links
send a nev je_msg on horizontal and dovn links
81
APPENDIX B. PSEUDO-CODE
}
the node lias a t1l0 node full Juncture
and it becomes partial juncture
send papp msg on up links
send a nev je_msg on horizontal and down 11Itks
EVENT: failure on a node's down link
for aU app msg received on the broken hnk
remove the app_msg
send anti -app_msg
remove the broken Hnlt for this destination
82
(
Appendix C
Buffer description
The buffer is the place where anode keeps its representation of the network and the
messages it keeps. There is one representation per destination. This representation
is divlded in two parts: a description of the node and a representation of the link.
The node description contains the following elements:
• the node status (full Juncture, partial juncture, not juncture)
• the # of up links
• the # of down links
• the # of horizontal links
Each link is represented by the following elements:
• the node id of the no de at the other end of the link
• the direction of the hnk (up, horizontal, down)
• a pointer to the list of messages which came on that link and which are kept at
the node.
In addition, the link representation has two fields which have a different meaning
depending on the direction of the link. If the link is clown, the link keeps two distances:
83
1
APPENDIX C. BUFFER DESCRIPTION
• the shortest distance through a full juncture
• the shortest distance through a partial juncture
Otherwise, the fields have the followmg meaning:
• the shortest distance to the destination
• the status of the other node sharing this link
(
(
Appendix D
Storage space calculation
This appendix presents an analysis of the storage space required to store each ap
proach representation of the network. 1t uses the assumption stated in section 5.4,
namely the assumption that distances and node addresses can be stored in 1 byte,
and that the size of the feasibility vector is 2 bits. In both approaches the space
required depends on the number of links at each node. Table Dol show the number
links at each node.
D.I Garcia approach
Garcia-Luna-Aceves approach uses three tables: the distance table, the routing table
and the link table. Those tables contain different information and they have different
slzes .
• A routing table entry has the form - (distance, successor). It uses 2 bytes per
entry. There is one entry per destination .
• A distance table entry has the form - (distance, feasibility vector). It uses 1
byte for the distance + 2 bits for the feasibIlity vector per entry. There is one
node A B C D E F G H 1 total links 2 3 2 2 2 2 3 2 2 20
Table D.l: Number of links at each node
85
APPENDIX D. STORAGE SPACE CALCVLATION 86
en try for each pair (hnk ,destinatIon).
• A distt'.nce entry has the form - (cost), 80 It uses 1 byte per entry Thcre 15
one entry per link.
The sum of the nurnber of link representations, in the 9 nodes of the network, is
20 (see table D.l). For one destination, the sum of the space in each table is the
following:
• routing table -- # nodes x space used by one entry - 9 x 2 bytes = 18 byte,';
• distance table - I: # of links x space used by one entry -~ 20 x (1 byte + 2 bits) = 20 bytes + 40 bits
• link table - 1: # of links x space used by one entry - 20 x 1 byte = 20 bytes
The total space used by ail the tables over aIl the nodes for one destination IS th en
58 bytes + 40 bits or 63 bytes
When aU nodes of the network are destination, this space becomes:
• routing table - # nodes x space used by one entry x # of destin~tion -
9 x 2 bytes x 9 = 162 bytes
• distance table - I: # of links x space used by one entry x # of destination
~ 20 x (1 byte + 2 bits) x 9 = 180 bytes + 360 bits
• link table - 1: # of links x space used by one entry -; 20 x 1 byte = 20 bytcB
The total space used becomes 362 bytes + 360 bits or 407 bytes.
D.2 ADR approach
In the ADR approach the space required depends on the messages kept at cach lIode
Table D.2 show the messages kept at each node before the frulurc. Table D ;~ r-,how the
messages after the failure. Both tables only show the messages for destmutioJl A 'l'he
detailed calculation for all destination 15 too long to be displayed hcre The JI UlII ben')
APPENDIX D. STORAGE SPACE CALCULATION 87
node lmk du dist messt\ses B A U 1 Jc(FJ - A. PJ 0)
D D 5 fapp(orlgm = I, distance = 7) E D 5 fapp(orlgm = I. distance = 7)
C A U 1 Jc(FJ - A. PJ - 0) F D 5 fapp( origm = I. distance = 7)
D B U 2 Jc(FJ - B. PJ -,lBp G D 4 papp(orlgm = G, wstance ::: 4), (app(orlgJn ::: l, dlstJmce = ")
E B U 2 Jc(FJ - B, PJ - )B) G D 4 papp(orlgm = G, dl,tance ::: 4), fapp(origJn ::: l, d ... tance = 7)
F C U 2 Jc(FJ = C, PJ = ~) H D 6 !app(orlgln = I, dlshmce = 6)
G D U 3 Jc~FJ :: B, PJ - [B]} E U 3 Jc(FJ = B, PJ = [B]) 1 D 5 fapp(orlgm = I, distance = 5)
H F U 3 jc(FJ = C, PJ = 0) 1 D 5 fapp(orlgln = l, distance = 5)
1 G U 4 JC~~J - G, PJ - y) H U 4 Jc(FJ = C, PJ = 0
Table D.2: Messages ston'd in the network before failure
{or the "aU destination case" shown m the table 5.3 of section 5.4 were obtamed using
the simulator.
The representation of a link in the proposed version of the ADR approach r~quires
3 bytes + 2 bIts here: 2 bits to record the direction of the link, 2 distances (1 byte
each) and 1 byte to point where the last message received is stored. The vaflous
messages have the {ollowing content, which requires the space mentioned:
• fapp message - (msg type,destination,origin,distance)- 4 bytes
• papp message - (msg type,destination,origin,stop field) - 4 bytes.
• jc message - (msg type,destillation,# of Pj clement, Fj) - 4 bytes
+ 1 byte per Pj element
Table D.2 shows that there are 8 fapp messages, 2 papp messages, 6 JC messages
with no Pj element, and 4 Je messages with one Pj element stored in the network
before the failure. The space required to store all these messages and all the lmks
representation is 128 bytes.
(
L __ _
APPENDIX D. STORAGE SPACE CALCUL A TlON 88
node Imk dlr dis! measage. B D 0 5 JC~~J - A, Pj - !~U
E 0 li Jc(FJ = A, Pj = [GD c A U 1 jc(FJ - A, Pj - U)
F 0 5 n B U 2 jc(FJ - A, PJ - [G])
G 0 4 papp( ofllPn = G, distance = 4) E B U 2 Jc{FJ - A, PJ -.IG1)
G 0 4 papp(orilPn = G, distance = 4) F C U 2 Jc{FJ - C, Pj - UJ
H D 6 G D U 3
E U 3 1 D 1'> jc(FJ = C, PJ = m
H F U 3 Jc(FJ = C, PJ - m 1 D 5
1 G U 4 H U 4 Jc(FJ = C, PJ = 0
Table D.3: Messages stored in the network after {ailure
8 fapp messages X 4 bytes 32 bytes
2 papp messages x 4 bytes 8 bytes
6 je messages with no Pj element x 4 bytes 24 bytes
4 je messages with 1 Pj element x 1 byte 4 bytes
18 links x 3 bytes + 2 bits 54 bytes + 36 bits
total 122 bytes 36 bits Table D.3 shows that there are 2 papp messages, 4 Je messages with no Pj elementl
and 4 je messages with one PJ element stored in the ndwok before the failure.
The space required ta store all these messages and all the link representations is
88 bytes. 2 papp messages
4 je messages with no Pj element
4 je messages with 1 Pj element
18 links
total
x 4 bytes
x 4 bytes
x 1 byte
x 3 bytes + 2 bits
8 bytes
16 bytes
4 bytes
54 bytes + 36 bits
82 bytes 36 bits
Bibliography
[CAT88] C.G. Cassandras, M.V. Abidi, and D. Towsley. Distributed routing with
on-line marginal delay estimation. IEEE Infocom '88, pages 603-612,
March 1988.
[Ceg75] T. Cegrell. A routing procedure fot t}le tidas message-switchmg network
IEEE Transactzons on Communzcations, COM-23(6):575-585, June 1975.
[CG74) D.G. Cantor and M. Gerla. Opt;'"7lal routing in a packet-swithced computer
network. IEEE Transactions on Computers, C-23(10):1062-1069, Oetober
1974.
[FGK73] L. Frata, M. Gerla, and L. Kleinrock. The flow deviatioll method: An
approach to store-and-forward communication network design. Networks,
3:97-133, 1973.
[Gal77J Robert G. Galager. A minimum delay ~'outing algorithm using distributed
computation. IEEE Transactions on Communications) COM-25(l):73-85,
January 1977.
[GLA86] J.J. Garcia-Luna-Aceves. An algorithm for shortest-path routing with
distributed information. Technical report, SRI International, Menlo Park,
Ca, December 1986.
[GLA87) J.J. Garcia-Luna-Aceves. A new minimum-hop routmg algonthm. In
Proceedmgs of IEEE INFOCOM '87, April 1987.
89
(
(
ft
BIBLIOGRAPIIY 90
[CLASSa] J.J. Garcia-Luna-Aceves. A distributed, loop-free, shortest-path routing
algorithm. In Proceedings of IEEE INFOCOM '88, 1988.
[CLA88b] J.J. Garcia-Luna-Aceves. Distributed routing using internodal coordina
tion. In Proceedings of IEEE INFOCOM '88, 1988.
[GLA88c] J.J. Garcia-Luna-Aceves. A minimum-hop routing algorithm based on dis
tributed information. Computer Networks and ISDN Systems, 16(5):367-
382, May 1988
[GT90j D. W. Glazf'r and C. Tropper. On congestion based dynamic routing.
volume COM-38, pages 360-368, March 1990.
[Hag83] Jacob Hagouel. Issues in Routing for Large and Dynamic Networks. PhD
thesis, Columbia University, 1983.
[JM82] J. M. Jaffe and F.M. Moss. A responsive routing algorithm for computer
networks. IEEE Transacttons on Computers, COM-30(7):1758-1762, July
1982.
IMcQ74] J. McQuillan. Adaptive Routing Algonthms in Dtstributed Computer Net
work. PhD thesis, Harvard University, 1974.
[Sch87] Mischa Schwartz. Telecommuicatlon Networks: Protocols, Modeling and
Analysls. Addison-Wesley, 1987.
[Tan89] Andrew S. Tanenbaum. Computer Networks. Prentice Hall, second edition,
1989.
[Tsu87] Paul F. Tsuchiya. Landmark routing: Architecture, algorithms, and is
sues. Technical report, MITRE Corporation, McLean, Virginia, September
1987.