An Accurate Model for the Performance Analysis of

7
An Accurate Model for the Performance Analysis of Deterministic Wormhole Routing B. Ciciani”, M. Colajanni*, C. PaolucciO O Dipartimento di Informatica e Sistemistica, Universita di Roma “La Sapienza” Via Salaria 113, Roma, 00198 Italy ADipartimento di Informatica, Sistemi e Produzione, Universiti di Roma “Tor Vergata” Via della Ricerca Scientifica, Roma, 00133 Italy Abstract We present a new analytical approach for the performance evaluation of asynchronous wormhole routing in k-ary n-cubes. Through the analysis of network flows, our methodology furnishes a closed formula for the average messagedelayin wormhole deterministic routing. In this paper, the focus is on 30 asymmetric torus networks with uni-directional or bi-directional links. However, the model can be easily applied to evaluate the performance of deterministic wormhole policies in any hypercube and torus topology. The comparison with two simulation models demonstrates that our methodology gives accurate results for both low and high trajjics. 1. Introduction Wormhole routing is an efficient switching technique that is adopted by a variety of parallel machines with distributed memory architectures. Most multicomputers employ wormhole in combination with deterministic routing policies. In such a case, the route between two nodes is determined in a static way and, in case of link conflict, the transmission waits until the link becomes free. Although the deterministic policies do not have the ability to respond to dynamic network conditions, their implementation is quite simple. Conversely, adaptive wormhole routing requires complex algorithms for preventing deadlocks and livelocks. Moreover, there is not a general agreement about the actual advantages of the adaptive policies with respect to deterministic ones. An interesting survey of wormhole routing is contained in [ 111. Commercial multicomputers adopted deterministic routing in three types of k-aq n-cubes: uni-directional hypercube, uni-directional and bi-directional torus, bi- directional mesh [lo]. The goal of this paper is to propose an approach that achieves accurate closed formulas for the evaluation of the message latency time of wormhole routing in k-aq n- cubes. We present the approach for an asymmetric (kOxklxk2) and symmetric (kxkxk) torus with uni- directional and bidirectional links, respectively. However, our results are extensible to hypercubes, symmetric tori, and asymmetric tori with other dimensions. Performance analysis of wormhole routing in k-q n- cubes topologies has been investigated by several authors. The majority of them adopted simulation models [7,13,2 12, 6,4, 31, while few results were obtained analytically [5, 9, 11. Adve and Vernon [l] use approximate Mean Value Analysis to analyze the performance of wormhole routing in a k-ary n-cube, where the system is modeled as a closed queueing network. This choice makes [l] different from other performance studies concerning wormhole routing. Dally [S] proposes a general model for k-ary n- cubes, that he adopts to demonstrate that low k- dimensional topologies provide best performance. The results of this paper represent a very important contribution in this area, however the Dally model solves only symmetric topologies with uni-directional links. Kim and Das [9] specialize this model for the hypercube topology. They obtain accurate results, although limited to the hypercube that is a perfect symmetric and balanced topology. Our approach is more general than the models proposed in [S] and [9], because it is applicable to several topologies such as hypercubes, symmetric and asymmetric tori with both unidirectional and bidirectional links. The paper is organized as follows. In Section 2, we describe the operational features of the interconnection network and motivate the assumptions that make the model analytically tractable. In Section 3, we present the approach for the message latency time evaluation focusing on a 3D asymmetric torus with uni-directional links. In Section 4, we demonstrate the flexibility of our approach by extending the analysis to a 3D symmetric torus with bidirectional links. In Section 5, we validate both models by comparing their analytical results against time values which are obtained from our simulations and other published results. Moreover, we apply our models to evaluate the performance of wormhole routing in several network dimensions and traffic loads. 2. Deterministic Wormhole Routing Last generation multicomputers adopt wormhole that is an efficient routing technique. In a contention-free network, wormhole has same behavior as virtual-cut-through [8]. It divides the message in small parts, namely flits, and implements a pipeline switching technique: once the headerflit has obtained the first link, it requires the second

Transcript of An Accurate Model for the Performance Analysis of

An Accurate Model for the Performance Analysis of Deterministic Wormhole Routing

B. Ciciani”, M. Colajanni*, C. PaolucciO

O Dipartimento di Informatica e Sistemistica, Universita di Roma “La Sapienza” Via Salaria 113, Roma, 00198 Italy

ADipartimento di Informatica, Sistemi e Produzione, Universiti di Roma “Tor Vergata” Via della Ricerca Scientifica, Roma, 00133 Italy

Abstract We present a new analytical approach for the

performance evaluation of asynchronous wormhole routing in k-ary n-cubes. Through the analysis of network flows, our methodology furnishes a closed formula for the average message delay in wormhole deterministic routing. In this paper, the focus is on 30 asymmetric torus networks with uni-directional or bi-directional links. However, the model can be easily applied to evaluate the performance of deterministic wormhole policies in any hypercube and torus topology. The comparison with two simulation models demonstrates that our methodology gives accurate results for both low and high trajjics.

1. Introduction

Wormhole routing is an efficient switching technique that is adopted by a variety of parallel machines with distributed memory architectures. Most multicomputers employ wormhole in combination with deterministic routing policies. In such a case, the route between two nodes is determined in a static way and, in case of link conflict, the transmission waits until the link becomes free. Although the deterministic policies do not have the ability to respond to dynamic network conditions, their implementation is quite simple. Conversely, adaptive wormhole routing requires complex algorithms for preventing deadlocks and livelocks. Moreover, there is not a general agreement about the actual advantages of the adaptive policies with respect to deterministic ones. An interesting survey of wormhole routing is contained in [ 111. Commercial multicomputers adopted deterministic routing in three types of k-aq n-cubes: uni-directional hypercube, uni-directional and bi-directional torus, bi- directional mesh [lo].

The goal of this paper is to propose an approach that achieves accurate closed formulas for the evaluation of the message latency time of wormhole routing in k-aq n- cubes. We present the approach for an asymmetric (kOxklxk2) and symmetric (kxkxk) torus with uni- directional and bidirectional links, respectively. However, our results are extensible to hypercubes, symmetric tori, and asymmetric tori with other dimensions.

Performance analysis of wormhole routing in k-q n- cubes topologies has been investigated by several authors. The majority of them adopted simulation models [7,13,2 12, 6,4, 31, while few results were obtained analytically [5, 9, 11. Adve and Vernon [l] use approximate Mean Value Analysis to analyze the performance of wormhole routing in a k-ary n-cube, where the system is modeled as a closed queueing network. This choice makes [l] different from other performance studies concerning wormhole routing. Dally [S] proposes a general model for k-ary n- cubes, that he adopts to demonstrate that low k- dimensional topologies provide best performance. The results of this paper represent a very important contribution in this area, however the Dally model solves only symmetric topologies with uni-directional links. Kim and Das [9] specialize this model for the hypercube topology. They obtain accurate results, although limited to the hypercube that is a perfect symmetric and balanced topology. Our approach is more general than the models proposed in [S] and [9], because it is applicable to several topologies such as hypercubes, symmetric and asymmetric tori with both unidirectional and bidirectional links.

The paper is organized as follows. In Section 2, we describe the operational features of the interconnection network and motivate the assumptions that make the model analytically tractable. In Section 3, we present the approach for the message latency time evaluation focusing on a 3D asymmetric torus with uni-directional links. In Section 4, we demonstrate the flexibility of our approach by extending the analysis to a 3D symmetric torus with bidirectional links. In Section 5, we validate both models by comparing their analytical results against time values which are obtained from our simulations and other published results. Moreover, we apply our models to evaluate the performance of wormhole routing in several network dimensions and traffic loads.

2. Deterministic Wormhole Routing

Last generation multicomputers adopt wormhole that is an efficient routing technique. In a contention-free network, wormhole has same behavior as virtual-cut-through [8]. It divides the message in small parts, namely flits, and implements a pipeline switching technique: once the headerflit has obtained the first link, it requires the second

IPPS 1997
ISSN 1063-7133/97 $10.00 © 1997 IEEE

link, while the data flits are transmitted from the first to the adjacent node. In case of conflict, wormhole does not gather all the data flits in a node, but blocks them where they are. The communication evolves through two concurrent activities: path-hole and data-trail. During the path-hole phase, the header flit establishes the path from the source to the destination node. Each link is assigned by the communication controller only after the verification that the required link is available. Since the path from the sender to the destination node is deterministic, in case of conflict, the flits are enqueued into special flit buffers (blocking phase). The data-trail begins when the header flit obtains the first link. During this phase, all data flits flow through the links which are obtained by the header. A link is released when the last flit (tailPit) of the message is transmitted through it.

Each node of the interconnection network contains two main components: a processing element, and a communication controller consisting of a (n+ l)x(n+ 1) crossbar switch and a controller per each dimension that guarantees a maximum of n simultaneous communications for each router. The processing element uses the remaining two links to send and receive its own messages. In order to obtain a model analytically tractable, we adopt the following assumptions that are commonly accepted in literature: 1) The memory capacity of each node is unlimited. This assumption is fairly accurate if one considers present available machines. It permits not to consider the loss of messages and consequent re-transmission due to insufficient memory space. 2) The network has single flit buffers per each link. 3) The flit size (in bits) is equal to the number of wires W that composes a link. 4) Each flit is transmitted in a single cycle link time. This time represents the basic temporal unit that we adopt in the model analysis. 5) The communication requests arising from each node are independent and identically distributed in accordance with a Poisson process having generation time z (cycles/bit) and rate l/z (bit/cycle) per each node.

The assumption 5) and the network topology with wrapped around links guarantee that the network is balanced [8] that is, we can assume that any link is equally likely to be visited independently of the assumed distribution for the message routing. In this paper, the methodology is presented under the hypothesis that the destination nodes are uniformly chosen. However, this is not a limitation of the model, because other distributions require a modification of the mean inter-node distance.

3. The Model with Uni-directional Links

In this section we consider deterministic (dimension ordering) wormhole routing in a k&xk2 torus with uni- directional links. The wormhole message transmission can be seen as composed of three phases: path-hole phase

without conflicts (PH), path-hole phase with blocking (PHB), data-trail phase (DT). The state diagram of the three phases in a 3D asymmetric torus with unidirectional links is shown in Figure 1: the nodes represent the links, while the arcs denote the paths among the links. It has to be noted that this figure does not model the torus architecture. Conversely, it represents the ‘life’ of an average message that flows through this network from the generation instant to the time in which its taiZ-==it reaches the destination node. Figure 1 represents the PH and DT phases as single nodes, while we detail the PHB phase because the analysis of the conflicts represents the most important part in the analysis of a routing policy.

Since for uniformly distributed routing the average length of the path in each dimension is kj/2, the flow in each of the three dimensions can be represented through a column of kj/2 nodes, where j=O, 1, 2. A node cv (i= 1, . . . . kj/2> denotes that the communication uses a link of the j-th dimension as the i-th link of its transmission in that dimension. The labels on the arcs are the rates of the messages that flow through the network, while more arcs reaching the same node denote a risk of conflict (see the entrance of l-st and 2nd dimension).

Let l/r~=L/z denote the message generation rate (in

msg/cycle) per each node, where l/z is the emission rate in bit/cycle. The message sizes have an average length equal to L. Moreover, p, = l/kj is the probability that a message does not use any link of the j-th dimension during its transmission. This equation is simply obtained from the observation that in a 3D torus, the following relation holds for each dimension: e.g., p0=(klk2)/(k&k2), if one considers the O-th dimension.

To simplify the notation we use a= l-p*, p= l-p1 and

Al-p2 to denote the probability that a transmission uses links of the O-th, l-st and 2-nd dimension, respectively. We distinguish the following important actions of a communication:

PH phase. Once generated, a message is at the entry point of the flow diagram. Each data transmission is subject to an average delay of (ko+kl+k2)/2-1 cycles. This value denotes the average time required by the header flit to reach without conflicts the node that precedes the destination. Entrance in a dimension. Once completed the PH phase, the message flow is divided into three streams: a fraction (l-po)/rB enters the O-th dimension, a

fraction po( l-p& enters the l-st dimension without using links of the O-th dimension, while a fraction pop& l-p$/rB uses only links of the 2nd dimension. The messages entering a new dimension represent the main flow in that dimension. This flow may conflict against the messages (for example, &B for the O-th dimension) which are generated by other nodes and use the first link of the main flow in that dimension. For an explanation about the link rates, see Theorem 1.

l Flow among the links of a dimension. This phase concerns the conflict analysis and is discussed in Section 3.2.

l Exit from a dimension. At the exit from a dimension, a part of the flow requires links of successive dimensions, while another part reaches the data-trail phase. Since we are considering the dimension ordering strategy, the flow leaving the O-th dimension is divided into three parts: a fraction (1-po)( 1-PI)/% enters the

1-st dimension, a fraction ( I-p())p 1( l-&)/rE enters the

2nd dimension, and a fraction (l-po)plp2/~ reaches the data-trail phase. The flow from the 1-st dimension is divided into two parts: a fraction (1-pl)( I-p2)/~ enters the 2-nd dimension and a fraction (1-pl)pz/q reaches the data-trail phase. The flow from the 2-nd dimension goes entirely to the DT phase.

No-blocking path-hole phase

(kg + kl + k2)/2 - 1 (cycles> \

. 'On, P,p, (l-p2YzE

1

6

:

:

:

:

:

: .

PI TE

E/z km1 6 E

T, p/ ZE

(b p, )pl (I- p, 17,

Figure 1. Communication state diagram for wormhole routing in a /C&,X/C, torus with uni-directional links.

3.1 Conflict analysis

We discuss the conflict analysis for the 2-nd dimension, however the evaluation of the flows occurring in the other dimensions is analogous. All the flow rates are evaluated by using the flow conservation law. We distinguish three kinds of links: the frst ~12, the internal Ci2, and the last c(k2/2) 2. At the conflict barrier, the 2nd dimension has

an incoming flow that is the sum of the following three flows: the messages that use the links of the O-th and 2-nd dimension, while skipping the l-st; the communications that pass at least through the links of the l-st and 2-nd dimension, while skipping the O-th; the messages that transit through the links of the 2nd dimension only. These flows are denoted by the following equation [(l-

P0)Pl(l-P2)+(1_Pl)(l-~2)+po~1( l-p$]/~ that, after some simple algebra, becomes (I-p#~=‘yJr~. When the

gzE flow requires the first ~12 link, it conflicts against

the 29/q flow denoting the messages that have already obtained at least one link of the 2nd dimension and now require the c 12 link. The relation between flrL7 and 29/s will be determined by Theorem 1.

Once the drE flow has obtained the ~12 link, a fraction

of its communications joins the 1’)/zE flow which is

directed towards

joins the y/rE

the adjacent link ~22. The other fraction

flow. This latter represents the communications that use cl2 as last link and hence leave the k2-ring of the 2-nd dimension. The messages that reach the destination node between the ~12 and c22 links (or any other links of the 2nd dimension) enter the data- trail phase. Conversely, the a/rE and p/rE flows concerning the other dimensions include both the completed communications and the messages that continue their path through the links of other dimensions.

The 29/2E flow, in the attempt of reaching the node ~22, conflicts against the messages that requires ~22 as first link of the 2nd dimension. The previous analysis for the cl 2 link explains why this latter flow is equal to fly. Theorem 1 establishes an important relation between the flows that interest the nodes of the same dimension. The same approach can be used to derive the flows at any ci2 node, i=2, . . . . k2/2-1, while a different reasoning should

be apply to Ck2/2 2. The last node receives a ^//rE flow

that leaves the 2-nd dimension and reach the DT phase, and a fi/z~~ flow that requires other links of the 2-nd

dimension.

Theorem 1. Any flow go that requires the first link of the j-th dimension conflicts against a flow equal to flkj/2,.

Proof: Let us consider a kj-ring in any dimension of a

koxklxk2 cube having nodes l,..., kj* Moreover, let us

suppose that the q flow enters the ring at the node I and

continues towards the nodes kj, kIdI,..., 2. In using the

first link, the p flow conflicts against the other messages that have already entered the ring in other points and request the (kj,l) Since each these messages kj/2 links average, the link is only by

messages that requested as link of j-th dimension of the links: 1, kj/2. The

comes from observation that cp flow associated to of these message.

By Theorem 1 the nodes ~21 and

we obtain ~=&/2 and respectively.

3.2 of the time

We a analysis to each time the state This analysis based on

assumption that time value ‘the average that has be passed’ in other the time

a message from a point of network until destination node. example, any label in

1 denotes average time a message experiment from point until completion of transmission. Proceeding such a we obtain set of that allow to move TDT (time

complete the phase) back a closed for the latency time. the backward analysis hypothesis, is the that a ‘sees’ as as it into the while TDT

the final that can expressed by where W the number wires per

and L average length the message. backward flow is based the law

correlates the ‘seen’ before conflict point the time after this To this let us

a link having two flows with arrival times and rb, In addition, Q and denote the ‘seen’ after conflict point

the messages to the and the flow, respectively. are interested evaluate the TX which related to of the reaching the point. By the Little’s to the flow, we that r&., the fraction messages that waiting for link C. the hypothesis these messages a waiting with limited

the equation the residual [ 141 can be simply expressed by Td2 that is, half of the average waiting time. Therefore, we can write the following conflict equation:

Since we are considering a wormhole transm ission 9 we have not to incl .ude in the evaluation of TX any delay due to the messages of the same flow ‘a’. If a message belonging to this flow reaches the C link, all the preceding messages of the same flow have already released

the relative links and do not represent anymore a source of conflict. The simplicity of Equation (1) does not prevent

where c=&/2 as stated by Theorem 1. A similar analysis can be applied to obtain the parameters relative to the O-th dimension that is, TO, T2,o and T~,o.

We are now able to obtain the closed form equation for the average latency time as sum of the time for completing the PH phase plus the three times that a message may ‘see’ at the exit from the PH phase:

us to model even multiple sources of conflicts that can occur in some point of the network. For example, by applying this equation to the ck2/2 2 node (where 9 Tx=Tk2/2,2, Ta=T2, Tb=T2,2,q,=rE/fi we can denote the

time ‘seen’ from the last link of the 2nd dimension as

%tency = [(ko + kl + k2 l/2 - ‘]+ (1 - Poui,o + (2)

PO(~-P~)[TIJ +t1-PO)(l-Pl)T~l/(2~E)]+

POP1 Cl- P2 )[T1,2 + Cl- PO )PlU - P2) l

(10)

T~~/(~zE)+(~-PO)(~-P~)~~~/(~~E)]

If we proceed in an analogous way for the other links of the 2-nd dimension, we obtain the following equations:

Tk2 /2-1,2 2

= Tk2/2,2 + fl2,2 I 2% =T2+942/7E (3) 9

4. The Model with Bi-directional Links By RCdhg that T~=TDT and TDT=L/W, we can solve

Equation (4) with respect to the T2 2 unknown. By 9 The methodology presented in Section 3 can be easily applied even to other torus and hypercube topologies. Here, we demonstrate the flexibility of the approach by analyzing the performance of deterministic wormhole routing in a 3D symmetric torus with bidirectional links. Because of space limitations, we only point out the main differences between the model for bidirectional links and that for asymmetric torus with uni-directional channels. The basic approach and demonstrations are analogous to those adopted in Section 3, while the main differences concern the following issues:

choosing the minus sign because of consistency reasons, we obtain

T2,2 = 1-,/l-(k2 -2)yT2 1-Q

(k2 -2)y/@rE) (5)

If we apply Equation (1) to the ~12 node, we have:

The average pathlength in a symmetric k-q 3-cube with bi-directional channels and uniform traffic is equal to 3k/4. Therefore, the PH phase causes a delay equal to (3k/4)-1 cycles. Since we are considering a 3D network with bi- directional links, the messages leaving a processor can choose among 8 paths. Hence, each path represents only l/8 of the entire flow which is generated by a processor. The new flow entering each of the three dimensions is now denoted by the rates a=( 1 -p)/87E, p= ( l-

p)2/87~+p( l-p)/87~=& and J+=/J~( l-p)/87~+( l-

p)2/8~+p(l-p)2/8~=~p, wherep=l/k because of the symmetric topology consideration. When each of these flows requires the first link of a new dimension, it conflicts against two flows. For’ example, in the 2-nd dimension, the flow y conflicts

against the 3yand ky flows. The flow 3y is a fraction of the 4y flow that is generated by the same processor. It has used the links of the 0-th and 1-st dimension in an opposite direction with respect to the y flow, while it follows the same direction of the y flow for what concerns the 2-nd dimension. The ky flow denotes the messages that are generated by other processors. Analogously to the +#/2 flow of

where 29=&/2 on the basis of Theorem 1. Note that the

29/2E flow consists of several components that ‘see’ different times. Therefore, using T2 2 as time associated 9 to this flow causes an upper bound approximation that, however, does not affect the accuracy of the model. Once obtained 5,29 we can evaluate T1 that is, the time which

a message sees at the exit from the 1-st dimension. Looking at Figure 1 and recalling the basic law of the backward flow analysis, Tl can be expressed as

T1 = P2TDT + c1 - P2 )[Tl,2 + t1 - PO>m(l- P2) .

By proceeding analogously to the analysis carried out for the 2-nd dimension in equations (2).(6), we can write T2, 1 and T1,l as follows

T2,l = l-+(kl-2)~Tl /s

#1-2)8/(2~E) 0.9

the uni-directional network, it denotes the communications that are in the k-ring (because they have already obtained at least another link in this dimension) and now require the ~12 link. Theorem 2 estimates this flow as a function of the incoming flow

4Y l Theorem 2. Any flow @+3@ that uses the first link

of a new dimension conflicts against a flow equal to (k/#JbD=k@. The proof is similar to that of Theorem 1, and is omitted for space limitations. Theorem 2 motivates the entire set of flows that characterize the wormhole communications in a bidirectional network.

Following an analogous approach to that presented in Section 3, we can evaluate any average time, thereby obtaining the closed form equation for the average latency time in a k-ary 3-&e with bidirectional links:

Tlatency = (3k / 4 - 1) + (I - P)Q +

P(I-P)[~~,~+(I-~)*~~1/(16zE)]+

T2 T2 - T1,2+p(l-p)2x+(1-p)2s

16~ E .

(11)

5. Validation of the Models and Results

In this Section we validate the analytical models with uni- directional and bidirectional links, and use them to obtain some interesting performance parameters. We compare the results of the former model against the analytical and simulation results presented by Dally in [5].

Table 1 shows the average latency time as a function of the traffic l/r (expressed in bit/cycle) in a 16.ary 3-&e. Our analytical values are obtained by setting ko=kl=k2=16 and rE=d in Equation (10). These results are very close to the simulation values: the differences are below 5% for most of the points and reach 10% only for very high traffic rates. Moreover, Table 1 demonstrates that our approach improves the good results achieved by the Dally analytical model.

Model %Error

52 +1.9 56 +1.8 63 +3.2 73 +4.2 92 +9.5

133 -10.1

Dally’s Dally’s Simul. Anal.

51 52 55 57 61 66 70 78 84 90 148 125

% Error

+1.9 +3.6 +8.2

+11.4 +7.1

-15.5 Table 1. Comparison of estimated message latency times with analytical and simulation results proposed by Dally in [5]. Uni-directional 16~16x16 torus, single-flit buffers, L=200 bits, W=8, mean message length=25 flits.

To validate the analytical model for the 3D torus with bidirectional links we used our discrete event simulator.

Figure 2. Comparison of asymmetric and symmetric topologies in a 3D torus with 1000 nodes and uni-

The simulation results were obtained by means of the directional links.

Independent Replications Method implemented in Simula, while the confidence intervals (95% level) were evaluated on the basis of the Jackknife estimator.

Table 2 reports the analytical and simulation latency times for wormhole routing in a 3D torus with 216 nodes (6x6~6). The message generation is modeled as a Poisson

process with rE time cycles per node. The message routing is uniformly distributed, while the average message length is equal to 12 flits. This table demonstrates the accuracy of the analytical approach for any message transmission times. The errors are under 4% in most of the instances. We have to reach a message generation time equal to 75 cycles per node to experiment an error of about 10%.

zr: 1 Model 1 Simulation 1 %Error

1000 15.66 700 15.75 500 15.88 300 16.18 200 16.57 100 17.90

1 75 1 18.88 Table 2. Comparison of

15.77 -0.7 15.89 -0.9 16.07 -1.2 16.46 -2.3 17.25 -3.9 19.14 -6.5 21.12 -10.6

estimated message latency time with simulation. Bi-directional 6~6x6 torus, single-flit buffers, L96 bits, VV’8, mean message length=12 flits.

We used the analytical models to evaluate the performance of deterministic wormhole routing in various network topologies and scenarios. Figures 2 estimates the sensitivity of the latency time to the distribution of the links in each dimensions of an asymmetric 3D torus with lo3 nodes and uni-directional links. The message generation time is I/Q.

Latency (cycles) 120

100

80

60

, . . . . . . * l o*a.’ 1Ox10x10 ~~.~~~~~I 10X20x5

. . . . . . . . . . . . . “.I 2&10x5

- -.-.-.-., *~5x10

- 5x10~20

I

0,0008 0,0013 0,0018 0,0023

Msg Generation Rate

As expected, the symmetric topology achieves best performance, while if we exclude unrealistic unbalanced topologies such as 2x5~100, we do not observe big differences among the performance of asymmetric networks. However, the dimension ordering routing performs slightly better for the topologies that adopt an increasing ordering of their dimensions, that is k&&.

The results achieved for the 5x 10x20 topology were confirmed even by experiments relative to other dimensions. Conversely, for network with bidirectional channels, the advantages of an ordering of the dimensions were less evident.

Figure 3 compares the performance of networks with unidirectional and bidirectional links as a function of the message generation rate for three message lengths (L=96, 200, 400). The advantages of the topologies with bi- directional links are especially evident for medium-large messages.

150 Latency (cycles)

..“... 10 . . . . . L=96 (Bi)

. . . . . . . * . . . . . L=200 (Bi)

.mm V . ..” L-UXI (Bi)

I L=96 (hi)

V L=200 (Uni)

v L400 (hi)

3. m

~~‘*~~4*=***~~ Q

mQI=I’Q . . . . 0 . . . . . . . . m-0 ._...*.......,......9*~m~=*-.*.“-”

(8x8x8) 0 1 I I

0,000 0,003 0,006 0,009 Msg Generation Rate

Figure 3. Performance comparison among 30 tori with uni-directional and bi-directional links.

Although space limitations do not ahow to show all performance results, these experiments demonstrate the usefulness of the proposed methodology. it can be considered a unifying approach for evaluating closed formulas for the message latency time in any k-ary n-cube with uni-directional and bidirectional links. In this paper, we have applied this methodology to two topologies that were not yet solved through closed analytical formulas: a k~xkl xk2 torus with uni-directional links, and a

symmetric k-ary 3-c&e with bidirectional channels. The same approach can be applied to evaluate the performance of wormhole routing in a wide set of topologies such as hypercubes, symmetric and asymmetric tori, with both uni-directional and bi-directional links. Moreover,

preliminary results demonstrate that the flexibility of this approach is a fine basis for analyzing even the performance of adaptive wormhole routing.

References V.S. Adve, M.K. Vernon, “Performance analysis of mesh interconnection networks with deterministic routing”, IEEE Trans. on Parallel and Distributed Systems, v. 5, n. 3, Mar. 1994, pp. 225246. R.V. Boppana, S. Chalasani, “A comparison of adaptive wormhole routing algorithms”, Proc. of 20th Annual Int. Symp. on Computer Architecture, San Diego, CA, May 1993, pp. 351-360. A.A. Chien, J.H. Kim, “Planar-adaptive routing: low-cost adaptive networks for multiprocessors”, J. ACM, v. 42, n. 1, Jan. 1995, pp. 91-123. M. Colajanni, A. Dell’Arte, B. Ciciani, “Communication switching techniques and link- conflict resolution strategies: a comparison analysis”, Proc. of EUROSIM ‘95, Vienna, Sept. 1995. W.J. Dally, “Performance analysis of k-ary n-cube interconnection networks”, IEEE Trans. on Comp., v. 39, n. 6, June 1990, pp. 775-785. L. Gravano, G.D. Pifarre, P.E. Berman, J.L.C. Sanz, “Adaptive deadlock- and livelock-free routing with all minimal paths in torus networks”, IEEE Trans. on Parallel and Distributed Systems, vol. 5, no. 12, Dec. 1994, pp. 1233-1251. D.C. Grunwald, D.A. Reed, “Networks for parallel processors: measurements and prognostications”, Proc. of 3rd Conf. on Hypercube Concurrent Computers and Applications, New York, Jan. 1988, pp. 610-619. P. Kerrnani, L. Kleinrock, “Virtual Cut-Through: A new computer communication swithching technique”, Computer Networks, v. 3, 1979, pp. 267-286. J. Kim, C.R. Das, “Hypercube communication delay with wormhole routing”, IEEE Trans. on Computers, v. 43, n. 7, July 1994, pp. 806-814. D. Linder, J.C. Harden, “An adaptive and fault tolerant wormhole routing strategy for k-ary n- cubes”, IEEE Trans. on Computers, v. 40, n. 1, Jan. 1991, pp. 2-12. L.M. Ni, PK. McKinley, “A survey of womhole routing techniques in direct networks”, IEEE Computer, v. 26, n. 2, Feb. 1993, pp. 62-76. S. Ramany, D. Eager, “The interaction between virtual link flow control and adaptive routing in wormhole networks”, Proc. of ACM Int. Conf. on Supercomp., Manchester, July 1994, pp. 136-145. D.S. Reeves, E.F. Gehringer, A. Chandiramani, “Adaptive routing and deadlock recovery: A simulation study”, Proc. of 4th Hypercube Conference, 1989, pp. 33 1-337. K.S. Trivedi, Probability & Statistics with Reliability, Queueing and Computer Science Applications, Prentice Hall, New Jersey, 1982.