Routing for applications in NoC using ACO-based algorithms

8
Applied Soft Computing 13 (2013) 2224–2231 Contents lists available at SciVerse ScienceDirect Applied Soft Computing j ourna l ho me p age: www.elsevier.com/l ocate/asoc Routing for applications in NoC using ACO-based algorithms Luneque Silva Junior a,, Nadia Nedjah b , Luiza de Macedo Mourelle c a Systems Engineering and Computer Science Program, PESC/COPPE, Federal University of Rio de Janeiro, Brazil b Department of Electronics Engineering and Telecommunication, Faculty of Engineering, State University of Rio de Janeiro, Brazil c Department of Systems Engineering and Computation, Faculty of Engineering, State University of Rio de Janeiro, Brazil a r t i c l e i n f o Article history: Received 13 March 2012 Received in revised form 4 December 2012 Accepted 14 January 2013 Available online 26 January 2013 Keywords: Network-on-Chip Packet routing Ant colony optimization a b s t r a c t Networks-on-Chips (NoCs) have been used as an interesting option in design of communication infras- tructures for embedded systems, providing a scalable structure and balancing the communication between cores. Because several data packets can be transmitted simultaneously through the network, an efficient routing strategy must be used in order to avoid congestion delays. In this paper, ant colony algo- rithms were used to find and optimize routes in a mesh-based NoC, where several randomly generated applications have been mapped. The routing optimization is driven by the minimization of total latency in packets transmission between tasks. The simulation results show the effectiveness of the ant colony inspired routing by comparing it with general purpose algorithms for deadlock free routing. © 2013 Elsevier B.V. All rights reserved. 1. Introduction A System-on-Chip (SoC) is an integrated circuit composed by a full computer system. SoCs contains, within the same package, processors, memory, input–output controllers and specific appli- cation devices. This block structure follows a design methodology based on intellectual property (IP) cores. Components designed for a specific project can be reused in other SoCs, reducing design time. Thus, under an extremely simplified view, to increase the number of tasks performed by the SoC, the designer just need to add more IP cores with different features. The increase of SoC’s scale raises new design challenges. Among them is communication between IP cores. The blocks of a SoC are interconnected by a communication infrastructure, such as buses or point-to-point links. Each of these models has their limitations. Shared buses can cause high delays if multiple blocks need to transmit data simultaneously. This does not happen in point-to- point architectures. In turn, the communication structure need to be redesigned for each new system. For many SoC designs, it is desirable to use a framework as scalable as buses and as fast as point-to-point links. An architecture that includes these two fea- tures are Networks-on-Chip (NoC) [1]. In a NoC architecture, switches are interconnected by point-to- point links, thus describing a given network topology. An example of network topology is the mesh shown in Fig. 1. The switches are also connected to the IP cores that implement the system, Corresponding author. Tel.: +55 2132526580. E-mail addresses: [email protected], [email protected] (L. Silva Junior), [email protected] (N. Nedjah), [email protected] (L. de Macedo Mourelle). also called resources. Switches exchange information in the form of messages and packets. The information generated by a resource is divided into smaller parts and sent over the network. Once arrived, these packets are organized in the destination switch and then delivered to resource. This operation is similar to that performed by computer networks. The structure formed by a switch and a resource is called a network node. In the design of NoC-based systems, the communication infra- structure can be imported as a single configurable IP block. However, many are the ways to connect network and resources, in order to achieve the desired application. To assist the designer, computational tools for project, or EDAs (Electronic Design Automation), are used. The purpose of EDAs is to optimize inter- mediate stages of SoC and NoCs projects, in order to obtain a more efficient design implementation quickly. The EDA tools must be able to use information about the desired application (at a high level of abstraction) and, through successive stages of optimization, implement a solution that meets the design specification. This optimization may include several steps [15], such as task allocation, IP mapping and static routing. Delays in communication may occur in congestion situations, when multiple packets could be transmitted using the same switch at the same time. If the routing algorithm adopted in the NoC’s design is deterministic, the selection of the packet path from the source to the destination switch will not consider the load of intermediate switches those between the source and destination switch. If these switches are under a heavy traffic, a given packet can only be transmitted after the end of congestion. This occurs even if other switches, not selected for routing, are free for trans- mission. On the other hand, adaptive routing algorithms can be used in order to avoid network congestion. These algorithms use 1568-4946/$ see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.asoc.2013.01.009

Transcript of Routing for applications in NoC using ACO-based algorithms

Page 1: Routing for applications in NoC using ACO-based algorithms

R

La

b

c

a

ARRAA

KNPA

1

apcbaToI

tioStpbdpt

poa

n

1h

Applied Soft Computing 13 (2013) 2224–2231

Contents lists available at SciVerse ScienceDirect

Applied Soft Computing

j ourna l ho me p age: www.elsev ier .com/ l ocate /asoc

outing for applications in NoC using ACO-based algorithms

uneque Silva Juniora,∗, Nadia Nedjahb, Luiza de Macedo Mourellec

Systems Engineering and Computer Science Program, PESC/COPPE, Federal University of Rio de Janeiro, BrazilDepartment of Electronics Engineering and Telecommunication, Faculty of Engineering, State University of Rio de Janeiro, BrazilDepartment of Systems Engineering and Computation, Faculty of Engineering, State University of Rio de Janeiro, Brazil

r t i c l e i n f o

rticle history:eceived 13 March 2012eceived in revised form 4 December 2012

a b s t r a c t

Networks-on-Chips (NoCs) have been used as an interesting option in design of communication infras-tructures for embedded systems, providing a scalable structure and balancing the communicationbetween cores. Because several data packets can be transmitted simultaneously through the network, an

ccepted 14 January 2013vailable online 26 January 2013

eywords:etwork-on-Chipacket routing

efficient routing strategy must be used in order to avoid congestion delays. In this paper, ant colony algo-rithms were used to find and optimize routes in a mesh-based NoC, where several randomly generatedapplications have been mapped. The routing optimization is driven by the minimization of total latencyin packets transmission between tasks. The simulation results show the effectiveness of the ant colonyinspired routing by comparing it with general purpose algorithms for deadlock free routing.

nt colony optimization

. Introduction

A System-on-Chip (SoC) is an integrated circuit composed by full computer system. SoCs contains, within the same package,rocessors, memory, input–output controllers and specific appli-ation devices. This block structure follows a design methodologyased on intellectual property (IP) cores. Components designed for

specific project can be reused in other SoCs, reducing design time.hus, under an extremely simplified view, to increase the numberf tasks performed by the SoC, the designer just need to add moreP cores with different features.

The increase of SoC’s scale raises new design challenges. Amonghem is communication between IP cores. The blocks of a SoC arenterconnected by a communication infrastructure, such as busesr point-to-point links. Each of these models has their limitations.hared buses can cause high delays if multiple blocks need toransmit data simultaneously. This does not happen in point-to-oint architectures. In turn, the communication structure need toe redesigned for each new system. For many SoC designs, it isesirable to use a framework as scalable as buses and as fast asoint-to-point links. An architecture that includes these two fea-ures are Networks-on-Chip (NoC) [1].

In a NoC architecture, switches are interconnected by point-to-

oint links, thus describing a given network topology. An examplef network topology is the mesh shown in Fig. 1. The switchesre also connected to the IP cores that implement the system,

∗ Corresponding author. Tel.: +55 2132526580.E-mail addresses: [email protected], [email protected] (L. Silva Junior),

[email protected] (N. Nedjah), [email protected] (L. de Macedo Mourelle).

568-4946/$ – see front matter © 2013 Elsevier B.V. All rights reserved.ttp://dx.doi.org/10.1016/j.asoc.2013.01.009

© 2013 Elsevier B.V. All rights reserved.

also called resources. Switches exchange information in the form ofmessages and packets. The information generated by a resource isdivided into smaller parts and sent over the network. Once arrived,these packets are organized in the destination switch and thendelivered to resource. This operation is similar to that performedby computer networks. The structure formed by a switch and aresource is called a network node.

In the design of NoC-based systems, the communication infra-structure can be imported as a single configurable IP block.However, many are the ways to connect network and resources,in order to achieve the desired application. To assist the designer,computational tools for project, or EDAs (Electronic DesignAutomation), are used. The purpose of EDAs is to optimize inter-mediate stages of SoC and NoCs projects, in order to obtain a moreefficient design implementation quickly.

The EDA tools must be able to use information about the desiredapplication (at a high level of abstraction) and, through successivestages of optimization, implement a solution that meets the designspecification. This optimization may include several steps [15], suchas task allocation, IP mapping and static routing.

Delays in communication may occur in congestion situations,when multiple packets could be transmitted using the same switchat the same time. If the routing algorithm adopted in the NoC’sdesign is deterministic, the selection of the packet path from thesource to the destination switch will not consider the load ofintermediate switches – those between the source and destinationswitch. If these switches are under a heavy traffic, a given packet

can only be transmitted after the end of congestion. This occurseven if other switches, not selected for routing, are free for trans-mission. On the other hand, adaptive routing algorithms can beused in order to avoid network congestion. These algorithms use
Page 2: Routing for applications in NoC using ACO-based algorithms

L. Silva Junior et al. / Applied Soft Com

R

S

R

S

R

S

R

S

Resource

Switch Link

R

S

R

S

R

S

R

S

R

S

(a) Mesh topology

North

Sout h

West East

Resource

Switch

(b) Switch

naytn

eippnabTttcm

aiturfrpaaa

tion of highly volatile chemicals called pheromones in their neighbor

Fig. 1. Network-on-Chip architecture.

ot only the position of origin and destination nodes, but also thectual load condition of the network to calculate the route. Whenou find a region of network in use, the routing can set the packageo follow another path. This congestion-free path may, however, beot minimal. These two situations are shown in Fig. 2.

In order to overcome the congestion problem, and thus accel-rating the packet delivery, which, in turn, would allow for anmprovement of the whole execution time of the system, this paperroposes a route optimization step in the design of NoCs, or morerecisely, the use of an adaptive static routing In this routing, aetwork model provides the communication patterns required forpplication execution. The identification of routes is accomplishedy an optimization process to minimize the communication time.he search is always for a shortest path between source and des-ination nodes. If the intermediate switches of this path are in use,he algorithm should be able to find another route, so that theontention effects do not affect the transmission at all or, at leastinimally.In this paper, the algorithm used in the search for routes is the

nt colony optimization (ACO) [10]. This is an example of swarmntelligence, where a group of agents work together to find a solu-ion to a given problem. We compared the results of the networksing the proposed routing algorithms and those widely adoptedouting algorithms. The reminder of this paper is organized asollows. In Section 2, we review the related work in routing algo-ithms. In Section 3, we do an overview on ACO meta-heuristics. Theroposed routing is presented in Section 4. A brief description of

pplications and mapping is shown in Section 5. Simulation resultsre presented in Section 6. The paper closes with a conclusion and

description of future work in Section 7.

B

D

A C

(a) Ideal hypothetica l situ-ation.

B

D

A

(b) Blocked pac terministic routi

Fig. 2. Routing in a

puting 13 (2013) 2224–2231 2225

2. Related work

Many of the techniques used for routing in NoCs, such as the XYalgorithm, were originally developed for computer networks andmultiprocessor systems. The XY algorithm is a routing techniquewidely used in 2D mesh networks with wormhole switching. Itworks by sending packets over the network first horizontally (Xdimension), then vertically (Y dimension). In the context of NoCs,XY routing proves efficient due to its simplicity of implementationand because it is deadlock-free. Works that made use of this algo-rithm include the HERMES network [16] and the SoCIN network[19].

Glass and Ni have proposed the so-called Turn Model for adap-tive, livelock and deadlock-free algorithms [13]. A turn is a changeof 90◦ in the direction of packet transmission. The main idea ofthis model is to restrict the amount of turns that a packet routecan go through in order to avoid the formation of cycles thatcause deadlocks. Following this concept, three routing algorithmswere proposed by the authors: the West-First, the North-last andNegative-First. A related approach is the Odd-Even turn model[3] for designing partially adaptive deadlock-free routing algo-rithms. Unlike the turn model, which relies on prohibiting certainturns in order to avoid deadlock, this model restricts the loca-tions where some types of turns can be taken. As a result, thedegree of routing adaptiveness provided is more even for differentsource–destination pairs.

The work of Jose Duato has addressed the mathematical foun-dations of routing algorithms. His main interests have been in thearea of adaptive routing algorithms for multicomputer networks.Most of the concepts are directly applicable to NoC. In [11], the the-oretical foundation for deadlock-free adaptive routing in wormholenetworks is given.

Even ACO meta-heuristics has been used in efficient routingalgorithms, as the case of AntNet made by Di Caro [5]. This it isa dynamic routing for telecommunications networks.

3. Ant colony optimization

Ant algorithms, also known as ant colony optimization (ACO)[10], are a class of heuristics search algorithms, that have been suc-cessfully applied to solving NP hard problems [2]. Ant algorithmsare biologically inspired in the behavior of colonies of ants, and inparticular how they forage for food. One of the main ideas behindthis approach is that the ants can communicate with one anotherthrough indirect means by making modifications to the concentra-

environment. As it has been shown [14], indirect communicationamong ants via pheromone trails enables them to find shortestpaths between their nest and food sources. The most emphatic and

C

ket s in de-ng.

B

D

A C

(c) Non-minima l path s inadaptive routing.

3 ×3 mesh.

Page 3: Routing for applications in NoC using ACO-based algorithms

2226 L. Silva Junior et al. / Applied Soft Com

Food Foo d Foo d

bbtFstopts

rnltpat

A123456

uiacptoibe

4

epta

4

cn

T = 1 T = 2 T = 3

Fig. 3. Pheromone concentration in the double bridge experiment.

est known example of the use of pheromones by ants is the dou-le bridge experiment. An ant nest is connected to a food source bywo bridges with different lengths. This configuration is shown inig. 3. Initially, ants choose equally both ways. However, opting forhorter path are able to go back to the food supply before the antshat follow the long way. Thus, also the concentration of pheromonen the shortest path will be greater from the moment the ants com-lete the round trip. This capability of real ant colonies has inspiredhe definition of artificial ant colonies that can find approximateolutions to hard combinatorial optimization problems.

The core ideas of ACO are (i) the use of repeated simulations car-ied out by a population of artificial agents called “ants” to generateew solutions of problem, (ii) the use by the agents of stochastic

ocal search to build the solutions in an incremental way, and (iii)he use of information collected during past simulations (artificialheromones) to direct future search for better solutions. Severalnt algorithms make use of the structure shown in Algorithm 1 [8],he ACO meta-heuristics.

lgorithm 1. ACO meta-heuristics: initialize parameters and pheromone trails;: while termination condition not met do: construct ant solutions;: local search (optional);: update pheromone trails;: end while;

In the artificial ant colony approach, each ant builds a solution bysing two types of information locally accessible: problem-specific

nformation, and information added by ants during previous iter-tions of the algorithm. In fact, while building a solution, each antollects information on the problem characteristics and on its ownerformance, and uses this information to modify the representa-ion of problem, as seen locally by the other ants. The representationf the problem is modified in such a way that information containedn past good solutions can be exploited to build new and hopefullyetter ones. This form of indirect communication mediated by thenvironment is called stigmergy, and is typical in social insects.

. ACO-based routing

The ant colony optimization, with the ability to search for paths,merged as a powerful solution for routing problems. Thus, thisaper presents the use of the ACO meta-heuristics in the construc-ion of routing algorithms. Two models of static routing for NoCsre proposed.

.1. Network specification

In this work, the network uses switches with five communi-ation ports. Four ports are responsible for communication witheighboring switches and one is for local communication with

puting 13 (2013) 2224–2231

the resource. The switches are considered bufferless using no vir-tual channels. The network topology is a two dimension mesh.The switching technique adopted was the wormhole [18]. In thismethod, packets are divided into smaller units called flits (flow-units). It is assumed that each communication channel has a widthof one flit. The transmission of flits is performed in a pipelineway. The latency of a packet sent trough the network in worm-hole switching is given by Eq. 1, where tflit is the transmission timeof a flit in a channel, D is the number of switches in a path, L is thetotal length of a packet (in bits), W is the length of a channel, andLdelay is the number of bits that would have been transmitted in aperiod of congestion.

Tpacket = tflit ·(

D +[

L

W

]+

[Ldelay

W

])(1)

4.2. Proposed routing algorithms

In this paper, we propose two algorithms to perform routingusing the ACO meta-heuristics. These algorithms are called REAS(routing based on Elitist Ant System [10]) and RACS (routing basedon Ant Colony System [9]). Both algorithms search for paths in anarchitecture characterization graph that represents the network 2Dmesh topology. These algorithms make use of multiple ant colonies,where each colony is responsible for searching the route of a packet.In this approach, each of which maintain its own pheromone andants. Nevertheless, the colonies must exchange information inorder to minimize the latency of their respective packets. Therefore,the route found by an ant in a given colony is visible to the ants inother colonies, because these packets are being transmitted simul-taneously and within the same network. A simplified pseudo-codeof REAS is shown in Algorithm 2.

Algorithm 2. REAS algorithmRequire network and ACO parameters;1: while total of cycles do2: for k = 1 → number of ants do3: for g = 1 → number of packets do4: Antk,g constructs a solution;5: compute Antk,g pheromone;6: end for7: compute the elitist pheromone;8: accumulate the pheromone of actual ants;9: endfor10: update the global pheromone;11: endwhile12: return best solution;

In the proposed algorithms, ants in a network node are onlyaware of two things: the first is the pheromone concentration inthe surrounding nodes; the second is the load on a node, whichdictates the waiting time in each of the four possible transmissiondirections.

The Elitist Ant System (EAS) is directly inspired by the Ant System,the first of ant algorithms [10]. The EAS is characterized mainly bythe use of the concept of elitism, in order to differentiate ants thatcarry out better solutions. In the REAS algorithm, ants build pathsthrough the network selecting the next node based on (2), wherepk

ijis the probability of ant k going from node i to node j.

pkij(t) =

⎧⎪⎪⎨⎪⎪⎩

�j(t)˛ · �ijˇ∑

k∈ak�k(t)˛ · �ik

ˇif j ∈ ak

0 otherwise(2)

The probability of selecting a particular direction is function ofpheromone concentration and network load in that direction. Thesetwo parameters are weighted by an importance constant and ˇ,

Page 4: Routing for applications in NoC using ACO-based algorithms

ft Computing 13 (2013) 2224–2231 2227

r(

Auibtds

Twbp

TSwc

j

Anod

db

uad

Tpua(

Faoatbwt

A

D

CB

E

Rando mMapping

C

E

BD

A

L. Silva Junior et al. / Applied So

espectively. The network load is used indirectly by �ij, defined in3), where Cij represents the load in transmission from i to j.

ij =1Cij

(3)

t end of each iterative cycle, the pheromone of all colonies ispdated according to (4). Part of pheromone of previous iteration

s reduced by evaporation rate �, and then reinforced by the contri-ution of all m ants in current cycle. The pheromone also receiveshe reinforcement of elitist ants: those that achieve best solutionseposit their pheromone in every cycle, so directing the search inubsequent cycles.

t+1 = (1 − �) · �t +m∑

k=1

��k + �elite (4)

he pheromone in the path find by a single ant k is defined in (5),here Q is a constant and Lk represents the total latency imposed

y the solution. It is easy to note that the ants with the worst resultsrovide a smaller amount of pheromone.

�k = Q

Lk(5)

he second ant algorithm is described as follows. The Ant Colonyystem [9] differs from others by the method of selection and theay which the pheromone is updated. Thus, the RACS uses the so-

alled pseudo-random proportional rule.

={

arg maxi ∈ [1,4]

(�i · �ˇi

) if q ≤ q0,

S otherwise(6)

s shown in (6), the probability for an ant to move from node i toode j depends on a random parameter q, uniformly distributedver [0, 1], and a parameter q0. If q ≤ q0, then the next node isirectly selected by arg max

i∈[1,4](�i · �ˇ

i). Otherwise, the next node is

efined by S, that uses a selection method similar to that employedy REAS, as defined in (2).

Furthermore, the RACS algorithm uses a double pheromonepdate. The first update, called local update, is performed by allnts in each step of construction of a solution. This local update isefined in (7).

t+1 = (1 − �) · �t + � · �0 (7)

he parameter � is the evaporation constant, and �0 is the initialheromone at each node. The second pheromone update, the offlinepdate, is applied at the end of each iteration only by the best-so-farnt (the ant that found the best solution). The update is given by8), where ��j is the reinforcement of the best ant pheromone.

jt+1 =

⎧⎪⎪⎨⎪⎪⎩

(1 − �) · �jt + � · ��j if j ∈ best

�jt otherwise (8)

or the tests (either with random traffic or task graphs mapped) wedopt the same stopping criterion, in this case, a maximum numberf cycles. It was defined as 200 cycles. For this value, the REAS wasble to optimize routes getting better results than other methods, so

his amount of cycles was chosen. A higher number of cycles coulde employed to search for still better results, but in most cases thisould be a waste of computation. To maintain a fair comparison,

his amount was also adopted for the executions of RACS.

Fig. 4. Application task graph mapped randomly in a NoC.

5. Applications in NoC

In general, NoCs are developed to perform a specific application.This application can be described initially as a software that mustbe embedded in hardware. The EDA tool must adjust characteristicsof NoC and application so that the execution is more efficient.

5.1. Task graphs

Every application, in any type of computer system, can bedescribed by a task graph. This is a data structure in which theapplication is divided into blocks responsible for specific tasks.These blocks, in turn, exchange information in order to completethe application execution. The number of blocks may be higher orlower depending on the level of abstraction adopted. Thus, the taskgraph is denoted by GT = G(T, D), an acyclic and weighted directedgraph. Each node of T is a task, or an application processing module.In general, an operation is a well defined task, as a mathematicalcalculation or a data encoding. Each arc of the set D characterizesthe data dependencies between two tasks.

5.2. Random mapping

As mentioned in Section 1, an EDA tool can perform certain pro-cesses, such as allocation, mapping and routing of applications inNoCs. In the allocation process, cores are selected in an IP repos-itory, and then associated with each application task. In turn, themapping deals with how the IPs are spatially distributed in theNoC topology. Both processes are intended to reduce the overallexecution time of application.

In this paper, the routing is emphasized. Thus, it is consideredthat an allocation step was previously performed, and the resourcespecifications are already available for use. In the mapping step,we employed a simple random mapping process. In the randommapping, each node of the application characterization graph isassociated to a node of the architecture characterization graph. Theway in which this association is made is random: a node of theapplication graph selects from a list of a node of the architecturegraph; the chosen position is removed from the list and the pro-cess repeats until all nodes of the application graph have a definedposition in the architecture graph. Fig. 4 illustrates this process. Thetasks of the graph are associated with five nodes in a 3 × 3 network.

The mapping also defines the number of nodes in NoC. The num-ber N of nodes in a mesh must be sufficient to map an applicationwith P tasks. Thus, because it’s a 2D mesh, the relationship betweenN and P is defined by (9).

N =⌈√

P⌉2

(9)

The choice of a random mapping was deliberate, in order to high-light the impact of routing. Intelligent mapping techniques [4][17] optimize system performance, positioning IPs with large data

Page 5: Routing for applications in NoC using ACO-based algorithms

2228 L. Silva Junior et al. / Applied Soft Com

Table 1Simulation parameters.

Routing algorithms REAS, RACS, XY, OETraffic pattern Uniform, Matrix Transpose

erswtc

6

lesaw

(io

6

fnIplwtp

fbirahara

Number of packets 10, 20, 30, 40, 50, 60, 70, 80, 90, 100Injection rate 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%

xchange close together, thus reducing the effort of routing algo-ithm. In this work, the interest is exactly the opposite: the routinghould be robust enough to ensure efficient communication evenhen the mapping is not optimal. In these situations, it is clear that

he search and route optimization becomes a harder task whenompared to an optimized mapping.

. Evaluation experiments and results

A cycle-accurate network simulator was implemented in Mat-ab. It supports 2D mesh networks with wormhole switching. Tovaluate the performance of the proposed methods, networks wereimulated with four different routing algorithms: REAS, RACS, XYnd Odd-Even (OE). The time unit adopted is the simulator cycle,here one cycle is the transmission time of one flit.

All algorithms were executed with Matlab Version 7.7.0.471R008b). The simulations were performed on PCs with Intel Core7 950 3 GHz, 8 Gb RAM and Microsoft Windows 7 Home Premiumperating system.

.1. Simulation with random traffic

In this experiment, we use random traffic to simulate the per-ormance of the network under different routing strategies. Theetwork was simulated with size of 5 × 5, a square of 25 nodes.

ncreasing the number of nodes will result in an increase in therocessing time of the routing algorithm. This is not really a prob-

em, since it is an offline optimization. The set of simulation testsere performed varying the network routing algorithm, the pat-

ern of traffic generation, the rate of injection and the number ofackets. These parameters are shown in Table 1.

The source–destination pairs are generated following two dif-erent distribution patterns, as shown in Fig. 5. These patterns areased on models widely used in the evaluation of communication

n multiprocessor and distributed systems [12]. The uniform is aandom pattern, because both the source and destination nodesre chosen in a randomly way. In the uniform pattern, all nodesave the same probability of being selected. The matrix transpose is

deterministic pattern. Although the selection of source nodes isandom (following the uniform distribution), the destination nodesre selected according to the position of the source nodes. In the

(2,3)

(3,1) (3,2) (3,3)

(1,1) (3,1)(2,1)

(2,2)(2,1)

(a) Uniform

(3,1) (3,2) (3,3)

(2,2) (2,3)

(1,1) (3,1)(2,1)

(2,1)

(b) Matri x Transpose

Fig. 5. Possible communication pairs in a 3 × 3 mesh.

puting 13 (2013) 2224–2231

matrix transpose pattern, for a source node in the position (x, y),the destination node is in the position (y, x).

For each simulation, we obtained the total latency and the aver-age latency per packet. Total latency is the sum of the individuallatency of all packets being transmitted on the network. The indi-vidual latency is the amount of simulation cycles that have elapsedsince the injection of the first flit of a packet until the beginningof injection of the next packet of the same message. The averagelatency is the latency value obtained divided by the total number ofpackets. The general purpose of these tests is to verify the variationof latency under different injection rates.

Results are shown in Fig. 6. The curves oflatency/packet × injection rate are, in fact, a mean of the valuesobtained for different quantity of packets. Each graph illustratesthese curves for the four routing algorithms adopted.

For both uniform and matrix transpose traffic patterns, thelatency curve of REAS is located below the curves of the other meth-ods, indicating its ability to search for routes that provide a shortertransmission time. This performance is slightly better than the oth-ers at low injection rates, becoming more evident in rates above50%. The RACS has a latency curve that differs for each traffic pat-tern. For the uniform pattern, latency is close to that obtained byXY and OE. As for the matrix transpose pattern, latency values arecloser to those found by REAS.

6.2. Simulation with applications

In the simulations, we used five sets of graphs of synthetic appli-cations. These graphs were randomly generated with the aid ofthe software Task Graph For Free (TGFF) [7]. The TGFF is a generalpurpose, user controllable pseudo-random graph generator, widelyused in embedded real-time systems research. The software worksbased on a script, where the user defines parameters such as totalnumber of tasks, levels, or tasks per level. A example of graph cre-ated by TGFF is shown in Fig. 7, with two intermediary levels, eachone with three tasks.

The task graph is composed by a set of nodes (rectangles) andarcs (arrows). The numbers in each node is a task index, where “1”is the start task and “2” is the end task, and the execution time ofthe task, in cycles. The values in the arcs are the number of bits ofthe transmitted packet. In orther to explore the behavior of applica-tions with parallel characteristics, tasks were generated followinga fork-join structure, with the start task sending packets to severaldestinations, and the end task receiving packets from several ori-gins. Between start and end tasks exist intermediate tasks, arrangedin various levels of parallelism. Tasks at the same level can runconcurrently and independently.

Therefore, 50 graphs are generated, being arranged in 5 sets of 10graphs. Each set – called Ex1, Ex2, Ex3, Ex4 and Ex5 – has a differentcharacteristic on the maximum number of tasks. Within a set, eachof the ten tasks are differentiated by the number of intermediatelevels.

In the Ex1 set, graphs are generated based only in the total num-ber of tasks. The structure of nodes and arcs are build in a randomlyway, with only restriction the format of fork-join graph. Thus, thegraph Ex1.1 (the first of set Ex1) is composed of ten tasks, whileEx1.10 (the last) is composed of a total of one hundred tasks. Inthe other four sets, the structure of each application graph is basedmainly in the number of tasks per level and the number of levels.Applications in Ex2 have two tasks per level – the set Ex3 have threetasks, and so on. In each graph from Ex2.1 to Ex2.10, the number oflevels vary from one to ten intermediary levels. Table 2 shows the

number of levels and the total number of tasks in all five sets ofgraphs.

From the information of a graph, the routing can be accom-plished by identifying which packets are generated by tasks at the

Page 6: Routing for applications in NoC using ACO-based algorithms

L. Silva Junior et al. / Applied Soft Computing 13 (2013) 2224–2231 2229

100%80%60%40%20%0%20

25

30

35

40

45

50

55

REASOEXYRACS

Injection rate

Lat

ency

/ pa

cket

(cy

cles

)

Uniform(a)

100%80%60%40%20%0%20

25

30

35

40

45

50

55

60

65

70

REASOEXYRACS

Injection rate

Lat

ency

/ pa

cket

(cy

cle)

TransposeMatrix(b)

Fig. 6. Results for the network under

Fig. 7. Example of TGFF generated graph.

Table 2Number of intermediary levels and tasks in all application graphs.

Ex1 Ex2 Ex3 Ex4 Ex5

Lvs. Tasks Lvs. Tasks Lvs. Tasks Lvs. Tasks Lvs. Tasks

1 6 10 1 4 1 5 1 6 1 72 8 22 2 6 2 8 2 10 2 123 10 30 3 8 3 11 3 14 3 174 12 42 4 10 4 14 4 18 4 225 14 50 5 12 5 17 5 22 5 276 16 62 6 14 6 20 6 26 6 327 16 70 7 16 7 23 7 30 7 378 16 82 8 18 8 26 8 34 8 429 18 90 9 20 9 29 9 38 9 47

10 22 102 10 22 10 32 10 42 10 52

two different traffic patterns.

same level, i.e., which packets may be transmitted simultaneously.Algorithm 3 was used to perform this process.

Algorithm 3. Mapping and routing of applicationRequire Task Graph;1: define size of NoC;2: perform the mapping;3: forl = 1 → # levels do4: get all arcs in level l;5: read tstart of source tasks;6: perform the routing;7: write tstart of destination tasks;8: end for9: texecution← tstart(last task) + tcomp(last task)10: return routing paths, texecution;

6.2.1. ResultsThe simulations were performed by submitting applications to

four different routing algorithms and measuring its total execu-tion time. This consists of execution of all individual tasks on acritical path plus the communication time of these tasks. The so-called packet delay is the difference between the value obtainedusing a specific routing algorithm and the optimal value of the net-work without congestion. To calculate this ideal value, we used amodified XY algorithm, called dummy XY. In this routing, the XYalgorithm is used to define the communication time using shortestpaths. But unlike the real XY (and any other routing algorithm), thepotential congestion delays are not counted.

Fig. 8 shown results of the performed simulations. The values ofpacket delay is presented for the four routing algorithms in eachof 50 applications. These values are a mean of packet delay in 10different mappings. The results show the REAS getting in generalsmaller values of latency when compared with other routing algo-rithms. However, REAS was exceeded in 12 of the 50 tests. The lowdelay values show that the REAS is able to find good solutions torouting problem, independent of mapping or complexity of graph.For routing based on XY algorithm and Odd-Even turn model, thereis a wide variation in average delays obtained for a given set ofgraphs. This large deviation in the delay values may suggest that XYand OE are very sensitive to mapping adopted, even more than thecomplexity of the application. This may explain the cases in whichthese ones performed better than ant-based algorithms: for “easy”applications, where XY and OE can possibly find the best minimum

paths, REAS and RACS will still need to perform the whole pro-cess of search and optimization route. So, due to heuristic nature,there is no guarantee that the best possible path will be found ina limited time. The worst results were obtained with the second
Page 7: Routing for applications in NoC using ACO-based algorithms

2230 L. Silva Junior et al. / Applied Soft Computing 13 (2013) 2224–2231

Ex1

.1

Ex1

.2

Ex1

.3

Ex1

.4

Ex1

.5

Ex1

.6

Ex1

.7

Ex1

.8

Ex1

.9

Ex1

.10

1

10

100

1000REASXYRACSOE

Pack

et d

elay

(cy

cles

)

(a) Ex1

Ex2

.1

Ex2

.2

Ex2

.3

Ex2

.4

Ex2

.5

Ex2

.6

Ex2

.7

Ex2

.8

Ex2

.9

Ex2

.10

0

2

4

6

8

10

12

REASXYRACSOE

Pack

et d

elay

(cy

cles

)

(b) Ex2

Ex3

.1

Ex3

.2

Ex3

.3

Ex3

.4

Ex3

.5

Ex3

.6

Ex3

.7

Ex3

.8

Ex3

.9

Ex3

.10

0

5

10

15

20

25

30REASXYRACSOE

Pack

et d

elay

(cy

cles

)

(c) Ex3

Ex4

.1

Ex4

.2

Ex4

.3

Ex4

.4

Ex4

.5

Ex4

.6

Ex4

.7

Ex4

.8

Ex4

.9

Ex4

.10

0

10

20

30

40

50REASXYRACSOE

Pack

et d

elay

(cy

cles

)

(d) Ex4

Ex5

.1

Ex5

.2

Ex5

.3

Ex5

.4

Ex5

.5

Ex5

.6

Ex5

.7

Ex5

.8

Ex5

.9

Ex5

.10

010203040506070

REASXYRACSOE

Pack

et d

elay

(cy

cles

)

(e) Ex5

pt

6

tRuoRTt

Table 3Significance levels obtained with �2-test.

Degrees offreedom

�2 Critical �2 Significance

Ex1 REAS × XY 8 155.5 26.12 99.90%REAS × OE 8 124.48 26.12 99.90%REAS ×RACS 8 68.38 26.12 99.90%

Ex2 REAS × XY 5 32.94 20.52 99.90%REAS × OE 5 13.25 12.83 99.90%REAS × RACS 6 120.12 22.46 97.50%

Ex3 REAS × XY 8 24.43 24.35 99.80%REAS × OE 8 33.47 26.12 99.90%REAS ×RACS 8 105.25 26.12 99.90%

Ex4 REAS ×XY 9 90.93 27.88 99.90%REAS × OE 9 99.36 27.88 99.90%REAS × RACS 9 37.57 27.88 99.90%

Ex5 REAS × XY 9 144.53 27.88 99.90%

through the network need only be defined one time. In this paperwe propose the use of ACO-based algorithms in the optimization of

Fig. 8. Packet delay in 5 sets of applications.

roposed routing. The values found by RACS are increasing due tohe complexity of the used graphs.

.2.2. Statistical analysisA given set of statistical results have significance if it is unlikely

hat these have occurred by chance. The presented results show theEAS being able to get better results than other routing algorithmssed for comparison. In order to determine whether the resultsbtained by REAS are significantly better than those obtained byACS, XY and OE, we performed a statistical test of significance.

he most commonly used method of comparing proportions useshe �2-test [6]. This test makes it possible to determine whether

REAS × OE 9 166.6 27.88 99.90%REAS × RACS 9 31.21 27.88 99.90%

the difference existing between two groups of data is significant orjust a chance occurrence.

Eat =∑

x∈AOxt ·∑

y∈T Oay∑(x,y)∈A×T Ox,y

(10)

For the sake of completeness, we explain briefly how the testworks. �2-test determines the differences between the observedand expected measures. The observed values are the actual experi-mental results, whereas the expected ones refer to the hypotheticaldistribution based on the overall proportions between the twocompared algorithms if these are alike. Tests were conducted toREAS × XY, REAS × OE and REAS × RACS separately for each set ofsimulations (Ex1–Ex5) using the obtained values of latency. Thecalculation of the expected value is performed with (10). The valueof �2 is calculated with (11), where A and T are the sets of simula-tions and algorithms used in each test. The value Oat is the observedlatency in the simulation t by the routing algorithm a.

�2 =∑

(a,t)∈A×T

(Oat − Eat)2

Eat(11)

The computed values for �2 for each of the comparisons are givenin Table 3. The use of the �2-test is recommended when the pro-portions are small. Therefore, the time quantities were convertedto 0.1 cycle instead of 1 cycle, thus avoiding the limitation imposedfor the usage of the test.

The critical value of �2 is 0.05 (i.e. 95% of confidence) and con-sidered the limit to assume the tested hypothesis. The degree offreedom depends on the amount of results used to compute �2.Assuming that the results are organized in a two-dimensional arrayof r rows and c columns, the degree of freedom is defined by(r − 1) × (c − 1).

As can be seen in the Table 3, null hypothesis may be discardedin all cases. Based on this statistical analysis, the REAS can be con-sidered significantly better than the algorithms XY, OE and RACS

7. Conclusion

Static routing is an efficient solution in NoC-based systemsdesigned to perform a fixed set of tasks, or where the communi-cation pattern is known. This is because the communication paths

paths in the static routing step in NoC design. The performance ofthese algorithms was evaluated using randomly generated graphs,

Page 8: Routing for applications in NoC using ACO-based algorithms

ft Com

mwtu

A

EvAfi

R

[

[

[

[

[

[

[

[

L. Silva Junior et al. / Applied So

odeled to imitate applications with parallel tasks. Best resultsere obtained with REAS algorithm. Future work may study how

o increase the performance of proposed algorithms and their eval-ation with real-world applications.

cknowledgments

We are grateful to FAPERJ (Fundac ão de Amparo à Pesquisa dostado do Rio de janeiro) and CNPq (Conselho Nacional de Desen-olvimento Científico e Tecnológico) and CAPES (Coordenac ão deperfeic oamento de Pessoal de Ensino Superior) for their continuousnancial support.

eferences

[1] L. Benini, G. De Micheli, Networks on chips: a new soc paradigm, Computer 35(1) (2002) 70–78.

[2] E. Bonabeau, M. Dorigo, G. Theraulaz, Swarm Intelligence: From Natural toArtificial Systems. Number 1, Oxford University Press, USA, 1999.

[3] G.M. Chiu, The odd-even turn model for adaptive routing, IEEE Transactions onParallel and Distributed Systems 11 (7) (2000) 729–738.

[4] M.V.C. Da Silva, N. Nedjah, L.M. Mourelle, Efficient mapping of an imageprocessing application for a network-on-chip based implementation, Inter-national Journal of High Performance Systems Architecture 2 (1) (2009)46–57.

[5] G. Di Caro, M. Dorigo, Antnet: distributed stigmergetic control for com-

munications networks, Journal of Artificial Intelligence Research 9 (1998)317–365.

[6] P. Diaconis, B. Efron, Testing for independence in a two-way table: new inter-pretations of the chi-square statistic, The Annals of Statistics 13 (3) (1985)845–874.

[

[

puting 13 (2013) 2224–2231 2231

[7] R.P. Dick, D.L. Rhodes, W. Wolf, Tgff: task graphs for free, in: Proceedings of the6th International Workshop on Hardware/Software Codesign, IEEE ComputerSociety, 1998, pp. 97–101.

[8] M. Dorigo, M. Birattari, T. Stutzle, Ant colony optimization, Computational Intel-ligence Magazine, IEEE 1 (4) (2006) 28–39.

[9] M. Dorigo, L.M. Gambardella, Ant colony system: a cooperative learningapproach to the traveling salesman problem, IEEE Transactions on EvolutionaryComputation 1 (1) (1997) 53–66.

10] M. Dorigo, V. Maniezzo, A. Colorni, Ant system: optimization by a colony ofcooperating agents, IEEE Transactions on Systems, Man, and Cybernetics, PartB: Cybernetics 26 (1) (1996) 29–41.

11] J. Duato, A new theory of deadlock-free adaptive routing in wormholenetworks, IEEE Transactions on Parallel and Distributed Systems 4 (12) (1993)1320–1331.

12] J. Duato, S. Yalamanchili, L.M. Ni, Interconnection Networks: An EngineeringApproach, Morgan Kaufmann, 2003.

13] C.J. Glass, L.M. Ni, The turn model for adaptive routing, in: ACM SIGARCH Com-puter Architecture News, vol. 20, ACM, 1992, pp. 278–287.

14] S. Goss, S. Aron, J. Deneubourg, J. Pasteels, Self-organized shortcuts in theargentine ant, Naturwissenschaften 76 (1989) 579–581.

15] A. Hansson, K. Goossens, A. Radulescu, A unified approach to constrained map-ping and routing on network-on-chip architectures, in: Proceedings of the 3rdIEEE/ACM/IFIP International Conference on Hardware/Software Codesign andSystem Synthesis, ACM, 2005, pp. 75–80.

16] F. Moraes, N. Calazans, A. Mello, L. Moller, L. Ost, Hermes: an infrastructure forlow area overhead packet-switching networks on chip, Integration, The VLSIJournal 38 (1) (2004) 69–93.

17] N. Nedjah, M.V.C. Da Silva, L. Mourelle, Customized computer-aided applicationmapping on noc infrastructure using multi-objective optimization, Journal ofSystems Architecture: The EUROMICRO Journal 57 (1) (2011) 79–94.

18] L.M. Ni, P.K. McKinley, A survey of wormhole routing techniques in directnetworks, Computer 26 (2) (1993) 62–76.

19] C.A. Zeferino, A.A. Susin, Socin: a parametric and scalable network-on-chip, in:Proceedings of 16th Symposium on Integrated Circuits and Systems Design,SBCCI 2003, IEEE, 2003, pp. 169–174.