Distributed Routing Algorithms. In a message passing distributed system, message passing is the only...

Distributed Routing Algorithms

In a message passing distributed system, message passing is the only means of interprocessor communication.

Unicast, Multicast, Broadcast Communication latency in a distributed

system depends on the following factors:

Topology Routing Flow control Switching

Topology Network topology can be classified as

general purpose and special purpose. A general purpose network does not have a

uniform and structured formation while a special purpose network follows a predefined structure.

Switching store-and-forward that includes packet switching cut-through that includes circuit switching, virtual cut-through, and

wormhole. Store-and-forward switching: a message is divided into packets that

can be sent to a destination via different paths. When a packet reaches an intermediate node, the entire packet is then forwarded to the next node.

Circuit switching: a physical circuit is constructed before the transmission. After the packet reaches the destination, the path is destroyed.

Virtual cut-through switching: the packet is stored at the intermediate node only if the required channel is busy; otherwise, it is forwarded immediately without buffering.

Wormhole differs from virtual cut-through in two aspects:

(1) Each packet is further divided into a number of flits.

(2) When the required channel is busy, instead of buffering the remaining flits by removing them from the network channels, the flow control blocks the trailing flits and they stay in flit buffers along the established route.

At the system level, the main difference between store-and-forward and cut-through is that the former is sensitive to the length of the selected path while the latter, especially in wormhole routing with pipelined flits, is almost insensitive to path length in the absence of network congestion. That is, one unicasting to any destination is considered one step.

The objective of using the store-and-forward model is to minimize the path length.

The objective of using the cut-through model is to reduce network congestion.

Type of communication Unicast, Multicast, Broadcast. Personalized: a source sends different

messages to different destinations.

Routing Routing algorithms can be classified as : Special purpose vs. general purpose Minimal vs. nonminimal Deterministic vs. adaptive Source routing vs. destination routing Fault-tolerant vs. non fault-tolerant Redundant vs. non redundant Deadlock-free vs. non deadlock-free

General vs. Special Purpose General purpose algorithms are suitable for

all types of networks but may not be efficient for a particular network. Special-purpose algorithms are usually efficient by taking advantage of the topological properties of specific networks.

Minimal vs. Nonminimal Minimal-path algorithms provide a least

cost path between source and destination. This scheme can lead to congestion in parts of a network. A nonminimal routing scheme may route the message along a longer path to avoid network congestion.

Deterministic vs. Adaptive In a deterministic algorithm the routing

path changes only in response to topological changes in the underlying network and does not use any information regarding the state of the network. In a dynamic algorithm the routing path changes based on the traffic in the network.

Fault-tolerant vs. non Fault-tolerant In a fault-tolerant routing a routing

message is guaranteed to be delivered in the presence of faults. In a non fault-tolerant routing it is assumed that no fault may occur, and hence, there is no need for the routing algorithm to dynamically adjust its activities.

Redundant vs. non Redundant A typical routing algorithm is nonredundant, i.e.,

for each destination one copy of the message is forwarded. In certain cases a shared path is used to forward the routing message to several destinations. For the purpose of fault tolerance, multiple copies are set to a destination via multiple edge-disjoint paths. As long as one of these paths remains healthy at least one copy will successfully reach its destination. Each destination should make sure only one copy is accepted.

Deadlock-free vs. non Deadlock-free A deadlock-free routing ensures freedom

from deadlock through carefully designed routing algorithms. In a non deadlock-free routing no special provision is given to prevent or avoid the occurrence of a deadlock.

Routing functions The routing function defines how a message is routed from the source

node to the destination node. Destination-dependent This routing function depends on the current

and destination nodes only. Input-dependent This routing function depends on the current and

destination nodes and the adjacent link (or node) from which a message is received.

Source-dependent This routing function depends on the source, current, and destination nodes.

Path-dependent This routing function depends on the destination node the routing path from the source node to the current node.

Dijkstra’s centralized algorithm Let D(v) be the distance (sum of link

weights along a given path) from source s to node v. Let l(v,w) be the given cost between nodes v and w.

There are two parts to the algorithm: An initialization step and a step to be repeated until the algorithm terminates.

1 Initialization. Set N={s}. For each node v not in N, set D(v)=l(s,v). We use ∞ for nodes not connected to s. Any number larger than the maximum cost or distance in the network will suffice.

2 At each subsequent step. Find a node w not in N for which D(w) is a minimum and add w to N. Then update D(v) for all nodes remaining that are not in N by computing:

D(v)= min[D(v), D(w)+l(w,v)] Step 2 is repeated until all nodes are in N.

Ford’s distributed algorithm Each node v has the label (n,D(v)) where D(v) represents the current

value of the shortest distance from the node to the destination and n is the next node along with the currently computed shortest path.

1 Initialization. With node d being the destination node, set D(d)=0 and label all other nodes (., ∞).

2 Shortest-distance labeling of all nodes. For each node v<>d do the following: Update D(v) using the current value D(w) for each neighboring node w to calculate D(w)+l(w,v) and perform the following update:

D(v)=min{D(v), D(w)+l(w,v)}

An example

Dijkstra’s centralized algorithmRound N D(1) D(2) D(3) D(4)

Initial {P5} 20 2

1 {P5,P4} 3 4 2

2 {P5,P4,P2} 7 3 4 2

3 {P5,P4,P2,P3} 7 3 4 2

4 {P5,P4,P2,P3,P1} 7 3 4 2

Ford’s distributed algorithm

Round P1 P2 P3 P4

Initial (., ) (., ) (., ) (., )

1 (., ) (., ) (P5,20) (P5,2)

2 (P3,25) (P4,3) (P4,4) (P5,2)

3 (P2,7) (P4,3) (P4,4) (P5,2)

Unicasting in Special-Purpose Networks The routing algorithms in the previous

section are general and are suitable for all types of network topologies. However, they may not be efficient for special-purpose networks such as rings, meshes, and hypercubes.

Bidirectional rings Deterministic unicasting on a bidirectional ring is simple: a message

is forwarded along one direction (clockwise or counterclockwise) depending on the position of the destination.

In multiple-path routing two paths can be used: one along the clockwise direction and the other counterclockwise direction. Two copies of the routing message are sent, one to each direction; or the message is halved and each half is forwarded to a different direction.

Meshes Adaptive routing and XY routing in 2-d

Hypercubes The length of the shortest path between two nodes u and w is the

Hamming distance between u and w denoted as H(u,w). The number of shortest node-disjoint paths equals the Hamming

distance between the source and destination nodes. If the selection follows a predefined order, the routing is deterministic and is called e-cube routing.

The multiple-path routing in hypercubes is based on the following property: If two nodes s and d are separated by k-hamming-distance in an n-cube, there are n node-disjoint paths between nodes s and d. Out of these n paths k have a length of k and the remaining n-k have a length of k+2.

An example

110 111

000001

010 011

3 node-disjoint paths between 000 and 110:

Path 1: 000->100->110Path 2: 000->010->110Path 3: 000->001->011->111->110

000<-> 100

Path 1: 000->100Path 2: 000->001->101->100Path 3: 000->010->110->100

Broadcasting in Special-Purpose Networks - Rings Broadcasting in rings is: two copies of a message are sent

from both directions and they terminate at the two furthermost nodes, respectively. The total number of steps is half of the number of nodes.

One-port model: a node can only forward a copy of the message to one of its neighbors in one step.

All-port model: a node can forward a copy of the message to all its neighbors in one step.

Contention-free broadcasting in a wormhole-routed ring: one port For the one-port model, the best strategy is: the source s

sends the message to the furthermost node in the first step. Partition the ring into two equal halves with one node that has a copy of the message in each half. The above process is repeated until all the nodes have a copy. The total number of steps is log n.

Contention-free broadcasting in a wormhole-routed ring: all-port For the all-port model, using the cut-through model, the

source can send the message to two nodes that are n/3 distance away where n is the total number of nodes. In the next step each of three nodes sends the message to two nodes that are n/6 distance away. In general, after k steps 3^k nodes have a copy and each sends the message to two nodes that are n/3(k+1) distance away. Basically, this approach cuts a path into three subpaths of equal length with the center node of each subpath as the only node with a copy of the routing message.

Broadcasting in a wormhole-routed mesh: one-port

A broadcast with message-partition in 2-d meshes

Personalized broadcast of¼ message in one row

Broadcast of ¼ message incolumns

Collecting four ¼ messagesin each row.

Hypercubes

110 111

100 101

000 001

011010

110 111

100 101

000 001

011010

A broadcasting initiated from 000. A Hamiltonian cycle in a 3-cube.

Path-based Approach

Low-channel High-channelA multicast in a 4x4 mesh

U-mesh algorithmSource: (0,0) Destinations: (1,0), (1,1), (1,2), (1,3), (2,0), (2,1), and (3,2)

The lexicographical order of destinations and source is:(0,0), (1,0), (1,1), (1,2), (1,3), (2,0), (2,1), (3,2)

{(0,0), (1,0), (1,1), (1,2)} and {(1,3), (2,0), (2,1), (3,2)}

Virtual Channels

Positive network Negative network

Unidirection ring

Ch3Ch2

Ch1Ch0Cl0 Cl1

Cl2Cl3

Ch3 Ch2

Ch1 Ch0

Cl3 Cl2

Cl1 Cl0

Unidirection ring algorithm If the source address is larger than the destination

address, any channel can be used to start with; however, once a high (or low) channel is selected, the remaining steps should use high (or low) channels exclusively.

If the source address is smaller than the destination, high channels are used and high virtual channels are switched to low virtual channels after crossing node P3.

Turn model

Deadlock

Four turns allowed in XY-routing

Six turns allowed in positive-first routing

Six turns allowed in negative-first routing

Adaptivity of positive-first routing

Fully adaptive deterministic

Distributed Routing Algorithms. In a message passing distributed system, message passing is the only...

Documents

Transcript of Distributed Routing Algorithms. In a message passing distributed system, message passing is the only...

Ignoring Interprocessor Communication During Scheduling

Introduction to MPI · Introduction to MPI Preeti Malakar ... Message Passing Interface (MPI) •Standard for message passing in a distributed memory environment •Efforts began

Distributed Systems - CSE · Technology Solutions Lab Confidential and Proprietary 3 Models of Distributed Computing Computation Models Message-passing Systems Shared memory Systems

Distributed Message Passing for Large Scale Graphical Models Alexander Schwing Tamir Hazan Marc Pollefeys Raquel Urtasun CVPR2011.

1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)

Communication in Distributed Systems. Communication in Distributed Systems based on low level message passing offered by underlying network Three popular.

Modelling Hardware and Software for Fast Serial Interprocessor

Distributed Message Passing for Large Scale Graphical Models

MPI Message Passing Interface - Georgia Institute of ...echow/ipcc/hpc-course/HPC-mpi.pdf · MPI –Message Passing Interface MPI is used for distributed memory parallelism (communication

1 Distributed Systems: Message Passing, Clusters, and Implementation of Clusters in Representative Operating Systems.

BuildingaVirtualized(Distributed(Computing(Infrastructure ... · describe the InterGrid project) (www ... Communication Module Message-Passing Persistence DB Java Derby Management

1 Multiprocessors Computer Organization Prof. H. Yoon MULTIPROCESSORS Characteristics of Multiprocessors Interconnection Structures Interprocessor Arbitration.

d-VMP: Distributed Variational Message Passingalessandro/pgm/Martinez.pdf · d-VMP: Distributed Variational Message Passing Andrés R. Masegosa1, Ana M. Martínez2, Helge Langseth1,

Distributed Memory Programming Using Advanced MPI (Message Passing Interface)

Function-Passing Style - Lightbenddownloads.typesafe.com/.../ScalaDaysSF2015/T4_Miller_Function-Pa… · Function-Passing Style Typed, Distributed Functional Programming Philipp Haller

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 6: MESSAGE PASSING PROGRAMMING AND MPI.

Distributed Simulation Using a Real-Time Shared Memory Network · Distributed Simulation Using a Real-Time Shared Memory Network ... 2) various interprocessor communication ... The

DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team UCLAB@KHU.

Message Passing vs. Distributed Objects - UVa · Message Passing versus Distributed Objects ... communication is lost between the processes ... based on objects that exist in a distributed

Distributed Objects - ece.rutgers.eduirodero/classes/09-10/ece451-566/slides... · Message Passing versus Distributed Objects ... must be in direct communication with each other.