Communication operations Efficient Parallel Algorithms COMP308.

25
Communication operations Efficient Parallel Algorithms COMP308
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of Communication operations Efficient Parallel Algorithms COMP308.

Page 1: Communication operations Efficient Parallel Algorithms COMP308.

Communication operations

Efficient Parallel Algorithms

COMP308

Page 2: Communication operations Efficient Parallel Algorithms COMP308.

Communication time

Communication requires 3 costs:

1. Static start up time (ts):– It is the time required to handle a message at the sending

processor

2. Per-hop time (th) with l the #Links that the message passes– It is take a finite amount of time to reach the next processor in

its path after a message leaves a processor.

3. Per-word transfer time (tw): with m the #bytes– If the channel bandwidth is r words per second, then each

word takes time tw=1/r to traverse the link.

Page 3: Communication operations Efficient Parallel Algorithms COMP308.

There are 2 main communication schemes:

Page 4: Communication operations Efficient Parallel Algorithms COMP308.

“store and forward” vs “cut-through” In “store and forward” routing, when a message is

traversing a path with multiple links, each intermediate node on the path forwards the message to the next node after it has received.

In “cut-through” routing an intermediate nodes does not wait for the entire message to arrive before forwarding it. – A tracer is first sent from the source to the designation node to

establish a connection. – Once a connection has been established, the flits are sent one

after the other. All flits follows the same path in a dovetailed fashion.

– As soon as a flit is received at an intermediate node, the flit is passed on to the next node.

Page 5: Communication operations Efficient Parallel Algorithms COMP308.

One to All Broadcast

Initially, only the source processor has the data of size m that need to be broadcast. At the end of the termination of the procedure, there are P copies of the initial data, one residing at each processor.

Page 6: Communication operations Efficient Parallel Algorithms COMP308.

Broadcast on ring (Store and Forward)

If the sender sends the messages consecutively to the p-1 other processors, it takes p-1 steps.

By optimisation, we can reduce this to p/2 steps.

Eg.: a 8-processor ring requires 4 steps

Page 7: Communication operations Efficient Parallel Algorithms COMP308.

NS diagram for “broadcast on ring”

Page 8: Communication operations Efficient Parallel Algorithms COMP308.

Ring network, Cut-Through routing With cut-through routing, messages can be sent faster to

nodes that are multiple hops away in the network. By using this, we send the message first to the outermost node.

In general, in a p-processor ring the source processor first sends the data to the processor at distance p/2, then both processors sends the message to the processors at distance of p/4 in the same direction, then to p/8, etc.

Page 9: Communication operations Efficient Parallel Algorithms COMP308.

Broadcast on mesh (Store and Forward)

Most of the optimised communication algorithms on a mesh are simple extensions of their ring counterparts, by consecutively applying the ring algorithm on each dimension of the mesh.

Page 10: Communication operations Efficient Parallel Algorithms COMP308.

Hypercube

The regular binary structure of the hypercube plays an important role in optimising communication.

Here, a broadcast is performed by sending the message along each dimension at each step. This results in log p or d steps.

It can be proved easily that log p is the minimal number of steps for every network.

Page 11: Communication operations Efficient Parallel Algorithms COMP308.

Hypercube

Important properties of the networks:– Small degree,– Small diameter,– Regular recursive structure,– Easy way to embed trees, etc

Hypercube – two nodes connected if they are differ precisely on one bit

Page 12: Communication operations Efficient Parallel Algorithms COMP308.

Hypercube – two nodes connected if they are differ precisely on one bit

Page 13: Communication operations Efficient Parallel Algorithms COMP308.

0 1

00 01

10 11

000 001

010 011

100 101

110 111

0000 0001

0010 0011

0100 0101

0110 0111

1000 1001

1010 1011

1100 1101

1110 1111

Page 14: Communication operations Efficient Parallel Algorithms COMP308.

1000 001

1010 011

1100 1101

1110 1111

0000 0001

0010 0011

0100 0101

0110 0111

Page 15: Communication operations Efficient Parallel Algorithms COMP308.
Page 16: Communication operations Efficient Parallel Algorithms COMP308.

Broadcast on hypercube (S&F)

Page 17: Communication operations Efficient Parallel Algorithms COMP308.

Broadcast on ring (Cut-Through )

Page 18: Communication operations Efficient Parallel Algorithms COMP308.

Broadcast on mesh (C-T)

Page 19: Communication operations Efficient Parallel Algorithms COMP308.

Broadcast on binary tree (C-T)

Page 20: Communication operations Efficient Parallel Algorithms COMP308.

Gossiping

All-to-All Communication

Page 21: Communication operations Efficient Parallel Algorithms COMP308.

Gossiping on Ring (Store and Forward)

Page 22: Communication operations Efficient Parallel Algorithms COMP308.

Gossiping on Mesh (Store and Forward)

Page 23: Communication operations Efficient Parallel Algorithms COMP308.

Gossiping on Hypercube (S&F)

Page 24: Communication operations Efficient Parallel Algorithms COMP308.

Gossiping on Ring (and Mesh)Cut-Through Routing

Each process sends m(p-1) words of data because it has an m-word packet for every other processor

The average distance that an m word packet travels is

Since there are p processors, each performing the same type of communication, the total traffic on the network is

The total number of communication channels in the network to share this load is p.

21

1

1 p

p

ip

i

pp

pm 2

)1(

2

)1(

2)1(2

)1(2

ppmt

ppm

p

ppm

w

Hence this procedure cannot be improved by using CT routing

Page 25: Communication operations Efficient Parallel Algorithms COMP308.

Gossiping on Hypercube (CT routing)