The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie...

17
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück

Transcript of The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie...

Page 1: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

The MPC Parallel Computer

Hardware, Low-level Protocols and

Performances

University P. & M. Curie (PARIS)

LIP6 laboratory

Olivier Glück

Page 2: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

Introduction

Very low cost and high performance parallel computer

PC cluster using optimized interconnection network

A PCI network board (FastHSL) developed at LIP6 :

High speed communication network (HSL,1 Gbit/s)

RCUBE : router (8x8 crossbar, 8 HSL ports)

PCIDDC : PCI network controller (a specific

communication protocol)

Goal : supply efficient soft layers

Page 3: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

Hardware architecture

R3PCI

DDC

R3PCI

DDC

Standard PC running

LINUX or FreeBSD

FastHSL boards

Standard PC running

LINUX or FreeBSD

Page 4: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

The MPC machine

Page 5: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

The FastHSL board

Page 6: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

Hardware layers

HSL link (1 Gbit/s)

coaxial cable, point to point, full duplex

data encoded on 12 bits

low-level flow control

RCUBE

Rapid Reconfigurable Router, extensibility

Latency : 150 ns

wormhole strategy, interval routing schemes

PCIDDC

the network interface controller

implements communication protocol : Remote DMA

zero copy

Page 7: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

Low-level communication protocol

Zero-copy protocol (Direct

deposit protocol)

FastHSL board accesses directly

to host memory

Process

Memory

Process

Memory

Sender Receiver

Process

Memory

Kernel

Memory

I/O

Memory

I/O

Memory

Kernel

Memory

Process

Memory

Page 8: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

PUT : the lowest level software API

Unix based layer : FreeBSD or Linux

Zero-copy strategy

Provides a basic kernel API using the PCIDDC remote-write

Parameters of a PUT() call : remote node local physical address remote physical address size of data message identifier callback functions for signaling

Page 9: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

PUT performances

PC Pentium II 350MHz

Throughput : 494 Mbit/s

Half-throughput : 66 bytes

Latency : 4 µs (without system call)

0

100

200

300

400

500

600

1 10 100 1000 10000 100000

Size (bytes)

Th

rou

gh

pu

t (M

bit

/s)

Page 10: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

MPI over MPC

HSL Network

Free BSD or LINUX Driver

MPI

PUT

Implementation of MPICH

over PUT API

Page 11: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

MPI implementation (1)

2 main problems :

Where to write data in remote physical memory ?

PUT only transfers contiguous blocks in physical

memory

2 kinds of messages :

control or short messages

data messages

Page 12: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

MPI implementation (2)

Short (or control) messages :

Control information or limited-size user data

Use allocated buffers at starting time, contiguous

in physical memory

One memory copy in emission and reception

Page 13: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

MPI implementation (3) Data messages :

transfer data larger than the maximum size of a control message or for specific MPI functions (e.g. MPI_Ssend)

RDV protocol

manage zero-copy transfer

Rendez-vous protocol

Sender Receiver

ctl

ack

ack

data

data

Page 14: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

MPI performances (1)

Latency : 26 µs Throughput : 490 Mbit/s

Throughput : MPI-MPC P350

0

100

200

300

400

500

600

Size (byte)

Th

rou

gh

pu

t (M

bit

/s)

MPI-MPC / P350 / FreeBSD

Page 15: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

MPI performances (2)Throughput (Log2) : Cray-T3E & MPC

-4

-2

0

2

4

6

8

10

12

1 2 4 8 16 32 64 128

256

512

1024

2048

4096

8192

1638

4

3276

8

6553

6

1310

72

2621

44

Size (bytes)

Th

rou

gh

pu

t (b

as

e 2

)

MPI-T3E / Proc 300

MPI-MPC / P350 / FreeBSD

Cray Latency : 57 µs Throughput : 1200 Mbit/s

MPC Latency : 26 µs Throughput : 490 Mbit/s

Page 16: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

MPI performances (3)

Throughput : MPI-BIP & MPI-MPC

0

50

100

150

200

250

300

350

400

450

1 4 16 64 256 1024 4096 16384 65536

Size (bytes)

Th

rou

gh

pu

t (M

b/s

) MPI-BIP / P200 / Linux

MPI-MPC / P166 / Linux

Page 17: The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.

Conclusion

MPC : a very low cost PC clusters

Performances : similar to Myrinet clusters

Very good extensibility (no centralized router)

Perspectives :

a new router

an another network controller

improvements in MPI over MPC