UE SYSTEMC – Cours 5 SystemC + Logiciel embarqué [email protected] [email protected].
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie...
-
Upload
coleen-poole -
Category
Documents
-
view
213 -
download
0
Transcript of The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie...
The MPC Parallel Computer
Hardware, Low-level Protocols and
Performances
University P. & M. Curie (PARIS)
LIP6 laboratory
Olivier Glück
Introduction
Very low cost and high performance parallel computer
PC cluster using optimized interconnection network
A PCI network board (FastHSL) developed at LIP6 :
High speed communication network (HSL,1 Gbit/s)
RCUBE : router (8x8 crossbar, 8 HSL ports)
PCIDDC : PCI network controller (a specific
communication protocol)
Goal : supply efficient soft layers
Hardware architecture
R3PCI
DDC
R3PCI
DDC
Standard PC running
LINUX or FreeBSD
FastHSL boards
Standard PC running
LINUX or FreeBSD
The MPC machine
The FastHSL board
Hardware layers
HSL link (1 Gbit/s)
coaxial cable, point to point, full duplex
data encoded on 12 bits
low-level flow control
RCUBE
Rapid Reconfigurable Router, extensibility
Latency : 150 ns
wormhole strategy, interval routing schemes
PCIDDC
the network interface controller
implements communication protocol : Remote DMA
zero copy
Low-level communication protocol
Zero-copy protocol (Direct
deposit protocol)
FastHSL board accesses directly
to host memory
Process
Memory
Process
Memory
Sender Receiver
Process
Memory
Kernel
Memory
I/O
Memory
I/O
Memory
Kernel
Memory
Process
Memory
PUT : the lowest level software API
Unix based layer : FreeBSD or Linux
Zero-copy strategy
Provides a basic kernel API using the PCIDDC remote-write
Parameters of a PUT() call : remote node local physical address remote physical address size of data message identifier callback functions for signaling
PUT performances
PC Pentium II 350MHz
Throughput : 494 Mbit/s
Half-throughput : 66 bytes
Latency : 4 µs (without system call)
0
100
200
300
400
500
600
1 10 100 1000 10000 100000
Size (bytes)
Th
rou
gh
pu
t (M
bit
/s)
MPI over MPC
HSL Network
Free BSD or LINUX Driver
MPI
PUT
Implementation of MPICH
over PUT API
MPI implementation (1)
2 main problems :
Where to write data in remote physical memory ?
PUT only transfers contiguous blocks in physical
memory
2 kinds of messages :
control or short messages
data messages
MPI implementation (2)
Short (or control) messages :
Control information or limited-size user data
Use allocated buffers at starting time, contiguous
in physical memory
One memory copy in emission and reception
MPI implementation (3) Data messages :
transfer data larger than the maximum size of a control message or for specific MPI functions (e.g. MPI_Ssend)
RDV protocol
manage zero-copy transfer
Rendez-vous protocol
Sender Receiver
ctl
ack
ack
data
data
MPI performances (1)
Latency : 26 µs Throughput : 490 Mbit/s
Throughput : MPI-MPC P350
0
100
200
300
400
500
600
Size (byte)
Th
rou
gh
pu
t (M
bit
/s)
MPI-MPC / P350 / FreeBSD
MPI performances (2)Throughput (Log2) : Cray-T3E & MPC
-4
-2
0
2
4
6
8
10
12
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
4
3276
8
6553
6
1310
72
2621
44
Size (bytes)
Th
rou
gh
pu
t (b
as
e 2
)
MPI-T3E / Proc 300
MPI-MPC / P350 / FreeBSD
Cray Latency : 57 µs Throughput : 1200 Mbit/s
MPC Latency : 26 µs Throughput : 490 Mbit/s
MPI performances (3)
Throughput : MPI-BIP & MPI-MPC
0
50
100
150
200
250
300
350
400
450
1 4 16 64 256 1024 4096 16384 65536
Size (bytes)
Th
rou
gh
pu
t (M
b/s
) MPI-BIP / P200 / Linux
MPI-MPC / P166 / Linux
Conclusion
MPC : a very low cost PC clusters
Performances : similar to Myrinet clusters
Very good extensibility (no centralized router)
Perspectives :
a new router
an another network controller
improvements in MPI over MPC