Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80...

17
Real Parallel Computers

Transcript of Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80...

Page 1: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

Real Parallel Computers

Page 2: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

Modular data centers

Page 3: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

Background Information

Recent trends in the marketplace of

high performance computing

Strohmaier, Dongarra, Meuer, Simon

Parallel Computing 2005

Page 4: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

Short history of parallel machines

• 1970s: vector computers

• 1990s: Massively Parallel Processors (MPPs)

– Standard microprocessors, special network and I/O

• 2000s:

– Cluster computers (using standard PCs)

– Advanced architectures (BlueGene)

– Comeback of vector computer (Japanese Earth Simulator)

– IBM Cell/BE

• 2010s:

– Multi-cores, GPUs

– Cloud data centers

Page 5: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

Performance development and

predictions

Page 6: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

Clusters

• Cluster computing

– Standard PCs/workstations connected by fast network

– Good price/performance ratio

– Exploit existing (idle) machines or use (new) dedicated

machines

• Cluster computers vs. supercomputers (MPPs)

– Processing power similar: based on microprocessors

– Communication performance was the key difference

– Modern networks have bridged this gap

• (Myrinet, Infiniband, 10G Ethernet)

Page 7: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

Overview

• Cluster computers at our department

– DAS-1: 128-node Pentium-Pro / Myrinet cluster (gone)

– DAS-2: 72-node dual-Pentium-III / Myrinet-2000 cluster

– DAS-3: 85-node dual-core dual Opteron / Myrinet-10G

– DAS-4: 72-node cluster with accelerators (GPUs etc.)

• Part of a wide-area system:

– Distributed ASCI Supercomputer

Page 8: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

Distributed ASCI Supercomputer

(1997-2001)

Page 9: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

DAS-2 Cluster (2002-2006)

• 72 nodes, each with 2 CPUs (144 CPUs in total)

• 1 GHz Pentium-III

• 1 GB memory per node

• 20 GB disk

• Fast Ethernet 100 Mbit/s

• Myrinet-2000 2 Gbit/s (crossbar)

• Operating system: Red Hat Linux

• Part of wide-area DAS-2 system (5 clusters with 200 nodes in total)

Myrinet switch

Ethernet switch

Page 10: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

DAS-3 Cluster (Sept. 2006)

• 85 nodes, each with 2 dual-core CPUs (340 cores in total)

• 2.4 GHz AMD Opterons (64 bit)

• 4 GB memory per node

• 250 GB disk

• Gigabit Ethernet

• Myrinet-10G 10 Gb/s (crossbar)

• Operating system: Scientific Linux

• Part of wide-area DAS-3 system (5 clusters; 263 nodes), using SURFnet-6 optical network with 40-80 Gb/s wide-area links

Page 11: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

DAS-3 Networks

Nortel 5530 + 3 * 5510

ethernet switch 85 compute nodes

85 * 1 Gb/s ethernet

Myri-10G switch

85 * 10 Gb/s Myrinet

10 Gb/s ethernet blade

8 * 10 Gb/s eth (fiber)

Nortel OME 6500

with DWDM blade

80 Gb/s DWDM

SURFnet6

1 or 10 Gb/s Campus uplink

Headnode

(10 TB mass storage)

10 Gb/s Myrinet

10 Gb/s ethernet

Page 12: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

Myrinet

Nortel

DAS-3 Networks

Page 13: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

DAS-4 • 72 nodes (2 quad-core Intel Westmere Xeon E5620,

24 GB memory, 2 TB disk)

• 2 fat nodes with 94 GB memory

• Infiniband network + 1 Gb/s Ethernet

• 16 NVIDIA GTX 480 graphics accelerators (GPUs)

• 2 Tesla C2050 GPUs

Page 14: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

DAS-4 performance

• Infiniband network:

• - One-way latency: 1.9 microseconds

• - Throughput: 22 Gbit/s

• CPU performance:

• - 72 nodes (576 cores): 4399.0 GFLOPS

Page 15: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

Blue Gene/L Supercomputer

Page 16: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

Blue Gene/L

2.8/5.6 GF/s

4 MB

2 processors

2 chips, 1x2x1

5.6/11.2 GF/s

1.0 GB

(32 chips 4x4x2)

16 compute, 0-2 IO cards

90/180 GF/s

16 GB

32 Node Cards

2.8/5.6 TF/s

512 GB

64 Racks, 64x32x32

180/360 TF/s

32 TB

Rack

System

Node Card

Compute Card

Chip

Page 17: Real Parallel Computersbal/college12/ParallelMachines12.pdf · Nortel OME 6500 with DWDM blade 80 Gb/s DWDM SURFnet6 1 or 10 Gb/s Campus uplink Headnode (10 TB mass storage) 10 Gb/s

Blue Gene/L Networks 3 Dimensional Torus

– Interconnects all compute nodes (65,536)

– Virtual cut-through hardware routing

– 1.4Gb/s on all 12 node links (2.1 GB/s per node)

– 1 µs latency between nearest neighbors, 5 µs to the farthest

– Communications backbone for computations

– 0.7/1.4 TB/s bisection bandwidth, 68TB/s total bandwidth

Global Collective

– One-to-all broadcast functionality

– Reduction operations functionality

– 2.8 Gb/s of bandwidth per link

– Latency of one way traversal 2.5 µs

– Interconnects all compute and I/O nodes (1024)

Low Latency Global Barrier and Interrupt

– Latency of round trip 1.3 µs

Ethernet

– Incorporated into every node ASIC

– Active in the I/O nodes (1:8-64)

– All external comm. (file I/O, control, user interaction, etc.)

Control Network