Post on 06-Nov-2020
Real Parallel Computers
Modular data centers
Background Information
Recent trends in the marketplace of
high performance computing
Strohmaier, Dongarra, Meuer, Simon
Parallel Computing 2005
Short history of parallel machines
• 1970s: vector computers
• 1990s: Massively Parallel Processors (MPPs)
– Standard microprocessors, special network and I/O
• 2000s:
– Cluster computers (using standard PCs)
– Advanced architectures (BlueGene)
– Comeback of vector computer (Japanese Earth Simulator)
– IBM Cell/BE
• 2010s:
– Multi-cores, GPUs
– Cloud data centers
Performance development and
predictions
Clusters
• Cluster computing
– Standard PCs/workstations connected by fast network
– Good price/performance ratio
– Exploit existing (idle) machines or use (new) dedicated
machines
• Cluster computers vs. supercomputers (MPPs)
– Processing power similar: based on microprocessors
– Communication performance was the key difference
– Modern networks have bridged this gap
• (Myrinet, Infiniband, 10G Ethernet)
Overview
• Cluster computers at our department
– DAS-1: 128-node Pentium-Pro / Myrinet cluster (gone)
– DAS-2: 72-node dual-Pentium-III / Myrinet-2000 cluster
– DAS-3: 85-node dual-core dual Opteron / Myrinet-10G
– DAS-4: 72-node cluster with accelerators (GPUs etc.)
• Part of a wide-area system:
– Distributed ASCI Supercomputer
Distributed ASCI Supercomputer
(1997-2001)
DAS-2 Cluster (2002-2006)
• 72 nodes, each with 2 CPUs (144 CPUs in total)
• 1 GHz Pentium-III
• 1 GB memory per node
• 20 GB disk
• Fast Ethernet 100 Mbit/s
• Myrinet-2000 2 Gbit/s (crossbar)
• Operating system: Red Hat Linux
• Part of wide-area DAS-2 system (5 clusters with 200 nodes in total)
Myrinet switch
Ethernet switch
DAS-3 Cluster (Sept. 2006)
• 85 nodes, each with 2 dual-core CPUs (340 cores in total)
• 2.4 GHz AMD Opterons (64 bit)
• 4 GB memory per node
• 250 GB disk
• Gigabit Ethernet
• Myrinet-10G 10 Gb/s (crossbar)
• Operating system: Scientific Linux
• Part of wide-area DAS-3 system (5 clusters; 263 nodes), using SURFnet-6 optical network with 40-80 Gb/s wide-area links
DAS-3 Networks
Nortel 5530 + 3 * 5510
ethernet switch 85 compute nodes
85 * 1 Gb/s ethernet
Myri-10G switch
85 * 10 Gb/s Myrinet
10 Gb/s ethernet blade
8 * 10 Gb/s eth (fiber)
Nortel OME 6500
with DWDM blade
80 Gb/s DWDM
SURFnet6
1 or 10 Gb/s Campus uplink
Headnode
(10 TB mass storage)
10 Gb/s Myrinet
10 Gb/s ethernet
Myrinet
Nortel
DAS-3 Networks
DAS-4 • 72 nodes (2 quad-core Intel Westmere Xeon E5620,
24 GB memory, 2 TB disk)
• 2 fat nodes with 94 GB memory
• Infiniband network + 1 Gb/s Ethernet
• 16 NVIDIA GTX 480 graphics accelerators (GPUs)
• 2 Tesla C2050 GPUs
DAS-4 performance
• Infiniband network:
• - One-way latency: 1.9 microseconds
• - Throughput: 22 Gbit/s
• CPU performance:
• - 72 nodes (576 cores): 4399.0 GFLOPS
Blue Gene/L Supercomputer
Blue Gene/L
2.8/5.6 GF/s
4 MB
2 processors
2 chips, 1x2x1
5.6/11.2 GF/s
1.0 GB
(32 chips 4x4x2)
16 compute, 0-2 IO cards
90/180 GF/s
16 GB
32 Node Cards
2.8/5.6 TF/s
512 GB
64 Racks, 64x32x32
180/360 TF/s
32 TB
Rack
System
Node Card
Compute Card
Chip
Blue Gene/L Networks 3 Dimensional Torus
– Interconnects all compute nodes (65,536)
– Virtual cut-through hardware routing
– 1.4Gb/s on all 12 node links (2.1 GB/s per node)
– 1 µs latency between nearest neighbors, 5 µs to the farthest
– Communications backbone for computations
– 0.7/1.4 TB/s bisection bandwidth, 68TB/s total bandwidth
Global Collective
– One-to-all broadcast functionality
– Reduction operations functionality
– 2.8 Gb/s of bandwidth per link
– Latency of one way traversal 2.5 µs
– Interconnects all compute and I/O nodes (1024)
Low Latency Global Barrier and Interrupt
– Latency of round trip 1.3 µs
Ethernet
– Incorporated into every node ASIC
– Active in the I/O nodes (1:8-64)
– All external comm. (file I/O, control, user interaction, etc.)
Control Network