Which, why and how do they compare in our...
Transcript of Which, why and how do they compare in our...
![Page 1: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/1.jpg)
MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE
08.07.2018 I MUG’18, COLUMBUS (OH) I DAMIAN ALVAREZ
Which, why and how do they compare in our systems?
•
![Page 2: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/2.jpg)
MPI RUNTIMES AT JSC
• FZJ’ mission • JSC’s role • JSC’s vision for Exascale-era computing • JSC’s systems • MPI runtimes at JSC • MPI performance at JSC systems • MVAPICH at JSC
Outline
07 September 2018 !2
![Page 3: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/3.jpg)
MPI RUNTIMES AT JSCFZJ’s mission
07 September 2018 !3
![Page 4: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/4.jpg)
MPI RUNTIMES AT JSCFZJ’s mission
07 September 2018 !4
JSC
INMIBG
IBG
ICS JCNSIC
S
ICS
IEK
IEK
IEK
IEK
IEK
IEK
IKP
ZEA
ZEA
![Page 5: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/5.jpg)
MPI RUNTIMES AT JSCJSC’s role
07 September 2018 !5
•Supercomputer operation for: • Centre – FZJ • Region – RWTH Aachen University • Germany – Gauss Centre for Supercomputing
John von Neumann Institute for Computing • Europe – PRACE, EU projects
•Application support • Unique support & research environment at JSC • Peer review support and coordination
•R&D work • Methods and algorithms, computational science,
performance analysis and tools • Scientific Big Data Analytics with HPC • Computer architectures, Co-Design
Exascale Labs together with IBM, Intel, NVIDIA •Education and training
![Page 6: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/6.jpg)
MPI RUNTIMES AT JSCJSC’s role
07 September 2018 !6
CommunitiesResearch Groups
Simulation Labs
Cross-Sectional Teams Data Life Cycle Labs Exascale co-Design
Facilities
PADC
![Page 7: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/7.jpg)
MPI RUNTIMES AT JSCJSC’s role
07 September 2018 !7
Source: Jack Dongarra, 30 Years of Supercomputing: History of TOP500, February 2018
![Page 8: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/8.jpg)
MPI RUNTIMES AT JSCJSC’s role
07 September 2018 !8
Source: Jack Dongarra, 30 Years of Supercomputing: History of TOP500, February 2018
JSC will host the first Exascale system in the EU around 2022
![Page 9: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/9.jpg)
MPI RUNTIMES AT JSCJSC’s vision
07 September 2018 !9
![Page 10: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/10.jpg)
MPI RUNTIMES AT JSCJSC’s vision
07 September 2018 !10
![Page 11: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/11.jpg)
MPI RUNTIMES AT JSCJSC’s vision
07 September 2018 !11
Extreme Scale Computing
Big Data Analytics
Deep Learning
Interactivity
![Page 12: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/12.jpg)
MPI RUNTIMES AT JSCJSC’s vision
07 September 2018 !12
Module 2: GPU-Acc
Module 3: Many core Booster
Module 1: Cluster
BN
BN
BN
BN
BN BN
BN
BN
BN
CN
CN
Module 4: Memory Booster
NAM
NAM
NAM
NICMEM
NICMEM
…
Module 5: Data Analytics
DN DN
Module 6: Graphics Booster
GN GN
Module 0: Storage
GA
GA
GA
GA
GA GA
DiskDiskDisk Disk
![Page 13: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/13.jpg)
MPI RUNTIMES AT JSCJSC’s vision
07 September 2018 !13
General Purpose Cluster
IBM Power 6 JUMP, 9 TFlop/s
IBM Blue Gene/P JUGENE, 1 PFlop/s
HPC-FF100 TFlop/s
JUROPA200 TFlop/s
IBM Blue Gene/QJUQUEEN (2012)5.9 PFlop/s
IBM Blue Gene/LJUBL, 45 TFlop/s
IBM Power 4+JUMP (2004), 9 TFlop/s
Highly scalable
Modular PilotSystem
Modular Supercomputer JUWELS Scalable Module (2019/20)50+ PFlop/s
JUWELS Cluster Module (2018)12 PFlop/s
JURECA Cluster (2015) 2.2 PFlop/s
JURECA Booster (2017) 5 PFlop/s
![Page 14: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/14.jpg)
MPI RUNTIMES AT JSCJSC’s systems
07 September 2018 !14
JURECA
▪ Dual-socket Intel Haswell (E5-2680 v3) ▪ 12× cores/socket ▪ 2.5 GHz ▪ ≥ 128 GB main memory
▪ 1,884 compute nodes (45,216 cores) ▪ 75 nodes: 2× K80 NVIDIA GPUs ▪ 12 nodes: 2 × K40 NVIDIA GPUs
512 GB main memory
▪ Peak performance: 2.2 Petaflop/s (1.7 w/o GPUs)
▪ Mellanox InfiniBand EDR
▪ Connected to the GPFS file system on JUST ▪ ~15 PByte online disk and ▪ 100 PByte offline tape capacity
![Page 15: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/15.jpg)
MPI RUNTIMES AT JSCJSC’s systems
07 September 2018 !15
JURECA Booster
▪ Intel Knights Landing (7250-F) ▪ 68× cores ▪ 1.4 GHz ▪ 16 + 64 GB main memory
▪ 1,640 compute nodes (111,520 cores)
▪ Peak performance: ~5 Petaflop/s
▪ Intel OmniPath
▪ Connected to GPFS
June 2018: #14 in Europe #38 worldwide #66 in Green500
+JURECA
![Page 16: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/16.jpg)
MPI RUNTIMES AT JSCJSC’s systems
07 September 2018 !16
JUWELS
▪ Dual-socket Intel Skylake (Xeon Platinum 8168) ▪ 24× cores/socket ▪ 2.7 GHz ▪ ≥ 96 GB main memory
▪ 2,559 compute nodes (122,448 cores) ▪ 48 nodes: 4× V100 NVIDIA GPUs, 192 GB main memory ▪ 4 nodes: 1× P100 NVIDIA GPUs, 768 GB main memory
▪ Peak performance: 12 Petaflop/s (10.4 w/o GPUs)
▪ Mellanox InfiniBand EDR
▪ Connected to the GPFS file system on JUST
June 2018: #7 in Europe #23 worldwide #29 in Green500
![Page 17: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/17.jpg)
MPI RUNTIMES AT JSC
• ParaStationMPI • Preferred MPI runtime • Developed in a consortium including JSC, ParTec, KIT and University of Wüppertal • Based on MPICH • Intel + GCC
MPI runtimes
07 September 2018 !17
![Page 18: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/18.jpg)
MPI RUNTIMES AT JSC
• ParaStationMPI • For the JURECA Cluster-Booster System, the ParaStation MPI Gateway Protocol bridges between Mellanox EDR and
Intel Omni-Path
• In general, the ParaStation MPI Gateway Protocol can connect any two low-level networks supported by pscom.
• Implemented using the psgw plugin to pscom, working together with instances of the psgwd, the ParaStation MPI Gateway daemon.
MPI runtimes
07 September 2018 !18
![Page 19: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/19.jpg)
MPI RUNTIMES AT JSC
• IntelMPI • As back up of ParaStationMPI • Software stack mirrored between these 2 MPI runtimes • Intel compiler
• MVAPICH2-GDR • Purely for GPU nodes (but can run in normal nodes) • PGI + GCC (and Intel in the past)
MPI runtimes
07 September 2018 !19
![Page 20: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/20.jpg)
MPI RUNTIMES AT JSC
• Microbenchmarks: Intel MPI Benchmarks • PingPong • Broadcast, gather, scatter and allgather
• ParaStationMPI, Intel MPI 2018.2, Intel MPI 2019 beta, MVAPICH2 and OpenMPI
• JURECA, Booster and JUWELS
• 1 MPI process per node. Average times. No cache disabling.
MPI performance
07 September 2018 !20
![Page 21: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/21.jpg)
MPI RUNTIMES AT JSCMPI performance
07 September 2018 !21
Jureca Booster JuwelsParaStationMPI Verbs PSM2 UCXIntel 2018 Default (DAPL+Verbs) Default (PSM2) Default (DAPL+Verbs)Intel 2019 Default (libfabric+Verbs) Default (libfabric+PSM2) Default (libfabric+verbs)MVAPICH2-GDR Verbs - -
MVAPICH2 Verbs PSM2 VerbsOpenMPI Default (UCX, Verbs and
libfabric+Verbs enabled)- -
![Page 22: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/22.jpg)
MPI RUNTIMES AT JSCMPI performance (PingPong)
07 September 2018 !22
Jureca Booster Juwels
![Page 23: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/23.jpg)
MPI RUNTIMES AT JSCMPI performance (PingPong)
07 September 2018 !23
Jureca Booster Juwels
![Page 24: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/24.jpg)
MPI RUNTIMES AT JSCMPI performance (Broadcast, msg size 1 byte)
07 September 2018 !24
Jureca Booster Juwels
![Page 25: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/25.jpg)
MPI RUNTIMES AT JSCMPI performance (Broadcast, msg size 4MB)
07 September 2018 !25
Jureca Booster Juwels
![Page 26: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/26.jpg)
MPI RUNTIMES AT JSCMPI performance (Scatter, msg size 1 byte)
07 September 2018 !26
Jureca Booster Juwels
![Page 27: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/27.jpg)
MPI RUNTIMES AT JSCMPI performance (Scatter, msg size 2MB)
07 September 2018 !27
Jureca Booster Juwels
![Page 28: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/28.jpg)
MPI RUNTIMES AT JSCMPI performance (Gather, msg size 1 byte)
07 September 2018 !28
Jureca Booster Juwels
![Page 29: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/29.jpg)
MPI RUNTIMES AT JSCMPI performance (Gather, msg size 2MB)
07 September 2018 !29
Jureca Booster Juwels
![Page 30: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/30.jpg)
MPI RUNTIMES AT JSC
PEPC performance • N-Body code • Asynchronous communication based in a communicating thread spawned using pthreads • 2 algorithms tested • Single particle processing with pthreads • Tile processing using OpenMP tasks
• Experiments in JURECA using 16 nodes, 4 MPI ranks per node and 12 computing threads per MPI rank (SMT enabled)
• Violin plots show average time per timestep and its distribution
MPI performance
07 September 2018 !30
![Page 31: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/31.jpg)
MPI RUNTIMES AT JSC
• PEPC performance with single particle processing
MPI performance
07 September 2018 !31
Plot: D
irk Bröm
mel
![Page 32: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/32.jpg)
MPI RUNTIMES AT JSC
• PEPC performance with tile processing
MPI performance
07 September 2018 !32
Plot: D
irk Bröm
mel
![Page 33: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/33.jpg)
MPI RUNTIMES AT JSC
• Generally it’s been a positive experience.
MVAPICH at JSC
07 September 2018 !33
• The fact that MVAPICH2-GDR is a binary only package poses some annoyances • Need to ask the MVAPICH2 team for binaries for particular compiler versions • No icc/icpc/ifort version • Patching of compiler wrappers (xFLAGS used during compilation leak into them)
• The Deep Learning community is pushing for OpenMPI as CUDA-Aware MPI of choice in our systems
But (as the guy in charge of installing all software in our systems)….
![Page 34: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA](https://reader034.fdocuments.in/reader034/viewer/2022050600/5fa772676056bb5cf26b3d22/html5/thumbnails/34.jpg)
THANK YOU!