Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The...
Transcript of Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The...
![Page 1: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/1.jpg)
Technology emerging from the DEEP & DEEP-ER projects
Estela Suarez
Jülich Supercomputing Centre
03.06.2016
The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under Grant Agreement n° 287530 and n° 610476
![Page 2: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/2.jpg)
DEEP
• Cluster-Booster archit.
• Software stack
• Programming environ.
• Energy efficiency
• Applications:
– Co-design
– Evaluation/demonstration
– Code modernisation
DEEP-ER
• Extend memory hierarchy
• High-performance I/O
• Scalable resiliency
• Applications:
– Co-design
– Evaluation/demonstration
– Code modernisation
2
Topics
![Page 3: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/3.jpg)
CLUSTER-BOOSTER ARCHITECTURE
3
![Page 4: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/4.jpg)
“Standard” heterogeneity
CN
CN
CN
InfiniBand
CN
CN
CN
GPU
GPU
GPU
GPU
GPU
GPU
Flat topology
Simple management of
resources
Static assignment of
accelerators to CPUs
Accelerators cannot act
autonomously
4
![Page 5: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/5.jpg)
CN
CN
CN
InfiniBand
Cluster
Cluster-Booster architecture
5
BI BN
BI BN
BI BN
BN
BN
BN
BN
BN
BN
EXTOLL
Booster
Flexible assignment of resources (CPUs, accelerators)
Direct communication between accelerators
“Offload” of large and complex parts of applications
![Page 6: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/6.jpg)
DEEP Architecture
6
Intel® Xeon ®
Intel ® Xeon PhiTM
![Page 7: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/7.jpg)
DEEP System
• Installed at JSC
• 1,5 racks
• 500 TFlop/s
peak perf.
• 3.5 GFlop/s/W
• Water cooled
7
Cluster
(128 Xeon)Booster
(384 Xeon Phi
KNC)
![Page 8: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/8.jpg)
Node Card Interface Card
8
Booster main components
![Page 9: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/9.jpg)
Booster Nodes (KNC) from 1 to 384
Bo
ost
er
No
de
s (K
NC
) fr
om
1 t
o 3
84
Booster measurementsMPI Linktest: ping-pong
15
Latency: 870 us
BW: 1.3 MB/s
PCIe BW-
issue in 3
nodes
![Page 10: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/10.jpg)
Booster Nodes (KNC) from 1 to 384
Bo
ost
er
No
de
s (K
NC
) fr
om
1 t
o 3
84
Booster measurementsMPI Linktest: ping-pong
16
Stddev < 10%
![Page 11: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/11.jpg)
Main EXTOLL characteristics– Direct network: no switches required
– Integrates network interface controller
– Supports 6+1 links
– Capable of tunneling PCIe (allows remote-booting KNC from the network)
Current (A3) version of EXTOLL ASIC– 270 million transistors
– Link bandwidth: 100 G
– MPI latency: 850 ns
– MPI bandwidth: 8.5 GB/s
– Message rate: 70 million mgs/sec
– PCIe Gen3 x16
Network EXTOLL Tourmalet
Tourmalet PCI Express Board
Tourmalet Chip and Wafer
17
![Page 12: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/12.jpg)
GreenICE system
Alternative Booster implementation
• Interconnect EXTOLL ASIC “Tourmalet”
• 32 KNC-node system
• Implement 4×4×2 topology, with Z dimension open
2-phase immersion cooling
• NOVEC liquid from 3M
• Evaporates at about 50 degrees
• Condensates again in a water cooling pipe
• Allows very high-density integration
19
GreenICE Booster
![Page 13: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/13.jpg)
MPI performanceLatency
20
![Page 14: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/14.jpg)
MPI performanceBandwidth
21
Factor of 3× achieved by EXTOLL TOURMALET (5Gbit/s version A2)
![Page 15: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/15.jpg)
DEEP Architecture
23
Xeon Xeon Phi
![Page 16: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/16.jpg)
Xeon Phi
DEEP-ER Architecture Innovation
Simplified Interconnect
On-Node NVM
Self-Booting Nodes
Network Attached Memory
24
Xeon
![Page 17: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/17.jpg)
DEEP-ER Aurora Blade
prototype
Eurotech’s Aurora technologyDirect water cooled, high density
25
7U
19 inch
Rootcard
-18x EXTOLL
-18x NVMe
Chassis:
-18x KNL
-94GB Mem
-1x backplane
4U
3U
EXTOLL Tourmalet
Aurora Blade DEEP-ER Booster(in construction)
Aurora Blade Chassis
NVMeIntel Xeon Phi (KNL)
![Page 18: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/18.jpg)
Network Attached Memory
NAM architecture
• Xilinx Virtex 7 FPGA
• Hybrid Memory Cube (HMC)– Bandwidth HMC↔FPGA: 40+ GByte/s
– HMC Conrtroler: Open source development
• Attached to TOURMALET NIC
libNAM (libc based) for ease of use
Use cases:
• global (shared) storage
• compute node for an X-OR C/R app
• “active memory”, etc.
26
EXTOLL
Tourmalet NIC
12 EXTOLL lanes
HD
I6
HD
I6
HD
I6H
DI6
HD
I6H
DI6
UHEI Aspin v2 Board
HD
I6
XilinxVirtex 7
FPGA
HMCDRAM
16 HMC lanes
12-laneEXTOLL Cable
Xilinx Virtex 7 FPGA
EXTOLL
Endpoint
HMC
Controller
16 HMC lanes
12 EXTOLL lanes
RMA
Engine
NAM
Functions
NAM Board
![Page 19: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/19.jpg)
SOFTWARE
27
![Page 20: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/20.jpg)
28
Programming environment
Cluster Booster
BoosterInterface
Infin
iband
Exto
ll
Cluster BoosterProtocol
MPI_Comm_spawn
ParaStation MPI
OmpSs on top of MPI provides pragmas to ease the offload process
![Page 21: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/21.jpg)
0
100
200
300
400
500
600
700
800
900
1000
Message size
Th
rou
gh
pu
t[M
B/s
]
CBP
Message-based
RMA limit
Cluster-Booster Protocol
29
![Page 22: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/22.jpg)
Application running on DEEP
3030
Source code
Compiler
Application
binaries
DEEP
Runtime
#pragma omp task in(…) out (…) onto (com, size*rank+1)
![Page 23: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/23.jpg)
DEEP Offload (with OmpSs)Performance & Scalability evaluation
31
Rank 0
master
Rank 0-15
slave0
Rank 0
Worker Rank 1
Worker Rank 2
Worker Rank 3
wk63
Rank 16-31
slave1
Rank 0
Worker Rank 1
Worker Rank 2
Worker Rank 3
wk127
Rank 240-255
slave15
Rank 0
Worker Rank 1
Worker Rank 2
Worker Rank 3
wk1023
x16
x16
x16
Published in: “Collective Offload for Heterogeneous Clusters”, HiPC 2015
FWI (full wave inversion) code
![Page 24: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/24.jpg)
Scalable I/O
• Improve I/O scalability on all usage-levels
• Used also for checkpointing
32
![Page 25: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/25.jpg)
Filesystem
• Two instances:
– Global FS on HDD server
– Cache FS on NVM at node
• API for cache domain handling
– Synchronous version
– Asynchronous version
33
![Page 26: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/26.jpg)
Resiliency
• Develop a hierarchical, distributed checkpoint/restart
mechanism leveraging DEEP-ER architecture
34
![Page 27: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/27.jpg)
APPLICATIONS
36
![Page 28: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/28.jpg)
Application-driven approach
• DEEP+DEEP-ER applications:
– Brain simulation (EPFL)
– Space weather simulation (KULeuven)
– Climate simulation (Cyprus Institute)
– Computational fluid engineering (CERFACS)
– High temperature superconductivity (CINECA)
– Seismic imaging (CGG)
– Human exposure to electromagnetic fields (INRIA)
– Geoscience (LRZ Munich)
– Radio astronomy (Astron)
– Oil exploration (BSC)
– Lattice QCD (University of Regensburg)
• Goals:
– Co-design and evaluation of architecture and its programmability
– Analysis of the I/O and resiliency requirements of HPC codes
37
![Page 29: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/29.jpg)
Cluster-Booster Advantages
• More flexible than a standard architecture
→ This enables different use models:
1. Dynamic ratio of processors/coprocessors
2. Use Booster as pool of accelerators (globally shared)
3. Discrete use of the Booster
4. Discrete use + I/O offload
5. Specialized symmetric mode
• Enables a more efficient use of system resources
– Only resources actually needed are blocked by applications
– Dynamic allocation further increases system utilization
38
![Page 30: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/30.jpg)
39
Code optimisations
0
2
4
6
8
10
12
020406080
100120140
160
Sp
ee
du
p
Gfl
op
s/s
Impact of different optimizations of
wave propagator on Xeon Phi
Gflops/s
speedup
BSC: Enhancing Oil Exploration (FWI, wave propagator)
1 XeonPhi (60 cores), 180 OpenMP threads
![Page 31: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/31.jpg)
Using SIONlib
LRZ: Rapid crustal deformation & earthquake source equation (Seisol)
1 process per node, 16 threads per process
writing 20 checkpoint files (4GB/checkpoint)
40
0.1
1.0
10.0
100.0
16 32 64 128 256 512 1,024
Ba
nd
wid
th [
GB
/s]
# of nodes
Accumulated bandwidth on SuperMUC
SIONlib POSIX MPI I/O HDF5
![Page 32: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/32.jpg)
41
Using NVMe
Inria: Assessment of Human exposure to EM fields
24 MPI processes, 1 thread per process
0
10
20
30
40
50
60
P1 P2 P3 P4
Wri
tin
g t
ime
[s]
I/O performance of MAXW-DGTD
sdv-work NVMe
Increasing
model precision
P1<P2<P3<P4
![Page 33: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/33.jpg)
DEEP/-ER
Emerging Technologies
• Cluster-Booster Architecture:
– Alternative approach to heterogeneity
– High flexibility enabling various use modes
• Hardware components:
– Booster (new kind of cluster of accelerators)
– GreenICE Booster (2-phase immersion cooling)
– EXTOLL network tested at scale
– Warm-water cooling
– Memory hierarchy based on NVM
– Network Attached Memory
42
![Page 34: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/34.jpg)
DEEP/-ER
Emerging Technologies
43
• Software
– Cluster-Booster Protocol: low-level communication protocol between different high-speed networks
– Programming environment for future heterogeneous systems
– ParaStation Global MPI supporting EXTOLL and CBP
– OmpSs extensions for DEEP Offload
– Resiliency extensions for OmpSs (task recovery) and ParaStation
– BeeGFS extension for local caches (on NVM)
– SIONlib extensions for buddy-checkpointing, integration with SCR and use of BeeGFS functionality
– E10 scalability optimisations for MPI-I/O
– Extrae/Paraver support for DEEP Offload
– Applications modernisation and optimisation
![Page 35: Technology emerging from the DEEP & DEEP-ER projectsJülich Supercomputing Centre 03.06.2016 The research leading to these results has received funding from the European Community's](https://reader034.fdocuments.in/reader034/viewer/2022052103/603d48fcdc696c0c5155a651/html5/thumbnails/35.jpg)
DEEP and DEEP-ER
44www.deep-project.eu www.deep-er.eu
EU-Exascale projects
20 partners
Total budget: 28,3 M€
EU-funding: 14,5 M€
Nov 2011 – Mar 2017
Visit us @
ISC’16, Frankfurt
(Germany)
20.-22.06.2016
-Booth #1340
-BoF #11
-Workshop