CEA and RIKEN AICS Collaboration
Yutaka IshikawaRIKEN AICS
First French‐Japanese‐German Workshop on Programming and Computing for Exascale and beyond, 5th April 2017, Tokyo
16:25 ‐ 16:55
Outline of Talk An Overview of FLAGSHIP 2020 and development of
post-K system CEA Collaboration Concluding Remarks
20017/04/05 2
FLAGSHIP2020 Project
20017/04/05 3
LoginServersLoginServers
MaitenanceServers
MaitenanceServers
I/O NetworkI/O Network
………
……………………… Hierarchical
Storage SystemHierarchical
Storage System
PortalServersPortalServers
Missions• Building the Japanese national flagship
supercomputer, post K, and• Developing wide range of HPC applications, running
on post K, in order to solve social and science issues in Japan
Hardware and System Software• Post K Computer
• RIKEN AICS is in charge of development• Fujitsu is vendor partnership
Applications• 9 High priority issues from a social and
national viewpoint• Promising creation of world‐Leading
achievement• Promising strategic use of post K
computer
9 Social and scientific priority issues
20017/04/05 4
Category Priority issues
Life science① Innovative drug discovery infrastructure through functional control of biomolecular systems
② Integrated computational life science to support personalized and preventive medicine
Disaster prevention and
global climate problem
③ Development of integrated simulation systems for hazard and disaster induced by earthquake and tsunami
④ Advancement of meteorological and global environmental predictions utilizing observational “Big Data”
Energy problem⑤ Development of new fundamental technologies for high-efficiency energy creation, conversion/storage and use
⑥ Accelerated Development of Innovative Clean Energy Systems
Industrial applications
⑦ Creation of new functional devices and high-performance materials to support next-generation industries
⑧ Development of Innovative Design and Production Processes that Lead the Way for the Manufacturing Industry in the Near Future
Basic science ⑨ Elucidation of the fundamental laws and evolution of the universe
Selected from the following point of view:• High priority issues from a social and national viewpoint• Promising creation of world‐Leading achievement• Promising strategic use of post K computer
An Overview of Co-design in the Post K development
5
Target Application
Program Brief description
① GENESIS MD for proteins
② Genomon Genome processing (Genome alignment)
③ GAMERA Earthquake simulator (FEM in unstructured & structured grid)
④ NICAM+LETK Weather prediction system using Big data (structured grid stencil & ensemble Kalman filter)
⑤ NTChem molecular electronic (structure calculation)
⑥ FFB Large Eddy Simulation (unstructured grid)
⑦ RSDFT an ab-initio program (density functional theory)
⑧ Adventure Computational Mechanics System for Large Scale Analysis and Design (unstructured grid)
⑨ CCS-QCD Lattice QCD simulation (structured grid Monte Carlo)
Node and Storage Architecture• #SIMD, SIMD length, #core, #NUMA node• cache (size and bandwidth)• network (topologies, latency and bandwidth)• memory technologies• specialized hardware• Node interconnect, I/O network
System Software
• Operating system for many core architecture• Communication libraries (low level layer, MPI, PGAS)• File I/O (Asynchronous I/O, buffering/caching)
Programming Environment• Programming model and languages• Math libraries, domain‐specific libraries
9 social & scientific priority issues and their R&D organizations have been selected from the following point of view:• High priority issues from a social and national viewpoint• Promising creation of world‐Leading achievement• Promising strategic use of post K computer
20017/04/05
An Overview of post-K
Hardware Manycore architecture based on
ARM+SVE+Fujitsuʼs extensions
6D mesh/torus Interconnect 3-level hierarchical storage system
Silicon Disk
Magnetic Disk
Storage for archive
LoginServersLoginServers
MaintenanceServers
MaintenanceServers
I/O NetworkI/O Network
……
…
………………………
HierarchicalStorage SystemHierarchical
Storage System
PortalServersPortalServers
System Software Multi-Kernel: Linux with Light-weight Kernel File I/O middleware for 3-level hierarchical storage
system and application Application-oriented file I/O middleware MPI+OpenMP programming environment Highly productive programing language and
libraries
20017/04/05 6
CPU Architecture ARMv8-A + SVE (Scalable Vector Extension)
20017/04/05 7
FP64/FP32/FP16
Fujitsuʼs extensions Inter core barrier Sector cache Hardware prefetch assist
https://developer.arm.com/products/architecture/a-profile/docs
Partition resources (CPU cores, memory)
Full Linux kernel on some cores System daemons and in-situ
non HPC applications Device drivers
Light-weight kernel(LWK), McKernel on other cores HPC applications
McKernel developed at RIKEN McKernel is loadable module of Linux McKernel supports Linux API McKernel runs on
Intel Xeon and Xeon phi Fujitsu FX10 and FX100 (Experiments)
Very simplememory
management
Thin LWKProcess/Thread management
General scheduler
Complex Mem. Mngt.
LinuxTCP stack
Dev. Drivers
VFS
File Sys Driers
Memory
… …Interrupt
Systemdaemons
?
HPC Applications
PartitionPartition
In‐situ non HPC application
Linux API (glibc, /sys/, /proc/)
Core Core Core Core Core Core
McKernel is deployed to the Oakforest‐PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo
20017/04/05 8
will be
Batch job queues for McKernel has not been turned on
How to deploy McKernel
• Linux Kernel+Loadable LWK, McKernel– Linux Kernel is resident, and daemons for job scheduler and etc. run on Linux– McKernel is dynamically reloaded (rebooted) for each application
• No hardware reboot
Finish
App A, requiring LWK-without-scheduler, Is invoked
App B, requiring LWK-with-scheduler,
Is invoked
FinishAp
p C
, usi
ng fu
ll Li
nux
capa
bilit
y, Is
invo
ked
Finish
20017/04/05 9
FWQ Benchmark
20017/04/05 10
https://asc.llnl.gov/sequoia/benchmarks
Linux with isolcpus McKernel
FWQ: Fixed Work Quanta
GeoFEM (University of Tokyo)
11
ICCG with Additive Schwartz Domain Decomposition - weak scaling Up to 18% improvement
0
2
4
6
8
10
12
14
16
1024 2048 4096 8192 16k 32k 64k 128k
Figu
re of m
erit (solved prob
lem size
norm
alized
to executio
n tim
e)
Number of physical cores
Linux IHK/McKernel
Acknowledgement: Kengo Nakajima, University of Tokyo, for providing GeoFEM. This result is on Oakforest‐PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo
Results using the same binary
20017/04/05
CCS-QCD (University of Tsukuba)
12
Lattice quantum chromodynamics code - weak scaling Up to 38% improvement
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1024 2048 4096 8192 16k 32k 64k 128k
MFlop
/sec/nod
e
Number of physical cores
Linux IHK/McKernel
Acknowledgement: Ken’ichi Ishikawa, Hiroshima University, providing CCS‐QCD. This result is on Oakforest‐PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo
Results using the same binary
20017/04/05
13
miniFE (CORAL benchmark suite) Conjugate gradient - strong scaling Up to 3.5X improvement (Linux falls over.. )
0
2000000
4000000
6000000
8000000
10000000
12000000
1024 2048 4096 8192 16k 32k 64k
Total C
G M
Flop
s
Number of physical cores
Linux IHK/McKernel 3.5X
Oakforest‐PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo
Results using the same binary
20017/04/05
CEA Collaboration Programming Language
Christophe Calvin, Marc Pérache, Patrick Carribault, Julien Jaeger, Julien Bigot
Mitsuhisa Sato, Hitoshi Murai, Jinpil Lee, Atsushi Hori
Runtime Environment Jacques-Charles Lafoucrière, Gilles Wiber, Yutaka Ishikawa, Masamichi
Takagi, Balazs Gerofi, Takahiro Ogura
Energy-aware batch job scheduler Matthieu Hautreux, Francis Belot, Atsuya Uno
Large DFT calculations and QM/MM Thierry Deutsch, Luigi Genovese, Takahito Nakajima , Takahito Nakajima
Application of High Performance Computing to Earthquake Related Issues of Nuclear Power Plant Facilities Evelyne Foerster, Gauthier Folzan, Alberto Frau, Muneo Hori , Hiroki
Motoyama, Kohei Fujita
KPIs (Key Performance Indicators) Jean-Philippe Bourgoin, Jean-Philippe Nominé, Didier Juvin, Shigeo Okaya,
Miwako Tsuji, Mitsuhisa Sato, Kenji Morishita
20017/04/05 14
CEA Collaboration: Programming Language
Collaborators Christophe Calvin, Marc Pérache, Patrick Carribault,
Julien Jaeger, Julien Bigot Mitsuhisa Sato, Hitoshi Murai, Jinpil Lee, Atsushi Hori
Objective and Collaboration Topics Supporting a wide range of applications productivity PGAS (Partitioned Global Address Space) model for the next generation
manycore parallel systems provides light-weight one-sided communication and low overhead
synchronization semantics.
Background CEA: MPC (MultiProcessor Communications) RIKEN: XcalableMP (XMP) , PVAS (Partitioned Virtual Address Space),
and PIP (Processes in a Process)
20017/04/05 15
CEA Collaboration: Programming Language
Collaborators Christophe Calvin, Marc Pérache, Patrick Carribault,
Julien Jaeger, Julien Bigot Mitsuhisa Sato, Hitoshi Murai, Jinpil Lee, Atsushi Hori
Objective and Collaboration Topics Supporting a wide range of applications productivity PGAS (Partitioned Global Address Space) model for the next generation
manycore parallel systems provides light-weight one-sided communication and low overhead
synchronization semantics.
Background CEA: MPC (MultiProcessor Communications) RIKEN: XcalableMP (XMP) , PVAS (Partitioned Virtual Address Space),
and PIP (Processes in a Process)
20017/04/05 16
2017 2018 2019 2020
• XMP available on ATOS/Bull supercomputer
• MPC available on ARM architecture
• MPC as MPI implementation for XMP prototype
• Study on a unified API for inter XMP nodes communication
• List of benchmarks and mini-app to be evaluated
• Benchmarks implemented with XMP on target architectures
• Benchmarks implemented with XMP-MPC on target architectures
• Benchmarks implemented with integrated environment on target architectures
CEA Collaboration: Runtime Environment Collaborators
Christophe Calvin, Marc Pérache, Patrick Carribault,Julien Jaeger, Julien Bigot
Mitsuhisa Sato, Hitoshi Murai, Jinpil Lee, Atsushi Hori
Objective and Collaboration Topics Improving (performance) portability of applications Defining a standard of the runtime environment settings
(including libraries, OS parameters and OS kernels) Finding optimal settings in terms of application
performance Contributing to the OpenHPC community
Background CEA: SELFIE (profiling tool) and PCOCC (virtualization tool)
EasyBuild, a software build and installation framework,is used to manage open-source packages
RIKEN: Linux with IHK/McKernel (Light-weight OS Kernel)20017/04/05 17
OS: LinuxCPU: Intel Xeon, Intel Xeon Phi,
ARMNetwork: InfiniBand, Omni-Path,
Fujitsu Tofu, Bull BXI
CEA Collaboration: Runtime Environment
Collaborators Christophe Calvin, Marc Pérache, Patrick Carribault,
Julien Jaeger, Julien Bigot Mitsuhisa Sato, Hitoshi Murai, Jinpil Lee, Atsushi Hori
Objective and Collaboration Topics Improving portability of applications with performance Defining a standard of the runtime environment settings (including
libraries, kernel parameters and kernels) Finding optimal settings in terms of application performance.
Background CEA: SELFIE (profiling tool) and PCOCC (virtualization tool) RIKEN: McKernel (Light-weight OS Kernel)
20017/04/05 18
2017 2018 2019 2020
• 1st version of configuration standard, libraries, kernel parameters and kernels
• 2nd version of configuration standard, libraries, kernel parameters and kernels
• 3rd version of configuration standard, libraries, kernel parameters and kernels
• 4th version of configuration standard, libraries, kernel parameters and kernels
• CEA tests McKernel on CEA’s machines
• RIKEN investigates EasyBuild
• CEA and RIKEN provide the current user demands
Concluding Remarks The system software stack for post-K is being
designed and implemented with the leverage of international collaborations, CEA, DOE Labs, and JLESC (NCSA, INRIA, ANL, BSC, JSC, RIKEN)
The software stack developed at RIKEN is open source
It also runs on Intel Xeon and Xeon phi RIKEN would like to contribute to OpenHPC
20017/04/05 19
Top Related