Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M....

21
Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics University of Hull, Cottingham Road, Kingston upon Hull HU6 7RX, UK [email protected] performance for computational astrochemistry @dbenoit1

Transcript of Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M....

Page 1: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

Extreme-scaling on Omni-Path fabric:

David M. BenoitE.A. Milne Centre for AstrophysicsDepartment of Physics and MathematicsUniversity of Hull, Cottingham Road, Kingston upon Hull HU6 7RX, [email protected]

performance for computational astrochemistry

@dbenoit1

Page 2: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Landing on a comet – ESA Rosetta Mission

• ESA space probe launched in 2004• Lander module Philae explored

comet 67/P Churyumov-Gerasimenko in 2014

Page 3: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Page 4: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

pivotal role in its chemistry.21,22 Dust grains are thought to becomprised of silicates, oxides and carbonaceous materials.23,24

In dense molecular clouds, molecules and atoms from the gasphase condense onto the grains to form molecular ices,16,21,25,26

the composition of which does not reflect gas phase abundances.During the lifetime of a molecular cloud (106–108 years)the ices undergo significant physical and chemical changesdepending on the astrophysical environment. A summary ofsome of the typical processing routes for interstellar ices isshown in Fig. 1. The grain acts as a third body, catalysingchemical reactions at the surface. At very low temperatures(o20 K) atomic species such as H, D, N, C and O form simplemolecules via thermal hopping or quantum tunnelling ongrain surfaces.16 In regions characterised by high atomic Habundances, hydrogenation forms molecules such as H2O,NH3, CH3OH and CH4.

13,16–19,27 Ices comprised of thesemolecules are often referred to as polar ices and are dominatedby H2O.16,28,29 Conversely, in regions where the gas phaseatomic and molecular hydrogen ratios are much lower, surfacebound C, N and O atoms form so called apolar ices consistingof more volatile species including CO, CO2, N2 and O2.

27,30,31

The presence of multiply deuterated species in the ISM hasalso been ascribed to grain surface chemistry.32

Interstellar ices undergo further processing via exposure toultraviolet (UV),20,33,34 X-ray or cosmic radiation,35 in additionto thermal processing. Laboratory studies of ice grain mimicshave shown that UV irradiation gives rise to the formation ofcomplex organics such as NH2CHO, OCN! andC2H5OH.34,36,37 Bombardment of the ices by low energyprotons and ions also leads to the formation of new species,in addition to altering the morphology of the ices.35,38–41 Heatgenerated by new born stars stimulates further surfacechemistry prior to evaporation of these icy mantles duringcloud collapse. In regions of massive star formation known ashot cores, where temperatures rise to in excess of 100 K,sublimation of the ices gives rise to enhanced gas phaseabundances of molecules. Once the molecules have beenreleased into the gas phase, they drive a rich chemistry leading

to the formation of larger organics such as methyl formate(HCOOCH3) and dimethyl ether ((CH3)2O).13,18,42

Studies have shown that the sublimation of interstellar icesis not instantaneous.43 Hence a detailed understanding of thethermal desorption of interstellar ices from dust grains isessential for the accurate modelling of star formation. Thisinformation can be obtained from surface science techniquessuch as temperature programmed desorption (TPD) studies ofmodel interstellar ices on dust grain analogue surfaces.44–57

Numerous studies have shown that the desorption of astro-physically relevant species from H2O-rich ices occurs over arange of temperatures, instead of a single temperature asassumed in many astrophysical models.46,48,58–69 Severalauthors70,71 have demonstrated the importance of usingexperimentally determined kinetic parameters to describe thesublimation of interstellar ices from grains. These data can beincorporated into astrophysical models to extrapolate desorptionevents on timescales relevant to real astrophysical processes.Furthermore, the data can be used to calculate residence timesof molecules on grain surfaces as a function of temperature, inaddition to providing a more accurate method of estimatingtotal column density (an estimate of the thickness of the ice) ininterstellar ices.55,72,73

This perspective aims to provide an overview of the currentlevel of understanding of the adsorption, and particularlydesorption, of astrophysically relevant molecular ices from arange of dust grain analogue surfaces. Whilst the perspectivefocuses on the adsorption and desorption of ices in aninterstellar context, particularly with respect to star-formingregions and to hot cores, much of the data is equally relevantto discussions of the desorption of cometary and planetaryices, although these are not specifically described here. Themain body of the perspective discusses results obtained for arange of different interstellar ices, using ultra-high vacuum(UHV) surface science techniques to study adsorption on anddesorption from a range of dust grain analogues. We provide amore general review of some of the experiments that have beenundertaken to study ice desorption by several different groups,

Fig. 1 Schematic showing the main routes of interstellar ice processing that takes place in astrophysical environments. The molecular species

labelled within the inner layer highlight the main constituents detected in interstellar ices.

5948 | Phys. Chem. Chem. Phys., 2010, 12, 5947–5969 This journal is "c the Owner Societies 2010

Publ

ished

on

23 F

ebru

ary

2010

. Dow

nloa

ded

by U

nive

rsity

of H

ull o

n 30

/10/

2017

16:

34:1

2.

View Article Online

•Dust grains acts as cosmic “bench tops” for chemical reactions•Help transform basic chemicals into more complex molecules• Their study is key to understanding low-temperature chemistry in the Universe• But how strongly do moleculesstick to ice-covered grains/comets?

Surface chemistry on dust grains

From: D.J. Burke et al., PCCP 12 (2010) 5947

Page 5: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Adsorption / sticking energies?

• Adsorption energies are hard to measure accurately

• Experimental database estimates are from ‘90s with little experimental/theoretical validation

•Quantum chemical calculations can provide a reliable estimate Penteado et al., ApJ 844 (2017) 71

Page 6: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

Scaling quantum chemistry on HPC

Page 7: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

VIPER technical profile

• 5040 Intel Broadwell E5-2860v4 (2.4 GHz) cores in 180 compute nodes• Intel X16 100Gb/s Omni-Path interconnect• 4 x 1 TB high memory nodes• 2 x visualisation nodes (2 x Nvidia

GeForce GTX 980 Ti per node)• 4 x accelerator nodes (4 x Nvidia Tesla

K40M GPUs per node)• 500 TB of user storage running BeeGFS• Each node runs a Docker container

Page 8: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

TESSERACT technical profile (SGI 8600)• 20,256 Intel Skylake silver 4116

(2.1 GHz) cores in 844 compute nodes• 96 GB NUMA per node (48GB/

processor)• Intel X16 100Gb/s Omni-Path

interconnect• 3 PB of Lustre file system • DiRAC’s Extreme Scaling facility• Hosted at EPPC/Edinburgh

Page 9: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Application: CPMD – Large-scale electronic structure code• Well established, freely available,

density functional theory (DFT) code• Developed by MPI-FKF Stuttgart&IBM• Demonstrated scaling on large HPC• Hybrid MPI/OpenMP parallelisation • Bottlenecks: FFT, memory footprint

and all-to-all comms• Our system: fullerene dimer (C120),

480 electrons, triplet state, B3LYP functional and a 500 Ry cutoff

Li/Air Batteries ( ~ 700 atoms ) PBE0 (SCF performance) BG/Q

cpmd.org

Fullerene dimer, C120

Page 10: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Large-scale density functional theory on OPASp

eedu

p (2

4 co

res

= 1)

0

4

8

12

16

20

24

Number of cores0 72 144 216 288 360 432 504 576

Linear scalingOmni-Path (VIPER)EDR – Reference

576 FFT planes

Conditions: Omni-Path HFI Silicon 100 Series; Intel 100 Series 48 port unmanaged switches; CentOS 7.2.1511; dockerised nodes; OPA 10.4.1.0-1; CPMD V4.5 compiled with ifort 2017 (-O2 -ipo -xHOST) ATLAS (single thread) and scaLAPACK

Maximum theoretical limit

Page 11: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Scaling past the maximum theoretical limit

Spee

dup

(24

core

s =

1)

0

12

24

36

48

60

Number of cores

0 288 576 864 1152 1440 1728 2016 2304

Linear scalingOmni-Path (VIPER)EDR – Reference2 FFT groups (VIPER)4 FFT groups (VIPER)

1/2 MPI FFT ranks

1/4 MPI FFT ranks

Page 12: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Locality and OpenMP (kitchen sink approach…)

Spee

dup

(24

core

s =

1)

0

12

24

36

48

60

72

84

96

Number of cores

0 288 576 864 1152 1440 1728 2016 2304

Linear scalingMPI onlyLocal data (MPI only)2 FFT groupsLocal data (2 FFT groups)Local data (OMP2)4 FFT groupsLocal data (OMP2, 2 FFT groups)Local data (OMP4)

Locality reduces number of all-to-all comms (~30%)

OpenMP comms slightly faster than FFT group comms

OpenMP & FFT groups faster than pure OMP or pure FFT groups

Page 13: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

2 x 576 MPI ranks

1 iteration = 4.4 s

ITAC MPI trace – What happens at 1152 cores?

1152 MPI ranks

576 MPI ranks with 2 OpenMP threads

1 iteration = 4.2 s

1 iteration = 13.5 s

CPMDOpenMPMPI All-reduceMPI all-to-all

Page 14: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

More cores better? – Tesseract scaling

Spee

dup

(24

core

s =

1)

0

48

96

144

192

240

288

336

384

Number of cores

0 1152 2304 3456 4608 5760 6912 8064 9216

Linear scalingTesseractBest Viper50% scaling

Are we running out of data?

Page 15: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

Sticky space ice?

Page 16: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Space ice model: Low density ice

• Amorphous low density ice model with a single benzene molecule, 1512 atoms, 4030 electrons, PBE functional and a 200 Ry cutoff• 10-fold size increase from C120

http://www.nims.go.jp/water/hda_lda_tr.html

Page 17: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Tesseract scalingSp

eedu

p (3

84 c

ores

= 1

)

0

4

8

12

16

20

24

28

32

Number of cores

0 1536 3072 4608 6144 7680 9216 10752 12288

Linear scalingTesseract50 % scaling

Page 18: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Bind

ing

ener

gy [K

]

3250

3265

3280

3295

3310

3325

3340

3355

3370

3385

3400

Cutoff energy

50 100 150 200 250 300 350

Aver

age

itera

tion

time

[min

]0

1

2

3

4

432 580 728 876 1024Number of cores

Adsorption energy estimation

TPD

: 477

5K

UM

IST:

790

0KCP2K

Page 19: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Why is it important?Unlikely

Standard UMIST

database

3-times more likelyOur results

Modified from: ESA/Rosetta/RPC-ICA

Page 20: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Conclusions

•Omni-Path performance are similar to EDR in conventional scaling region but better in non-scalable regime

•Combination of enhanced locality, reduced MPI ranks and FFT grouping leads to predictable scalability

•However, understanding application and problem size are key to scaling at large core counts

Page 21: Extreme-scaling on Omni-Path fabric · 2019-01-23 · Extreme-scaling on Omni-Path fabric: David M. Benoit E.A. Milne Centre for Astrophysics Department of Physics and Mathematics

CIUK18 | Manchester | December 2018

Acknowledgements

» VIPER HPC support team

» The University of Hull for funding

» DiRAC / STFC / EPCC for CPU time