ADAC6Appropriate Use of Containers in HPC
Agenda
April 2017 Cray Inc. Confidential and Proprietary
● Why HPC containers● Investigate different container runtimes for HPC ● Integration with batch systems● Best practices for HPC● HPC Application performance
2
Why HPC containers
April 2017 Cray Inc. Confidential and Proprietary
● Mobility of computing to both users and HPC centers● Means to capture and distribute software and compute environments● Entire workflow can be contained for others to replicate no matter what
version of Linux they are running (Kernel ABI compatibility – syscall’s)● Reproducibility of results – may not mean same performance
● MPI-ABI, libraries, host drivers● Standard and simplified packaging
● Independent Software Vendors (ISV) codes● Delivered benchmarks● CRI standard and/or propriety formats
● Legacy workload packaging and execution
3
Container Runtime Environments
April 2017 Cray Inc. Confidential and Proprietary
● Enterprise, and HPC container environments● opencontainers/runc
● 236 contributors, 17 release. 3,613 commits● Standard image format (Open Container Initiative – OCI)
● rtk/rtk● 197 contributors, 65 releases , 5,539 commits● Standard image format (appc Specification – Application Container Image, ACI)
● NERSC Shifter (github)● 11 contributors, 5 releases, 1,712 commits● Propriety image format
● LBL Singularity (syslabs.io)● 63 contributors, 19 releases, 4,503 commits● Propriety image format
4
As of 06/18/18
Integration with batch systems
April 2017 Cray Inc. Confidential and Proprietary
● Examples of Moab/Torque and Cray ALPS● Application based utilities● Limited or no cgroup and kernel namespace support
① $ aprun -n N -b runc --bundle /tmp/cle.img run $(date +%Y%m%d%H%M)② $ aprun -n N… -b rkt run \
--stage1-name=coreos.com/rkt/stage1-fly:1.21.0 \registry-1.docker.io/library/cle:latest --net=host --exec=/bin/hostname
③ $ aprun -n N… -b shifter --image cle:latest /bin/hostname④ $ aprun -n N… -b singularity exec /global/cle.img /bin/hostname
5
Container Runtime Specifics
April 2017 Cray Inc. Confidential and Proprietary
● Container runtimes configuration and semantics● Unique image management● Runtime arguments are specific
6
Configuration & Runtime options
April 2017 Cray Inc. Confidential and Proprietary
● runc● Embedded in the OCI image definition: config.json
● rkt● /etc/rkt, /usr/lib/rkt, user-defined
● Repository authentication, data and image locations● Command line can override system configurations
● Shifter● System configuration (/etc/shifter)
● Authentication, data and image locations● Singularity
● System configuration $SYSCONFDIR/singularity/singularity.conf● Authentication, data and image locations
7
runc
April 2017 Cray Inc. Confidential and Proprietary
● Create a root file system, created via Docker container$ docker export alpine | tar xvfC - ./rootfs
● Create container configuration file config.json$ runc spec$ vi config.json # modify the args, see OCI specification
● Run the container$ runc --log /dev/stdout run $(date +%Y%m%d%H%M)
8
OCI specification: https://github.com/opencontainers/runtime-spec/blob/master/runtime.md
rkt
April 2017 Cray Inc. Confidential and Proprietary9
● Pull an image locallydocker2aci$ docker2aci docker://jsparkscraycom/cle:6.0.2
● Run the container$ qsub –Isalloc: Granted job allocation 88$ rkt run --no-overlay=true --net=host \
--stage1-name=coreos.com/rkt/stage1-fly:1.22.0 \registry-1.docker.io/jsparkscraycom/cle:6.0.2 \${options} –inherit-env=true \--environment=LD_LIBRARY_PATH=$LD_LIBRARY_PATH \--exec=/lus/${USER}/a.out -- myargs
Shifter
April 2017 Cray Inc. Confidential and Proprietary
● Pull an image$ shifterimg pull jsparkscraycom/test:latest
● Run the container$ shifter --image jsparkscraycom/test:latest \/usr/bin/date
10
Singularity
April 2017 Cray Inc. Confidential and Proprietary
● Pull an image (if needed)$ singularity pull docker://jsparkscraycom/test:latest
● Run the container$ singularity exec \
docker://jsparkscraycom/test:latest \/usr/bin/date
11
Container Best Practices
April 2017 Cray Inc. Confidential and Proprietary
● Portability● Run anywhere vs. exploit host acceleration● Isolated contained environment, aka encapsulation – full vs. partial
● Image standardization compliance● OCI, ACI and custom (singularity)
● HPC Optimizations● Override container defaults via mounts and environment variables
● Libraries and configurations● Framework independence
● No vendor/runtime lock in● Stripped frameworks
● opencontainers/runc, Charliecloud, rtk
12
Containerization Models
April 2017 Cray Inc. Confidential and Proprietary13
● HPC case● HPC runtimes
(Shifter/Singularity/...)● Partial encapsulation –
limited namespaces● Access to host resources –
networks, storage● blurs the line between
container and host such that local directories can exist within the container.
● Multiple applications● Batch system integration● Image format for scale (pfs)
● Enterprise case● Standard runtimes
(docker/rkt)● Fully encapsulated
● Everything the application needs is in the container
● Single application● Orchestration software● Standard image formats
HPC Partial Encapsulation
April 2017 Cray Inc. Confidential and Proprietary14
Container
binaries, libraries, configurations
PFS
ISV app, libs, configuration
ISV app, libs,configuration,
directories
exec() / mount()
filesystem mount
ssh/sshd
/home, /var/lib/hugetlbfs, /opt/nvidia, /lustre
HPC Application performance
April 2017 Cray Inc. Confidential and Proprietary
● Launch times● Time to setup and launch via container runtime
● Application performance● Hugepage optimization● Environment pass-through
15
An Updated Performance Comparison of Virtual Machinesand Linux ContainersRef: http://domino.research.ibm.com/library/cyberdig.nsf/papers/0929052195DD819C85257D2300681E7B/$File/rc25482.pdf
Launch Times
April 2017 Cray Inc. Confidential and Proprietary16
0. 00
2. 00
4. 00
6. 00
8. 00
10 .0 0
12 .0 0
14 .0 0
16 .0 0
18 .0 0
20 .0 0
2 4 8 16 32 64
Sec
on
ds
N um ber of nodes
Container Execution OverheadExecution time of /bin/true
ba re O S
sh if te r
r kt
r unC w / Lu str e
si ngu la ri t y
r unC w / t m pf s
OSU Benchmarks
April 2017 Cray Inc. Confidential and Proprietary17
0. 00
10 0. 00
20 0. 00
30 0. 00
40 0. 00
50 0. 00
60 0. 00
0 2 832 12
851
220
4881
9232
768
1310
72
5242
88
2097
152
Lat
ency
(u
s)
S ize
OSU One Sided MPI_GET latency Test v3.8
r kt
S hif t er
C LE
S ing ul ar it y
0
10 00
20 00
30 00
40 00
50 00
60 00
70 00
80 00
90 00
1 416 64 25
610
2440
9616
384
6553
626
214
410
485
7641
943
04
Ban
dw
idth
(M
B/s
)
S ize
OSU One Sided MPI_GET Bandwidth Test v3.8
r kt
S hif t er
C LE
S ing ul ar it y
Hugepage / DMAPP
April 2017 Cray Inc. Confidential and Proprietary18
0
10 00
20 00
30 00
40 00
50 00
60 00
70 00
80 00
1 416 64 25
610
2440
9616
384
6553
626
214
410
485
7641
943
04
Ban
dw
idth
(M
B/s
)
S ize
OSU MPI One Slided MPI_Get Bandwidth
S hif t er w/ ohu gep age s
S hif t er whu gep age s
010 0020 0030 0040 0050 0060 0070 0080 0090 00
10 000
1 416 64 25
610
2440
9616
384
6553
626
214
410
485
7641
943
04
Ban
dw
idth
(M
B/s
)
S ize
OSU MPI One Sided MPI_Get Bandwidth
Hugepages + DMAPP
S hif t er w/ oD M AP P
S hif t er wD M AP P
NPB Single node
April 2017 Cray Inc. Confidential and Proprietary19
0. 0020 .0 0
40 .0 060 .0 0
80 .0 010 0. 00
12 0. 0014 0. 00
16 0. 0018 0. 00
20 0. 00
CLE
Shif
ter
rkt
Sing
ular
ity
CLE
Shif
ter
rkt
Sing
ular
ity
CLE
Shif
ter
rkt
Sing
ular
ity
CLE
Shif
ter
rkt
Sing
ular
ity
CLE
Shif
ter
rkt
Sing
ular
ity
CLE
Shif
ter
rkt
Sing
ular
ity
CLE
Shif
ter
rkt
Sing
ular
ity
CLE
Shif
ter
rkt
Sing
ular
ity
CLE
Shif
ter
rkt
Sing
ular
ity
bt cg dc ep f t i s l u sp ua
Tim
e in
sec
on
ds
B enchm ark
NAS Parallel Benchmarks 3.3Serial Single node CLASS=A
N PB - S in gle n ode
NPB Multi-node
April 2017 Cray Inc. Confidential and Proprietary20
0. 00
20 .0 0
40 .0 0
60 .0 0
80 .0 0
10 0. 00
12 0. 00
14 0. 00
CLE
Shif
ter
rkt
Sing
ular
ityru
nC CLE
Shif
ter
rkt
Sing
ular
ityru
nC CLE
Shif
ter
rkt
Sing
ular
ityru
nC CLE
Shif
ter
rkt
Sing
ular
ityru
nC CLE
Shif
ter
rkt
Sing
ular
ityru
nC CLE
Shif
ter
rkt
Sing
ular
ityru
nC CLE
Shif
ter
rkt
Sing
ular
ityru
nC CLE
Shif
ter
rkt
Sing
ular
ityru
nC
bt cg ep f t i s l u m g sp
be nch m ar k
Tim
e in
sec
on
ds
NAS Parallel Benchmarks 3.3NPROCS=256 CLASS=D
N PB
Quantum ESPRESSO – optimization
April 2017 Cray Inc. Confidential and Proprietary21
0. 00
50 .0 0
10 0. 00
15 0. 00
20 0. 00
25 0. 00
30 0. 00
35 0. 00
36 72 10 8 14 4 18 0 21 6 25 2 28 8
Ex
ec
uti
on
tim
e (
se
cs
)
cores
Execution time of QUANTUM ESPRESSO 6.0
/ Broadwellhugepage
C LE
S hif t er
S ing ul ar it y
0. 00
50 .0 0
10 0. 00
15 0. 00
20 0. 00
25 0. 00
30 0. 00
35 0. 00
36 72 10 8 14 4 18 0 21 6 25 2 28 8
Ex
ec
uti
on
tim
e (
se
cs
)
cores
Execution time of QUANTUM ESPRESSO 6.0
/ Broadwellnone-hugepage
C LE
S hif t er
S ing ul ar it y
Radioss – optimization
April 2017 Cray Inc. Confidential and Proprietary22
0
10
20
30
40
50
60
36 72 10 8 14 4
Ela
pse
d T
ime
(sec
s)
N um ber of C ores
Radioss (Intel MPI): Normalized to CLEBroadwell ppn:36
w /o h uge pag es
hu gep age s
I nt el l ib ra r ies
en vir o nm en t pr op oga ti on
Conclusions and Future Work
April 2017 Cray Inc. Confidential and Proprietary
● Deployment● Enterprise frameworks required per-node image caches● Enterprise frameworks, as expected offer more runtime options● HPC frameworks enforce strict user security
● Performance● Native application performance can be achieved, requires host-level access
to resources (network, file system)● Host environment pass-through● Launch time dependent on container infrastructure and image size
● Future work● Scaling investigation of open container frameworks
● Shared image storage – faster startup times● Abstraction layer standardizing on applications packaging, deployed and run
in isolation – OCI.
23
Legal Disclaimer
April 2017 Cray Inc. Confidential and Proprietary24
Information in this document is provided in connection with Cray Inc. products. No license, express or implied, to any intellectual property rights is granted by this document.
Cray Inc. may make changes to specifications and product descriptions at any time, without notice.
All products, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.
Cray hardware and software products may contain design defects or errors known as errata, which may cause the product to deviatefrom published specifications. Current characterized errata are available on request.
Cray uses codenames internally to identify products that are in development and not yet publicly announced for release. Customers and other third parties are not authorized by Cray Inc. to use codenames in advertising, promotion or marketing and any use of Cray Inc. internal codenames is at the sole risk of the user.
Performance tests and ratings are measured using specific systems and/or components and reflect the approximate performance of Cray Inc. products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.
The following are trademarks of Cray Inc. and are registered in the United States and other countries: CRAY and design, SONEXION, URIKA and YARCDATA. The following are trademarks of Cray Inc.: CHAPEL, CLUSTER CONNECT, CLUSTERSTOR, CRAYDOC, CRAYPAT, CRAYPORT, DATAWARP, ECOPHLEX, LIBSCI, NODEKARE, REVEAL. The following system family marks, and associated model number marks, are trademarks of Cray Inc.: CS, CX, XC, XE, XK, XMT and XT. The registered trademark LINUX is used pursuant to a sublicense fromLMI, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis. Other trademarks used on this website are the property of their respective owners.
Top Related