INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos...
Transcript of INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos...
![Page 1: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/1.jpg)
INTRODUCTION TO CLUSTER COMPUTINGCarlos Teijeiro Barjas (HPC Advisor)
UvA – Amsterdam – 15/01/2020
![Page 2: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/2.jpg)
Outline
Introduction to High Performance Computing
Definitions
Parallel programming
SURFsara facilities
Presentation
Systems and specifications
Running jobs
Hands-on exercises
Exercise available in your home directories (Stopos)
2
![Page 3: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/3.jpg)
Outline
Introduction to High Performance Computing
Definitions
Parallel programming
SURFsara facilities
Presentation
Systems and specifications
Running jobs
Hands-on exercises
Exercise available in your home directories (Stopos)
3
![Page 4: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/4.jpg)
High-performance computing (HPC) is …
… an area of computer-based computation. It includes all computing work that requires a high computing capacity or storage capacity.
… the use of parallel processing for running advanced application programs efficiently, reliably and fast.
… the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
… the use of super computers and parallel processing techniques for solving complex computational problems.
4
![Page 5: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/5.jpg)
A computer is …
5
![Page 6: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/6.jpg)
A computer is …
6
![Page 7: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/7.jpg)
Peripherals (I/O) are …
7
Image source: https://media.sciencephoto.com/image/t4150120/800wm/T4150120-Piles_of_discarded,_redundant_computer_keyboards.jpg
Image source: https://static.guim.co.uk/sys-images/Guardian/Pix/pictures/2014/2/10/1392028631237/Pile-of-computer-monitors-008.jpg
![Page 8: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/8.jpg)
A computer is …
8
![Page 9: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/9.jpg)
A memory is …
9
Image source: https://upload.wikimedia.org/wikipedia/commons/c/c0/8 bytes vs. 8Gbytes.jpg
![Page 10: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/10.jpg)
A computer is …
10
![Page 11: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/11.jpg)
A central processing unit (CPU) is …
11Image source: http://people.cs.pitt.edu/˜don/coe1502/current/Unit4a/Unit4a.html
![Page 12: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/12.jpg)
A larger computer could be …
12
![Page 13: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/13.jpg)
A larger computer could be …
13
![Page 14: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/14.jpg)
A larger computer could be …
14
![Page 15: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/15.jpg)
A larger computer could be …
15
![Page 16: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/16.jpg)
A larger computer could be …
16
![Page 17: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/17.jpg)
A larger computer actually is …
17
![Page 18: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/18.jpg)
A larger computer actually is …
18
![Page 19: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/19.jpg)
A larger computer actually is …
19
![Page 20: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/20.jpg)
A larger computer actually is …
20
![Page 21: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/21.jpg)
High-performance computing (HPC) …
… is an area of computer-based computation. It includes all computing work that requires a high computing capacity or storage capacity.
… is the use of parallel processing for running advanced application programs efficiently, reliably and fast.
… refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
… is the use of super computers and parallel processing techniques for solving complex computational problems.
21
![Page 22: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/22.jpg)
High-performance computing (HPC) …
… is an area of computer-based computation. It includes all computing work that requires a high computing capacity or storage capacity.
… is the use of parallel processing for running advanced application programs efficiently, reliably and fast.
… refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
… is the use of super computers and parallel processing techniques for solving complex computational problems.
… is the part of computing focused on making computers collaborate efficiently up to very large scales
… is optimized and scalable computer coordination (hardware and software)
22
![Page 23: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/23.jpg)
Outline
Introduction to High Performance Computing
Definitions
Parallel programming
SURFsara facilities
Presentation
Systems and specifications
Running jobs
Hands-on exercises
Exercise available in your home directories (Stopos)
23
![Page 24: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/24.jpg)
SURFsara is part of SURF
24
![Page 25: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/25.jpg)
Location of SURFsara
25
![Page 26: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/26.jpg)
Activities at SURFsara
Regular user support: from a few minutes to a couple of days
Application enabling for Dutch Compute Challenge Projects
Potential effort by SURFsara staff: 1 to 6 person months per project
Performance improvement of applications
Typically meant for promising user applications
Potential effort by SURFsara staff: 3 to 6 person months per project
Support for PRACE applications: access to European systems
Visualization projects
Training and workshops (regular and on demand)
Please contact SURFsara at [email protected]
26
![Page 27: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/27.jpg)
Dutch national supercomputers: performance increase
27
Year Machine Rpeak (GFlop/s) kW GFlop/s/ kW
1984 CDC Cyber 205 1-pipe 0.1 250 0.0004
1988 CDC Cyber 205 2-pipe 0.2 250 0.0008
1991 Cray Y-MP/4128 1.33 200 0.0067
1994 Cray C98/4256 4 300 0.0133
1997 Cray C916/121024 12 500 0.024
2000 SGI Origin 3800 1,024 300 3.4
2004 SGI Origin 3800 +SGI Altix 3700
3,200 500 6.4
2007 IBM p575 Power5+ 14,592 375 40
2008 IBM p575 Power6 62,566 540 116
2009 IBM p575 Power6 64,973 560 116
2013 Bull bullx DLC 250,000 260 962
2014 Bull bullx DLC >1,000,000 >520 1923
2017 Bull bullx DLC + KNL > 1,800,000
2016 Raspberry PI 3 (35 euro) 0.44 0.004 110
![Page 28: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/28.jpg)
Schematic overview of a supercomputer
28
![Page 29: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/29.jpg)
Specific example: Cartesius architecture
29
/scratch /home /project
int1
int2 tcn12
tcn11
tcn10 fcn1
fcn2
…
fcn3
… …
/archive
![Page 30: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/30.jpg)
Specific example: Lisa architecture
30
/scratch /home /project
login1
login2 r12n3
r12n2
r12n1 r25n1
r25n3
…
r25n3
… …
/archive
![Page 31: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/31.jpg)
Compute power on Cartesius
1 thin node island, a so-called Bull sequana X1000 cell
177 sequana X1110 thin nodes, each with 2 × 16-core 2.6 GHz Intel Xeon E5-2697A v4 and 64 GB memory
3 thin node islands
360 bullx B720 thin nodes, each with 2 × 12-core 2.6 GHz Intel XeonE5-2690 v3 and 64 GB memory
2 thin node islands
360 + 180 bullx B710 thin nodes, each with 2 × 12-core 2.4 GHz Intel Xeon E5-2695 v2 and 64 GB memory
31
![Page 32: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/32.jpg)
Compute power on Cartesius
1 fat node island
32 bullx R428 E3 fat nodes with 4 × 8-core 2.7 GHz Intel Xeon E5-4650 and 256 GB memory
18 sequana X1210 Xeon Phi nodes
64-core 1.3 GHz Intel Xeon Phi 7230 (Knights Landing) with 96 GB memory
1 accelerator island with 66 bullx B515 GPGPU accelerated nodes
2 × 8-core 2.5 GHz Intel Xeon E5-2450 v2 with 96 GB memory
2 × NVIDIA Tesla K40m GPGPUs/node
32
![Page 33: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/33.jpg)
Compute power on Cartesius
2 bullx R423-E3 interactive front end nodes
2 × 8-core 2.9 GHz Intel Xeon E5-2690 with 128 GB memory
5 bullx R423-E3 service nodes
2 × 8-core 2.9 GHz Intel Xeon E5-2690 with 32 GB memory
Global summary
47,776 cores + 132 GPUs: 1.843 Pflop/s (peak performance)
130 TB memory
33
![Page 34: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/34.jpg)
Compute power on Cartesius
Low-latency network: 4x FDR14 InfiniBand
Non-blocking within fat node island and thin node islands
3.3 : 1 pruning factor among islands
56 Gbit/s inter-node bandwidth
2.4 µs inter-island latency
File systems and I/O
180 TB NFS file system (home)
7.7 PB Lustre file system (scratch and project)
bullx GNU/Linux OS, compatible with Red Hat Enterprise Linux
Specific policy for software installation and maintenance
34
![Page 35: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/35.jpg)
Compute power on Lisa
35
NumberProcessor Type
Clock Scratch Memory Sockets Cache Cores GPUsInterconnect
23Bronze 3104
1.70 GHz1.5 TB NVME
256 GB UPI 10.4 GT/s
2 8.25 MB 12
4 x GeForce 1080Ti, 11 GB GDDR5X
40 Gbit/s Ethernet
2Bronze 3104
1.70 GHz1.5 TB NVME
256 GB UPI 10.4 GT/s
2 8.25 MB 124 x Titan V,12GB HBM2
40 Gbit/s Ethernet
8 Gold 5118 2.30 GHz 1.5 TB NVME
192 GB UPI 10.4 GT/s
2 16.5 MB 244 x Titan RTX, 24 GB GDDR6
40 Gbit/s Ethernet
192 Gold 6130 2.10 GHz 1.7 TB 96 GB UPI 10.4 GT/s
1 22 MB 16 -10 Gbit/s Ethernet
96 Silver 4110 2.10 GHz 1.8 TB64 GB UPI 9.6 GT/s
2 11 MB 16 -10 Gbit/s Ethernet
1E7-8857 v2
3.00 GHz 13 TB1 TB QPI 8.00 GT/s
4 30 MB 48 -10 Gbit/s Ethernet
1 Gold 6126 2.60 GHz 11 TB2 TB UPI 10.4 GT/s
4 19.25 MB 48 -40 Gbit/s Ethernet
![Page 36: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/36.jpg)
Compute power on Lisa
CPU nodes
Total number of CPU cores: 4704Total amount of memory: 30 TBTotal peak performance: 263 TFlop/secDisk space: 400 TB for the home file systemsOperating System: Debian Linux
GPU nodesTotal number of CPU cores: 492Total number of CUDA cores: 376832Total number of Tensor cores: 1280Total amount of memory: 6.3 TBTotal peak performance (SP): 1,576.8 TFlop/secTotal peak performance (DP): 52.9 TFlop/sec
36
![Page 37: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/37.jpg)
File systems on Cartesius and Lisa
/home/user
User home directory (quota - currently 200GB)
Backed up
Meant for storage of important files (sources, scripts, input and output data)
Based on NFS: not the fastest file system
/scratch (/scratch-local & /scratch-shared on Cartesius)
Variable quota depending on disk (currently 8 TB on Cartesius)
Not backed up
Meant for temporary storage (during running of a job and shortly thereafter)
Based on Lustre: the fastest file systems on Cartesius & Lisa
37
![Page 38: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/38.jpg)
File systems on Cartesius and Lisa
/archive
Connected to the tape robot (quota – virtually unlimited)
Backed up
Meant for long term storage of files, zipped, tarred, combined into small number of files
Slow – especially to retrieve “old” data – and not available in compute nodes
/project
Large and fast on Cartesius. On Lisa, large but not so fast
For special projects requiring lots of space (quota – as much as needed/possible)
Not backed up
Comparable in speed with /scratch on Caratesius. On Lisa, comparable to /home
38
![Page 40: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/40.jpg)
User portal
40
![Page 41: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/41.jpg)
User portal
41
![Page 42: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/42.jpg)
User portal
42
![Page 43: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/43.jpg)
Connecting to Cartesius
43
user@local:~$ lslocal-file.txtuser@local:~$ scp local_file.txt [email protected]:user@local:~$ scp [email protected]:cartesius_file.txt .user@local:~$ lscartesius-file.txt local-file.txtuser@local:~$ ssh [email protected]@cartesius.surfsara.nl's password:sdemo000@int2:~$ lscartesius-file.txt local-file.txt
user@local:~$ ssh [email protected]
[email protected]'s password:
sdemo000@int2:~$ ls
cartesius-file.txt
When you log in with ssh, you access the login nodes
With scp you can transfer files to/from your local machine
![Page 44: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/44.jpg)
Running jobs: how-to guide
Schedulers distribute work to batch nodes
Workflow:
1. You upload your data from your computer to the cluster system
2. You create a job script with the work steps
3. You submit the job script to the scheduler
4. The scheduler looks for available computers to run your work
5. When a batch node with the requirements you specified becomes available, your work runs
6. When the job finishes, you can get an e-mail to inform you
7. When the job is finished, you download the results to your computer
44
![Page 45: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/45.jpg)
Running jobs: useful commands
45
sbatch <jobscript> - submit a job to the scheduler
squeue <job_id> - inspect the status of job <job_id>
squeue –u <user_id> - inspect all jobs of user <user_id>
scancel <job_id> - cancel job <job_id> before it runs
scontrol show job <job_id> - show estimated job start
![Page 46: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/46.jpg)
Running jobs: first example
Create a text file with exactly the first lines; name the file “job.sh”
Submit this job with “sbatch job.sh” and look the status with “squeue –u login_id”
Use “scontrol show job job_id” to find out when your job will run
Look at your home-directory to see what happens there; look at the files.
Which files were created? Look at those files.
46
#!/bin/bash
#SBATCH --job-name=“firsttest"
#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --time=00:01:00
#SBATCH --partition=normal
echo "Who am I?"
whoami
echo
echo "Where ?"
srun hostname
echo
sleep 120
date
echo "DONE"
![Page 47: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/47.jpg)
Running jobs: best practices
47
Give the scheduler a realistic walltime estimate
Your home directory is slow. Use $TMPDIR.
Load software modules as part of your job script – this improves reproducibility
Run parallel versions of your programs
![Page 48: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/48.jpg)
Anatomy of a job script
48
Job scripts consist of:
the “shebang” line: #!/bin/bash
scheduler directives
command(s) that load software modules and set the environment
command(s) to prepare the input
command(s) that run your main task(s)
command(s) to save your output
![Page 49: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/49.jpg)
Module management: useful commands
49
module avail - available modules in the system
module load <mod> - load <mod> in the shell environment
module list - show a list of all loaded modules
module unload <mod> - remove <mod> from the environment
module purge - unload all modules
module whatis <mod> - show information about <mod>
![Page 50: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/50.jpg)
Example: a real job script
50
#!/bin/bash
#SBATCH –t 0:20:00
#SBATCH –N 1 –c 24
module load 2019
module load Python/2.7.14-foss-2017b
cp –r $HOME/run3 $TMPDIR
cd $TMPDIR/run3
srun python myscript.py input.dat
mkdir –p $HOME/run3/results
cp result.dat run3.log $HOME/run3/results
![Page 51: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/51.jpg)
Running jobs: second example
Check the file “python.sh” in your home directory:
linux-cluster-computing/cluster/batch
Submit this job with “sbatch python.sh” and look the status with “squeue –u login_id”
If you needed to use some input file or you would generate an output file… where would you put the copy commands for scratch?
Now try the same with “pi.sh”… but first compile the code! ( ./compilepi )
Can you play around with the variable‘ncores’ and see some parallel efficiency?
51
#!/bin/bash
#SBATCH --job-name="python"
#SBATCH --nodes=1
#SBATCH --cpus-per-node=10
#SBATCH --time=00:10:00
#SBATCH --partition=normal
module purge
module load 2019
module load GCC
echo "OpenMP parallelism"
for ncores in {1..10}
do
export OMP_NUM_THREADS=$ncores
echo "CPUS: " $OMP_NUM_THREADS
echo "CPUS: " $OMP_NUM_THREADS >&2
./pi
echo "DONE "
done
![Page 52: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/52.jpg)
Everything about jobs: user info pages
52
Go to:
https://userinfo.surfsara.nl
Click on the corresponding system:
Cartesius: Usage Batch Usage (jobs)
Lisa: User guide Creating and running jobs
![Page 53: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/53.jpg)
Outline
Introduction to High Performance Computing
Definitions
Parallel programming
SURFsara facilities
Presentation
Systems and specifications
Running jobs
Hands-on exercises
Exercise available in your home directories (Stopos)
53
![Page 54: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/54.jpg)
Guided hands-on with Stopos
Stopos – a tool to perform (large) parameter sweeps
Check https://userinfo.surfsara.nl/systems/lisa/software/stopos
Main commands:
module load stopos – load the Stopos module
stopos create -p pool1 – create a pool called ”pool1”
stopos -p pool1 parmset.txt – add the set in parmset.txt to ”pool1”
stopos pools – view the current pools
stopos purge -p pool1 – purge (trash) ”pool1”
stopos next -p pool1 – obtain next value of ”pool1” in $STOPOS_VALUE
stopos remove -p pool1 – remove used value ($STOPOS_VALUE) of ”pool1”
stopos status -p pool1 – show the status of ”pool1”
You can follow the instructions in your home directory (~/linux-clustercomputing/cluster/cluster_demo.md)
54
![Page 55: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/55.jpg)
Guided hands-on with Stopos
55
Image sources: Wikipedia and Page 81 from PhD thesis of Jeroen Engelberts
![Page 56: INTRODUCTION TO CLUSTER COMPUTING - UvA...2020/01/15 · INTRODUCTION TO CLUSTER COMPUTING Carlos Teijeiro Barjas (HPC Advisor) UvA –Amsterdam –15/01/2020 Outline Introduction](https://reader036.fdocuments.in/reader036/viewer/2022070805/5f0383a67e708231d4096f9f/html5/thumbnails/56.jpg)
THANK YOU FOR YOUR ATTENTION
56
Carlos Teijeiro Barjas
www.surf.nl
@SURF_onderzoek
Driving innovation together