Training Day : High Performance Computing...

Training Day : High Performance Computing

Cluster

08/03/16 2

Pre-requisite : Linux

Today

● Connect to « genotoul » server

● Basic command line utilization

● File system Hierarchy Standard

● Useful tools (find, sort, cut, grep...)

● Transferring & compressing files

● How to use High Performance Computing Cluster (compute nodes)

08/03/16 3

➔ To optimise computational power➔ How to submit jobs on compute nodes➔ How to manage his jobs (stat, kill...)

➔ Autonomy, self mastery

Objectives

08/03/16 4

Planning of the day

Part I : 09h00 - 12h00 Compute nodes environmentOpen Grid EnginePractical 1

Part II : 14h00 – 17h00 Submit array of jobs

Practical 2 Parallel environments Practical 3

08/03/16 5

Connection to « genotoul » cluster

Internet

ssh

«genotoul» login nodes

Storage Facilities

node001 à 068 : 2720 INTEL cores, 17TB of memory

computer nodes

smp : 240 INTEL cores3TB of memory

Ceri001 à 034 : 1632 AMD cores12TB of memory

Bigmem01 :64 INTEL cores1TB of memory

08/03/16 6

● Pre-requisite : ask for a linux accounthttp://bioinfo.genotoul.fr/index.php?id=81

● SSH connection to the login nodes (use putty if Windows desktop) : genotoul.toulouse.inra.fr

● Linux command line (terminal session)

Connection to genotoul

08/03/16 7

Vocabulary : Cluster / Node

● Cluster : set of nodes

● Node : Huge computer (with several CPUs)

CPU CPU

CPU CPU

08/03/16 8

Vocabulary : CPU / Core

● CPU : Central Processing Unit● Core

1 CPU Dual Core

08/03/16 9

● Each server = 32 INTEL cores, 128 GB of memory ● Linux 64 bits based on CentOS-6 distribution● Hundreds of users simultaneous● Secured (SSH only), daily saved (backup)● FUNCTIONS :

➔ To serve development environments➔ To test his scripts before data analysis➔ To launch batches on the cluster nodes➔ To follow the execution of jobs➔ To get data results on the /save directory

Login nodes : alias « genotoul »

08/03/16 10

● Environment dedicated to bioinformatics➔ Software into : /usr/local/bioinfo/src (ex: blastall, clustalw, iprscan, megablast, wu-blast, ...)➔ Genomics databanks into : /bank

● Development languages➔ Shell, perl, C++, java, python...

● Editing tools ➔ nedit, geany, nano, emacs, vi, ...

Login nodes : alias « genotoul »

08/03/16 11

● Interactive mode : for beginners / for remote display

● Batch access : for intensive usage (most of jobs)

● Communication between server and computational nodes is managed by the grid scheduler. No direct ssh-access to the nodes.

Access to cluster nodes

08/03/16 12

Data storage

Drive bay

08/03/16 13

/usr/local/bioinfo/ Bioinformatics Software

/bank/ International genomics Databanks

/home/User configuration files (ONLY)(100 MB user quota)

Disk spaces

/save/

HPC TEMPORARY disk space(1 TB user quota)/work/

User disk space (with BACKUP)(250 GB user quota)

08/03/16 14

High Performance Computing● Workspace is exactly the same as genotoul servers

(software, data-banks, disk spaces).● Exception with permissions rights onto disk spaces (read

only on /save directory).● Tips :

➔ Submission and control from genotoul➔ Portable binaries (no need to recompile)➔ Facilities to get results

HPC environment

08/03/16 15

Cluster nodes

High Performance Computing cluster

smp

Node001 à 068 (INTEL)

Ceri001 à 034 (AMD)

Bigmem01

08/03/16 16

● INTEL cluster: 68 nodes purchased in 2014=> each 20 cores (40 threads), 256GB memory

● AMD cluster: 34 nodes purchased in 2012=> each 48 cores (48 threads), 384GB memory

● BIGMEM : 1 node purchased in 2012=> 32 cores (64 threads), 1TB memory

● SMP : 1 node purchased in 2014=> 120 cores (240 threads), 3TB memory

● High-performance clustered file system (GPFS) /work

Cluster nodes

08/03/16 17

Planning of the day

Part I : 09h00 - 12h00 Compute nodes environnmentOpen Grid EnginePractical 1



08/03/16 18

Grid Engine is responsible for accepting, scheduling, dispatching, and managing the remote and distributed execution of large numbers of standalone, parallel or interactive user jobs.

It also manages and schedules the allocation of distributed resources such as processors, memory.

08/03/16 19

OGE (Open Grid Engine)

Queues availablesQueues availables for users

Queue Access Priority Max time Max slots

Workq (default) everyone 300 96H 4120

unlimitq everyone 100 unlimited 680

smpq on demand 0 unlimited 240

hypermemq On demand 0 unlimited 96

Interq (qlogin) everyone 100 48H 40

08/03/16 20


Ressource quota limitations

Max slots Workq (group) Workq (user) Unlimitq (group)

Unlimitq (user)

contributeurs 4120 1024 680 256

INRA / REGION 3264 512 128 48

autres 1088 256 32 8

It depends on genotoul linux group :(contributeurs, INRA et/ou REGION, autres)

08/03/16 21

Defaults parameters● Workq● 1 core● 8 GB memory maximum● Write only /work directory (temporary disk space)● 1 TB quota disk per user (on /work directory)● 120 days files without access automatic purged● 100 000H annually computing time (more on demand)


08/03/16 22

[laborie@genotoul2 ~]$ qloginYour job 2470388 ("QLOGIN") has been submittedwaiting for interactive job to be scheduled ...Your interactive job 2470388 has been successfully scheduled.Establishing /SGE/ogs/inra/tools/qlogin_wrapper.sh session to host node001 ...[laborie@node001 ~]$

[laborie@node001 ~]$ exitlogout/SGE/ogs/inra/tools/qlogin_wrapper.sh exited with exit code 0[laborie@genotoul2 ~]$

qrsh (interactive mode)qlogin (interactive with graphical redirection)


Connected

Disconnected

08/03/16 23

qsub : batch Submission1 - First write a script (ex: myscript.sh) with the command line as following:#$ -o /work/.../output.t#$ -e /work/.../error.txt#$ -q workq#$ -m bea# My command lines I want to run on the clusterblastall -d swissprot -p blastx -i /save/.../z72882.fa

2 - Then submit the job with the qsub command line as following:

job ID ->


$qsub myscript.shYour job 15660 ("mon_script.sh") has been submitted


08/03/16 24

● -N job_name : to give a name to the job● -q queue_name : to specify the batch queue● -o output_file_name : to redirect output standard● -e error_file_name : to redirect error file● -m bae : mail sending options (b : begin, a : abort, e : end)● -l mem=8G: to ask for 8GB of memory (minimum reservation)● -l h_vmem=10G : to fix the maximum consumption of memory● -l myarch=intel / adm

Job Submission : basic options


08/03/16 25

● Default (workq, 1 core, 8 GB memory max)

OGE (Open Grid Engine)Job Submission : some examples



● More memory (workq, 1 core, 32 / 36 GB memory)

$qsub -l mem=32G -l h_vmem=36G myscript.shYour job 15661("mon_script.sh") has been submitted

$qsub -l mem=32G -l h_vmem=36G myscript.shYour job 15661("mon_script.sh") has been submitted

● More cores (workq, 8 core, 8*8 GB memory)

$qsub -l parallel smp 8 myscript.shYour job 15662("mon_script.sh") has been submitted

$qsub -l parallel smp 8 myscript.shYour job 15662("mon_script.sh") has been submitted

08/03/16 26


$nedit myscript.sh

### head of myscript.sh #### !/bin/bash#$ -m a#$ -l mem=32G#$ -l h_vmem=36G

#Mon programme commence icils### end of myscript.sh ###


$nedit myscript.sh

### head of myscript.sh #### !/bin/bash#$ -m a#$ -l mem=32G#$ -l h_vmem=36G

#Mon programme commence icils### end of myscript.sh ###


Script edition

Submission

Job Submission : some examples

08/03/16 27

Monitoring jobs : qstat

Job-ID : Job identifierprior : priority of jobname : job nameuser : user namestate : actualy state of job (see follow)submit/start at : submit/start dateQueue : batch queue nameslots : number of slots aked for the jobja-task-ID : job array task identifier (see follow)


$qstat

job-ID prior name user state submit/start queue slots ja-task-ID

$qstat

job-ID prior name user state submit/start queue slots ja-task-ID

08/03/16 28

● state : actually state of job

➢ d(eletion) : job is deleting➢ E(rror) : job is in error state➢ h(old), w(waiting) : job is pending➢ t(ransferring) : job is about to be executed➢ r(unning) : job is running

Monitoring jobs : qstat


● man qstat : to see all options of qstat command

08/03/16 29

qstat -f : full format displayOGE (Open Grid Engine)

$qstat -f

queuename qtype resv/used/tot. load_avg arch states---------------------------------------------------------------------------------hypermemq@bigmem01 BIP 0/25/64 25.21 linux-x64 2654562 502.47578 scriptIMR. pbert r 02/01/2015 10:43:21 24 3417296 510.00000 spades.sh klopp r 02/23/2015 09:50:08 1 ---------------------------------------------------------------------------------

hypermemq@bigmem02 BIP 0/3/32 2.00 linux-x64 2717127 500.10764 bayesian_m lbrousseau r 02/03/2015 20:28:58 2 2822735 505.00000 LasMap faraut r 02/11/2015 14:29:35 1 ---------------------------------------------------------------------------------

interq@node001 IP 0/13/40 2.12 linux-x64 3455759 501.10143 QLOGIN mmolettadena r 02/23/2015 15:21:13 1 3456700 501.10143 QLOGIN mmolettadena r 02/23/2015 15:33:25 1 3456911 506.13893 QLOGIN smehdi r 02/23/2015 15:36:48 1

$qstat -f

queuename qtype resv/used/tot. load_avg arch states---------------------------------------------------------------------------------hypermemq@bigmem01 BIP 0/25/64 25.21 linux-x64 2654562 502.47578 scriptIMR. pbert r 02/01/2015 10:43:21 24 3417296 510.00000 spades.sh klopp r 02/23/2015 09:50:08 1 ---------------------------------------------------------------------------------

hypermemq@bigmem02 BIP 0/3/32 2.00 linux-x64 2717127 500.10764 bayesian_m lbrousseau r 02/03/2015 20:28:58 2 2822735 505.00000 LasMap faraut r 02/11/2015 14:29:35 1 ---------------------------------------------------------------------------------

interq@node001 IP 0/13/40 2.12 linux-x64 3455759 501.10143 QLOGIN mmolettadena r 02/23/2015 15:21:13 1 3456700 501.10143 QLOGIN mmolettadena r 02/23/2015 15:33:25 1 3456911 506.13893 QLOGIN smehdi r 02/23/2015 15:36:48 1

08/03/16 30

Deleting a job : qdelOGE (Open Grid Engine)

$qstat -u laborie

job-ID prior name user state submit/start at queue slots ja-task-ID

------------------------------------------------------------------------------------------------------3629151 512.54885 sleep laborie r 02/25/2015 16:23:03

workq@node002 1

$ qdel 3629151laborie has registered the job 3629151 for deletion

$qstat -u laborie

job-ID prior name user state submit/start at queue slots ja-task-ID

------------------------------------------------------------------------------------------------------3629151 512.54885 sleep laborie r 02/25/2015 16:23:03

workq@node002 1

$ qdel 3629151laborie has registered the job 3629151 for deletion

08/03/16 31

Connection to « genotoul » cluster

Internet

ssh

«genotoul» login nodesAccess to platformDeveloppment (scripts)Jobs submission (cluster)Transfert files to /save

Storage Facilities/save : Read only /work : Read + Write

node001 à 068 : 2720 INTEL cores, 17TB of memory

computer nodesWork, hypermemq, smpq

smp : 240 INTEL cores3TB of memory

Ceri001 à 034 : 1632 AMD cores12TB of memory

Bigmem :64 INTEL cores1TB of memory

qrshqloginqsubqstatqdel

08/03/16 32

Monitoring genotoul cluster

08/03/16 33

Practical

Part 1

08/03/16 34

Planning of the day




08/03/16 35

➔ Concept : segment a job into smaller atomic jobs➔ Improve the processing time very significantly

(the calculation is performed on multiple processing cores)

Array of jobs concept

08/03/16 36

Execution on single core

Ex.1: blast in basic mode(genbank nucléique Sequence reference)

NTseqs.fa

(multi fasta file)

qsub script.sh

script.shblastn+ db nt query seqs.fa

08/03/16 37Execution on 3 cores

Ex.2 : blast in split mode

seqs.fa

qsub script1.shqsub script2.shqsub script3.sh

script1.sh blastn+ db nt query seq1.fa

seq3.fa

seq2.fa

seq1.fa

script2.sh

script3.sh

blastn+ db nt query seq2.fa

blastn+ db nt query seq3.fa

08/03/16 38Execution on 3 cores

Ex.3 : blast in job array mode

seqs.fa

qarray script.sh

blastx+ d nt i seq1.fa blastx+ d nt i seq2.fa blastx+ d nt i seq3.fa

seq3.fa

seq2.fa

seq1.fa

script.sh

for i in ...

split ...

08/03/16 39

Ex.3 : blast in job array mode

qarray script.sh

script.sh

qsub script1.shqsub script2.shqsub script3.sh

script1.sh

script2.sh

script3.sh

3 blast line

08/03/16 40

Tools

fastasplit <path> <dirpath>Sequence Input Options:f fasta [mandatory] <*** not set ***>o output [mandatory] <*** not set ***>c chunk [2]

Split a fasta file

#mkdir out_split#fastasplit f seqs.fa o out_split c 6

Example :

08/03/16 41

Create multi commands file 1 rm script.sh 2 for f in `ls out_split/*` 3 >do 4 >echo blastn+ query $f db ensembl_danio_rerio o $f.blast >> script.sh 5 >done

(1) If you execute the 'for loop' a second time, you MUST DELETE script.sh as '>>' add lines in the file if it exists .

Tools

➢ `: is the character on the key '7' (2)➢ for: the $f will loop on the result of the command between ` … ` , (2) i.e.: output of the split➢ do: syntaxically required (3)➢ echo: mean print to the screen (4)➢ >>: redirect screen printing to the file script.sh (4)➢ done: syntaxically required (5)

08/03/16 42

Practical

Part 2

08/03/16 43

Planning of the day




08/03/16 44

1 job = 1 thread (one core)

Previous use of cluster


qarray script.sh script.shblastx+ d nt i seq1.fa blastx+ d nt i seq2.fa

blast1 blast2

Each blast use 1 core

08/03/16 45

If the program was developed for : 1 job could use multi-threads

Parallel environments


qsub pe parallel_smp 2 script.sh

script.sh blastx+ num_threads 2 d nt i seqs.fa

Each blast use 2 cores

blast

08/03/16 46

Visualisation :

qconf -spl

qconf -sp <parallel_env>

Utilisation: qsub -pe <parallel_env> <n slots> myscript.sh● smp : X cores on the same node (multi-thread, OpenMP)● parallel_fill : fill up the node then use others nodes (MPI)● parallel_rr : X cores on strictly different nodes (MPI)



08/03/16 47

Parallel environments : smp


blast

Shared memory in the same nodeNeed optimized program (e.i. for blast do not use multithread > 8)

08/03/16 48

Parallel environments : rr / fill


Only for MPI programmation (Message Passing Interface)Read the manual of the software before use itNot optimized for blast !

thread2

Thread3

thread1

08/03/16 49

Examples:

qsub -hard -l myarch=intel … myscript.sh (intel nodes utilisation)

qsub -soft -l myarch=intel … myscript.sh (intel nodes if free only)

qsub -pe parallel_fill 32 -soft -l myarch=intel … myscript.sh

qsub -pe parallel_smp N -hard -l myarch=intel … myscript.sh

Why this job stay in queue waiting ?

qsub -q workq -pe parallel_smp 20 -l mem=12G… myscript.sh



08/03/16 50


$qstat -r3193243 516.61063 tneg_V1_UC aghozlane qw 02/19/2015 12:16:10

Full jobname: tneg_V1_UC35_0_GL0032312 Requested PE: parallel_rr 8 Hard Resources: h_stack=256M (0.000000) h_vmem=50G (0.000000) memoire=50G (0.000000) pri_work=true (2400.000000)

$qstat -r3193243 516.61063 tneg_V1_UC aghozlane qw 02/19/2015 12:16:10

Full jobname: tneg_V1_UC35_0_GL0032312 Requested PE: parallel_rr 8 Hard Resources: h_stack=256M (0.000000) h_vmem=50G (0.000000) memoire=50G (0.000000) pri_work=true (2400.000000)

qstat -r : resources requirements

08/03/16 51


$qstat -t 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18

workq@node012 MASTER workq@node012 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18

workq@node014 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18






workq@node020 SLAVE

$qstat -t 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18

workq@node012 MASTER workq@node012 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18







workq@node020 SLAVE

qstat -t : sub-tasks (parallel jobs)

08/03/16 52

Practical

Part 3

Training Day : High Performance Computing...

Documents

Transcript of Training Day : High Performance Computing...