Training Day : High Performance Computing...
Transcript of Training Day : High Performance Computing...
Training Day : High Performance Computing
Cluster
08/03/16 2
Pre-requisite : Linux
Today
● Connect to « genotoul » server
● Basic command line utilization
● File system Hierarchy Standard
● Useful tools (find, sort, cut, grep...)
● Transferring & compressing files
● How to use High Performance Computing Cluster (compute nodes)
08/03/16 3
➔ To optimise computational power➔ How to submit jobs on compute nodes➔ How to manage his jobs (stat, kill...)
➔ Autonomy, self mastery
Objectives
08/03/16 4
Planning of the day
Part I : 09h00 - 12h00 Compute nodes environmentOpen Grid EnginePractical 1
Part II : 14h00 – 17h00 Submit array of jobs
Practical 2 Parallel environments Practical 3
08/03/16 5
Connection to « genotoul » cluster
Internet
ssh
«genotoul» login nodes
Storage Facilities
node001 à 068 : 2720 INTEL cores, 17TB of memory
computer nodes
smp : 240 INTEL cores3TB of memory
Ceri001 à 034 : 1632 AMD cores12TB of memory
Bigmem01 :64 INTEL cores1TB of memory
08/03/16 6
● Pre-requisite : ask for a linux accounthttp://bioinfo.genotoul.fr/index.php?id=81
● SSH connection to the login nodes (use putty if Windows desktop) : genotoul.toulouse.inra.fr
● Linux command line (terminal session)
Connection to genotoul
08/03/16 7
Vocabulary : Cluster / Node
● Cluster : set of nodes
● Node : Huge computer (with several CPUs)
CPU CPU
CPU CPU
08/03/16 8
Vocabulary : CPU / Core
● CPU : Central Processing Unit● Core
1 CPU Dual Core
08/03/16 9
● Each server = 32 INTEL cores, 128 GB of memory ● Linux 64 bits based on CentOS-6 distribution● Hundreds of users simultaneous● Secured (SSH only), daily saved (backup)● FUNCTIONS :
➔ To serve development environments➔ To test his scripts before data analysis➔ To launch batches on the cluster nodes➔ To follow the execution of jobs➔ To get data results on the /save directory
Login nodes : alias « genotoul »
08/03/16 10
● Environment dedicated to bioinformatics➔ Software into : /usr/local/bioinfo/src (ex: blastall, clustalw, iprscan, megablast, wu-blast, ...)➔ Genomics databanks into : /bank
● Development languages➔ Shell, perl, C++, java, python...
● Editing tools ➔ nedit, geany, nano, emacs, vi, ...
Login nodes : alias « genotoul »
08/03/16 11
● Interactive mode : for beginners / for remote display
● Batch access : for intensive usage (most of jobs)
● Communication between server and computational nodes is managed by the grid scheduler. No direct ssh-access to the nodes.
Access to cluster nodes
08/03/16 12
Data storage
Drive bay
08/03/16 13
/usr/local/bioinfo/ Bioinformatics Software
/bank/ International genomics Databanks
/home/User configuration files (ONLY)(100 MB user quota)
Disk spaces
/save/
HPC TEMPORARY disk space(1 TB user quota)/work/
User disk space (with BACKUP)(250 GB user quota)
08/03/16 14
High Performance Computing● Workspace is exactly the same as genotoul servers
(software, data-banks, disk spaces).● Exception with permissions rights onto disk spaces (read
only on /save directory).● Tips :
➔ Submission and control from genotoul➔ Portable binaries (no need to recompile)➔ Facilities to get results
HPC environment
08/03/16 15
Cluster nodes
High Performance Computing cluster
smp
Node001 à 068 (INTEL)
Ceri001 à 034 (AMD)
Bigmem01
08/03/16 16
● INTEL cluster: 68 nodes purchased in 2014=> each 20 cores (40 threads), 256GB memory
● AMD cluster: 34 nodes purchased in 2012=> each 48 cores (48 threads), 384GB memory
● BIGMEM : 1 node purchased in 2012=> 32 cores (64 threads), 1TB memory
● SMP : 1 node purchased in 2014=> 120 cores (240 threads), 3TB memory
● High-performance clustered file system (GPFS) /work
Cluster nodes
08/03/16 17
Planning of the day
Part I : 09h00 - 12h00 Compute nodes environnmentOpen Grid EnginePractical 1
Part II : 14h00 – 17h00 Submit array of jobs
Practical 2 Parallel environments Practical 3
08/03/16 18
Grid Engine is responsible for accepting, scheduling, dispatching, and managing the remote and distributed execution of large numbers of standalone, parallel or interactive user jobs.
It also manages and schedules the allocation of distributed resources such as processors, memory.
08/03/16 19
OGE (Open Grid Engine)
Queues availablesQueues availables for users
Queue Access Priority Max time Max slots
Workq (default) everyone 300 96H 4120
unlimitq everyone 100 unlimited 680
smpq on demand 0 unlimited 240
hypermemq On demand 0 unlimited 96
Interq (qlogin) everyone 100 48H 40
08/03/16 20
OGE (Open Grid Engine)
Ressource quota limitations
Max slots Workq (group) Workq (user) Unlimitq (group)
Unlimitq (user)
contributeurs 4120 1024 680 256
INRA / REGION 3264 512 128 48
autres 1088 256 32 8
It depends on genotoul linux group :(contributeurs, INRA et/ou REGION, autres)
08/03/16 21
Defaults parameters● Workq● 1 core● 8 GB memory maximum● Write only /work directory (temporary disk space)● 1 TB quota disk per user (on /work directory)● 120 days files without access automatic purged● 100 000H annually computing time (more on demand)
OGE (Open Grid Engine)
08/03/16 22
[laborie@genotoul2 ~]$ qloginYour job 2470388 ("QLOGIN") has been submittedwaiting for interactive job to be scheduled ...Your interactive job 2470388 has been successfully scheduled.Establishing /SGE/ogs/inra/tools/qlogin_wrapper.sh session to host node001 ...[laborie@node001 ~]$
[laborie@node001 ~]$ exitlogout/SGE/ogs/inra/tools/qlogin_wrapper.sh exited with exit code 0[laborie@genotoul2 ~]$
qrsh (interactive mode)qlogin (interactive with graphical redirection)
OGE (Open Grid Engine)
Connected
Disconnected
08/03/16 23
qsub : batch Submission1 - First write a script (ex: myscript.sh) with the command line as following:#$ -o /work/.../output.t#$ -e /work/.../error.txt#$ -q workq#$ -m bea# My command lines I want to run on the clusterblastall -d swissprot -p blastx -i /save/.../z72882.fa
2 - Then submit the job with the qsub command line as following:
job ID ->
OGE (Open Grid Engine)
$qsub myscript.shYour job 15660 ("mon_script.sh") has been submitted
$qsub myscript.shYour job 15660 ("mon_script.sh") has been submitted
08/03/16 24
● -N job_name : to give a name to the job● -q queue_name : to specify the batch queue● -o output_file_name : to redirect output standard● -e error_file_name : to redirect error file● -m bae : mail sending options (b : begin, a : abort, e : end)● -l mem=8G: to ask for 8GB of memory (minimum reservation)● -l h_vmem=10G : to fix the maximum consumption of memory● -l myarch=intel / adm
Job Submission : basic options
OGE (Open Grid Engine)
08/03/16 25
● Default (workq, 1 core, 8 GB memory max)
OGE (Open Grid Engine)Job Submission : some examples
$qsub myscript.shYour job 15660 ("mon_script.sh") has been submitted
$qsub myscript.shYour job 15660 ("mon_script.sh") has been submitted
● More memory (workq, 1 core, 32 / 36 GB memory)
$qsub -l mem=32G -l h_vmem=36G myscript.shYour job 15661("mon_script.sh") has been submitted
$qsub -l mem=32G -l h_vmem=36G myscript.shYour job 15661("mon_script.sh") has been submitted
● More cores (workq, 8 core, 8*8 GB memory)
$qsub -l parallel smp 8 myscript.shYour job 15662("mon_script.sh") has been submitted
$qsub -l parallel smp 8 myscript.shYour job 15662("mon_script.sh") has been submitted
08/03/16 26
OGE (Open Grid Engine)
$nedit myscript.sh
### head of myscript.sh #### !/bin/bash#$ -m a#$ -l mem=32G#$ -l h_vmem=36G
#Mon programme commence icils### end of myscript.sh ###
$qsub myscript.shYour job 15660 ("mon_script.sh") has been submitted
$nedit myscript.sh
### head of myscript.sh #### !/bin/bash#$ -m a#$ -l mem=32G#$ -l h_vmem=36G
#Mon programme commence icils### end of myscript.sh ###
$qsub myscript.shYour job 15660 ("mon_script.sh") has been submitted
Script edition
Submission
Job Submission : some examples
08/03/16 27
Monitoring jobs : qstat
Job-ID : Job identifierprior : priority of jobname : job nameuser : user namestate : actualy state of job (see follow)submit/start at : submit/start dateQueue : batch queue nameslots : number of slots aked for the jobja-task-ID : job array task identifier (see follow)
OGE (Open Grid Engine)
$qstat
job-ID prior name user state submit/start queue slots ja-task-ID
$qstat
job-ID prior name user state submit/start queue slots ja-task-ID
08/03/16 28
● state : actually state of job
➢ d(eletion) : job is deleting➢ E(rror) : job is in error state➢ h(old), w(waiting) : job is pending➢ t(ransferring) : job is about to be executed➢ r(unning) : job is running
Monitoring jobs : qstat
OGE (Open Grid Engine)
● man qstat : to see all options of qstat command
08/03/16 29
qstat -f : full format displayOGE (Open Grid Engine)
$qstat -f
queuename qtype resv/used/tot. load_avg arch states---------------------------------------------------------------------------------hypermemq@bigmem01 BIP 0/25/64 25.21 linux-x64 2654562 502.47578 scriptIMR. pbert r 02/01/2015 10:43:21 24 3417296 510.00000 spades.sh klopp r 02/23/2015 09:50:08 1 ---------------------------------------------------------------------------------
hypermemq@bigmem02 BIP 0/3/32 2.00 linux-x64 2717127 500.10764 bayesian_m lbrousseau r 02/03/2015 20:28:58 2 2822735 505.00000 LasMap faraut r 02/11/2015 14:29:35 1 ---------------------------------------------------------------------------------
interq@node001 IP 0/13/40 2.12 linux-x64 3455759 501.10143 QLOGIN mmolettadena r 02/23/2015 15:21:13 1 3456700 501.10143 QLOGIN mmolettadena r 02/23/2015 15:33:25 1 3456911 506.13893 QLOGIN smehdi r 02/23/2015 15:36:48 1
$qstat -f
queuename qtype resv/used/tot. load_avg arch states---------------------------------------------------------------------------------hypermemq@bigmem01 BIP 0/25/64 25.21 linux-x64 2654562 502.47578 scriptIMR. pbert r 02/01/2015 10:43:21 24 3417296 510.00000 spades.sh klopp r 02/23/2015 09:50:08 1 ---------------------------------------------------------------------------------
hypermemq@bigmem02 BIP 0/3/32 2.00 linux-x64 2717127 500.10764 bayesian_m lbrousseau r 02/03/2015 20:28:58 2 2822735 505.00000 LasMap faraut r 02/11/2015 14:29:35 1 ---------------------------------------------------------------------------------
interq@node001 IP 0/13/40 2.12 linux-x64 3455759 501.10143 QLOGIN mmolettadena r 02/23/2015 15:21:13 1 3456700 501.10143 QLOGIN mmolettadena r 02/23/2015 15:33:25 1 3456911 506.13893 QLOGIN smehdi r 02/23/2015 15:36:48 1
08/03/16 30
Deleting a job : qdelOGE (Open Grid Engine)
$qstat -u laborie
job-ID prior name user state submit/start at queue slots ja-task-ID
------------------------------------------------------------------------------------------------------3629151 512.54885 sleep laborie r 02/25/2015 16:23:03
workq@node002 1
$ qdel 3629151laborie has registered the job 3629151 for deletion
$qstat -u laborie
job-ID prior name user state submit/start at queue slots ja-task-ID
------------------------------------------------------------------------------------------------------3629151 512.54885 sleep laborie r 02/25/2015 16:23:03
workq@node002 1
$ qdel 3629151laborie has registered the job 3629151 for deletion
08/03/16 31
Connection to « genotoul » cluster
Internet
ssh
«genotoul» login nodesAccess to platformDeveloppment (scripts)Jobs submission (cluster)Transfert files to /save
Storage Facilities/save : Read only /work : Read + Write
node001 à 068 : 2720 INTEL cores, 17TB of memory
computer nodesWork, hypermemq, smpq
smp : 240 INTEL cores3TB of memory
Ceri001 à 034 : 1632 AMD cores12TB of memory
Bigmem :64 INTEL cores1TB of memory
qrshqloginqsubqstatqdel
08/03/16 32
Monitoring genotoul cluster
08/03/16 33
Practical
Part 1
08/03/16 34
Planning of the day
Part I : 09h00 - 12h00 Compute nodes environnmentOpen Grid EnginePractical 1
Part II : 14h00 – 17h00 Submit array of jobs
Practical 2 Parallel environments Practical 3
08/03/16 35
➔ Concept : segment a job into smaller atomic jobs➔ Improve the processing time very significantly
(the calculation is performed on multiple processing cores)
Array of jobs concept
08/03/16 36
Execution on single core
Ex.1: blast in basic mode(genbank nucléique Sequence reference)
NTseqs.fa
(multi fasta file)
qsub script.sh
script.shblastn+ db nt query seqs.fa
08/03/16 37Execution on 3 cores
Ex.2 : blast in split mode
seqs.fa
qsub script1.shqsub script2.shqsub script3.sh
script1.sh blastn+ db nt query seq1.fa
seq3.fa
seq2.fa
seq1.fa
script2.sh
script3.sh
blastn+ db nt query seq2.fa
blastn+ db nt query seq3.fa
08/03/16 38Execution on 3 cores
Ex.3 : blast in job array mode
seqs.fa
qarray script.sh
blastx+ d nt i seq1.fa blastx+ d nt i seq2.fa blastx+ d nt i seq3.fa
seq3.fa
seq2.fa
seq1.fa
script.sh
for i in ...
split ...
08/03/16 39
Ex.3 : blast in job array mode
qarray script.sh
script.sh
qsub script1.shqsub script2.shqsub script3.sh
script1.sh
script2.sh
script3.sh
3 blast line
08/03/16 40
Tools
fastasplit <path> <dirpath>Sequence Input Options:f fasta [mandatory] <*** not set ***>o output [mandatory] <*** not set ***>c chunk [2]
Split a fasta file
#mkdir out_split#fastasplit f seqs.fa o out_split c 6
Example :
08/03/16 41
Create multi commands file 1 rm script.sh 2 for f in `ls out_split/*` 3 >do 4 >echo blastn+ query $f db ensembl_danio_rerio o $f.blast >> script.sh 5 >done
(1) If you execute the 'for loop' a second time, you MUST DELETE script.sh as '>>' add lines in the file if it exists .
Tools
➢ `: is the character on the key '7' (2)➢ for: the $f will loop on the result of the command between ` … ` , (2) i.e.: output of the split➢ do: syntaxically required (3)➢ echo: mean print to the screen (4)➢ >>: redirect screen printing to the file script.sh (4)➢ done: syntaxically required (5)
08/03/16 42
Practical
Part 2
08/03/16 43
Planning of the day
Part I : 09h00 - 12h00 Compute nodes environnmentOpen Grid EnginePractical 1
Part II : 14h00 – 17h00 Submit array of jobs
Practical 2 Parallel environments Practical 3
08/03/16 44
1 job = 1 thread (one core)
Previous use of cluster
OGE (Open Grid Engine)
qarray script.sh script.shblastx+ d nt i seq1.fa blastx+ d nt i seq2.fa
blast1 blast2
Each blast use 1 core
08/03/16 45
If the program was developed for : 1 job could use multi-threads
Parallel environments
OGE (Open Grid Engine)
qsub pe parallel_smp 2 script.sh
script.sh blastx+ num_threads 2 d nt i seqs.fa
Each blast use 2 cores
blast
08/03/16 46
Visualisation :
qconf -spl
qconf -sp <parallel_env>
Utilisation: qsub -pe <parallel_env> <n slots> myscript.sh● smp : X cores on the same node (multi-thread, OpenMP)● parallel_fill : fill up the node then use others nodes (MPI)● parallel_rr : X cores on strictly different nodes (MPI)
Parallel environments
OGE (Open Grid Engine)
08/03/16 47
Parallel environments : smp
OGE (Open Grid Engine)
blast
Shared memory in the same nodeNeed optimized program (e.i. for blast do not use multithread > 8)
08/03/16 48
Parallel environments : rr / fill
OGE (Open Grid Engine)
Only for MPI programmation (Message Passing Interface)Read the manual of the software before use itNot optimized for blast !
thread2
Thread3
thread1
08/03/16 49
Examples:
qsub -hard -l myarch=intel … myscript.sh (intel nodes utilisation)
qsub -soft -l myarch=intel … myscript.sh (intel nodes if free only)
qsub -pe parallel_fill 32 -soft -l myarch=intel … myscript.sh
qsub -pe parallel_smp N -hard -l myarch=intel … myscript.sh
Why this job stay in queue waiting ?
qsub -q workq -pe parallel_smp 20 -l mem=12G… myscript.sh
Parallel environments
OGE (Open Grid Engine)
08/03/16 50
OGE (Open Grid Engine)
$qstat -r3193243 516.61063 tneg_V1_UC aghozlane qw 02/19/2015 12:16:10
Full jobname: tneg_V1_UC35_0_GL0032312 Requested PE: parallel_rr 8 Hard Resources: h_stack=256M (0.000000) h_vmem=50G (0.000000) memoire=50G (0.000000) pri_work=true (2400.000000)
$qstat -r3193243 516.61063 tneg_V1_UC aghozlane qw 02/19/2015 12:16:10
Full jobname: tneg_V1_UC35_0_GL0032312 Requested PE: parallel_rr 8 Hard Resources: h_stack=256M (0.000000) h_vmem=50G (0.000000) memoire=50G (0.000000) pri_work=true (2400.000000)
qstat -r : resources requirements
08/03/16 51
OGE (Open Grid Engine)
$qstat -t 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node012 MASTER workq@node012 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node014 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node015 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node016 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node017 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node018 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node019 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node020 SLAVE
$qstat -t 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node012 MASTER workq@node012 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node014 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node015 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node016 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node017 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node018 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node019 SLAVE 3191467 516.61063 tneg_MH034 aghozlane r 02/25/2015 09:02:18
workq@node020 SLAVE
qstat -t : sub-tasks (parallel jobs)
08/03/16 52
Practical
Part 3