Until now - UCLouvain · . 5. Control your job ... Parallelism is obtained by launching a...


Transcript of Until now - UCLouvain · . 5. Control your job ... Parallelism is obtained by launching a...

Until now: - access the cluster- copy data to/from the cluster- create parallel software- compile code and use optimized libraries- how to run the software on the full cluster


- submit a job to the scheduler

What is a job?

What is a job scheduler?

Job scheduler/Resource manager :

Piece of software which:

● manages and allocates resources;● manages and schedules jobs;

and sets up the environment for parallel and distributed computing

Two computersare available for 10h

You go, then yougo. You wait.


CPU cores Memory

Disk space






Free and open-source


Very active community

Many success stories

Runs 50% of TOP10 systems, including 1st

Also an intergalactic soft drink

Other job schedulers


Oracle (ex Sun) Grid EngineCondor


You will learn how to:

Create a jobMonitor the jobs

Control your own jobGet job accounting info


1. Make up your mind

● resources you need;● operations you need to perform.

e.g. 1 core, 2GB RAMfor 1 hour

e.g. launch 'myprog'

Job parameters

Job steps

2. Write a submission script

It is a shellscript (Bash)

Regular Bashcomment

Bash sees these as comments

Slurm takes them as


Job stepcreation

Regular Bashcommands

Other useful parameters

You want You ask

To set a job name --job-name=MyJobName

To attach a comment to the job --comment=”Some comment”

To get emails --email-type= BEGIN|END|[email protected]

To set the name of the ouptut file --output=result-%j.txt--error=error-%j.txt

To delay the start of your job --begin=16:00--begin=now+1hour--begin=2010-01-20T12:34:00

To specify an ordering of your jobs --dependency=after(ok|notok|any):jobids--dependency=singleton

To control failure options --nokill--norequeue--requeue

Constraints and resources

You want You ask

To choose a specific feature (e.g. a processor type or a NIC type)


To use a specific resources (e.g. a gpu) --gres

To reserve a whole node for yourself --exclusive

To chose a partition --partition

3. Submit the script

Slurm gives me the JobID

I submit with 'sbatch'

One more job parameter

So you can play

Download http://www.cism.ucl.ac.be/Services/Formations/slurm.tgz

with wget and untar it on hmem

compile the 'stress' programyou can use it to burn cputime and memory:

./stress --cpu 1 --vm-bytes 128M --timeout 30s

Write a job scriptSubmit a jobSee it runningCancel itGet it killed

4. Monitor your job

● squeue● sprio● sstat

● sview

4. Monitor your job

● squeue● sprio● sstat

● sview

4. Monitor your job

● squeue● sprio● sstat

● sview

4. Monitor your job

● squeue● sprio● sstat

● sview

A word about backfill

The rule: a job with a lower priority can start before a job with a higher priority if it does not delay that job's start time.








Low priority job has short max run time and less requirements ; it starts before larger priority job

job's priorityjob

4. Monitor your job

● squeue● sprio● sstat

● sview

4. Monitor your job

● squeue● sprio● sstat

● sview

4. Monitor your job

● squeue● sprio● sstat

● sview


5. Control your job

● scancel● scontrol

● sview

5. Control your job

● scancel● scontrol

● sview

5. Control your job

● scancel● scontrol

● sview

5. Control your job

● scancel● scontrol

● sview

5. Control your job

● scancel● scontrol

● sview


6. Job accounting

● sacct● sreport● sshare

6. Job accounting

● sacct● sreport● sshare

6. Job accounting

● sacct● sreport● sshare

6. Job accounting

● sacct● sreport● sshare

6. Job accounting

● sacct● sreport● sshare

6. Job accounting

● sacct● sreport● sshare

The rules of fairshare

● A share is allocated to you: 1/nbusers ● If your actual usage is above that share, your

fairshare value is decreased towards 0. ● If your actual usage is below that share, your

fairshare value is increased towards 1.● The actual usage taken into account decreases

over time

A word about fairshare

A word about fairshare

● Assume 3 users, 3-cores cluster● Red uses 1 core for a certain period of time● Blue uses 2 cores for half that period● Red uses 2 cores afterwards



A word about fairshare

● Assume 3 users, 3-cores cluster● Red uses 1 core for a certain period of time● Blue uses 2 cores for half that period● Red uses 2 cores afterwards

A word about fairshare

Getting cluster info

● sinfo● sjstat

Getting cluster info

● sinfo● sjstat

Interactive work

● salloc

salloc –-ntasks=4 --nodes=2

Interactive work

● salloc

salloc –-ntasks=4 --nodes=2


● Explore the enviroment● Get node features (sinfo --node --long)● Get node usage (sinfo --summarize)

● Submit a job:● Define the resources you need● Determine what the job should do● Submit the job script (sbatch)● View the job status (squeue)● Get accounting information (sacct)

job script

You will learn how to:

Create a parallel jobRequest distributed resources


Concurrent - Parallel - Distributed

Master/slave vs SPMD

Synchronous vs asynchronous

Message passing vs shared memory

Typical resource request

You want You ask

16 independent processes (no communication) --ntasks=16

MPI and do not care about where cores are distributed


cores spread across distinct nodes --ntasks=16 --nodes=16

cores spread across distinct nodes and nobody else around

--ntasks=16 --nodes=16 --exclusive

16 processes to spread across 8 nodes --ntasks=16 --ntasks-per-node=2

16 processes on the same node --ntasks=16 --ntasks-per-node=16

one process that can use 16 cores for multithreading

--ntasks=1 --cpus-per-task=16

4 processes that can use 4 cores --ntasks=4 --cpus-per-task=4

more constraint requests --distribution=block|cyclic|arbitrary

● Your program draws random numbers and processes them sequentially

● Parallelism is obtained by launching the same program multiple times simultaneously

● Every process does the same thing

● No inter process communication

● Results appended to one common file

Use case 1: Random sampling

Use case 1: Random sampling

You want You ask

16 independent processes (no communication) --ntasks=16

You use srun ./myprog

Use case 1: Random sampling

You want You ask

16 independent processes (no communication) --array=1-16 --output=res%a

You merge with cat res*

Use case 2: Multiple datafiles

● Your program processes data from one datafile

● Parallelism is obtained by launching the same program multiple times on distinct data files

● Everybody does the same thing on distinct data stored in different files

● No inter process communication

● Results appended to one common file

Use case 2: Multiple datafiles

You want You ask

16 independent processes (no communication) --ntasks=16

You use srun ./myprog$SLURM_PROCID

Use case 2: Multiple datafiles

Useful commands: xargs and find/ls:

Single node:

ls “data*” | xargs -n1 -P $SLURM_NPROCS myprog

Multiple nodes:

ls “data*” | xargs -n1 -P $SLURM_NTASKS srun -c1 myprog

Safer: find . -maxdepth1 -name “data*” -print0 | xargs -0 -n1 -P ...

Use case 2: Multiple datafiles

You want You ask

16 independent processes (no communication) --array=1-16


Use case 3: Parameter sweep

● Your program tests something for one particular value of a parameter

● Parallelism is obtained by launching the same program multiple times with an distinct identifier

● Everybody does the same thing except for a given parameter value based on the identifier

● No inter process communication

● Results appended to one common file

Use case 3: Parameter sweep

You want You ask

16 independent processes (no communication) --ntasks=16

You use srun ./myprog$SLURM_PROCID

Use case 3: Parameter sweep

You want You ask

16 independent processes (no communication) --array=1-16 --output=res%a

You use $SLURM_ARRAY_TASK_IDcat res* to merge

Use case 3: Parameter sweep

Useful command: GNU Parallel

Single node:

parallel -j $SLURM_NPROCS myprog ::: {1..5} ::: {A..D}

Multiple nodes:

parallel -j $SLURM_NTASKS srun -c1 myprog ::: {1..5} ::: {A..D}

Useful: parallel --joblog runtask.log –resume for checkpointing parallel echo data_{1}_{2}.dat ::: 1 2 3 ::: 1 2 3

Use case 4: Multithread

● Your program uses OpenMP or TBB

● Parallelism is obtained by launching a multithreaded program

● One program spawns itself on the node

● Inter process communication by shared memory

● Results managed in the program which outputs a summary

You want You ask

one process that can use 16 cores for multithreading

--ntasks=1 --cpus-per-task=16

You use OMP_NUMTHREADS=16 srun myprog

Use case 4: Multithread

● Your program uses MPI

● Parallelism is obtained by launching a multi-process program

● One program spawns itself on several nodes

● Inter process communication by the network

● Results managed in the program which outputs a summary

Use case 5: Message passing

Use case 5: Message passingYou want You ask

16 processes for use with MPI --ntasks=16

You use module load openmpimpirun myprog

● You have two types of programs: master and slave

● Parallelism is obtained by launching a several slaves, managed by the master

● The master launches several slaves on distinct nodes

● Inter process communication by the network or the disk

● Results managed in the master program which outputs a summary

Use case 6: Master/slave

Use case 6: Master slaveYou want You ask

16 processes 16 threads


You use --multi-prog + conf file

Use case 6: Master slaveYou want You ask

16 processes 16 threads


You use --multi-prog + conf file


● Choose number of processes: --ntasks● Choose number of threads: --cpu-per-task

● Launch processes with srun or mpirun● Set multithreading with OMP_NUM_THREADS



● Download MPI hello world on Wikipedia, compile it, write job script and submit it

● Rewrite 'Multiple files' examples using xargs

● Rewrite 'Parameter sweep' example using GNU parallel