Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg...

24
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

Transcript of Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg...

Page 1: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

Running Jobs on Jacquard

An overview of interactive and batch computing, with comparsions to Seaborg

David TurnerNUG Meeting3 Oct 2005

Page 2: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

2

Topics

• Interactive– Serial– Parallel– Limits

• Batch– Serial– Parallel– Queues and Policies

• Charging• Comparison with Seaborg

Page 3: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

3

Execution Environment

• Four login nodes– Serial jobs only– CPU limit: 60 minutes– Memory limit: 64 MB

• 320 compute nodes– “Interactive” parallel jobs– Batch serial and parallel jobs– Scheduled by PBSPro

• Queue limits and policies established to meet system objectives

– User input is critical!

Page 4: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

4

Interactive Jobs

• Serial jobs run on login nodes– cd, ls, pathf90, etc.– ./a.out

• Parallel jobs run on compute nodes– Controlled by PBSPro

mpirun -np 16 ./a.out

qsub -I -q interactive -l nodes=8:ppn=2 % cd $PBS_O_WORKDIR

% mpirun -np 16 ./a.out

qsub -I -q batch -l nodes=32:ppn=2,walltime=18:00:00

Page 5: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

5

PBSPro

• Marketed by Altair Engineering– Based on open source Portable Batch

System developed for NASA– Also installed on DaVinci

• Batch scripts contain directives:#PBS -o myjob.out

• Directives may also appear as command-line options:qsub -o myjob.out …

Page 6: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

6

Simple Batch Script

#PBS -l nodes=8:ppn=2,walltime=00:30:00#PBS -N myjob #PBS -o myjob.out #PBS -e myjob.err #PBS -A mp999 #PBS -q debug #PBS -V

cd $PBS_O_WORKDIR mpirun -np 16 ./a.out

Page 7: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

7

Useful PBS Options (1)

-A repoCharge this job to repository repoDefault: Your default repository

-N jobnameProvide name for job; up to 15 printable, non-

whitespace charactersDefault: Name of batch script

-q qnameSubmit job to batch queue qnameDefault: batch

Page 8: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

8

Useful PBS Options (2)

-S shellSpecify shell as the scripting language

Default: Your login shell

-VExport current environment variables into the

batch job environment

Default: Do not export

Page 9: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

9

Useful PBS Options (3)

-o outfileWrite STDOUT to outfileDefault: <jobname>.o<jobid>

-e errfileWrite STDERR to errfileDefault: <jobname>.e<jobid>

-j [eo|oe]Join STDOUT and STDERR on STDOUT (eo)

or STDERR (oe)Default: Do not join

Page 10: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

10

Useful PBS Options (4)

-m [a|b|e|n]E-main notification

a = send mail when job aborted by system

b = send mail when job begins

e = send mail when job ends

n = do not send mail

Options a, b, and e may be combined

Default: a

Page 11: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

11

Batch Queues

Submit Execute Nodes Walltime

interactive interactive 1 – 16 30 mins

debug debug 1 – 32 30 mins

batch

batch16 1 – 16 48 hours

batch32 17 – 32 24 hours

batch64 33 – 64 12 hours

batch128 65 – 128 6 hours

batch256 129 – 256 6 hours

low low 1 – 64 6 hours

Page 12: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

12

Batch Queue Policies

• Each user may have:– One running interactive job– One running debug job– Four jobs running over entire system

• Only one batch128 job is allowed to run at a time.

• The batch256 queue usually has a run limit of zero. NERSC staff will arrange to run jobs of this size.

Page 13: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

13

Submitting Batch Jobs

% qsub myjob

93935.jacin03

%

• Record jobid for tracking!

Page 14: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

14

Deleting Batch Jobs

% qdel 93935.jacin03

%

Page 15: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

15

Monitoring Batch Jobs (1)

• PBS command qstat % qstatJob id Name User Time Use S Queue---------------- ---------------- ---------------- -------- - -----93295.jacin03-ib job5 einstein 00:00:00 R batch1693894.jacin03 EV80fl02_3 legendre 0 H batch16

93330.jacin03 test.script laplace 00:00:23 R batch32

93897.jacin03 runlu8x8 rasputin 0 Q batch3293334.jacin03-m mtp_mg_3wat_o2a fibonacci 00:00:11 R batch16...

• Use -u option for single-user output% qstat -u einsteinJob id Name User Time Use S Queue---------------- ---------------- ---------------- -------- - -----93295.jacin03-ib job5 einstein 00:00:00 R batch16%

Page 16: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

16

Monitoring Batch Jobs (2)

• NERSC command qs% qs

JOBID ST USER NAME NDS REQ USED SUBMIT

93939 R gauss STDIN 1 00:30:00 00:10:43 Oct 2 16:47:00

93891 R einstein runlu4x8 16 01:00:00 00:38:48 Oct 2 15:23:36

93918 R inewton r4_16 8 01:00:00 00:10:37 Oct 2 15:36:35

... 93785 Q inewton r4_64 32 01:00:00 - Oct 2 08:42:36

93828 Q rasputin nodemove 64 00:05:00 - Oct 2 12:00:11

93897 Q einstein runlu8x8 32 01:00:00 - Oct 2 15:24:27

... 93893 H legendre EV80fl02_2 4 03:00:00 - Oct 2 15:24:23

93894 H legendre EV80fl02_3 4 03:00:00 - Oct 2 15:24:24

93917 H legendre EV80fl98_5 4 03:00:00 - Oct 2 15:26:06

...

• Also provides -u option

Page 17: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

17

Monitoring Batch Jobs (3)

• NERSC website has current queue look:http://www.nersc.gov/nusers/status/jacquard/qstat

• Also has completed jobs list:http://www.nersc.gov/nusers/status/jacquard/pbs_summary

• Numerous filtering options available– Owner– Account– Queue– Jobid

Page 18: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

18

Charging

• Machine charge factor (cf) = 4– Based on benchmarks and user applications– Currently under review

• Serial interactive– Charge = cf • cputime– Always charged to default repository

• All parallel– Charge = cf • 2 • nodes • walltime– Charged to default repo unless -A specified

Page 19: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

19

Things To Look Out For (1)

• Do not set group write permission for your home directory; it will prevent PBS from running your jobs.

• Library modules must be loaded at runtime as well as linktime.

• Propagation of environment variables to remote processes is incomplete; contact NERSC consulting for help.

Page 20: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

20

Things To Look Out For (2)

• Do not run more that one MPI program in a single batch script.

• If your login shell is bash, you may see:accept: Resource temporarily unavailable

done.

In this case, specify a different shell using the -S directive, such as:#PBS -S /usr/bin/ksh

Page 21: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

21

Things To Look Out For (3)

• Batch jobs always start in $HOME. To get to directory where job was submitted:cd $PBS_O_WORKDIR

For jobs that work with large files:cd $SCRATCH/some_subdirectory

• PBS buffers output and error files until job completes. To view files (in home directory) while running:-k oe

Page 22: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

22

Things To Look Out For (3)

• The following is just a warning and can be ignored:Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.

Page 23: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

23

LoadLeveler vs. PBS

LL PBS LL PBS#@ node #PBS -l nodes #@

notification#PBS -m

#@ tasks_per_node

#PBS -l ppn #@ shell #PBS -S

#@ wall_clock_limit

#PBS -l walltime #@ output #PBS -o

#@ class #PBS -q #@ error #PBS -e

#@ job_name #PBS -N #@ environment

#PBS -V

#@ account_no #PBS -A

Page 24: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

24

Resources

• NERSC Websitehttp://www.nersc.gov/nusers/resources/jacquard/running_jobs.php

http://www.nersc.gov/vendor_docs/altair/PBSPro_7.0_User_Guide.pdf

• NERSC Consulting

    1-800-66-NERSC, menu option 3, 8 am - 5 pm, Pacific time     (510) 486-8600, menu option 3, 8 am - 5 pm, Pacific time     [email protected]     http://help.nersc.gov/