Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg...
-
Upload
oscar-long -
Category
Documents
-
view
213 -
download
0
Transcript of Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg...
![Page 1: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/1.jpg)
Running Jobs on Jacquard
An overview of interactive and batch computing, with comparsions to Seaborg
David TurnerNUG Meeting3 Oct 2005
![Page 2: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/2.jpg)
2
Topics
• Interactive– Serial– Parallel– Limits
• Batch– Serial– Parallel– Queues and Policies
• Charging• Comparison with Seaborg
![Page 3: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/3.jpg)
3
Execution Environment
• Four login nodes– Serial jobs only– CPU limit: 60 minutes– Memory limit: 64 MB
• 320 compute nodes– “Interactive” parallel jobs– Batch serial and parallel jobs– Scheduled by PBSPro
• Queue limits and policies established to meet system objectives
– User input is critical!
![Page 4: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/4.jpg)
4
Interactive Jobs
• Serial jobs run on login nodes– cd, ls, pathf90, etc.– ./a.out
• Parallel jobs run on compute nodes– Controlled by PBSPro
mpirun -np 16 ./a.out
qsub -I -q interactive -l nodes=8:ppn=2 % cd $PBS_O_WORKDIR
% mpirun -np 16 ./a.out
qsub -I -q batch -l nodes=32:ppn=2,walltime=18:00:00
![Page 5: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/5.jpg)
5
PBSPro
• Marketed by Altair Engineering– Based on open source Portable Batch
System developed for NASA– Also installed on DaVinci
• Batch scripts contain directives:#PBS -o myjob.out
• Directives may also appear as command-line options:qsub -o myjob.out …
![Page 6: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/6.jpg)
6
Simple Batch Script
#PBS -l nodes=8:ppn=2,walltime=00:30:00#PBS -N myjob #PBS -o myjob.out #PBS -e myjob.err #PBS -A mp999 #PBS -q debug #PBS -V
cd $PBS_O_WORKDIR mpirun -np 16 ./a.out
![Page 7: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/7.jpg)
7
Useful PBS Options (1)
-A repoCharge this job to repository repoDefault: Your default repository
-N jobnameProvide name for job; up to 15 printable, non-
whitespace charactersDefault: Name of batch script
-q qnameSubmit job to batch queue qnameDefault: batch
![Page 8: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/8.jpg)
8
Useful PBS Options (2)
-S shellSpecify shell as the scripting language
Default: Your login shell
-VExport current environment variables into the
batch job environment
Default: Do not export
![Page 9: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/9.jpg)
9
Useful PBS Options (3)
-o outfileWrite STDOUT to outfileDefault: <jobname>.o<jobid>
-e errfileWrite STDERR to errfileDefault: <jobname>.e<jobid>
-j [eo|oe]Join STDOUT and STDERR on STDOUT (eo)
or STDERR (oe)Default: Do not join
![Page 10: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/10.jpg)
10
Useful PBS Options (4)
-m [a|b|e|n]E-main notification
a = send mail when job aborted by system
b = send mail when job begins
e = send mail when job ends
n = do not send mail
Options a, b, and e may be combined
Default: a
![Page 11: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/11.jpg)
11
Batch Queues
Submit Execute Nodes Walltime
interactive interactive 1 – 16 30 mins
debug debug 1 – 32 30 mins
batch
batch16 1 – 16 48 hours
batch32 17 – 32 24 hours
batch64 33 – 64 12 hours
batch128 65 – 128 6 hours
batch256 129 – 256 6 hours
low low 1 – 64 6 hours
![Page 12: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/12.jpg)
12
Batch Queue Policies
• Each user may have:– One running interactive job– One running debug job– Four jobs running over entire system
• Only one batch128 job is allowed to run at a time.
• The batch256 queue usually has a run limit of zero. NERSC staff will arrange to run jobs of this size.
![Page 13: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/13.jpg)
13
Submitting Batch Jobs
% qsub myjob
93935.jacin03
%
• Record jobid for tracking!
![Page 14: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/14.jpg)
14
Deleting Batch Jobs
% qdel 93935.jacin03
%
![Page 15: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/15.jpg)
15
Monitoring Batch Jobs (1)
• PBS command qstat % qstatJob id Name User Time Use S Queue---------------- ---------------- ---------------- -------- - -----93295.jacin03-ib job5 einstein 00:00:00 R batch1693894.jacin03 EV80fl02_3 legendre 0 H batch16
93330.jacin03 test.script laplace 00:00:23 R batch32
93897.jacin03 runlu8x8 rasputin 0 Q batch3293334.jacin03-m mtp_mg_3wat_o2a fibonacci 00:00:11 R batch16...
• Use -u option for single-user output% qstat -u einsteinJob id Name User Time Use S Queue---------------- ---------------- ---------------- -------- - -----93295.jacin03-ib job5 einstein 00:00:00 R batch16%
![Page 16: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/16.jpg)
16
Monitoring Batch Jobs (2)
• NERSC command qs% qs
JOBID ST USER NAME NDS REQ USED SUBMIT
93939 R gauss STDIN 1 00:30:00 00:10:43 Oct 2 16:47:00
93891 R einstein runlu4x8 16 01:00:00 00:38:48 Oct 2 15:23:36
93918 R inewton r4_16 8 01:00:00 00:10:37 Oct 2 15:36:35
... 93785 Q inewton r4_64 32 01:00:00 - Oct 2 08:42:36
93828 Q rasputin nodemove 64 00:05:00 - Oct 2 12:00:11
93897 Q einstein runlu8x8 32 01:00:00 - Oct 2 15:24:27
... 93893 H legendre EV80fl02_2 4 03:00:00 - Oct 2 15:24:23
93894 H legendre EV80fl02_3 4 03:00:00 - Oct 2 15:24:24
93917 H legendre EV80fl98_5 4 03:00:00 - Oct 2 15:26:06
...
• Also provides -u option
![Page 17: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/17.jpg)
17
Monitoring Batch Jobs (3)
• NERSC website has current queue look:http://www.nersc.gov/nusers/status/jacquard/qstat
• Also has completed jobs list:http://www.nersc.gov/nusers/status/jacquard/pbs_summary
• Numerous filtering options available– Owner– Account– Queue– Jobid
![Page 18: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/18.jpg)
18
Charging
• Machine charge factor (cf) = 4– Based on benchmarks and user applications– Currently under review
• Serial interactive– Charge = cf • cputime– Always charged to default repository
• All parallel– Charge = cf • 2 • nodes • walltime– Charged to default repo unless -A specified
![Page 19: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/19.jpg)
19
Things To Look Out For (1)
• Do not set group write permission for your home directory; it will prevent PBS from running your jobs.
• Library modules must be loaded at runtime as well as linktime.
• Propagation of environment variables to remote processes is incomplete; contact NERSC consulting for help.
![Page 20: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/20.jpg)
20
Things To Look Out For (2)
• Do not run more that one MPI program in a single batch script.
• If your login shell is bash, you may see:accept: Resource temporarily unavailable
done.
In this case, specify a different shell using the -S directive, such as:#PBS -S /usr/bin/ksh
![Page 21: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/21.jpg)
21
Things To Look Out For (3)
• Batch jobs always start in $HOME. To get to directory where job was submitted:cd $PBS_O_WORKDIR
For jobs that work with large files:cd $SCRATCH/some_subdirectory
• PBS buffers output and error files until job completes. To view files (in home directory) while running:-k oe
![Page 22: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/22.jpg)
22
Things To Look Out For (3)
• The following is just a warning and can be ignored:Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.
![Page 23: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/23.jpg)
23
LoadLeveler vs. PBS
LL PBS LL PBS#@ node #PBS -l nodes #@
notification#PBS -m
#@ tasks_per_node
#PBS -l ppn #@ shell #PBS -S
#@ wall_clock_limit
#PBS -l walltime #@ output #PBS -o
#@ class #PBS -q #@ error #PBS -e
#@ job_name #PBS -N #@ environment
#PBS -V
#@ account_no #PBS -A
![Page 24: Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649d095503460f949dc06e/html5/thumbnails/24.jpg)
24
Resources
• NERSC Websitehttp://www.nersc.gov/nusers/resources/jacquard/running_jobs.php
http://www.nersc.gov/vendor_docs/altair/PBSPro_7.0_User_Guide.pdf
• NERSC Consulting
1-800-66-NERSC, menu option 3, 8 am - 5 pm, Pacific time (510) 486-8600, menu option 3, 8 am - 5 pm, Pacific time [email protected] http://help.nersc.gov/