Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski ([email protected])...
-
Upload
jared-brooks -
Category
Documents
-
view
217 -
download
0
Transcript of Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski ([email protected])...
![Page 1: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/1.jpg)
Computing Workshop Computing Workshop for for
Users of NCAR’s SCD machinesUsers of NCAR’s SCD machines
Christiane Jablonowski ([email protected])
NCAR ASP/SCD
31 January 2006
ML Mesa Lab, Chapman Roomvideo conference facilities: FL EOL Atrium and CG1 3150
![Page 2: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/2.jpg)
OverviewOverview
Current machine architectures at NCAR (SCD) Some basics on parallel computing Batch queuing systems at NCAR GAU resources & how to obtain a GAU account Insights into GAU charges The Mass Storage System How to monitor the GAUs Some practical tips on benchmarks, debugging tools,
restarts… ???
![Page 3: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/3.jpg)
Computer architecturesComputer architectures
SCD’s machines are UNIX-based parallel computing architectures
Two types:– Hybrid (shared and distributed memory) machines like
bluesky (IBM Power4) bluevista (IBM Power5)lightning (AMD Opteron system running Linux)
– Shared memory system liketempest (SGI, 128 CPUs), predominantly used for post-processing jobs
![Page 4: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/4.jpg)
Parallel ProgrammingParallel Programming
Parallel machines require parallel programming techniques in the user application:– MPI (Message Passing Interface) for distributed
memory systems, can also be used on shared memory systems
– OpenMP for shared memory systems– Hybrid (MPI & OpenMP) programming technique
common on the IBMs at NCAR Pure MPI parallelization often the fastest option,
computational domain is split into pieces that can communicate over the network (via messages)
OpenMP: Parallelization of (mostly) loops via compiler directives
Parallelization provided in CAM/CCSM/WRF
![Page 5: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/5.jpg)
Most common: Hybrid hardware architecturesMost common: Hybrid hardware architectures
Combined shared and distributed memory architecture:– Shared-memory symmetric multiprocessor (SMP) nodes,
processors on a node have direct access to memory– Nodes are connected via the network (distributed memory)
![Page 6: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/6.jpg)
MPI exampleMPI example
Processors communicate via messages
![Page 7: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/7.jpg)
MPI ExampleMPI Example
Initialize & finalize MPI in your program via function/subroutine calls to the MPI library. Examples include:MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize
Example fromprevious pagein C notation(unoptimized):
Important to note: such an operation (computing a global sum) is very common, therefore MPI provides a highly optimized function, also called a ‘reduction operation’ MPI_Reduce (…) that can replace the example above
![Page 8: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/8.jpg)
Example: domain decompositions for MPIExample: domain decompositions for MPI
Each color presentsa processor
![Page 9: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/9.jpg)
OpenMP ExampleOpenMP Example
Parallel loops via compiler directives (here: in Fortran notation)Before program is called set:setenv OMP_NUM_THREADS #proc
Add compiler directives in your code:!$OMP PARALLEL DODO i = 1, n a(i) = b(i) + c(i)END DO!$OMP END PARALLEL DO
master thread
master thread
team
Assume n=1000 & #proc=4: The loop will be split into 4 ‘threads’ that run in parallel with loop indices 1…250, 251…500, 501…750, 751…1000
![Page 10: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/10.jpg)
SCD’s machinesSCD’s machines Bluesky (web page)
– ‘Oldest’ machine at NCAR (2002)– Lots of user experience at NCAR, easy access to help– CAM/CCSM/WRF are set up for this
architecture (Makefiles)– Batch queuing system LoadLeveler,
short interactive runs possible– Batch queues are listed under
http://www.cisl.ucar.edu/computers/bluesky/queue.charge.html
– Lots of additional software available: e.g. math libraries, graphics packages, Totalview debugger
![Page 11: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/11.jpg)
SCD’s machinesSCD’s machines
Bluevista (web page)– Newest machine on the floor (Jan. 2006)– CAM/CCSM/WRF are (probably) set up for this architecture– Batch queuing system LSF (Load Sharing Facility)– Queue names different from bluesky: premium, regular, economy,
standby, debug, sharehttp://www.cisl.ucar.edu/computers/bluevista/queue.charge.html
– Some additional software available: e.g. math libraries, Totalview debugger
![Page 12: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/12.jpg)
SCD’s machinesSCD’s machines
Lightning (web page)– Linux cluster– Compilers different from the
IBMs:Portland Group or Pathscale
– Batch queuing system LSF– Same queue names as
bluevista– Some support software
Tempest (web page)– for data post-processing with
yet another batch queuing system NQS
– Lots of support software– Interactive use possible
![Page 13: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/13.jpg)
Example of a LoadLeveler job scriptExample of a LoadLeveler job script
#@ class = com_rg32#@ node = 1#@ tasks_per_node = 32#@ output = out.$(jobid)#@ error = out.$(jobid)#@ job_type = parallel#@ wall_clock_limit = 00:20:00#@ network.MPI = csss,not_shared,us#@ node_usage = not_shared#@ account_no = 54042108#@ ja_report = yes#@ queue
…setenv OMP_NUM_THREADS 1
…
Parallel job with 32 MPI processes, com_reg32 queue (32-way node)
Submit the job via: llsubmit job_script
regular queue
32 MPI processesper 32-way node
![Page 14: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/14.jpg)
Example of a LoadLeveler job scriptExample of a LoadLeveler job script
#@ class = com_ec32#@ node = 1#@ tasks_per_node = 8#@ output = out.$(jobid)#@ error = out.$(jobid)#@ job_type = parallel#@ wall_clock_limit = 00:20:00#@ network.MPI = csss,not_shared,us#@ node_usage = not_shared#@ account_no = 54042108#@ ja_report = yes#@ queue
…setenv OMP_NUM_THREADS 4
…
Hybrid parallel job with 8 MPI processes and 4 OpenMP threads
Submit the job via: llsubmit job_script
economy queue
8 MPI processesper 32-way node
![Page 15: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/15.jpg)
Example of an LSF job script (lightning)Example of an LSF job script (lightning)
#! /bin/csh###BSUB -a 'mpich_gm'#BSUB -P 54042108#BSUB -q regular #BSUB -W 00:30#BSUB -x#BSUB -n 8#BSUB -R "span[ptile=2]"#BSUB -o fvcore_amr.out.%J#BSUB -e fvcore_amr.err.%J#BSUB -J test0.path##mpirun.lsf -v ./dycore
Parallel job with 8 MPI processes (on 4 2-way nodes)
Submit the job via: bsub < job_script
regular queue
8 MPI processes (total)
2 MPI processes per node
select on lightning
wallclock limit 30 min
name of the job (listedin the SCD Portal)
![Page 16: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/16.jpg)
Example of an LSF job script (bluevista)Example of an LSF job script (bluevista)
#! /bin/csh###BSUB -a poe#BSUB -P 54042108#BSUB -q economy#BSUB -W 00:30#BSUB -x#BSUB -n 8#BSUB -R "span[ptile=8]"#BSUB -o fvcore_amr.out.%J#BSUB -e fvcore_amr.err.%J#BSUB -J test0.path##mpirun.lsf -v ./dycore
Parallel job with 8 MPI processes (on 1 8-way node)
Submit the job via: bsub < job_script
economy queue
select ‘poe’ on bluevista
Allows up to 8 MPI processes on a node
exclusive use (not shared)
![Page 17: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/17.jpg)
More information on SCD’s machinesMore information on SCD’s machines
Web page: SCD’s Support and Consulting services SCD’s costomer support sometimes you even get help on
the weekends or in the evenings– Email: [email protected]– Phone: 303 497 1278 – Walk-in support at the Mesa Lab
Check out SCD’s Daily Bulletin (scheduled machine downtimes, etc.)
Subscribe to the hpcstatus mailing list (short e-mails about machine status, system updates)
![Page 18: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/18.jpg)
GAU resourcesGAU resources ASP has a monthly allocation of 3850 GAUs
(General Accounting Units) A GAU is a measure for some compute time on the
supercomputers maintained by NCAR’s Scientific Computing Division (SCD):http://www.cisl.ucar.edu/
Access to these machines require – an SCD login account ([email protected] or 303-497-1225)– a GAU account (for ASP: contact Maura, otherwise
contact your division / apply for a university account)– ssh environment – and a crypto card (for secure access)
SCD contacts: Dick Valent & Mike Page (here today), Juli Rew, Siddhartha Gosh, Ginger Caldwell (GAUs)
![Page 19: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/19.jpg)
GAU resourcesGAU resources GAUs: Use it or lose it - strategy In ASP: We share the resource among the ASP
postdocs & graduate fellows Distribution is flexible and will be discussed
occasionally, e.g. monthly, either via meetings or e-mail discussions:
email: [email protected] GAUs are also charged for
– storing files in the Mass Storage System (MSS) – file transfers from MSS to other machines
![Page 20: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/20.jpg)
ASP GAU accountASP GAU account
ASP GAU account number: 54042108 (also project number) Needs to be specified in the batch job scripts ASP account number is not your default account number Therefore: everybody needs a second (default) GAU account:
– divisional GAU account– so-called University account (small request form for 1500
GAUs http://www.cisl.ucar.edu/resources/compServ.html)these GAUs do not expire every month, one-time allocation
Second GAU account should be used for the accumulating MSS charges– automatic when using CAM / CCSM’s MSS option
![Page 21: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/21.jpg)
GAU charges on SCD’s supercomputersGAU charges on SCD’s supercomputers
You are charged GAUs for how much time you use a processor (on bluesky, bluevista, lightning, tempest)
On bluesky, there are actually two formulas:– Shared-node usage:
GAUs charged = CPU hours used computer factor class charging factor
– Dedicated-node usage:GAUs charged = wallclock hours used
number of nodes used number of processors in that
node computer factor
class charging factor Slides on GAU charges: Modified from an earlier Slides on GAU charges: Modified from an earlier presentation by George Bryan, NCAR MMMpresentation by George Bryan, NCAR MMM
![Page 22: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/22.jpg)
““Number of nodes used” andNumber of nodes used” and“Number of processors in that node”“Number of processors in that node”
Self explanatory (?) Bluesky:
– 76 8-way (processors) nodes– 25 32-way (processors) nodes
Bluevista:– 78 8-way (processors) nodes
Lightning– 128 2-way (processors) nodes
![Page 23: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/23.jpg)
““CPU hours used” and “Wallclock CPU hours used” and “Wallclock hours used”hours used”
Measure of how long you “used” a processor NOTE: This includes all time you were allocated the
use of a processor, whether you actually used it or not
Example: you used two 8-processor nodes on bluesky. The job started at 1:00 PM and finished at 2:30 PM.
You are charged for 1.5 hrs
![Page 24: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/24.jpg)
““Computer factor”Computer factor”
A measure of how powerful a computer is– Bluesky: 0.24– Bluevista: 0.5– Lightning: 0.34
This “levels the playing field”
![Page 25: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/25.jpg)
““Class charging factor”Class charging factor”
Tied to queuing system: “How quickly do you want your results, and how much are you willing to pay for it?”
Current setting on all SCD supercomputers:– Premium = 1.5 (highest priority, fastest turnaround)– Regular = 1.0– Economy = 0.5– Standby = 0.1 (lowest priority, slow turnaround)
![Page 26: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/26.jpg)
ExampleExample
Recall dedicated-node usage on bluesky GAUs charged = wallclock hours used number of
nodes used number of processors in that node computer factor class charging factor
1.5 hours using two 8-processor nodes Bluesky regular queue GAUs used = 1.5 2 8 0.24 1.0
= 5.76 GAUs In premium queue, this would be 8.64 GAUs In standby queue, this would be 0.576 GAUs
![Page 27: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/27.jpg)
Recommendations: Queuing systemsRecommendations: Queuing systems
Check the queue before you submit any job:– If the queue is not busy, try using the standby or economy
queues The queue tends to be “emptier” evenings, weekends,
and holidays Job will start sooner when specifying a wallclock limit in
the job script (scheduler tries to ‘squeeze in’ short jobs) The less processors you request, the sooner you start Use the premium queue sparingly
– Short debug jobs (there is also a special debug queue on lightning)
– When that conference paper is due
![Page 28: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/28.jpg)
Recommendations: Recommendations: # of processors vs. run times# of processors vs. run times
If you are using more processors, you might wait longer in the queue, but usually the actual runtime of your job is reduced
Caveat: it usually costs more GAUs Example: you run the same job, but using
– Using 8 processors, the job ran in 24 hours– Using 64 processors, the job ran in 4 hours
– 1st example used 46 GAUs– 2nd example used 61 GAUs
![Page 29: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/29.jpg)
The Mass Storage SystemThe Mass Storage System
MSS: Mass storage system (disks and cartridges) for your big data sets
MSS connected to the SCD machines, sometimes also to divisional computers
MSS user have directories like mss:/LOGIN_NAME/ Quick online reference (mss commands):
http://www.cisl.ucar.edu/docs/mss/mss-commandlist.html You are charged GAUs for using the MSS The GAU equation for MSS is more complicated ....
![Page 30: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/30.jpg)
GAUs charged = .0837 R + .0012 A + N (.1195 W + .2050
S) where:
– R = Gigabytes read– W = Gigabytes created or written– A = Number of disk drive or tape cartridge accesses– S = Data stored, in gigabyte-years– N = Number of copies of file: 1 if economy
reliability selected; 2 if standard reliability selected
MSS ChargesMSS Charges
![Page 31: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/31.jpg)
Recommendations: Recommendations: The MSSThe MSS
MSS charges seem small, but they add up!
Examples: FY04 MSS usage– ACD: 24,000 of 60,000 GAUs– CGD: 94,500 of 181,000 GAUs– HAO: 22,000 of 122,000 GAUs– MMM: 34,000 of 139,000 GAUs– RAP: 32,000 of 35,000 GAUs
![Page 32: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/32.jpg)
Recommendations: Recommendations: The MSSThe MSS
Recommendation for ASP users: – use an account in your home division or your
so-called ‘university’ account (1500 GAUs for postdocs, you need to apply) for MSS charges
– leave ASP GAUs for supercomputing
![Page 33: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/33.jpg)
GAU Usage Strategy: 30-day and GAU Usage Strategy: 30-day and 90-day averages90-day averages
The allocation actually works through 30-day and 90-day averages
Limits: 120% for 30-day use105% for 90-day use
It is helpful to spread usage out evenly How to check GAU usage:
– Type “charges” on command line of a supercomputer– Check the “daily summary” output (next page)– SCD Portal: look for the link on SCD’s main page:
http://www.cisl.ucar.edu/
![Page 34: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/34.jpg)
Web page: http://www.cisl.ucar.edu/dbsg/dbs/ASP/ASP 30 Day Percent = 57.0 % ASP 90 Day Percent = 48.3 %30 Day Allocation = 3850 90 Day Allocation = 1155030 Day Use = 2193 90 Day Use = 5575
90 DAY ST -- 30 DAY ST -- LAST DAY 01-NOV-05 31-DEC-05 29-JAN-06
ASP Gaus Used by Day
01-NOV-05 9.3603-NOV-05 .03 04-NOV-05 141.45
…22-JAN-06 .04 23-JAN-06 44.29 24-JAN-06 170.83 25-JAN-06 120.30 26-JAN-06 91.67 27-JAN-06 41.97 28-JAN-06 15.59 29-JAN-06 16.95
![Page 35: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/35.jpg)
What happens when we use too What happens when we use too many GAUs?many GAUs?
Your jobs will be thrown into a very low priority: the dreaded hold queue
It will be hard to get work doneBut, jobs will still run ASP Users: You can use more than 3850 GAUs /
month Experience says, it’s better to use too many than not
enough
![Page 36: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/36.jpg)
What happens when we use What happens when we use too many/too few GAUs?too many/too few GAUs?
Too many: Recommendation: when the 30- and 90-day
averages are running high, use the economy or standby queue ... conserve GAUs
But, don’t worry about going over
Too few: ASP’s allocation will be cut in the long run if the
3850 GAUs per month allocation is not used
![Page 37: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/37.jpg)
How to catch up when behindHow to catch up when behind
Be wasteful:– Use the premium queue– Use more processors than you need
Have fun:– Try something you always wanted to do, but
never had the resources
![Page 38: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/38.jpg)
How to conserve GAUsHow to conserve GAUs
Be frugal:– Use the economy and standby queues– Use fewer processors– Use divisional GAUs (if possible) or your
‘university’ GAU account
![Page 39: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/39.jpg)
How to share & monitor GAUs in ASPHow to share & monitor GAUs in ASP
Communicate! Occasionally, we (ASP postdocs) use the e-mail list:
to announce a ‘busy’ GAU period Keep watching the ASP GAU usage on the webpage
http://www.cisl.ucar.edu/dbsg/dbs/ASP/ or in the SCD Portal
Look for the SCD Portal link on the SCD page:http://www.cisl.ucar.edu/
![Page 40: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/40.jpg)
SCD PortalSCD Portal Online tool that helps you monitor the GAU charges and the current
machine status (e.g. batch queues), display can be customized Information on the machine status requires a setup-command on
roy.scd.ucar.edu via the crypto-card access, just enter ‘scdportalkey hostname’ (e.g. lightning) after logging on with the crypto-card
At this time (Jan/31/2006) the GAU charges on bluevista are not itemized: will be included in the next release in Spring 2006
![Page 41: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/41.jpg)
Other IBM resourcesOther IBM resources
Sources of information on the IBM machines bluesky (from the command line), batchview also works on bluevista & lightning– batchview for overview of jobs with their rankings– llq for list of all submitted jobs, no ranking– spinfo : queue limits, memory quotas on home file system and
the temporary file system /ptmp
– Useful IBM LoadLeveler keywords in the script:#@account_no=54042108 -> ASP account #@ja_report=yes -> job report (see
example on the
next page)
– Useful LoadLeveler commands: llsubmit script_file, llcancel job_id
![Page 42: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/42.jpg)
Example: IBM Job ReportExample: IBM Job Report If selected, one email per job is sent to you at midnight,
Output on the IBM machines, here blackforest (meanwhile decommisioned):
Job Accounting - Summary Report ===============================
Operating System : blackforest AIX51 User Name (ID) : cjablono (7568) Group Name (ID) : ncar (100) Account Name : 54042108 Job Name : bf0913en.26921 Job Sequence Number : bf0913en.26921 Job Starts : 12/20/04 17:56:33 Job Ends : 12/20/04 23:26:34 Elapsed Time (Wall-Clock * #CPU): 633632 s Number of Nodes (not_shared) : 8 Number of CPUs : 32 Number of Steps : 1
![Page 43: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/43.jpg)
IBM Job Report (continued)IBM Job Report (continued)
Charge Components Wall-clock Time : 5:30:01 Wall-clock CPU hours : 176.00889 hrs Multiplier for com_ec Queue : 0.50 Charge before Computer Factor : 88.00444 GAUs
Multiplier for computer blackforest: 0.10 Charged against Allocation : 8.80044 GAUs Project GAUs Allocated : 5000.00 GAUs Project GAUs Used, as of 12/16/04:1889.20 GAUs Division GAUs 30-Day Average : 103.3% Division GAUs 90-Day Average : 58.6%
![Page 44: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/44.jpg)
How to increase the efficiencyHow to increase the efficiency Get a feel for the GAUs for long jobs: benchmark the application on
target machine– Run a short but relevant test problem and measure the run time
(wall clock time) via MPI commands (function MPI_WTIME) or UNIX timing commands like time or timex (output formats are shell-script dependent)
– Vary number of processors to assess the scaling– If application scales poorly, avoid using a large number of
processors (waste of GAUs), instead use smaller number with numerous restarts
– Make sure your job fits into the queue (finishes before the max. time is up)
Use compiler options, especially the optimization options In case of programming problems: the Totalview debugger can save
you days, weeks or even monthson the IBM’s: compile your program with the compiler options:-g -qfullpath -d
![Page 45: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/45.jpg)
RestartsRestarts
Restart files are important for long simulations– Queue limits are up to 6 wallclock hours (hard
limit, job fails afterwards), then a restart becomes necessary
– Get information on the queue limits (SCD web page) and select the job’s integration time accordingly
– Restarts built into CAM/CCSM/WRF, must only be activated
– Restarts for other user applications must probably be programmed
![Page 46: Computing Workshop for Users of NCAR’s SCD machines Christiane Jablonowski (cjablono@ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room video.](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649e005503460f94ae8e89/html5/thumbnails/46.jpg)
Questions ?Questions ?