The HEPiX IPv6 working group David Kelsey (STFC-RAL) HEPiX meeting, Bologna 17 Apr 2013.
Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.
-
Upload
jerome-harvey -
Category
Documents
-
view
215 -
download
2
Transcript of Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.
![Page 1: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/1.jpg)
Batch Scheduling at CERN (LSF)
Hepix Spring Meeting 2005
Tim Bell IT/FIO Fabric Services
![Page 2: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/2.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 2
End User Clusterslx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
00
01
DNS-like load balancing
LSF CLI or Grid UI
dis
k0
01
rfio
tap
e0
01
rfio
lxfs
rk1
23
tpsrv
00
1
950 Batch Servers “lxbatch”
44 InteractiveServers “lxplus”
400 Disk Servers
80 Tape Servers
![Page 3: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/3.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 3
Hard/software of clusters Typically we make two acquisitions per year
Interactive login:
44 lxplus dual 2.8 GHz, 2GB memory, 80GB disk, slc3
Batch worker nodes - /pool is working space:
85 lxbatch dual 800MHz, 512MB memory, 8GB /pool,redhat7
86 lxbatch dual 1 GHz, 1GB memory, 8GB /pool, slc3 538 lxbatch dual 2.4 GHz, 1GB memory, 40GB /pool, slc3 226 lxbatch dual 2.8 GHz, 2GB memory, 40GB /pool, slc3
On order 225 lxbatch dual 2.8GHz, 2GB memory, 120GB disk, slc3
![Page 4: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/4.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 4
Current CERN LSF queues View queue information bqueues [options] [queue-name] % bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP grid_dteam 20 Open:Active - - - - 0 0 0 0 grid_cms 20 Open:Active - - - - 0 0 0 0 grid_atlas 20 Open:Active - - - - 226 21 205 0 grid_lhcb 20 Open:Active - - - - 0 0 0 0 grid_alice 20 Open:Active - - - - 0 0 0 0 lcgtest 20 Open:Active - - - - 0 0 0 0 system_test 10 Open:Active - - - - 0 0 0 0 8nm 7 Open:Active - 120 - - 3487 3472 15 0 1nh 6 Open:Active - 120 - - 2218 2104 114 0 8nh 5 Open:Active - 100 - 3 5330 5004 326 0 cmsprs 5 Open:Active 60 75 - 3 98 86 12 0 1nd 4 Open:Active - - - 2 6894 6230 664 0 2nd 4 Open:Active - - - 2 3212 2686 526 0 1nw 3 Open:Active - - - 1 4623 4257 366 0 prod100 2 Open:Active - 100 - 2 146 120 26 0 cmsdc04 2 Open:Active - 600 - 2 0 0 0 0 prod200 1 Open:Active - 250 - 2 1611 1561 50 0 prod400 1 Open:Active - 1000 - 2 5773 5301 472 0
![Page 5: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/5.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 5
Queues are based on cputime requirements
8nm 8 normalised minutes 1nh 1 normalised hour 8nh 8 normalised hours 1nd 1 normalised day 1nw 1 normalised week The normalisation changes every few years as machines
get faster. We recently converted to kilo-specint2000 units so that one cpu hour on a 2.8GHz farm PC is almost exactly one normalised hour (they are rated at 1.037 KSI2K).
Additional low priority production queues for reserved users with high job concurrency allow 1nw
Grid queues use mapped team accounts and allow 1nw cpu time. Further refinements will be investigated.
All queues can have resource or user group restrictions
![Page 6: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/6.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 6
![Page 7: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/7.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 7
![Page 8: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/8.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 8
Queue issues
The queue definitions are essentially empirically defined to match users requirements for turnaround times where a user could expect many short jobs per day and a few long jobs overnight.
Production queues are for low priority work where specific turnaround is not an issue. The three queues allow higher numbers of concurrent jobs at decreasing priority.
There is a grid queue for each VO but without any cpu-time granularity.
There is a local queue funded by an experiment for fast turnaround of analysis jobs.
Many experiments define subgroups of privileged users which have higher priority within the group and can run higher numbers of concurrent jobs.
We allow to schedule up to 3 shorter jobs per worker node but stop at 2 if cpu load exceeds 90%.
![Page 9: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/9.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 9
Resource sharing
CERN experiments and major user groups apply annually for part of centrally funded shared resources and can fund extra capacity giving them a guaranteed share. Any unused shares are available to all. LSF schedules jobs firstly on a per user group basis to deliver the defined shares of cpu time to the group in a rolling 12 hour period.
We have currently defined (arbitrarily) 1800 shares to match our 1800 KSi2K of capacity.
There is currently an allocatable capacity reserve (but which is used in practise) of 300 shares that we use to satisfy short term requests.
![Page 10: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/10.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 10
Scheduling policies
LSF supports hierarchies of user groups. Each level of the hierarchy can have a
scheduling policy : Fairshare is used at the highest group level and by
most experiments among their users. Jobs are scheduled to give equal shares in average over a certain period – currently 12+ hours. Queue priorities are ignored and more longer jobs may be started.
FCFS (First Come First Served) (default policy at CERN). Shorter queues are scheduled first. Within a group and queue a single user can block others.
Preemptive/Preemptableused at CERN on an engineering cluster only (for parallel jobs)
![Page 11: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/11.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 11
Current CERN shares per user group [lxplus058.cern.ch] > 75 ~ > bhpart SHARE HOST_PARTITION_NAME: SHARE HOSTS: g_share/
SHARE_INFO_FOR: SHARE/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME u_prod 289 95.740 0 0 95.6 0 u_z2 100 33.333 0 0 0.0 0 zp_prod 90 1.978 5 0 109780.2 31637 u_vp 20 0.586 2 0 19698.6 109414 u_vo 1 0.333 0 0 0.0 0 harp_dydak 2 0.261 1 0 200.8 8353 u_slap 75 0.161 11 0 766725.5 1442519 u_za 16 0.138 10 0 191694.7 236536 u_DELPHI 2 0.129 1 0 33828.5 14929 u_wf 3 0.077 0 0 183887.4 0 u_LHCB 280 0.070 263 0 1268841.6 15357023 u_HARP 9 0.068 11 0 138163.6 353289 u_c3 1 0.050 2 0 20586.3 35210 zp_burst 40 0.048 77 0 1401348.9 1681989 u_xu 2 0.043 10 0 26846.6 42004 u_CMS 400 0.035 408 0 18033716.0 34445474 u_vl 24 0.024 132 0 2220209.0 908173 u_NA48 6 0.024 34 0 681000.8 85772 u_coll 75 0.023 426 0 6378126.0 4021268 others 20 0.016 23 0 1847487.4 4105005 u_yt 16 0.016 177 0 2208468.0 167261 u_l3c 5 0.016 65 0 568145.3 53442 u_vk 1 0.010 20 0 182311.5 29401 u_OPAL 2 0.009 9 0 124016.9 880953 u_COMPASS 200 0.008 502 0 42877972.0 82582955 u_zp 90 0.008 216 0 5662955.5 51679400 u_z6 2 0.003 10 0 967997.3 1925963
![Page 12: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/12.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 12
Resource Requirement String
Most LSF commands accept a resource requirement string.
Describes the resources required by a job.
Used for mapping the job onto execution hosts which satisfy the resource request.
Queue names imply a cpu (and real time) time limit and are enough for most users.
Real time limit is hardwired per queue at a multiple (3 to 4) of the cputime limit.
![Page 13: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/13.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 13
Users can request specific static resources
type host type (SLC3 or LINUX7) string
model host model (SEIL_2400 etc.) string
hname Hostname (lxb0001 etc.) string
cpuf CPU factor (0.852 etc.) relative
server host can run remote jobs boolean
rexpri execution priority nice(2) argument
ncpus number of processors integer
ndisks number of local disks integer
maxmem
maximum RAM memory available to users megabytes
maxswp maximum available swap space megabytes
maxtmp maximum available space in /tmp directory megabytes
![Page 14: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/14.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 14
Or dynamic resources
Index Measures Units Averaged over
Update Interval
status host status string 15 sec
r15s run queue length processes 15 sec 15 sec
r1m run queue length processes 1 min 15 sec
r15m run queue length processes 15 min 15 sec
ut CPU utilization % 1 min 15 sec
pg paging activity pgin+pgout per sec
1 min 15 sec
ls logins users 30 sec
it idle time min 30 sec
![Page 15: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/15.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 15
Other dynamic Resources
Index Measures Units Averaged over
Update Interval
swp available swap space MB 15 sec
mem available memory MB 15 sec
tmp available space in /tmp MB 120 sec
io disk i/o to local disks kB/sec 1 min 15 sec
lftm seconds until next shutdown
sec 15 sec
pool available space on /pool disk
MB 15 sec
![Page 16: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/16.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 16
Typical batch job requirements
Beyond queue name the most commonly used resource requests are: “Cpuf > factor” where current values are
from 0.2 to 1.0. This is to avoid slower machines with long execution real times.
“Mem > MB” requesting free memory at job start. Avoids jobs running on hosts where they would be killed by our monitors or swap excessively.
“pool > MB” requests current working directory space at job start. We kill jobs that use more than 75% of a nodes work space.
Current grid UI mapping for CPU factor and pool space is unclear
![Page 17: Batch Scheduling at CERN (LSF) Hepix Spring Meeting 2005 Tim Bell IT/FIO Fabric Services.](https://reader036.fdocuments.in/reader036/viewer/2022083008/56649f565503460f94c79b75/html5/thumbnails/17.jpg)
03/05/2005 Batch Scheduling at CERN [email protected] 17
Further information
User level info is on the web at http://cern.ch/batch which has links to
live status, accounting data and basic user guides.
Live monitoring information which starts at the cluster level but can descend to individual nodes is at: http://cern.ch/lemon-status (internal
access only)