HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for...
-
Upload
lesley-ford -
Category
Documents
-
view
212 -
download
0
Transcript of HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for...
![Page 1: HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ec55503460f94bcf93a/html5/thumbnails/1.jpg)
HEPiX Karlsruhe May 9-13, 2005Operated by the Southeastern Universities Research Association for the U.S. Department of Energy
Thomas Jefferson National Accelerator Facility
HEPiX Spring 2005
Batch Scheduling at JLab
Sandra Philpott
Scientific Computing Manager
Physics Computer Center
![Page 2: HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ec55503460f94bcf93a/html5/thumbnails/2.jpg)
HEPiX Karlsruhe May 9-13, 2005Operated by the Southeastern Universities Research Association for the U.S. Department of Energy
Thomas Jefferson National Accelerator Facility
Overview of Resources
Experimental PhysicsBatch Farm + Mass Storage
• Raw Data Storage• Data Replay and Analysis• 200 dual Xeons
http://auger.jlab.org/scicomp
Theoretical PhysicsHigh Performance Computing (HPC) - Lattice QCD
• 3 clusters of meshed machines• 384 GigE, 256 GigE, 128 Myrinet
• parallel jobs
http://www.jlab.org/hpc
![Page 3: HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ec55503460f94bcf93a/html5/thumbnails/3.jpg)
HEPiX Karlsruhe May 9-13, 2005Operated by the Southeastern Universities Research Association for the U.S. Department of Energy
Thomas Jefferson National Accelerator Facility
Schedulers
• LSF – Physics Offline Reconstruction and Analysis• Auger locally developed front end• Tight integration with JASMine, our mass storage system• Consider PBS in time for Hall D and GlueX?
• Cost savings, compatibility with HPC
jsub user command
• OpenPBS – Lattice QCD parallel computing• Torque• UnderLord locally developed scheduler
• Also provides trend analysis, projections, graphs, etc.
• Considering Maui as a replacement for UnderLordqsub user command
![Page 4: HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ec55503460f94bcf93a/html5/thumbnails/4.jpg)
HEPiX Karlsruhe May 9-13, 2005Operated by the Southeastern Universities Research Association for the U.S. Department of Energy
Thomas Jefferson National Accelerator Facility
Queue Configuration - LSF
• Production – bulk of the jobs
• Priority – quick jobs – less than 30 min.
• Low priority – intended for simulations
• Idle – typically mprime
• Maintenance – for SysAdmin
Queue Priority Policy Preempt Time Opt Time
Priority 100 FIFO No Yes 30
Production 80 Fairshare No No -
Low Priority 5 Fairshare/RR
No Yes 2880
Idle 1 FIFO Yes No
![Page 5: HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ec55503460f94bcf93a/html5/thumbnails/5.jpg)
HEPiX Karlsruhe May 9-13, 2005Operated by the Southeastern Universities Research Association for the U.S. Department of Energy
Thomas Jefferson National Accelerator Facility
Queue Configuration - PBS
Batch Queue Names: • 2m: Master@qcd2madm
• 3g: Master@qcd3gadm
• 4g: Panel01@qcd4gadm, Panel23@qcd4gadm, Panel45@qcd4gadm
Queue & Machine Limits: • 2m: 24 hours, 8 GB /scratch, 256 MB memory
• 3g: 24 hours, 20 GB /scratch, 256 MB memory
• 4g: 24 hours, 20 GB /scratch, 512 MB memory
• Jobs that use the most nodes have the highest priority• UnderLord scheduling policy defined by Admin
• Job Age, Job Duration, Queue Priority, User Share, User Priority
![Page 6: HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ec55503460f94bcf93a/html5/thumbnails/6.jpg)
HEPiX Karlsruhe May 9-13, 2005Operated by the Southeastern Universities Research Association for the U.S. Department of Energy
Thomas Jefferson National Accelerator Facility
Sample Script – LSF
JOBNAME: job2PROJECT: clasCOMMAND: /home/user/test/job2.scriptOPTIONS: -debugOS: solarisINPUT_FILES: /mss/clas/raw/run1001.data
/mss/clas/raw/run1002.data /mss/clas/raw/run1003.dataINPUT_DATA: fort.11OTHER_FILES: /home/xxx/yyy/exp.databaseTOTAPEOUTPUT_DATA: recon.run100
OUTPUT_TEMPLATE: /mss/clas/prod1/OUTPUT_DATA
![Page 7: HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ec55503460f94bcf93a/html5/thumbnails/7.jpg)
HEPiX Karlsruhe May 9-13, 2005Operated by the Southeastern Universities Research Association for the U.S. Department of Energy
Thomas Jefferson National Accelerator Facility
Sample Script – PBS
#! /bin/csh -f
setenv DEPEND "" if ($#argv > 0) then setenv DEPEND "-W depend=afterok" foreach job ($argv) setenv DEPEND "${DEPEND}:$job" endendif
qsub \ -c n \ -m ae -M [email protected]\ -l nodes=64:ppn=2,walltime=30\ -v SLEEPTIME=30\ -N MPI_CPI_Test \ -p 1 \ -q Master@qcdadm01 ${DEPEND}\ /home/akers/TestJobs/MPI/cpi.csh
![Page 8: HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ec55503460f94bcf93a/html5/thumbnails/8.jpg)
HEPiX Karlsruhe May 9-13, 2005Operated by the Southeastern Universities Research Association for the U.S. Department of Energy
Thomas Jefferson National Accelerator Facility
Resource Utilization
ExpPhy• Efficient Data Flow - prestaging of data before jobs are admitted
to farm• Spread data over multiple file servers transparently
• Keeps batch farm CPUs 100% busy; no waiting on data to arrive
• Workaround to specify specific resources to imply newer systems with more memory
DISK_SPACE: 125 GB
HPC/Lattice• jobs may have an optimal resource spec but can use other
configs if optimal not available
![Page 9: HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ec55503460f94bcf93a/html5/thumbnails/9.jpg)
HEPiX Karlsruhe May 9-13, 2005Operated by the Southeastern Universities Research Association for the U.S. Department of Energy
Thomas Jefferson National Accelerator Facility
Summary
We would like • Common job submission for users
• for both experimental and LQCD jobs
• For both experimental and LQCD clusters
• For grid jobs
• common set of resource descriptors; user can specify only the ones required
We are collaborating with STAR at BNL for RDL – Request Description Languagehttp://www.star.bnl.gov/STAR/comp/Grid/scheduler/rdl/index.html
We will soon become an Open Science Grid sitehttp://www.opensciencegrid.org