When and How to Use Large-Scale Computing: CHTC and HTCondor
description
Transcript of When and How to Use Large-Scale Computing: CHTC and HTCondor
![Page 1: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/1.jpg)
When and How to Use Large-Scale Computing: CHTC and HTCondor
Lauren Michael, Research Computing FacilitatorCenter for High Throughput Computing
STAT 692, November 15, 2013
![Page 2: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/2.jpg)
› Why to Access Large-Scale Computing resources› CHTC Services and Campus-Shared Computing› What is High-Throughput Computing (HTC)?› What is HTCondor and How Do You Use It?› Maximizing Computational Throughput› How to Run R on Campus-Shared Resources
Topics We’ll Cover Today
2
![Page 3: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/3.jpg)
1. your computing work won’t run at all on your computer(s) (lack sufficient RAM, disk, etc.)
2. your computing work will take too long on your own computer(s)
3. you would like to off-load certain processes in favor of running others on your computer(s)
When should you use outside computing resources?
3
![Page 4: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/4.jpg)
Center for High Throughput Computing, est. 2006› Large-scale, campus-shared computing systems
h high-throughput computing (HTC) grid and high-performance computing (HPC) cluster
h all standard services provided free-of-chargeh automatic access to the national Open Science Grid (OSG)h hardware buy-in options for priority access h information about other computing resources
› Support for using our systemsh consultation services, training, and proposal assistanceh solutions for numerous software (including Python, Matlab, R)
CHTC Services
4
![Page 5: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/5.jpg)
HTCondor: CHTC’s R&D Arm› R&D for HTCondor and other HTC software› Services provided to the campus community
h HTC Software• HTCondor: manage your compute cluster• DAGMan: manage computing workflows• Bosco: submit locally, run globally
h Software Engineering Expertise & Consulting• CHTC-operated Build-and-Test Lab (BaTLab)
h Software Security Consulting
Your Problems become Our Research!
![Page 6: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/6.jpg)
Jul’10-Jun’11
Jul’11-Jun’12
Jul’12-Jun’13 Quick Facts
45 70 97 Million Hours Served
54 106 120 Research Projects
35 52 52 Departments
10 13 15 Off-Campus Projects
Researchers who use the CHTC are located all over campus (red buildings)
http://chtc.cs.wisc.edu
![Page 7: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/7.jpg)
Director, Miron Livny [email protected](also OSG Technical Director and WIDs CTO)
Campus Support: [email protected]+ Research Computing Facilitators› Lauren Michael (lead)
[email protected] Systems Administrators+4-8 Part-time Students
HTCondor Development TeamOSG Software Team
CHTC Staff
7
![Page 8: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/8.jpg)
› high-throughput computing (HTC)h many independent processes that can run on 1 or few
processors (“cores” or “threads”) on the same computerh mostly standard programming methodsh best accelerated by: access to as many cores as possible
› high-performance computing (HPC)h sharing the workload of interdependent processes over
multiple cores to reduce overall compute timeh OpenMP and MPI programming methods, or multi-threadh requires: access to many servers of cores within the same
tightly-networked cluster; access to shared files
HTC versus HPC
8
![Page 9: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/9.jpg)
› essentially means: spread computing work out over multiple processors
› Use of the words “parallel” and “parallelize” can apply to HTC or HPC when referring to programs
› It’s important to be clear!
“parallel” is confusing
9
![Page 10: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/10.jpg)
› Why to Access Large-Scale Computing resources› CHTC Services and Campus-Shared Computing› What is High-Throughput Computing (HTC)?› What is HTCondor and How Do You Use It?› Maximizing Computational Throughput› How to Run R on Campus-Shared Resources
Topics We’ll Cover Today
10
![Page 11: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/11.jpg)
› match-maker of computing work and computers› “job scheduler”
h matches are made based upon necessary RAM, CPUs, disk space, etc., as requested by the user
h jobs re-run if interrupted
› works beyond “clusters” to coordinate distributed computers for maximum throughput
› coordinates data transfers between users and distributed computers
› can coordinate servers, desktops, and laptops
What is HTCondor?
11
![Page 12: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/12.jpg)
Queuejob1.1 user1job1.2 user1job2.1 user2
Submit Node(s)(where jobs are
submitted)
input
How HTCondor Works
Central Manager(of the pool)
Execute Node(s)(where jobs run)
Machine
ClassAdJo
b Clas
sAd
output
12
input
![Page 13: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/13.jpg)
13
![Page 14: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/14.jpg)
Submit host CS Pool CHTC Pool Campus Grid Open Science Grid
Stat dept servers default
simon.stat.wisc.edu default
CHTC submit nodes default flocking “glidein”
Submit nodes available to YOU
14
![Page 15: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/15.jpg)
› Prepare programs and files
› Write submit file(s)› Submit jobs to the queue› Monitor the jobs› (Remove bad jobs)
Basic HTCondor Submission
15
![Page 16: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/16.jpg)
› Make programs portableh compile code to a simple binaryh statically-link code dependenciesh consider CHTC’s tools for packaging Matlab, Python, and R
› Consider using a shell script (or other “wrapper”) to run multiple commands for youh create a local install of softwareh set environment variablesh then, run your code
› Stage all files on a submit node
Preparing Programs and Files
16
![Page 17: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/17.jpg)
1. Cut up computing work into many independent pieces
(CHTC can consult)
2. Make programs portable, minimize dependencies
(CHTC can consult, or may have prepared solutions)
3. Learn how to submit jobs(CHTC can help you a lot!)
4. Maximize your overall throughput on available computational resources
(CHTC can help you a lot!)
HTC Components
17
![Page 18: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/18.jpg)
# This is a commentuniverse = vanillaoutput = process.outerror = process.errlog = process.logexecutable = cosmosarguments = cosmos.in 4should_transfer_files = YEStransfer_input_files = cosmos.inwhen_to_transfer_output = ON_EXITrequest_memory = 100request_disk = 100000request_cpus = 1queue
Basic HTCondor Submit File
18
basic jobs are vanilla universe
executable is your single
program or a shell script
log is where HTCondor stores info about how
your job ran
output and error are where
system output and error will go
The program will be run as: ./cosmos cosmos.in 4
queue with no number after it will submit only
one job
memory in MB and disk in KB
![Page 19: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/19.jpg)
# This is a commentuniverse = vanillaoutput = process.outerror = process.errlog = process.logexecutable = cosmosarguments = cosmos.in 4should_transfer_files = YEStransfer_input_files = cosmos.inwhen_to_transfer_output = ON_EXITrequest_memory = 100request_disk = 100000request_cpus = 1queue
Basic HTCondor Submit File
19
Initial File Organization
In folder test/ cosmos cosmos.in submit.txt
![Page 20: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/20.jpg)
# This is a commentuniverse = vanillaoutput = $(Process).outerror = $(Process).errlog = $(Cluster).logexecutable = cosmosarguments = cosmos_$(Process).inshould_transfer_files = YEStransfer_input_files = cosmos_$(Process).inwhen_to_transfer_output = ON_EXITrequest_memory = 100request_disk = 100000request_cpus = 1queue 3
HTCondor Multi-Job Submit File
20
test/ cosmos cosmos_0.in cosmos_1.in cosmos_2.in submit.txt
![Page 21: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/21.jpg)
# This is a commentuniverse = vanillaInitialDir = $(Process)output = $(Process).outerror = $(Process).errlog = /home/user/test/$(Cluster).logexecutable = /home/user/test/cosmosarguments = cosmos.inshould_transfer_files = YEStransfer_input_files = cosmos.inwhen_to_transfer_output = ON_EXITrequest_memory = 100request_disk = 100000request_cpus = 1queue 3
HTCondor Multi-Folder Submit File
21
test/ cosmos cosmos.in submit.txt 0/ cosmos.in 1/ cosmos.in 2/ cosmos.in
![Page 22: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/22.jpg)
Submitting Jobs
22
[lmichael@simon test]$ condor_submit submit.txtSubmitting job(s)...3 job(s) submitted to cluster 29747.[lmichael@simon test]$
![Page 23: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/23.jpg)
Checking the Queue
23
[lmichael@simon test]$ condor_q lmichael
-- Submitter: simon.stat.wisc.edu : <144.92.142.159:9620?sock=3678_5c57_3> : simon.stat.wisc.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 29747.0 lmichael 2/15 09:06 0+00:01:34 R 0 9.8 cosmos cosmos.in29747.1 lmichael 2/15 09:06 0+00:00:00 I 0 9.8 cosmos cosmos.in29747.2 lmichael 2/15 09:06 0+00:00:00 I 0 9.8 cosmos cosmos.in
3 jobs; 0 completed, 0 removed, 2 idle, 1 running, 0 held, 0 suspended[lmichael@simon test]$
View all user jobs in the queue: condor_q
![Page 24: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/24.jpg)
Log Files
24
000 (29747.001.000) 02/15 09:29:17 Job submitted from host: <144.92.142.159:9620?sock=3678_5c57_3>...001 (29747.001.000) 02/15 09:33:59 Job executing on host: <144.92.142.153:9618?sock=17172_f1f3_3>...005 (29747.001.000) 02/15 09:39:01 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job Partitionable Resources : Usage Request Allocated Cpus : 1 1 Disk (KB) : 225624 100000 645674 Memory (MB) : 85 1000 1024
![Page 25: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/25.jpg)
› Remove a single job: condor_rm 29747.0
› Remove all jobs of a cluster: condor_rm 29747
› Remove all of your jobs: condor_rm lmichael
Removing Jobs
25
![Page 26: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/26.jpg)
› Why to Access Large-Scale Computing resources› CHTC Services and Campus-Shared Computing› What is High-Throughput Computing (HTC)?› What is HTCondor and How Do You Use It?› Maximizing Computational Throughput› How to Run R on Campus-Shared Resources
Topics We’ll Cover Today
26
![Page 27: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/27.jpg)
› The Philosophy of HTC› The Art of HTC› Other Best-Practices
Maximizing Throughput
27
![Page 28: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/28.jpg)
› break up your work into many ‘smaller’ jobsh single CPU, short run times, small input/output data
› run on as many processors as possibleh single CPU and low RAM needsh take everything with you; make programs portableh use the “right” submit node for the right “resources”
› automate as much as you can› (share your processors with others to increase
everyone’s throughput)
The Philosophy of HTC
28
![Page 29: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/29.jpg)
› Edgar Spalding: studies effect of gene on plant growth outcomes
› GeoDeepDive Project: extracts and comprises “dark data” from PDFs of publications in Geosciences
We want HTC to revolutionize your research!
Success Stories
29
![Page 30: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/30.jpg)
carrying out the philosophy, well
› Tuning job requests for memory and disk› Matching run times to the maximum number of
available processors› Automation
The Art of HTC
30
![Page 31: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/31.jpg)
Problem: Don’t know what your job needs?
› If you don’t ask for enough memory and disk:h Your jobs will be kicked off for going over, and will have
to be retried (though, HTCondor will automatically request more for you)
› If you ask for too much:h Your jobs won’t match to as many available “slots” as
they could
Tuning Job Resource Requests
31
![Page 32: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/32.jpg)
Solution: Testing is Key!!!
1. Run just a few jobs at first to determine memory and disk needs from log filesh If your first request is not enough, HTCondor will retry the
jobs and request more until they finish.h It’s okay to request a lot (1 GB each) for a few tests.
2. Change the “request” lines to a better value3. Submit a large batch
Tuning Job Resource Requests
32
![Page 33: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/33.jpg)
Submit host CS Pool
(4 hrs?)
CHTC Pool<24 hrs
(up to 72)*
Campus Grid<4 hrs
Open Science Grid<2 hrs
Stat dept servers default
simon.stat.wisc.edu default
CHTC submit nodes default +WantFlocking
= true +WantGlidein = true
Time-Matching (submit file additions)
33
![Page 34: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/34.jpg)
› Problem: Jobs less than 5 minutes are bad for overall throughputh more time spent on matching and data transfers than
on your job’s processesh Ideal time is between 5 minutes and 2 hours (OSG)
› Solution: Use a shell script (or other method) to run multiple processes within a single jobh avoids transfer of intermediate files between
sequential, related processesh debugging can be a bit trickier
Time-Tuning: Batching
34
![Page 35: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/35.jpg)
› The best way to run longer jobs without losing progress to eviction.
Two Ways:1. Compile your code with condor_compile and use
the “standard” universe within HTCondor2. Implement self-checkpointing
*Consult HTCondor’s online manual or contact the CHTC for help
Time-Tuning: Checkpointing
35
![Page 36: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/36.jpg)
› Use $(Process)› Shell scripts to run multiple tasks within the same job
h including environment preparation› Hardcode arguments, calculate them (random number
generation), or use parameter files/tables
› Use HTCondor’s DAGMan featureh “directed acyclic graph”h create complex workflows of dependent jobs, and submit
them all at onceh additional helpful features: success checks and more
Automate Tasks
36
![Page 37: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/37.jpg)
Remember that you are sharing with others› “Be Kind to Your Submit Node”
h avoid transfers of large files through the submit node (large: >10GB per batch; ~10 MB/job x 1000+ jobs)• transfer files from another server as part of your job
(wget and curl)• compress where appropriate; delete unnecessary files • remember: “new” files are copied back to submit nodes
h avoid running multiple CPU-intensive executables › Test all new batches, and scale up gradually
h 3 jobs, then 100s, then 1000s, then
Non-Throughput Considerations
37
![Page 38: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/38.jpg)
› Why to Access Large-Scale Computing resources› CHTC Services and Campus-Shared Computing› What is High-Throughput Computing (HTC)?› What is HTCondor and How Do You Use It?› Maximizing Computational Throughput› How to Run R on Campus-Shared Resources
Topics We’ll Cover Today
38
![Page 39: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/39.jpg)
› Problem: R programs don’t easily compile to a binary
› Solution: Take R with your job!
CHTC has tools just for R (and Python, and Matlab)› Installed on CS/Stat submit nodes, simon, and CHTC
submit nodes
Running R on HTC Resources:The Best Way
39
![Page 40: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/40.jpg)
40
![Page 41: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/41.jpg)
› Copy your R code and any R library tar.gz files to the submit node
› Run the following command:chtc_buildRlibs --rversion=sl5-R-2.10.1 \
--<library1>.tar.gz,<library2>.tar.gz› R versions supported: 2.10.1, 2.13.1, 2.15.1
(use the closest version below yours)
› Get back sl5-RLIBS.tar.gz and sl6-RLIBS.tar.gz(you’ll use these in the next step)
1. Build R Code with chtc_buildRlibs
41
![Page 42: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/42.jpg)
42
![Page 43: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/43.jpg)
› download ChtcRun.tar.gz, according to the guide (wget)› un-tar it: tar xzf ChtcRun.tar.gz
› View ChtcRun contents:process.template (submit file template)mkdag (script that will ‘create’ jobs based
upon your staged data)Rin/ (example data staging folder)
2. Download the “ChtcRun” Package
43
![Page 44: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/44.jpg)
› Stage data as such:ChtcRun/ data/
1/ input.in <specific_files> 2/ input.in <specific_files> job3/ input.in <specific_files> test4/ input.in <specific_files>shared/ <RLIBS.tar.gz> <program>.R <shared_files>
› Modify process.template with respect to:h request_memory and request_disk, if you knowh +WantFlocking = true OR +WantGlidein = true
3. Prepare data and process.template
44
![Page 45: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/45.jpg)
› In ChtcRun, execute the mkdag scripth (Examples at the top of “./mkdag --help”)
./mkdag --data=Rin –outputdir=Rout \--cmdtorun=soartest.R --type=R \--version=R-2.10.1 --pattern=meanx
h “pattern” indicates a portion of a filename that you expect to be created by successful completion of any single job
› A successful mkdag run will instruct you to navigate to the ‘outputdir’, and submit the jobs as a single DAG:condor_submit_dag mydag.dag
4. Run mkdag and submit jobs
45
![Page 46: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/46.jpg)
› Check jobs in the queue as they’re gradually added and completed (condor_q)
› Check other files in your ‘outputdir’:
Rout/ mydag.dag.dagman.out (updated table of job stats) 1/ process.log process.out,err ChtcWrapper1.out 2/ process.log process.out,err ChtcWrapper2.out …/
After testing a small number of jobs, submit many!(up to many 10,000s; # submitted is throttled for you)
5. Monitor Job Completion
46
![Page 47: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/47.jpg)
1. Use a Stat server to submit shorter jobs to the CS pool.2. Obtain access to simon.stat.wisc.edu from Mike
Camilleri ([email protected]), and submit longer jobs to the CHTC Pool.
3. Meet with the CHTC to submit jobs to the entire UW Grid and to the national Open Science Grid.h chtc.cs.wisc.edu, click “Get Started”
User support for HTCondor users at UW:[email protected]
What Next?
47
![Page 48: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/48.jpg)
48
![Page 49: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/49.jpg)
49
![Page 50: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/50.jpg)
50
![Page 51: When and How to Use Large-Scale Computing: CHTC and HTCondor](https://reader035.fdocuments.in/reader035/viewer/2022062811/568161da550346895dd1e4e2/html5/thumbnails/51.jpg)
1. Use a Stat server to submit shorter jobs to the CS pool.2. Obtain access to simon.stat.wisc.edu from Mike
Camilleri ([email protected]), and submit longer jobs to the CHTC Pool.
3. Meet with the CHTC to submit jobs to the entire UW Grid and to the national Open Science Grid.h chtc.cs.wisc.edu, click “Get Started”
User support for HTCondor users at UW:[email protected]
What Next?
51