Running Batch Jobs in R: How to deal with coarsely parallel problems
description
Transcript of Running Batch Jobs in R: How to deal with coarsely parallel problems
Running Batch Jobs in R:How to deal with coarsely parallel problems
WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP
Malcolm HaddonMay 2014
| Batch Jobs in R | Haddon
Computer Intensive•Many, many, many iterations:
•Management Strategy Evaluation•Monte Carlo Markov Chains•Lots of replicates of any analyses
•Large scale simulations:•multi-species, •multi-populations, •multi-’etc’
•Any computing job that takes a long time or uses a lot of computing resources
2 |
| Batch Jobs in R | Haddon
Why the Fuss?• Solving BIG computing problems has its own
strategies.• If a job:
• takes a very long time, or •uses very large amounts of RAM •Then how can it be split up most effectively?
•Depends on the scale at which processes are independent.•May need trials to find best compromise.
3 |
| Batch Jobs in R | Haddon
Coarsely Parallel Processes • Not talking about finely parallel processes such as
cellular models in Oceanography or visualization. • The use of GPUs containing thousands of small processors is
ideally suited to such analyses.• Some emphasis on this with the CSIRO clusters, (Bragg, etc)
and the Advanced Scientific Computing program• Instead: focussed on serial and sequential problems
where analysis order is important.• Population processes• Many biological processes
• Cannot split up time-series trajectories – but can treat each trajectory as a different process (coarsely parallel)
4 |
| Batch Jobs in R | Haddon
Alternative Approaches to Simulation.
5 |
Apply 8 Harvest Strategiesto an abalone fishery over 40 years with 1000 replicates (8 x 1000)
for (HS in 1:8) { for (iter in 1:1000) { } }
plot and tabulateresults
Apply 8 Harvest Strategiesto an abalone fishery over 40 years with 1000 replicates (8 x 1000)
for (iter in 1:1000) { }
Combineplot and tabulate
results
for (iter in 1:1000) { }
for (iter in 1:1000) { }
…..
Store Results Store Results Store Results …..
Split the jobinto 8 parts
Next Steps
| Batch Jobs in R | Haddon
The R program
6 |
| Batch Jobs in R | Haddon7 |
batchsimab.rsource(“Lots of Functions”)
source(“Constants”)
setwdresultdir
read in Data
source(“run_specification”)
write to csv file(s)write to Rdata filesplots to tiff/pdf/etc
| Batch Jobs in R | Haddon
Top Level: runbatch.R – contains:## SET PARAMETERS AS DESIRED IN ## runspecification.R and constants.R
>wkdir <- "C:/A_CSIRO/Rcode/abalone/SimAb">setwd(wkdir) ## points to directory containing batchsimab.r
>command <- "R.exe --vanilla < “batchsimab.R">shell(command, wait=FALSE)
##(R.exe must be on the path).
8 |
| Batch Jobs in R | Haddon
Top Level: runbatch.R – contains:## SET PARAMETERS AS DESIRED IN ## RunSpecification.R and constants.R primaryloop <- c(val1, val2, val3,..) for (toplevel in 1:length(primaryloop) { sink(“RunSpecification.R”) … … sink() command <- "R.exe --vanilla < batchsimab.R" shell(command, wait=FALSE)}
## Can re-write values in RunSpecification.R
9 |
| Batch Jobs in R | Haddon
• pickLML <- c(127,132,138,145)• for (pick in 1:length(pickLML)) {• filename <- "alt_runspecification.r"• sink(filename)• cat("##Select the HCR \n")• cat("StepH <- FALSE \n")• cat("ConstH <- TRUE \n")• cat("## Define the Scenarios \n")• cat("initDepl_L <- c(0.7) \n")• cat("inH_L <- c(0.1) \n")• cat("origTAC <- 150.0 \n")• cat(paste("LML <- ",pickLML[pick],sep="") ," \n")• cat("reps <- 100 \n")• sink()• command <- "R.exe --vanilla < batchsimab.R"• shell(command, wait=FALSE)• Sys.sleep(5.0)• }
10 |
| Batch Jobs in R | Haddon
alt-runspecification.r - contents• batch <- TRUE • ##Select the HCR • StepH <- FALSE • ConstH <- TRUE
• ## Define the Scenarios • initDepl_L <- c(0.7) • inH_L <- c(0.1)
• origTAC <- 150.0
• LML <- 138 • reps <- 100
11 |
| Batch Jobs in R | Haddon
Alternative Approach
12 |
Not that useful for coarsely parallel problems, but excellent for finely parallel processes.
| Batch Jobs in R | Haddon
Alternative Approaches•Can use one’s own desktop or laptop.•Can use a secondary machine (remote login)•Can use a CSIRO cluster machine (bragg for
Linux or bragg-w for windows, plus others).•Clusters are very effective for finely parallel
work but less so for coarsely parallel jobs.•Can use Condor – harvests CPU time on remote
machines on network automatically.• wiki.csiro.au/display/ASC/Scientific+Computing+Homepage
13 |
| Batch Jobs in R | Haddon
Conclusion• The use of batch jobs provides a solution for completing
certain types of task.• If you are using computer intensive methods then you
might gain greatly from using coarsely parallel methods.• Trade-off between the benefits and the set-up time and
post-run processing determines when it becomes sensible to use coarsely parallel methods• Invariably more than 1 way exists to do the same thing:• https://wiki.csiro.au/display/ASC/Scientific+Computing+Homepage
14 |
WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP
Thank you
CSIRO Marine and Atmospheric ResearchMalcolm Haddontel. 61 3 6232 5097email. [email protected]. www.csiro.au
| Batch Jobs in R | Haddon
Adding in R.exe to Path•Control Panel
•System–Advanced System Settings
–Environmental Variables• PATH - edit
•Paste “; C:/Program Files/R/R3.1.0/bin/x64” onto the end of the present PATH and exit.
16 |