Running Batch Jobs in R: How to deal with coarsely parallel problems

16
Running Batch Jobs in R: How to deal with coarsely parallel problems WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP Malcolm Haddon May 2014

description

Running Batch Jobs in R: How to deal with coarsely parallel problems. Malcolm Haddon. May 2014. Wealth from Oceans National research Flagship. Computer Intensive. Many, many, many iterations: Management Strategy Evaluation Monte Carlo Markov Chains Lots of replicates of any analyses - PowerPoint PPT Presentation

Transcript of Running Batch Jobs in R: How to deal with coarsely parallel problems

Page 1: Running Batch Jobs in R: How to deal with coarsely parallel problems

Running Batch Jobs in R:How to deal with coarsely parallel problems

WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP

Malcolm HaddonMay 2014

Page 2: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

Computer Intensive•Many, many, many iterations:

•Management Strategy Evaluation•Monte Carlo Markov Chains•Lots of replicates of any analyses

•Large scale simulations:•multi-species, •multi-populations, •multi-’etc’

•Any computing job that takes a long time or uses a lot of computing resources

2 |

Page 3: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

Why the Fuss?• Solving BIG computing problems has its own

strategies.• If a job:

• takes a very long time, or •uses very large amounts of RAM •Then how can it be split up most effectively?

•Depends on the scale at which processes are independent.•May need trials to find best compromise.

3 |

Page 4: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

Coarsely Parallel Processes • Not talking about finely parallel processes such as

cellular models in Oceanography or visualization. • The use of GPUs containing thousands of small processors is

ideally suited to such analyses.• Some emphasis on this with the CSIRO clusters, (Bragg, etc)

and the Advanced Scientific Computing program• Instead: focussed on serial and sequential problems

where analysis order is important.• Population processes• Many biological processes

• Cannot split up time-series trajectories – but can treat each trajectory as a different process (coarsely parallel)

4 |

Page 5: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

Alternative Approaches to Simulation.

5 |

Apply 8 Harvest Strategiesto an abalone fishery over 40 years with 1000 replicates (8 x 1000)

for (HS in 1:8) { for (iter in 1:1000) { } }

plot and tabulateresults

Apply 8 Harvest Strategiesto an abalone fishery over 40 years with 1000 replicates (8 x 1000)

for (iter in 1:1000) { }

Combineplot and tabulate

results

for (iter in 1:1000) { }

for (iter in 1:1000) { }

…..

Store Results Store Results Store Results …..

Split the jobinto 8 parts

Next Steps

Page 6: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

The R program

6 |

Page 7: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon7 |

batchsimab.rsource(“Lots of Functions”)

source(“Constants”)

setwdresultdir

read in Data

source(“run_specification”)

write to csv file(s)write to Rdata filesplots to tiff/pdf/etc

Page 8: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

Top Level: runbatch.R – contains:## SET PARAMETERS AS DESIRED IN ## runspecification.R and constants.R

>wkdir <- "C:/A_CSIRO/Rcode/abalone/SimAb">setwd(wkdir) ## points to directory containing batchsimab.r

>command <- "R.exe --vanilla < “batchsimab.R">shell(command, wait=FALSE)

##(R.exe must be on the path).

8 |

Page 9: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

Top Level: runbatch.R – contains:## SET PARAMETERS AS DESIRED IN ## RunSpecification.R and constants.R primaryloop <- c(val1, val2, val3,..) for (toplevel in 1:length(primaryloop) { sink(“RunSpecification.R”) … … sink() command <- "R.exe --vanilla < batchsimab.R" shell(command, wait=FALSE)}

## Can re-write values in RunSpecification.R

9 |

Page 10: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

• pickLML <- c(127,132,138,145)• for (pick in 1:length(pickLML)) {• filename <- "alt_runspecification.r"• sink(filename)• cat("##Select the HCR \n")• cat("StepH <- FALSE \n")• cat("ConstH <- TRUE \n")• cat("## Define the Scenarios \n")• cat("initDepl_L <- c(0.7) \n")• cat("inH_L <- c(0.1) \n")• cat("origTAC <- 150.0 \n")• cat(paste("LML <- ",pickLML[pick],sep="") ," \n")• cat("reps <- 100 \n")• sink()• command <- "R.exe --vanilla < batchsimab.R"• shell(command, wait=FALSE)• Sys.sleep(5.0)• }

10 |

Page 11: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

alt-runspecification.r - contents• batch <- TRUE • ##Select the HCR • StepH <- FALSE • ConstH <- TRUE

• ## Define the Scenarios • initDepl_L <- c(0.7) • inH_L <- c(0.1)

• origTAC <- 150.0

• LML <- 138 • reps <- 100

11 |

Page 12: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

Alternative Approach

12 |

Not that useful for coarsely parallel problems, but excellent for finely parallel processes.

Page 13: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

Alternative Approaches•Can use one’s own desktop or laptop.•Can use a secondary machine (remote login)•Can use a CSIRO cluster machine (bragg for

Linux or bragg-w for windows, plus others).•Clusters are very effective for finely parallel

work but less so for coarsely parallel jobs.•Can use Condor – harvests CPU time on remote

machines on network automatically.• wiki.csiro.au/display/ASC/Scientific+Computing+Homepage

13 |

Page 14: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

Conclusion• The use of batch jobs provides a solution for completing

certain types of task.• If you are using computer intensive methods then you

might gain greatly from using coarsely parallel methods.• Trade-off between the benefits and the set-up time and

post-run processing determines when it becomes sensible to use coarsely parallel methods• Invariably more than 1 way exists to do the same thing:• https://wiki.csiro.au/display/ASC/Scientific+Computing+Homepage

14 |

Page 15: Running Batch Jobs in R: How to deal with coarsely parallel problems

WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP

Thank you

CSIRO Marine and Atmospheric ResearchMalcolm Haddontel. 61 3 6232 5097email. [email protected]. www.csiro.au

Page 16: Running Batch Jobs in R: How to deal with coarsely parallel problems

| Batch Jobs in R | Haddon

Adding in R.exe to Path•Control Panel

•System–Advanced System Settings

–Environmental Variables• PATH - edit

•Paste “; C:/Program Files/R/R3.1.0/bin/x64” onto the end of the present PATH and exit.

16 |