Workshop: Using the VIC3 Cluster for Statistical Analyses

22
Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex

description

Workshop: Using the VIC3 Cluster for Statistical Analyses. Support perspective G.J. Bex. Overview. Cluster VIC3: hardware & software Statistics research scenario Worker framework MapReduce with Worker Q&A. Birds eye view of VIC3. r1i0n0. login1. r1i3n15. r1i0n1. login2. /bin. - PowerPoint PPT Presentation

Transcript of Workshop: Using the VIC3 Cluster for Statistical Analyses

Page 1: Workshop: Using the VIC3 Cluster for Statistical Analyses

Workshop: Using the VIC3 Cluster for Statistical Analyses

Support perspective

G.J. Bex

Page 2: Workshop: Using the VIC3 Cluster for Statistical Analyses

Overview

• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A

Page 3: Workshop: Using the VIC3 Cluster for Statistical Analyses

Birds eye view of VIC3

login1

login2

svcs1svcs2

r1i0n0

r1i0n1 r1i3n15

r2i0n0

r2i0n1

netapp

~vsc30034

/bin

r2i3n15

Page 4: Workshop: Using the VIC3 Cluster for Statistical Analyses

VIC3 nodes• Compute nodes

– 112 nodes with 2 quad core 'harpertown', 8GB RAM– 80 nodes with 2 quad core 'nehalem', 24GB RAM– 6 nodes with 2 quad core 'nehalem', 72 GB RAM and local hard disk

• Storage– 20 TB disk space shared between home directories and scratch

space, access via NFS– 4 nodes with disks for a parallel file system (needed for MPI I/O jobs)

• Service nodes include 2 login nodes

1584 cores, for16.6 TFlop (theoretical peak)

Page 5: Workshop: Using the VIC3 Cluster for Statistical Analyses

What can you run?

• All open source linux software• All linux software the K.U.Leuven has a license

for that covers the cluster, and you are a K.U.Leuven staff member

• All linux software you have a license for that covers the cluster

• No Windows software

R, SAS, MATLAB are ok for K.U.Leuven & UHasselt users

Page 6: Workshop: Using the VIC3 Cluster for Statistical Analyses

Overview

• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A

Page 7: Workshop: Using the VIC3 Cluster for Statistical Analyses

Running example: SAS code

• Your SAS program, e.g., 'clmk.sas'– is usually interactive– depends on parameters, e.g.,• type of distribution• alpha, beta

– has to be run for several types and values of alpha and beta

Page 8: Workshop: Using the VIC3 Cluster for Statistical Analyses

Running example: batch mode

• 1st step: convert it for batch mode– capture command line variables:

– run it from the command line:

…%LET type = "%scan(&sysparm, 1, %str(:))";%LET alpha = %scan(&sysparm, 2, %str(:));%LET beta = %scan(&sysparm, 3, %str(:));…

$ sas –batch –noterminal –sysparm discr:1.3:15.0 clmk.sas

Page 9: Workshop: Using the VIC3 Cluster for Statistical Analyses

login

I've got a job to do: PBS files

compute nodes

queue system/scheduler:Torque/Moab

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas

clmk.pbs

$ msub clmk.pbs

Page 10: Workshop: Using the VIC3 Cluster for Statistical Analyses

No more modifying!#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas

$ msub clmk.pbs

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas

$ msub clmk.pbs –v type=discr,alpha=1.3,beta=15.0

Page 11: Workshop: Using the VIC3 Cluster for Statistical Analyses

Going parallel… or nuts?

• Parameter sets…– are independent, so computations can be done in

parallel!– but all combination of type, alpha, beta: large

number of jobs

Worker framework

Page 12: Workshop: Using the VIC3 Cluster for Statistical Analyses

Overview

• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A

Page 13: Workshop: Using the VIC3 Cluster for Statistical Analyses

Conceptuallytype alpha beta

discr 1.3 15.0

discr 1.3 30.0

discr 1.8 15.0

discr 1.8 30.0

… … …

cont 1.3 15.0

… … …

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas

Page 14: Workshop: Using the VIC3 Cluster for Statistical Analyses

Concretetype alpha beta

discr 1.3 15.0

discr 1.3 30.0

discr 1.8 15.0

discr 1.8 30.0

… … …

cont 1.3 15.0

… … …

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas

clmk.pbs

clmk.csv

$ module load worker/1.0$ wsub –data clmk.csv –batch clmk.pbs -l nodes=2:ppn=8

N

N rows will be computed in parallel by 2 × 8 – 1 = 15 cores

Page 15: Workshop: Using the VIC3 Cluster for Statistical Analyses

Caveat 1: time is of the essence…

• How long does your job need? (= walltime)– time to compute N rows/requested cores

• walltime limitations– more than 5 minutes– less than 2 days

• hence, if walltime exceeds 2 days, split data and submit multiple jobs

• explicitly request sufficient walltime:

No hard limits,but guidelines toreduce queue time

$ wsub –data clmk.csv –batch clmk.pbs \ -l nodes=2:ppn=8,walltime=36:00:00

Page 16: Workshop: Using the VIC3 Cluster for Statistical Analyses

Caveat 2: slave labour

• P cores, how to choose P?– functions• 1 master• P – 1 slaves

– each compute node has 8 cores, so P mod 8 = 0– N >> P: better load balancing, efficiency– larger P• shorter walltime• (potentially) longer time in queue

shortest turn-around: hard to predict

turn-around=

queue time+

walltime

Page 17: Workshop: Using the VIC3 Cluster for Statistical Analyses

Caveat 3: independence

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

log_name="clmk-$type-$alpha-$beta.log"print_name="clmk-$type-$alpha-$beta.lst"

sas -batch –noterminal \ -log $log_name \ -print $print_name \ -sysparm $type:$alpha:$beta clmk.sas

SAS locks log and output files!

Make sure each computation writes to its own files!

Page 18: Workshop: Using the VIC3 Cluster for Statistical Analyses

Overview

• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A

Page 19: Workshop: Using the VIC3 Cluster for Statistical Analyses

Conceptually: MapReduce

data.txt

data.txt.1

data.txt.2

data.txt.7

…result.txt

result.txt.1

result.txt.2

result.txt.7

…map reduce

Page 20: Workshop: Using the VIC3 Cluster for Statistical Analyses

Concrete: -prolog & -epilog

data.txt

data.txt.1

data.txt.2

data.txt.7

…result.txt

result.txt.1

result.txt.2

result.txt.7

prolog.sh epilog.shprolog.sh

batch.sh

batch.sh

batch.sh

$ wsub –prolog prolog.sh –batch batch.sh \ –epilog epilog.sh –l nodes=3:ppn=8

Page 21: Workshop: Using the VIC3 Cluster for Statistical Analyses

Overview

• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A

Page 22: Workshop: Using the VIC3 Cluster for Statistical Analyses

Where to find help?

• http://www.vscentrum.be/vsc-help-center• [email protected]• http://status.kuleuven.be/hpc• UHasselt staff: [email protected]