Beowulf Training v2.1

51
1 Beowulf HPC Cluster Training Tan Wee Chuan Senior Support Engineer Centre for Academic Computing Updated: 21 Feb 08

Transcript of Beowulf Training v2.1

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 1/51

1

Beowulf HPC Cluster TrainingTan Wee ChuanSenior Support Engineer

Centre for Academic Computing

Updated: 21 Feb 08

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 2/51

2

Our Focus Today Beowulf Cluster Setup and Access

Grid Engine

Cluster Software

 Job Submission

Hands-on

Research Resources & CAC Website

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 3/51

3

Cluster Physical Setup

AMD Opteron2.4GHz

Processors

4GB mem/node

Intel Xeon3.06GHzProcessors

4GB mem/node

32-bit frontend node

18 x 32-bit compute CPU

2 x 32-bit express CPU

64-bit frontend node

38 x 64-bit compute CPU

2 x 64-bit express CPU

User Disk Quota:- 8 GB (soft)- 12 GB (hard)

<-- -->

32-bit 64-bit

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 4/51

4

Cluster Health Monitoringhttp://beowulf.smu.edu.sg/ganglia

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 5/51

5

Cluster Access

Beowulf Cluster Information

Files Transfer (on campus )

Files Transfer (off campus )

Login to the Cluster (anywhere )

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 6/51

Beowulf Cluster InfoHost name beowulf.smu.edu.sg

Login ID Your SMU userid

Password Your SMU password

Access Protocol SSH 2

Software to Use • Secure shell clients such as:

• Putty (\\beowulf\resources or Google it)

• Linux or Unix OS.

SharedDocuments andWIKI page

http://research2.smu.edu.sg/CAC/HPC/ 

6

Cluster Access

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 7/51

7

Files Transfer (on campus)

Type \\beowulf into

the address bar of the Windows Explorer

Cluster Access

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 8/51

Files Transfer (off campus)

Previous method (but with SMU VPN on)

Use WinSCP

8

Put hostname,

user id andpassword here

ChooseSFTP (allow SCP fallback)

Cluster Access

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 9/51

Files Transfer (off campus)

9

Cluster Access

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 10/51

Login to the Cluster

10

You can create andsave settings under aConnection Name

Hostname

You can put inyour useridto save typing

• Using Putty

You can changethe backgroundcolour and font size

Cluster Access

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 11/51

Login to the Cluster What you see after you login

Cluster Access

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 12/51

Exercises: Some Commands Directory operations

Present working directory List directory Change directory Make directory

File operations Copy files Delete files Moving files

Creating and editing files

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 13/51

Exercise: Directory Operations Present working directory: pwd

List directorysimple listing: lsfull listing: ls -lfull listing with screen scroll: ls -l | more

Change directoryabsolute path: cd [full pathname] eg: cd /opt/matlab relative path: cd [foldername] eg: cd beowulf-samples return to home: cd

Make directory: mkdir [new folder name ] eg: mkdir set1

Remove directory: rm –r [folder name] eg: rm –r set1

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 14/51

Exercise: File Operations Copy files: cp [file to copy] [location]

Examples: copy within same folder: cp data1.txt data2.txt

copy to another folder with same name: cp data1.txt set1

copy to another folder and rename: cp data1.txt set1/data.txt

Move / Rename files: mv [file to move] [location]

Examples:

move to another folder: mv data1.txt set1

move to another folder and rename: mv data1.txt set1/data.dat

Delete files: rm [file1] [file2] [file3]

Example: rm data1.txt run*.java

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 15/51

Exercise: Create / Edit Files Do not use Microsoft

Notepad, Wordpad orWord. On Windows, useUNIX friendly text editors,then copy the file over

In the cluster, choice of“vi” or “pico”

Using vi:

vi [file name]ddTo delete a line

yyTo copy a line

Press escTo exit insert text mode

iTo go into insert textmode

:qTo quit without save

:wqTo save and quit

:wTo save

TypingAction

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 16/51

16

Grid Engine

qsub

qstat (or jobwatch)

qdel

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 17/51

qsub Use this command to submit a job to the cluster

A job consists of your program codes and a text file (jobscript) “describing” the job

17

My job script

My matlab code& data file

Using qsub to submit the job script

Syntax: qsub [job script name]

Grid Engine

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 18/51

qstat Use this command to check job queues and job status

18

Job state (qw: queueing; t: transfer; r: running; Eqw: error)

Using qstat to check the job queue

Syntax: Action CommandSee all submitted jobs qstat

See only your jobs qstat –u [userid]

See all cluster queues qstat –f  

Per second update of status jobwatch –u

{use Ctrl-C to break update}

Grid Engine

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 19/51

qstat -f 

19

{ cropped  }

AMD64-bit

x86 areIntel 32-bit

Grid Engine

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 20/51

qdel Use this command to delete a submitted job

20

First, use “qstat” to locate job id

Then, use qdel to delete the job

Syntax: qdel [jobid]

Grid Engine

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 21/51

summary

qsub (for job submission)

qstat / jobwatch (for job monitoring)

qdel (for job deletion)

21

Grid Engine

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 22/51

Job Submission Overview

Software Available on the 2 Platforms

Software Specific Job Submissions

Common Errors

22

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 23/51

Overview

23

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 24/51

Software AvailableThe mainstream list of software available across both platforms:

24

Software 32-bit 64-bit

C / C++ /Fortran / IMSL for Fortran

YES (intel & gnu) YES (pathscale & gnu)

Gauss pseudocode YES YES

ILOG Cplex NO YES

JAVA YES YES

MATLAB YES YES

Compiled MATLAB codes YES (source 32-bitlibrary names)

YES (source 64-bit librarynames)

R YES YES

STATA NO YES

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 25/51

Job Submission Script A job submission script consists of 2 parts:

Grid Engine settings (switches) Software specifics

Sample:

25

--------------------------

#! /bin/bash

#$ -j y

#$ -cwd

#$ -m e

#$ -M [email protected]#$ -l amd64=1

. /etc/profile.d/java.sh

./mycode

--------------------------

GE switches

Software specifics

Job Submission

J b S b i i

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 26/51

GE Switch

Switch Meaning#$ -j y Merge the error and output stream into a single file

#$ -cwd Stands for “current working directory”. Means to take the currentfolder as reference for file paths

#$ -m e Send an email notification when the job ends#$ -M [email address] Send job notification to this email address

One of the following switches need to be indicated depending on the platform of your job:

#$ -l intel32=1 Send job to the Intel 32-bit standard queue

#$ -l amd64=1 Send job to the AMD 64-bit standard queue

#$ -l xp=1 Send job to the Intel 32-bit express queue (max. 24 hrs before kill)

#$ -l xp64=1 Send job to the AMD 64-bit express queue (max 24 hrs before kill)

26

The switches and the meaning:

Job Submission

J b S b i i

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 27/51

Software Specifics

27

Software Launching Example

C / C++ /Fortrancompiledbinary

./[compiled filename] > [output

file]

./compute_matrix > output.txt

Gausspseudocode

gsrun -b [filename.e.gcg] gsrun -b ols.e.gcg

ILOG Cplex cplex < [cplex script name] >

[output file]

cplex < optim_val > output.txt

JAVA . /etc/profile.d/java.sh

java [filename]

. /etc/profile.d/java.sh

java testfile > output.txt

27

Each software has its way of launching:

Job Submission

continue next page…

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 28/51

Software Specifics…continue from previous page.

28

Job Submission

Software Launching ExampleMATLAB  matlab -r ‘[filename]’ matlab –r ‘matrix’

Compiled

MATLABcodes

. /etc/profile.d/matlab.sh

./[compiled filename]

. /etc/profile.d/matlab.sh

./matrix_compiled

R R -b --vanilla < [filename.R] >

[output file]

R -b --vanilla < Rcode.R >

output.txt

STATA stata -b do [filename] stata -b do sortdata

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 29/51

C / C++

Use the Intel compiler “icc”:

example: icc -static matrix.cpp -o matrix

Move to the 64-bit platform first (“devel64”)

There are 2 licenses available for the Pathscale compilers

Use the Pathscale compiler “pathcc” (C) or “pathCC” (C++)

example: pathcc matrix.c -o matrix

pathCC matrix.cpp -o matrix

29

32-bit compilation

64-bit compilation

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 30/51

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 31/51

C / Fortran Script Form your job submission script. Use the text editor “vi” or “pico” or

use “Editpad”. Normally, we use a file extension “.sh” to name script

Submit the job submission script: qsub mycode.sh Check state: qstat (OR) jobwatch -u

If error, check the error / output stream file “mycode.sh.oXXXX ” and

make corrections

31

#! /bin/bash

#$ -j y

#$ -cwd

#$ -m e

#$ -M [email protected]

#$ -l intel32=1

./compiled_filename > output.txt

GE switches

Code specifics

Example: mycode.sh

Job Submission

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 32/51

Gauss Gauss Runtime Library is available on both the 32-bit and 64-bit

compute nodes

Gauss codes cannot be run directly and has to be compiled intoGauss pseudocode

Launch Gauss on your choice of platform (32-bit or 64-bit) andcompile your Gauss code into pseudocode form

example: compile mygausscode.e

Exit Gauss and find a new file with extension “.e.gcg”

32

Job Submission

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 33/51

Gauss Form your job submission script. Use the text editor “vi” or “pico” or

use “Editpad”. Normally, we use a file extension “.sh” to name script

Submit the job submission script: qsub mycode.sh Check state: qstat (OR) jobwatch -u

If error, check the error / output stream file “mycode.sh.oXXXX ” and

make corrections

33

#! /bin/bash

#$ -j y

#$ -cwd

#$ -m e

#$ -M [email protected]

#$ -l intel32=1

gsrun –b mygausscode.e.gcg

GE switches

Code specifics

Example: mycode.sh

Job Submission

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 34/51

ILOG Cplex ILOG Cplex is available only on the 64-bit platform and each job

utilises a license from the 20 network licenses available on campus

To run a Cplex optimization on a model:

Formulate and program the model

Write a cplex script to read the model and run optimization

34

 maximize

x1 + 2 x2 + 3 x3

subject to

-x1 + x2 + x3 <= 20

x1 - 3 x2 + x3 <=30 bounds

0 <= x1 <= 40

0 <= x2

0 <= x3

End

Filename: problem.lp

read problem.lp

optimize

display solution variables x1-x3

quit

Filename: cplex-script

Job Submission

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 35/51

ILOG Cplex Form your job submission script. Use the text editor “vi” or “pico” or

use “Editpad”. Normally, we use a file extension “.sh” to name script

Submit the job submission script: qsub mycode.sh Check state: qstat (OR) jobwatch -u

If error, check the error / output stream file “mycode.sh.oXXXX ” and

make corrections

35

#! /bin/bash

#$ -j y

#$ -cwd

#$ -m e

#$ -M [email protected]

#$ -l intel32=1

cplex < cplex-script > out.txt

GE switches

Code specifics

Example: mycode.sh

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 36/51

JAVA The Sun Java Software Development Kit (SDK) is available on both

the 32-bit and 64-bit platforms. You can compile your JAVA code on

the frontend nodes before job submission

To submit your job, form your job submission script. Use the texteditor “vi” or “pico” or use “Editpad”. Normally, we use a file

extension “.sh” to name script

36

#! /bin/bash

#$ -j y

#$ -cwd

#$ -m e

#$ -M [email protected]

#$ -l intel32=1

. /etc/profile.d/java.sh

java myjavacode

GE switches

Code specifics

example:  mycode.sh

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 37/51

JAVA Submit the job submission script: qsub mycode.sh

Check state: qstat (OR) jobwatch -u

If error, check the error / output stream file “mycode.sh.oXXXX ” and

make corrections

37

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 38/51

MATLAB MATLAB is available on both the 32-bit and 64-bit compute nodes.

The version on the main frontend nodes are full versions (to support

compiling) while the compute nodes have only the followingtoolboxes only: Financial

Optimization

Splines

Statistics

Each job utilises the network licenses available on campus

We encourage users to compile MATLAB codes to conservelicenses whenever possible (2 slides later…)

38

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 39/51

MATLAB Form your job submission script. Use the text editor “vi” or “pico” or

use “Editpad”. Normally, we use a file extension “.sh” to name script

Submit the job submission script: qsub mycode.sh Check state: qstat (OR) jobwatch -u

If error, check the error / output stream file “mycode.sh.oXXXX ” and

make corrections

39

#! /bin/bash

#$ -j y

#$ -cwd

#$ -m e

#$ -M [email protected]

#$ -l intel32=1

 matlab -r ‘mymatlabcode’

GE switches

Code specifics

Example: mycode.sh

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 40/51

Compiled MATLAB Codes A compiled MATLAB code binary uses the MATLAB Runtime Library

to execute

This way of running your MATLAB code does not use any of thenetwork licenses which is shared by the entire campus

Most MATLAB codes can be compiled but there are minorexceptions

To compile a MATLAB code, it has to be turned into a function

example:

40

function main 

mat1 = magic(4)mat2 = mat1 * mat1

exit 

Your matlab code ->

<- Function header

<- exit the code

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 41/51

Compiled MATLAB Codes To compile the function, choose your platform of choice and launch

MATLAB

Then run the MATLAB compile command “mcc”

example: mcc -mv mymatlabcode.m

It even automatically source for your own sub-functions. ExitMATLAB to free the Compiler license (only 1 on campus)

The output files created follows your filename: mymatlabcode

mymatlabcode.ctf

mymatlabcode*.c

mymatlabcode.prj

mccExcludedFiles.log 41

These files can be removed

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 42/51

Compiled MATLAB Codes In the job submission script, the platform-dependent Runtime Libraries (RT)

have to be “exported” into the environment

42

#! /bin/bash

#$ -j y

#$ -cwd

#$ -m e

#$ -M [email protected]

#$ -l intel32=1

. /etc/profile.d/matlab.sh

./mymatlabcode

GE switches

Code specifics

eg: mycode.sh (the 32-bit RT Library exported in a single line)

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 43/51

Compiled MATLAB Codes Submit the job submission script: qsub mycode.sh

Check state: qstat (OR) jobwatch -u

If error, check the error / output stream file “mycode.sh.oXXXX ” andmake corrections

If the RT libraries are not exported, you will see such errors:

./mymatlabcode: error while loading shared libraries:

libmwmclmcrrt.so.7.5: cannot open shared object file: No such

file or directory

You can always find updated export paths in the general-submissionscript in the beowulf-samples folder

43

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 44/51

R  R is available on both the 32-bit and 64-bit platforms.

We can include contributed extension packages base on your needs

To submit R jobs, form your job submission script. Use the text editor“vi” or “pico” or use “Editpad”. Normally, we use a file extension “.sh”

to name scripts

44

#! /bin/bash

#$ -j y

#$ -cwd

#$ -m e

#$ -M [email protected]

#$ -l intel32=1

R -b –-vanilla < myRcode.R 

GE switches

Code specifics

Example: mycode.sh

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 45/51

R  Submit the job submission script: qsub mycode.sh

Check state: qstat (OR) jobwatch -u

If error, check the error / output stream file “mycode.sh.oXXXX ” and

make corrections

45

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 46/51

STATA STATA is available only on the 64-bit platform.

To submit STATA jobs, form your job submission script. Use the texteditor “vi” or “pico” or use “Editpad”. Normally, we use a fileextension “.sh” to name scripts

46

#! /bin/bash#$ -j y

#$ -cwd

#$ -m e

#$ -M [email protected]

#$ -l intel32=1

stata -b do mySTATAcode.R 

GE switches

Code specifics

Example: mycode.sh

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 47/51

STATA Submit the job submission script: qsub mycode.sh

Check state: qstat (OR) jobwatch -u

If error, check the error / output stream file “mycode.sh.oXXXX ” and

make corrections

47

Job Submission

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 48/51

Common Errors Job state is “Eqw” (error queuing)Cause: - was the job submission script edited using MS Notepad or Wordpad?

Solution: - delete the job from the queue- delete the job script and redo a new job script using a Linux friendly text editor

Job disappeared immediately from queue but results are missing

Cause: - execution terminated due to errors like missing export variables or filesSolution: - check the error / output stream file for hints of problem

- check the software-specific export variables for mistakes if any

- make sure that the path to execute a binary is correct

Error / output stream file shows “segmentation fault”Cause: - usually due to wrong platform of execution for compiled codes

Solution: - check the GE switch “-l [platform]=1” and make sure that the compiled binary

runs on the correct platform

48

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 49/51

Exercise - MATLAB Step 1: move into the beowulf samples matlab sub folder

cd ~/beowulf-samples/matlab

Step 2: see what’s in the folder through windows explorer ( \\beowulf)

ls (OR) ls -l

Step 3: examine the matlab code “matrix.m”

Step 4: edit the job submission script “matlab-submit.sh” and do thefollowing:

Remove 1 hash (#) from the line ##$ -cwd

Remove 1 hash (#) from the line ##$ -l intel32=1

49

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 50/51

Exercise - MATLAB Step 5: back to the shell, make sure you’re in the right sub-folder.

You can check ( /home/wctan/beowulf-samples/matlab):

pwd

Step 6: submit the job

qsub matlab-submit.sh

Step 7: check the job status

qstat (OR) jobwatch -u

Step 8: check the folder again when the job ends. There should bean output file named “output.txt”

50

8/8/2019 Beowulf Training v2.1

http://slidepdf.com/reader/full/beowulf-training-v21 51/51

THE END

WIKI page contribution:http://research2.smu.edu.sg/CAC/HPC/

Click WIKI

HPCC User’s Guide and Training Slides:

http://research2.smu.edu.sg/CAC/HPC/Click Shared Documents

51