Koç University High Performance Computing Labs Hattusas & Gordion.

23
Koç University High Performance Computing Labs Hattusas & Gordion

Transcript of Koç University High Performance Computing Labs Hattusas & Gordion.

Page 1: Koç University High Performance Computing Labs Hattusas & Gordion.

Koç UniversityHigh Performance Computing Labs

Hattusas & Gordion

Page 2: Koç University High Performance Computing Labs Hattusas & Gordion.

General Information

• There are two clusters in High Performance Labs.

• The name of the first cluster is Hattusas.• It consist of 32 nodes and each node has one CPU.• Hattusas will be used by both Science and Engineering

Faculty.• The name of the other cluster is Gordion.• It consist of 8 nodes and each node has one CPU.• Gordion will be used for special purpose projects by only

Engineering Faculty.• Linux Red Hat 9.0 is installed on both clusters.

Page 3: Koç University High Performance Computing Labs Hattusas & Gordion.

The Hardware of Hattusas

• The hardware of the nodes of Hattusas is:• CPU: 1 Intel Pentium 4 2.4 Ghz• Memory:• Storage:• Network:• CD-Rom:• The hardware of Hattusas server is:• CPU: 2 Intel Pentium 4 2.4 Ghz• Memory:• Storage:• Network:• CD-Rom:

Page 4: Koç University High Performance Computing Labs Hattusas & Gordion.

Hattusas Installation

• Hattusas is installed by the latest version of the OSCAR version 3.0 .

• “OSCAR version 3.0 is a snapshot of the best known methods for building, programming, and using clusters. It consists of a fully integrated and easy to install software bundle designed for high performance cluster computing. Everything needed to install, build, maintain, and use a modest sized Linux cluster is included in the suite, making it unnecessary to download or even install any individual software packages on your cluster.”

• http://oscar.openclustergroup.org

Page 5: Koç University High Performance Computing Labs Hattusas & Gordion.

The Software of Hattusas• The software of the Hattusas which is automatically installed by

OSCAR is:• C3 - http://www.csm.ornl.gov/torc/C3/• LAM/MPI - http://www.lam-mpi.org/ • Maui PBS Scheduler - http://supercluster.org/maui/ • MPICH - http://www-unix.mcs.anl.gov/mpi/mpich/ • OpenPBS - http://www.openpbs.org/ • OpenSSH - http://www.openssh.com/ • OpenSSL - http://www.openssl.org/ • PVM - http://www.csm.ornl.gov/pvm/ • System Installation Suite - http://www.sisuite.org/ • Older OSCAR version : LUI -

http://oss.software.ibm.com/developerworks/projects/lui/

Page 6: Koç University High Performance Computing Labs Hattusas & Gordion.

Partition on Hattusas

• Hattusas Node

Home70

Projects70

Others70

Scretch70

Others20

Projects20

Home70

•The home partition on the nodes are mounted from Hattusas server. Consequently, you can read from and write to your account from both server and nodes

•If your program generates large temporary files, you can use projects partition on the nodes. It is available to write and each node has its own projects partition whose capacity is 20 GB.

Page 7: Koç University High Performance Computing Labs Hattusas & Gordion.

How Hattusas is controlled

• OpenPBS is responsible for the Job Management in Hattusas.

• “OpenPBS is the original version of the Portable Batch System. It is a flexible batch queueing system developed for NASA in the early to mid-1990s. It operates on networked, multi-platform UNIX environments.”

• http://www.openpbs.org/• A detailed guide for OpenPBS (Portable Batch System

Release 2.3) can be found at:• http://www.chl.chalmers.se/~eb/pbs/files/v2.3_admin.pdf

Page 8: Koç University High Performance Computing Labs Hattusas & Gordion.

Queue Structure of Hattusas• There two kinds of queues, the first one is routing queue and the

second one is execution queue.• In Hattusas, a routing queue named submitq and 8 execution queue

is defined.• However job submission is only done to the routing queue.• The job of routing is the look at the resource requirements of the

program and decide to which queue will the the job go.• According resource requirments, it will send to the correct queue.• By the queue structure, the aim is to optimize the usage of cluster.

So, the other 8 execution queues are designed for this purpose.• The last thing that you should be aware of is that this queue

sturucture is not a first in first out queue. Every queue has its own priority. Therefore, the job that goes to that queue has the same priority of that queue.

• However, inside the each individual queues, the working mechenism is first in first out.

Page 9: Koç University High Performance Computing Labs Hattusas & Gordion.

How to Create an Account on Hattusas

• Account creation will be held by the CIT.

• After your account has been created, you will be able to use Hattusas cluster.

Page 10: Koç University High Performance Computing Labs Hattusas & Gordion.

How to Login to Hattusas

• Hattusas server is open to network so you can connect from everywhere.

• You can connect to Hattusas by either ssh or telnet.

• You don’t need to know the IP address of the Hattusas inside the school.

• For instance: ssh hattusas.eng.hpc.ku.edu.tr• The name of the server is automatically known

by all the terminals inside the school. • To able to connect to Hattusas, you need only

an account on Hattusas server.

Page 11: Koç University High Performance Computing Labs Hattusas & Gordion.

How to Use Cluster

• To be able to use Hattusas, you need only an account on Hattusas.

• After your account has been created, you will be able to use the cluster.

• During the account creation progress, your username will be added to the all nodes.

• Consequently, you will be able to pass from Hattusas to its nodes by rsh or ssh(which can be a requirment for certain programs).

Page 12: Koç University High Performance Computing Labs Hattusas & Gordion.

OpenPBS• The Portable Batch System, PBS, is a batch job and computer

system resource management package.• While you are using cluster, you will need OpenPBS to control your

jobs.• PBS consist of four major components: commands, the job Server,

the job executor, and the job Scheduler.• Commands: PBS supplies both command line commands and a

graphical interface.• Job Server: The Server’s main function is to provide the basic batch

services such as receiving/creating a batch job, modifying the job, protecting the job against system crashes, and running the job.

• Job Executor: The job executor is the daemon which actually places the job into execution.

• Job Scheduler: The Job Scheduler is another daemon which contains the site’s policy controlling which job is run and where and when it is run.

Page 13: Koç University High Performance Computing Labs Hattusas & Gordion.

OpenPBS

• Currently, PBS provides two GUIs: xpbs and xpbsmon.• xpbs provides a user-friendly point-and-click interface to

the PBS commands. The xpbs(1) man page provides full information on configuring and running xpbs.

• xpbsmon is the node monitoring GUI for PBS. It is used for displaying graphically information about execution hosts in a PBS environment. Its view of a PBS environment consists of a list of sites where each site runs one or more Servers, and each Server runs jobs on one or more execution hosts (nodes).

Page 14: Koç University High Performance Computing Labs Hattusas & Gordion.

How to submit Jobs

• There are two ways two submit your job.• The first one is submitting through scripts and the

second one is through the command line .• To submit your job, the first thing, that you should be

aware of, is the resources that your job requires.• Because the scheduling system of Hattusas needs job

requirements to put your job in the correct queue.• Consequently, while submitting your job, you should

specify the job requirements; otherwise your job won’t begin execution.

Page 15: Koç University High Performance Computing Labs Hattusas & Gordion.

How to submit Jobs• The first way is through the command line.• The first line is always standard for any shell script which specifies

the name of the shell for executing the commands. Then, it will consist of resource requirements, job attributes and the executable name. All pbs directives for resources and job attributes in a shell script start with #PBS. The executable can have arguments too.

• Example of a PBS sample job script that runs the executable name `subrun':

• #! /bin/sh• #PBS -l walltime=2:00:00• #PBS -l mem=800mb• #PBS -l ncpus=1• #PBS -j oe cd /homes/agarwal/release/workdir ./subrun

Page 16: Koç University High Performance Computing Labs Hattusas & Gordion.

How to submit Jobs

• To submit the script to PBS, you use the qsub command. For instance, if the script were called myscript you'd submit it using

• qsub myscript• 3212.hattusas.eng.hpc.ku.edu.tr• The second line is the job identifier returned by PBS, and

indicates that the script has been accepted. • Notes• You must specify the number of CPUs your job needs (-l

ncpus=), memory (-l mem=)(optional), and the maximum wall clock time (-l walltime=).

• If you do not specify any values, your job will not be accepted by PBS.

Page 17: Koç University High Performance Computing Labs Hattusas & Gordion.

How to submit Jobs

• Checking the Status of PBS Jobs• The qstat command is for checking the PBS job

status. If you want to display full or long information about the job whose id is 3212, use:

• [test@hattusas test]$ qstat -f 3212• Deleting a Job from the Queue• The qdel command is used to delete any job

from the queue. Suppose you want to delete a job with the job id 3212, then use:

• [test@hattusas test]$ qdel 3212

Page 18: Koç University High Performance Computing Labs Hattusas & Gordion.

How to submit Jobs

• The second way is the submit through the command line using qsub command.

• [test@hattusas test]$ qsub -V -l nodes=muse21 -N myjob

• Here myjob is the specified job name

• Again the other check and delete commands are valid also by this method.

Page 19: Koç University High Performance Computing Labs Hattusas & Gordion.

A Sample PBS Batch Submission Script

• #!/bin/csh• #• # file:    pbs.template• #• # purpose: template for PBS (Portable Batch System) script• # • # remarks: a line beginning with # is a comment;• # a line beginning with #PBS is a pbs command;• # assume (upper/lower) case to be sensitive;• #• # use:     submit job with• # qsub pbs.template• #• # job name (default is name of pbs script file)• #PBS -N myjob• #• # resource limits: number of CPUs to be used• #PBS -l ncpus=25• #• # resource limits: amount of memory to be used• #PBS -l mem=213mb

Page 20: Koç University High Performance Computing Labs Hattusas & Gordion.

A Sample PBS Batch Submission Script

• # resource limits: max. wall clock time during which job can be running• #PBS -l walltime=3:20:00• #• # path/filename for standard output• #PBS -o mypath/my.out• #• # path/filename for standard error• #PBS -e mypath/my.err• #• # queue name, one of {submit, special express}• # The default queue, "submit", need not be specified• #PBS -q submit• #• # group account (for example, g12345) to be charged• #PBS -W group_list=g12345• #• # files to be copied to execution server before script processing starts• # usage: -W stagein=local-filename@remotehost:remote-filename• #PBS -W stagein=my.input@msa01-h:runs/input/my.input• #• # files to be copied from execution server after script processing• # usage: -W stageout=local-filename@remotehost:remote-filename• #PBS -W stageout=my.output@msa01-h:runs/output/my.outout

Page 21: Koç University High Performance Computing Labs Hattusas & Gordion.

A Sample PBS Batch Submission Script

• # start job only after MMDDhhmm, where M=Month, D=Day, h=hour, m=minute• # e.g., July 4th, 14:30• #PBS -a 07041430• #• # send me mail when job begins• #PBS -m b• # send me mail when job ends• #PBS -m e• # send me mail when job aborts (with an error)• #PBS -m a• # if you want more than one message, you must group flags on one line,• # otherwise, only the last flag selected executes:• #PBS -mba• #• # do not rerun this job if it fails• #PBS -r n• #• # export all my environment variables to the job• #PBS -V

Page 22: Koç University High Performance Computing Labs Hattusas & Gordion.

A Sample PBS Batch Submission Script

• # Using PBS - Environment Variables• # When a batch job starts execution, a number of environment variables are # predefined, which include:• #• # Variables defined on the execution host.• # Variables exported from the submission host with• # -v (selected variables) and -V (all variables).• # Variables defined by PBS.• #• # The following reflect the environment where the user ran qsub:• # PBS_O_HOST    The host where you ran the qsub command.• # PBS_O_LOGNAME Your user ID where you ran qsub.• # PBS_O_HOME    Your home directory where you ran qsub.• # PBS_O_WORKDIR The working directory where you ran qsub.• #• # These reflect the environment where the job is executing:• # PBS_ENVIRONMENT       Set to PBS_BATCH to indicate the job is a batch job, or• # to PBS_INTERACTIVE to indicate the job is a PBS interactive job.• # PBS_O_QUEUE   The original queue you submitted to.• # PBS_QUEUE     The queue the job is executing from.• # PBS_JOBID     The job's PBS identifier.• # PBS_JOBNAME   The job's name.• ###

Page 23: Koç University High Performance Computing Labs Hattusas & Gordion.

END