How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP...

22
How to get started on cees Mandy SEP Style

Transcript of How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP...

Page 1: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

How to get started on cees

Mandy

SEP Style

Page 2: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

ResourcesCees-clustersSEP-reserved disk 20TBSEP reserved node 35 (currently 25)Default max node 149 (8 cores per node)Computer node hardware 2.26 GHz Dual Processor Quad-Core Nehalem

cees-rcfSEP-reserved disk 30TBSEP reserved node 21 (16 cores per node)Default max node 137 (16 cores per node)Computer node hardware sandy bridge

Page 3: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

Home and working directories

• /home/username– 10GB quota– Backed up daily– Mounted read-only on compute nodes

• /data/sep/username– Everyone have write access to 20TB in /data/cees– Not backed up– SEP partition in /data/sep (20TB for cees-clusters and 30TB for cees-rcfs)

• Options1) Run your code in /home but use absolute paths for outputting in /data2) Run your code in /data but back-up your code in /home

• TipsA lot faster to write to /tmp within each node first and then copy back to /data

Page 4: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

Where is SEPlib?

• # my own environmental variable• setenv SEP /usr/local/SEP• setenv SEPINC /usr/local/SEP/include• setenv SEPBIN /usr/local/SEP/bin

Page 5: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

How to submit a job

Number of nodes and cores you need

Page 6: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

How to submit a job

The max run time of your job before it is killedNote: must be < 2hours for default queue

Page 7: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

How to submit a job

Stdout and Stderr logs

Page 8: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

How to submit a job

Queue, either default or sep

Page 9: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

How to submit a job

Jobname

Page 10: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

How to submit a job

The command for your jobs

Page 11: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

How to submit a job

Submit your job using qsub

Page 12: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

Do not run big jobs on the head node

-Talk to Dennis when moving large dataset- You can use cees-rcf-tools to test jobs as well

Page 13: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

Check jobs

Page 14: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

Cancel jobs

Page 15: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

Need 40 nodes

Need 1 node

Need 40 nodes

Need 1 node

Need 40 nodes

Ex. Stacking, step sizes, updating

Ex. Pre-stack forwardor adjoint operation

Typical computation structure

1 job or many jobs?

Page 16: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

reserved queue jobs can run forever

default queue jobs must finish in 2 hours

Waiting…

Page 17: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.
Page 18: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

Need 40 nodes

Need 1 node

Need 40 nodes

Need 1 node

Need 40 nodes

Ex. Stacking, step sizes, updating

Ex. Pre-stack forwardor adjoint operation

I am taking over every single node.

muahahaha

Page 19: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

Bob’s advice

• Break your jobs into 2 hours block and use the default queue

• Only store intermediate result on the clusters

Page 20: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

Scripting is useful for job management

• On cees-clusters• /data/sep/mandyman/Tutorial

1. Embarrassingly parallel jobs submission2. Timer to check jobs

Page 21: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

Sharing resourcesWe are here now

Page 22: How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.

Sharing resources