Speed up your research: How to get 40 computers to do your...

23
Speed up your research: How to get 40 computers to do your work for you Bingbing Yuan June 19, 2008

Transcript of Speed up your research: How to get 40 computers to do your...

Page 1: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

Speed up your research: How to get 40 computers to do your work for you

Bingbing YuanJune 19, 2008

Page 2: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

• barra: 4 GB RAM

• LSF (Load Sharing Facility) Cluster– 36 machines (+ 42 lab specific machines )

• 34: 4 GB RAM per machine• 2: 8 GB RAM per machine

ncc001

ncc005 ncc64003 ncc64026 ncc64010

User Jobs submitted from barra

Page 3: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

bsub – submit jobs• bsub myscript• Send notification to specified email

– bsub –u [email protected] myscript• Send error and standard output to files

– bsub –e error_file –o std_file myscript– bsub –e error_file –o std_file “myscript >result”

• Send job with specific queue– bsub –q sq32hp myscript

• Send job to a host– bsub –m ncc64022 myscript

Page 4: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

Check the job status • bjobs: pending, running and suspending jobs

Page 5: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

bjobsShow all the running jobs

Page 6: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

bjobs

• also show finished jobs: -a

Page 7: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

bkill – kill jobs

• bkill JOBID– bkill 124047

• Kill all jobs– bkill 0

• kill all jobs running as ‘normal’ queue– bkill –q normal 0

Page 8: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

• bpeek – peek at the stdout and stderr output of unfinished job– bpeek JOBID

• bpeek 124047• bstop - suspends unfinished jobs

– bstop 124047

• bresume - resumes suspended jobs– bresume 124047

Page 9: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

LSF commands are shown in black; Job states are shown in blue; Job events are shown in red;

host resource information is shown in purple.

Picture from Los Alamos National Laboratory ( http://asci-training.lanl.gov/LSF/ )

Page 10: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

LSF selects which job to run next based on:

• Resources requirements of the applications – queue– job requirement

• Current load conditions• How important you are

Page 11: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

bqueues -- queue

TotalJobSlots

JobSlotsPending

JobSlotsRunning

JobSlotssuspended

Page 12: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

32-bit nodes

highpriority

Mediumpriority

Lowpriority

sq32hp<20 min

lq32hp>20 min

sq32mp<20 min

lq32mp> 20 min

sq32lp< 20 min

64-bitnodes64-bitnodes

highpriorityhigh

prioritylow

prioritylow

priority

sq64hp<20 minsq64hp<20 min

lq64hp>20 minlq64hp>20 min

lq64lp>20 minlq64lp

>20 min

normal : default queuepriority: 10restart to lq64lp after 30min

queues in the cluster

high: 50medium: 40low: 20-30

Page 13: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

Only available on 32-bit nodes

• RepeatMasker– Mask repetitive DNA

• EMBOSS applications– A suite for sequence analysis– http://iona.wi.mit.edu/bio/tools/emboss/

Page 14: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

LSF selects which job to run next based on:

• Resources requirements of the applications – queue: bsub –q sq32hp myscript– job requirement

• Current load conditions• How important you are

Page 15: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

Request 1G of memory– bsub -R “rusage[mem=1000]” myscript

Standard out from previous job……

……

Page 16: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

CPU factornumber of processors

lshosts –static resource information for the machines

Page 17: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

LSF selects which job to run next based on:

• Resources requirements of the applications – queue: bsub –q sq32hp myscript– job requirement

• Current load conditions• How important you are

Page 18: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

lsload-current dynamic load activity

Accept job ? Load index

CPU utilization

available RAM

available swap space

free space in /tmp

Page 19: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

bhosts –static and dynamic resources

Accept job?

Max jobSlots per user

Max jobSlots per host

JobSlots started

Page 20: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

LSF selects which job to run next based on:

• Resources requirements of the applications – queue: bsub –q sq32hp myscript– job requirement

• Current load conditions• How important you are

Page 21: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

User priority

• bqueues -l normal…

Page 22: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

Commands we have learned

• bsub• bjobs• bpeek• bstop• bresume• bkill

• bqueues• lshosts• lsload• bhosts

Page 23: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –

References

• Platform LSF Reference:– Descriptions of all commands

• Running Jobs with Platform LSF– Introduction to basic concepts of LSF software to

run and monitor jobs

http://iona.wi.mit.edu/bio/bioinfo/docs/LSF_help.html