Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource...
-
Upload
abraham-sanders -
Category
Documents
-
view
220 -
download
0
description
Transcript of Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource...
![Page 1: Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource selection for running a single parallel job on HPC resources.](https://reader036.fdocuments.in/reader036/viewer/2022082415/5a4d1af07f8b9ab05997dd0a/html5/thumbnails/1.jpg)
Master Control Program Subha Sivagnanam
SDSC
![Page 2: Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource selection for running a single parallel job on HPC resources.](https://reader036.fdocuments.in/reader036/viewer/2022082415/5a4d1af07f8b9ab05997dd0a/html5/thumbnails/2.jpg)
Master Control Program
• Provides automatic resource selection for running a single parallel job on HPC resources
• MCP uses directives in batch submission scripts to submit to the queues of multiple resources. Eg: #MCP submit_host <head node for the remote cluster> #MCP username <local username on the remote cluster> #MCP scratch_dir <scratch directory on the remote cluster>
• As soon as the job starts to run on one of the resources, it removes the jobs from all other resources' queues.
![Page 3: Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource selection for running a single parallel job on HPC resources.](https://reader036.fdocuments.in/reader036/viewer/2022082415/5a4d1af07f8b9ab05997dd0a/html5/thumbnails/3.jpg)
Assumption:• User should compile the application on the
desired machines• Input should be staged on the remote
clusters• Submission will be initiated only from one
machine• MCP can be initiated by
– using mcp.py, manually creating job scripts– using fullauto.py, automating job scripts based on
desired attributes
![Page 4: Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource selection for running a single parallel job on HPC resources.](https://reader036.fdocuments.in/reader036/viewer/2022082415/5a4d1af07f8b9ab05997dd0a/html5/thumbnails/4.jpg)
MCP flow• Grid credential needs to be established (grid-proxy-init or
myproxy-get-delegation )• Write job script for each resource
Example – NCSA jobscript#!/bin/ksh #MCP qtype pbs #MCP submit_host tg-login.ncsa.teragrid.org#MCP username your_username #MCP scratch_dir /home/ncsa/your_username/info/mcp/test/mcp #PBS -l walltime=00:05:00,nodes=4:ppn=2:compute #PBS -d /home/ncsa/your_username/info/mcp/test/run NPROCS=`wc -l < $PBS_NODEFILE` /usr/local/mpich/mpich-gm-1.2.5..10-intel-r2/bin/mpirun -v -machinefile $PBS_NODEFILE -np $NPROCS /home/ncsa/your_username/testprog/ring26 -t 10 -n 2 -l 10 -i 0.03125 #/bin/sleep 900
![Page 5: Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource selection for running a single parallel job on HPC resources.](https://reader036.fdocuments.in/reader036/viewer/2022082415/5a4d1af07f8b9ab05997dd0a/html5/thumbnails/5.jpg)
• User submits the job files to MCP with job files as the input. ./mcp.py [--debug] <submit_script1> <submit_script2>
• MCP submits jobs to all clusters and monitors all clusters for job start
• Once one job starts, MCP cancels all other jobs
![Page 6: Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource selection for running a single parallel job on HPC resources.](https://reader036.fdocuments.in/reader036/viewer/2022082415/5a4d1af07f8b9ab05997dd0a/html5/thumbnails/6.jpg)
Fullauto Flow• User runs grid-proxy-init or myproxy-get-delegation to establish grid credential.• autojob.py is created with personalized settings.
Eg:match_attributes = { 'CPU_MODEL' : ['==', 'ia64'], 'CPU_MEMORY_GB' : ['>=', 2], 'CPU_MHZ' : ['>=', 1300], 'CPU_SMP' : ['>=', 2], 'NODECOUNT' : ['>=', 128], }machine_dict_list = [ { 'HOSTNAME' : 'tg-login.ncsa.teragrid.org','substitutes_dict' : { 'arguments' : ['-t', '100', '-n', '10', '-l', '4000', '-i', '0.03125', '-c', '0', '-s', '0'], 'wallclock_seconds' : '300', ‘__MCP_SHELL__' : '/bin/ksh', ‘__MCP_PARALLEL_RUN__' : '/usr/local/mpich/mpich-gm-1.2.6..14b-intel-r2/bi n/mpirun', ‘__MCP_SERIAL_RUN__' : '#', ‘__MCP_NODES__' : '4', ‘__MCP_CPUS_PER_NODE__' : '2', ‘__MCP_USERNAME__' : 'your_username', ‘__MCP_SCRATCH_DIR__' : '/home/ncsa/your_username/info/mcp/test/mcpdata', ‘__MCP_JOB_DIR__' : '/home/ncsa/your_username/info/mcp/test/run', ‘__MCP_EXECUTABLE__' : '/home/ncsa/your_username/testprog/ring26', }, }, ]
![Page 7: Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource selection for running a single parallel job on HPC resources.](https://reader036.fdocuments.in/reader036/viewer/2022082415/5a4d1af07f8b9ab05997dd0a/html5/thumbnails/7.jpg)
• User runs fullauto.py with autojob.py as the input.fullauto.py --autojobfile=<autojob file>
• Fullauto finds clusters from the allowable list of resources (automachine.py) and creates job scripts for each selected cluster.
• Fullauto uses MCP to run the scripts.
![Page 8: Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource selection for running a single parallel job on HPC resources.](https://reader036.fdocuments.in/reader036/viewer/2022082415/5a4d1af07f8b9ab05997dd0a/html5/thumbnails/8.jpg)
Resources available• Fullauto.py –attributes or from
automachine.pyResource Name LocationQueen Bee (Dell IA64 cluster)
LONI
Mercury (Intel IA64 cluster)
NCSA
Abe (Dell Intel IA64 cluster)
NCSA
Lonestar (Dell 1955 cluster)
TACC
Steele (Dell 1950 cluster)
Purdue