USC Viterbi School of Engineering Ewa Deelman Resource Management.

42
USC Viterbi School of Engineering Ewa Deelman Resource Management

Transcript of USC Viterbi School of Engineering Ewa Deelman Resource Management.

Page 1: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Ewa Deelman

Resource Management

Page 2: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Outline

• General resource management framework• Managing a set of resources

– Locally

– Across the network

• Managing distributed resources

Page 3: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Types of resources

LAN

O(100GB-1TB)

cluster

Myrinet, infiniband

ethernet

Local system

Loose set of resources

headnode

Page 4: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Resource managers/schedulers

RemoteResource

Loc

al S

ched

ule

r

Rem

ote

Con

nec

tivi

ty

Local Resource R

emot

e S

ub

mis

sion

Wor

klo

ad M

anag

emen

t

network “Su

per

” S

ched

ule

r

Wor

kfl

ow

Man

agem

ent

Page 5: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Basic Resource Management Functionality

• Accept requests• Start jobs on the resource• Log information about jobs• Reply to queries about request status• Prioritize among requests• Provide fault tolerance• Provide resource reservation• Management of dependent jobs

Page 6: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Schedulers/Resource Managers

• Local system– OS, scheduling across multiple cores

• Clusters– Basic Scheduler

• PBS, LSF, torque, Condor

– “Super” Scheduler• Maui

• Loose set of resources– Condor

Page 7: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Schedulers/Resource Managers

• Remote Connectivity– Globus GRAM

• Remote Submission to possibly many sites– Condor-G

• Workload management– Condor

• Workflow management– DAGMan

– Pegasus

Page 8: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Resource managers/schedulers

RemoteResource

Loc

al S

ched

ule

r

Rem

ote

Con

nec

tivi

ty

Local Resource R

emot

e S

ub

mis

sion

Wor

klo

ad M

anag

emen

t

network “Su

per

” S

ched

ule

r

Wor

kfl

ow

Man

agem

ent

PBS

O(100GB-1TB)

cluster

headnode

Page 9: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Cluster Scheduling--PBS

– Aggregates all the computing resources (cluster nodes) into a single entity

– Schedules and distributes applications• On a single node, across multiple nodes

– Supports various types of authentication• User name based• X509 certificates

– Supports various types of authorization• User, group, system

– Provides job monitoring • Collects real time data on the state of the system and job status• Logs real time data and other info on queued jobs,

configuration information, etc

Page 10: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Cluster Scheduling PBS

• Performs data stage-in and out onto the nodes• Starts the computation on a node• Uses backfilling for job scheduling• Supports simple dependencies

– Run Y after X

– If X succeeds run Y otherwise run Z

– Run Y after X regardless of X success

– Run Y after specified time

– Run X anytime after specified time

Page 11: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Cluster Scheduling--PBS

• Advanced features– Resource reservation

• Specified time (start_t) and duration (delta)• Checks whether reservation conflicts with running jobs, and

other confirmed reservation• If OK—sets up a queue for the reservation with a user-level acl• Queue started at start_t• Jobs killed past the reservation time

– Cycle harvesting• Can include idle resources into the mix

– Peer scheduling• To other PBS systems• Pull-based

Page 12: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Cluster Scheduling--PBS

• Allocation properties– Administrator can revoke any allocation (running or

queued)

– Jobs can be preempted• Suspended

• Checkpointed

• Requeued

– Jobs can be restarted X times

Page 13: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Single Queue Backfilling

• A job is allowed to jump in the queue ahead of jobs that are delayed due to lack of resources

• Non-preemptive • Conservative backfilling

– A job is allowed to jump ahead provided it does not delay any previous job in the queue

• Aggressive backfilling– A job is allowed to jump ahead if the first job is not

affected

– Better performing then conservative backfilling

Page 14: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Aggressive backfilling

• Job (arrival time, number of procs, expected runtime)

• Any job that executes longer than expected runtime is killed

• Define pivot—first job in the queue• If enough resources for pivot, execute pivot and

define a new pivot• Otherwise sort all currently executing jobs in

order of their expected completion time– Determine pivot time—when sufficient resources for

pivot are available

Page 15: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Aggressive backfilling

• At pivot time, any idle processors not required for pivot job are defined as extra processors

• The scheduler searches for the first queued job that– Requires no more than the currently idle processors and

will finish by the pivot time, or

– Requires no more than the minimum currently idle processors and the extra processors

• Once a job becomes a pivot, it cannot be delayed• Possible that a job will end sooner and pivot starts

before pivot time

Page 16: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Aggressive backfilling example

Job A(10 procs,40mins)

8

time

QueueJob B (12 procs, 1 hour)Job C (20 procs, 2 hours)Job D (2 procs, 50 mins)Job E (6 procs, 1 hour)Job F (4 procs, 2 hours)

10

processors

Running Job A(10 procs, 40 mins)

Page 17: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Aggressive backfilling example

Job A(10 procs,40mins)

8

time

queueJob B (12 procs, 1 hour)Job C (20 procs, 2 hours)Job E (6 procs, 1 hour)Job F (4 procs, 2 hours)

10

processorsPIVOTJob B

JOB D

Running Job A(10 procs, 40 mins)Job D (2 procs, 50 mins)

Page 18: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Aggressive backfilling example

QueueJob B (12 procs, 1 hour)Job C (20 procs, 2 hours)Job E (6 procs, 1 hour)

Job A(10 procs,40mins)

8

time

10

processorsPIVOTJob B

Job D

Running Job A(10 procs, 40 mins)Job D (2 procs, 50 mins)Job F (4 procs, 2 hours)

Job F

Page 19: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Aggressive backfilling example

QueueJob C (20 procs, 2 hours)Job E (6 procs, 1 hour)

Job A(10 procs,40mins)

8

time

10

processorsJob B

Job DJob F

Running Job D (2 procs, 50 mins)Job F (4 procs, 2 hours)Job B (12 procs, 1 hour)

Page 20: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Aggressive backfilling example

QueueJob C (20 procs, 2 hours)Job E (6 procs, 1 hour)

Job A(10 procs,40mins)

8

time

10processors

Job B

Job DJob F

Running Job D (2 procs, 50 mins)Job F (4 procs, 2 hours)Job B (12 procs, 1 hour)

PivotJob C (20 procs, 2 hours)

Page 21: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Multiple-Queue Backfill

From “Self-adapting Backfilling Scheduling for Parallel Systems”, B.G.Lawson et al. Proceedings of the 2002 International Conference on Parallel Processing (ICPP'02), 2002

Page 22: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Resource managers/schedulers

RemoteResource

Loc

al S

ched

ule

r

Rem

ote

Con

nec

tivi

ty

Local Resource R

emot

e S

ub

mis

sion

Wor

klo

ad M

anag

emen

t

network “Su

per

” S

ched

ule

r

Wor

kfl

ow

Man

agem

ent

Maui

Page 23: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

“Super”/External Scheduler--Maui

• External scheduler– Does not mandate a specific resource manager– Fully controls the workload submitted through the local

management system– Enhances resource manager capabilities– Supports advance reservations

• <start-time, duration, acl>• Irrevocable allocations

– For time critical tasks such as weather modeling– The reservation is guaranteed to start at a given time and last for a

given duration regardless of future workload

• Revocable allocations– The reservation would be released if it precludes a higher priority

request from being granted

Page 24: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Maui

• Retry “X” times mechanism for finding resources and starting a job– If job does not start—can be put on hold

• Can be configured preemptive/nonpreemptive• Exclusive allocations

– No sharing of resources

• Malleable allocations– Changes can be made in time and space

• Increases if possible

• Decreases always

Page 25: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Maui

• Resource Querying– Request (account mapping, minimum duration,

processors)– Maui returns a list of tuples

• <feasible start_time, total available resources, duration, resource cost, quality of the information>

– Can also return a single value which incorporate• Anticipated, non-submitted jobs• Workload profiles

• Provides event notifications– Solitary and threshold events– Job/reservation: creation, start, preemption, completion,

various failures

Page 26: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Maui

• Courtesy allocation– Placeholder which is released if confirmation is not

received within a time limit

• Depending on the underlying scheduler Maui can– Suspend/resume

– Checkpoint/restart

– Requeue/restart

Page 27: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Resource managers/schedulers

RemoteResource

Loc

al S

ched

ule

r

Rem

ote

Con

nec

tivi

ty

Local Resource R

emot

e S

ub

mis

sion

Wor

klo

ad M

anag

emen

t

network “Su

per

” S

ched

ule

r

Wor

kfl

ow

Man

agem

ent

Globus GRAM

Page 28: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

GRAM - Basic Job Submission and Control Service

• A uniform service interface for remote job submission and control– Includes file staging and I/O management

– Includes reliability features

– Supports basic Grid security mechanisms

– Asynchronous monitoring

– Interfaces with local resource managers, simplifies the job of metaschedulers/brokers

• GRAM is not a scheduler.– No scheduling

– No metascheduling/brokering

Slide courtesy of Stuart Martin

Page 29: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Grid Job Management Goals

Provide a service to securely:• Create an environment for a job• Stage files to/from environment• Cause execution of job process(es)

– Via various local resource managers

• Monitor execution• Signal important state changes to client• Enable client access to output files

– Streaming access during execution

Slide courtesy of Stuart Martin

Page 30: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Job Submission Model

• Create and manage one job on a resource• Submit and wait• Not with an interactive TTY

– File based stdin/out/err

– Supported by all batch schedulers

• More complex than RPC– Optional steps before and after submission message

– Job has complex lifecycle• Staging, execution, and cleanup states

• But not as general as Condor DAG, etc.

– Asynchronous monitoring

Slide courtesy of Stuart Martin

Page 31: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Job Submission Options

• Optional file staging– Transfer files “in” before job execution

– Transfer files “out” after job execution

• Optional file streaming– Monitors files during job execution

• Optional credential delegation– Create, refresh, and terminate delegations

– For use by job process

– For use by GRAM to do optional file staging

Slide courtesy of Stuart Martin

Page 32: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Job Submission Monitoring

• Monitor job lifecycle– GRAM and scheduler states for job

• StageIn, Pending, Active, Suspended, StageOut, Cleanup, Done, Failed

– Job execution status• Return codes

• Multiple monitoring methods– Simple query for current state

– Asynchronous notifications to client

Slide courtesy of Stuart Martin

Page 33: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Secure Submission Model

• Secure submit protocol– PKI authentication

– Authorization and mapping• Based on Grid ID

– Further authorization by scheduler• Based on local user ID

• Secure control/cancel– Also PKI authenticated

– Owner has rights to his jobs and not others’

Slide courtesy of Stuart Martin

Page 34: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Secure Execution Model

• After authorization…• Execute job securely

– User account “sandboxing” of processes• According to mapping policy and request details

– Initialization of sandbox credentials• Client-delegated credentials

• Adapter scripts can be customized for site needs– AFS, Kerberos, etc

Slide courtesy of Stuart Martin

Page 35: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Secure Staging Model

• Before and after sandboxed execution…

• Perform secure file transfers– Create RFT request

• To local or remote RFT service

• PKI authentication and delegation

• In turn, RFT controls GridFTP– Using delegated client credentials

– GridFTP• PKI authentication

• Authorization and mapping by local policy files

• further authorization by FTP/unix perms

Slide courtesy of Stuart Martin

Page 36: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

GRAM WSDLs+

Job Description Schema

(executable, args,env, …)

Users/Applications: Job Brokers, Portals, Command line tools, etc.

Resource Managers: PBS, Condor, LSF, SGE, Loadleveler, Fork

WS standard interfaces for subscription, notification,destruction

GRAM4

Slide courtesy of Stuart Martin

Page 37: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

GRAM4 Approach

GridFTPRFT

Delegation

GridFTP

GRAMservices

local sched.

user job

compute element

compute element and service host(s)

remote storage element(s)

FTP data

FTP control

clie

nt

job submit

delegate

xfer

req

uest

local job control

delegateGRAMadaptersu

do

Slide courtesy of Stuart Martin

Page 38: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

File staging retry policy

• If a file staging operation fails, it may be non-fatal and retry may be desired– Server defaults for all transfers can be configured

– Defaults can be overridden for a specific transfer

• Output can be streamed

Slide courtesy of Stuart Martin

Page 39: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Throttle staging work

• A GRAM submission that specifies file staging imposes load on the service node executing the GRAM service.– GRAM is configured for a

maximum number of “worker” threads and thus a maximum number of concurrent staging operations.

O(100GB-1TB)

cluster

GRAM/PBSHeadnode

Page 40: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Local Resource Manager Interface• The GRAM interface to the LRM to submit,

monitor, and cancel jobs.– Perl scripts + SEG

• Scheduler Event Generator (SEG) generated by local resource manager provides efficient monitoring between the GRAM service and the LRM for all jobs for all users

Slide courtesy of Stuart Martin

Page 41: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

Fault tolerance• GRAM can recover from a container or host

crash. Upon restart, GRAM will resume processing of the users job submission– Processing resumes for all jobs once the service

container has been restarted

Job cancellation• Allow a job to be cancelled

– WSRF standard “Destroy” operation

Slide courtesy of Stuart Martin

Page 42: USC Viterbi School of Engineering Ewa Deelman Resource Management.

USC Viterbi School of Engineering

State Access: Push

• Allow clients to request notifications for state changes– WS Notifications

• Clients can subscribe for notifications to the “job status” resource property

State Access: Pull

• Allow clients to get the state for a previously submitted job– The service defines a WSRF resource property that contains

the value of the job state. A client can then use the standard WSRF getResourceProperty operation.

Slide courtesy of Stuart Martin