Post on 24-Dec-2015
Resource Management
Reading:
“A Resource Management Architecture for Metacomputing Systems”
What is Resource Management?
Mechanisms for locating and allocating computational resourcesAuthenticationProcess creation
Remote job submission Scheduling Other resources that can be managed:
MemoryDisk Networks
Resource Management Issues for Grid Computing
Site autonomyResources owned by different organizations,
in different administrative domainsLocal policies for use, scheduling, security
Heterogeneous substrateDifferent local resource management
systems Policy extensibility
Local sites need ability to customize their resource management policies
More Issues for Grid Computing
Co-allocationMay need resources at several sitesMechanism for allocating multiple
resources, initiating computation, monitoring and managing
On-line controlAdapt application requirements to resource
availability
Specifying Resource and Job Requirements
Resource requirements: Machine typeNumber of nodesMemoryNetwork
Job or scheduler parameters: DirectoryExecutableArgumentsEnvironmentMaximum time required
Resource and Job Specification
Globus: Resource Specification Language (RSL)&(executable=myprog) (|(&(count=5)
(memory>=64)) (&(count=10)(memory>=32)))
Condor: Classified adsResource owners advertise abilities and
constraintsApplications advertise resource requestsMatchmaking: match offers & requests
Components of Globus Resource Management Architecture
Resource specification using RSL Resource brokers: translate resource
requirements into specifications Co-allocators: break down requests for
multiple sites Local resource managers: apply local, site-
specific resource management policies Information about available compute
resources and their characteristics
Resource Specification Language
Common notation for exchange of information between components
API provided for manipulating RSL
RSL Syntax
Elementary form: parenthesis clauses(attribute op value [ value … ] )
Operators Supported:<, <=, =, >=, > , !=
Some supported attributes:executable, arguments, environment, stdin,
stdout, stderr, resourceManagerContact,resourceManagerName
Unknown attributes are passed through May be handled by subsequent tools
Constraints: “&”
For example:
& (count>=5) (count<=10)
(max_time=240) (memory>=64)
(executable=myprog) “Create 5-10 instances of myprog, each
on a machine with at least 64 MB memory that is available to me for 4 hours”
Multirequest: “+”
A multirequest allows us to specify multiple resource needs, for example
+ (& (count=5)(memory>=64)
(executable=p1))
(&(network=atm) (executable=p2))Execute 5 instances of p1 on a machine
with at least 64M of memoryExecute p2 on a machine with an ATM
connection Multirequests are central to co-allocation
Resource Broker
Takes high-level RSL specification Transforms into concrete specifications
through “specialization” process Locate resources that meet requirements
Multiple brokers may service single request Application-specific brokers translate
application requirements
Output: complete specification of locations of resources; given to co-allocator
Examples of Resource Brokers
Nimrod-GAutomates creation and management of
large parametric experimentsRun application under wide range of input
conditions and aggregate resultsQueries MDS to find resourcesGenerates number of independent jobsGRAM allocates jobs to computational nodesHigher-level broker: allows user to specify
time and cost constraints
Examples of Resource Brokers
AppLeSApplication Level SchedulerMap large number of independent tasks to
dynamically varying pool of available computers
Use GRAM to locate resources and initiate and manage computation
Resource co-allocators
May request resources at multiple sitesTwo or more computers and networks
Break multi-request into components Pass each component to resource manager Provide means for monitoring job status or
terminating job Complex:
Two or more resource managersGlobal state like availability of resources
difficult to determine
Different co-allocation services
1. Require all resources to be available before job proceeds; fail globally if failure occurs at any resource
2. Allocate at least N out of M resources and return
3. Return immediately, but gradually return more resources as they become available
Each useful for some class of applications
Concurrent Allocation
If advance reservations are available: Obtain list of available time slots from each
participating resource manager and choose timeslot
Without reservations: Optimistically allocate resources Hope desired set will be available at future time Use information service (MDS) to determine current
availability of resources Construct RSL request that is likely to succeed If allocation fails, all started jobs must be terminated
Disadvantages of Concurrent Allocation Scheme
Computational resources wasted while waiting for all requested resources to become available
Application must be altered to perform barrier to synchronize startup across components
Detecting failure of a resource is difficult, e.g. in queue-based local resource managers
Local Resource Managers
Implemented with Globus Resource Allocation Manager (GRAM)1. Processing RSL specifications representing
resource requests Deny request Create one or more processes (jobs) that satisfy
request
2. Enable remote monitoring and management of jobs
3. Periodically update MDS information service with current availability and capabilities of resources
GRAM (cont.)
Interface between grid environment and entity that can create processesE.g., Parallel scheduler or Condor pool
GRAM may schedule resource itself More commonly, maps resource
specification into a request to a local resource allocation mechanismE.g., Condor, LoadLeveler, LSF
Co-exists with local mechanisms
GRAM (cont.)
GRAM API has functions for:Submitting a job request: produces globally
unique job handleCanceling a job requestAsking when job request is expected to runUpon submission, can request that progress
be signaled asynchronously to callback URL
GRAM Scheduling Model
Jobs are either:Pending: resources have not yet been
allocated to the jobActive: resources allocated, job runningDone: when all processes have terminated
and resources have been deallocatedFailed: job terminates due to :
explicit terminationerror in request formatfailure in resource management systemdenial of access to resource
GRAM Components Gatekeeper
Responds to a request:
1. Performs mutual authentication of user and resource
2. Determines local user name for remote user
3. Starts a job manager that executes as local user and handles request
GRAM Components (cont.)
Job managerCreates processes requested by userSubmits resource allocation requests to
underlying resource management system (or does fork)
Monitors state of created processesNotifies callback contact of state transitionsImplements control operations like
termination
GRAM Components (cont.)
GRAM reporter
Responsible for storing into MDS (information service) info about:Scheduler structure
Support reservations?Number of queues
Scheduler stateCurrently active jobsExpected wait time in queueTotal number of nodes and available nodes
GRAM GRAM GRAM
LSF EASY-LL NQE
Application
RSL
Simple ground RSL
Information Service
Localresourcemanagers
RSLspecialization
Broker
Ground RSL
Co-allocator
Queries& Info
Resource Management Architecture
Job Submission Interfaces
Globus Toolkit includes several command line programs for job submission globus-job-run: Interactive jobsglobus-job-submit: Batch/offline jobsglobusrun: Flexible scripting infrastructure