Grid Resource Allocation and Management (GRAM)

22

description

Grid Resource Allocation and Management (GRAM). Execution management Deployment, scheduling and monitoring Community Scheduler Framework (CSF): Provides a single interface to different resource schedulers. PBS, Condor(G). Workspace management - PowerPoint PPT Presentation

Transcript of Grid Resource Allocation and Management (GRAM)

Page 1: Grid Resource Allocation and Management (GRAM)
Page 2: Grid Resource Allocation and Management (GRAM)

Grid Resource Allocation and Grid Resource Allocation and Management (GRAM)Management (GRAM)

• Execution managementExecution management– Deployment, scheduling and monitoringDeployment, scheduling and monitoring

• Community Scheduler Framework (CSF): Provides Community Scheduler Framework (CSF): Provides a single interface to different resource schedulers.a single interface to different resource schedulers.– PBS, Condor(G).PBS, Condor(G).

• Workspace managementWorkspace management– Dynamically create and manage workspaces on remote Dynamically create and manage workspaces on remote

hosts.hosts.

• Grid Telecontrol ProtocolGrid Telecontrol Protocol– WSRF-enabled service interface for control of remote WSRF-enabled service interface for control of remote

instruments.instruments.• Remote goldfish surgical procedures.Remote goldfish surgical procedures.

Page 3: Grid Resource Allocation and Management (GRAM)

• Jobs are computational tasks that may perform Jobs are computational tasks that may perform input/output operations while running.input/output operations while running.

• Affect the state of the computational resource Affect the state of the computational resource and its associated file systems. and its associated file systems.

• May require coordinated staging of data into the May require coordinated staging of data into the resource prior to job execution and out of the resource prior to job execution and out of the resource following execution. resource following execution.

• Some users, particularly interactive ones, benefit Some users, particularly interactive ones, benefit from accessing output data files as the job is from accessing output data files as the job is running. Monitoring consists of querying and running. Monitoring consists of querying and subscribing for status information such as job subscribing for status information such as job state changes. state changes.

Page 4: Grid Resource Allocation and Management (GRAM)

• Monitoring consists of querying and subscribing Monitoring consists of querying and subscribing for status information such as job state changes. for status information such as job state changes.

• Operated under the control of a scheduler which Operated under the control of a scheduler which implements allocation and prioritization policies implements allocation and prioritization policies (i.e., priorities).(i.e., priorities).

• GRAM is not a resource scheduler but a protocol GRAM is not a resource scheduler but a protocol engine for communicating with different local engine for communicating with different local resource schedulers.resource schedulers.

Page 5: Grid Resource Allocation and Management (GRAM)

Conceptual DetailsConceptual Details

• Targeted Job TypesTargeted Job Types– Not “RPC”Not “RPC”

– reliable operation, stateful monitoring, reliable operation, stateful monitoring, credential management, and file staging are credential management, and file staging are important (i.e., the performance is horrible so important (i.e., the performance is horrible so only use if necessary).only use if necessary).

Page 6: Grid Resource Allocation and Management (GRAM)

Component ArchitectureComponent Architecture

• Based on Component architectureBased on Component architecture

– Job management services Job management services •represent, monitor, and control the overall represent, monitor, and control the overall

job life cycle. These services are the job-job life cycle. These services are the job-management specific software provided by management specific software provided by the GRAM solution.the GRAM solution.

– File transfer services File transfer services •support staging of files into and out of support staging of files into and out of

compute resources. compute resources.

Page 7: Grid Resource Allocation and Management (GRAM)

Component ArchitectureComponent Architecture

– Credential management services Credential management services •are used to control the delegation of rights are used to control the delegation of rights

among distributed elements of the GRAM among distributed elements of the GRAM architecture based on users' application architecture based on users' application requirements. requirements.

Page 8: Grid Resource Allocation and Management (GRAM)

SecuritySecurity

• Secure OperationSecure Operation– WS GRAM utilizes WSRF functionality to provide WS GRAM utilizes WSRF functionality to provide

for authentication of job management requests for authentication of job management requests as well as to protect job requests from malicious as well as to protect job requests from malicious interference.interference.

• Local System protection domainsLocal System protection domains– jobs are executed in appropriate local security jobs are executed in appropriate local security

contextscontexts

• e.g. under specific Unix user IDs based on e.g. under specific Unix user IDs based on details of the job request and authorization details of the job request and authorization policies. policies.

Page 9: Grid Resource Allocation and Management (GRAM)

• Credential delegation and management Credential delegation and management – Client may delegate some of its rights to GRAM services Client may delegate some of its rights to GRAM services

• e.g. rights for GRAM to access data on a remote storage e.g. rights for GRAM to access data on a remote storage

element as part of the job executionelement as part of the job execution. .

• AuditAudit– To assist with normal accounting functions as well as to To assist with normal accounting functions as well as to

further mitigate risks from abuse or malfunction. further mitigate risks from abuse or malfunction.

Page 10: Grid Resource Allocation and Management (GRAM)

Job ManagementJob Management

• Reliable job submission.Reliable job submission.– ““at most once” semanticsat most once” semantics

• Job CancellationJob Cancellation– a mechanism for clients to cancel (abort) their jobs at a mechanism for clients to cancel (abort) their jobs at

any point in the job life cycle. any point in the job life cycle.

Page 11: Grid Resource Allocation and Management (GRAM)

Data ManagementData Management

• Reliable Data StagingReliable Data Staging– reliable, high-performance transfers of files reliable, high-performance transfers of files

between the compute resource and external between the compute resource and external (gridftp) data storage elements before and (gridftp) data storage elements before and after the job execution. after the job execution.

• Output MonitoringOutput Monitoring– mechanism for incrementally transferring output file mechanism for incrementally transferring output file

contents from the computation resource while the job is contents from the computation resource while the job is

runningrunning. .

Page 12: Grid Resource Allocation and Management (GRAM)

Task CoordinationTask Coordination

• Parallel JobsParallel Jobs

• Task rendezvousTask rendezvous– mechanism for task rendezvous which job applications mechanism for task rendezvous which job applications

may use if they do not have another more appropriate may use if they do not have another more appropriate solution solution

– Usually done in MPIUsually done in MPI

Page 13: Grid Resource Allocation and Management (GRAM)

• WS-GRAM (Web Services version).WS-GRAM (Web Services version).

• Designed to support job execution with coordinated Designed to support job execution with coordinated file staging.file staging.

• Uses a set of Web services in the GT4 WSRF core.Uses a set of Web services in the GT4 WSRF core.– ManagedJob: Provides interface to monitor the status of the ManagedJob: Provides interface to monitor the status of the

job, terminate. Each submitted job is a distinct resource.job, terminate. Each submitted job is a distinct resource.– ManagedJobFactory: Interface to create ManagedJob ManagedJobFactory: Interface to create ManagedJob

resources of appropriate type to perform a job in that local resources of appropriate type to perform a job in that local scheduler.scheduler.

• ManagedJob resource creation ManagedJob resource creation ManagedJobFactory::createManagedJob invocation. ManagedJobFactory::createManagedJob invocation.

Page 14: Grid Resource Allocation and Management (GRAM)
Page 15: Grid Resource Allocation and Management (GRAM)

• Creation of JobCreation of Job– ManagedJobFactory::createManagedJob invocation. ManagedJobFactory::createManagedJob invocation.

– A meaningful WS GRAM client MUST create a job that A meaningful WS GRAM client MUST create a job that will then go through a life cycle where it eventually will then go through a life cycle where it eventually completes execution and the resource is eventually completes execution and the resource is eventually

destroyeddestroyed • Optional Staging CredentialsOptional Staging Credentials

– Must be performed before call to createMnagedJobMust be performed before call to createMnagedJob

• Optional Job CredentialOptional Job Credential– Store into user account for use by job process.Store into user account for use by job process.

Page 16: Grid Resource Allocation and Management (GRAM)

• Optional Credential RefreshOptional Credential Refresh– Credentials delegated may be refreshed.Credentials delegated may be refreshed.

• Optional Hold of CleanupOptional Hold of Cleanup– User wants to directly access output files without waiting User wants to directly access output files without waiting

for stage-out. for stage-out.

• ManagedJob DestructionManagedJob Destruction– Can explicitly destroy job. Can explicitly destroy job.

Page 17: Grid Resource Allocation and Management (GRAM)

Globus Toolkit Components used by Globus Toolkit Components used by WS GRAMWS GRAM

• Reliable File Transfer (RFT)Reliable File Transfer (RFT)– For file staging before and after job completes.For file staging before and after job completes.

• GridFTPGridFTP– Supports retrySupports retry– Partial file transferPartial file transfer– 33rdrd party file transfer party file transfer

Page 18: Grid Resource Allocation and Management (GRAM)

GridFTP

FOO1 FOO2

Page 19: Grid Resource Allocation and Management (GRAM)

GridFTP

FOO1 FOO2

Page 20: Grid Resource Allocation and Management (GRAM)

Delegation ServicesDelegation Services

• Can delegate credentials to any service that is Can delegate credentials to any service that is deployed in the same container as the service.deployed in the same container as the service.– Tells delegation service it wants to delegate its Tells delegation service it wants to delegate its

credentials.credentials.– The service that wants to use them must contact the The service that wants to use them must contact the

delegation service to acquire them.delegation service to acquire them.

Page 21: Grid Resource Allocation and Management (GRAM)

External Components Used by WS External Components Used by WS GRAMGRAM

• Local job scheduler:Local job scheduler:– PBS, LSF, CondorPBS, LSF, Condor

• SudoSudo– Access to user accounts without having root privilege. Access to user accounts without having root privilege.

Page 22: Grid Resource Allocation and Management (GRAM)