RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID...

50
RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID – Summer School Bonn, 24 July 2006

Transcript of RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID...

Page 1: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

RMS and Scheduling for Future Generation Grids

Ramin Yahyapour

University DortmundLeader CoreGRID Institute

on Resource Management and Scheduling

CoreGRID – Summer SchoolBonn, 24 July 2006

Page 2: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

2

Introduction

We all know what “the Grid” is…– one of the many definitions:

“Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations” (Ian Foster)

– however, the actual scope of “the Grid” is still quite controversial

Many people consider High Performance Computing (HPC) as the main Grid application.

– today’s Grids are mostly Computational Grids or Data Grids with HPC resources as building blocks

– thus, Grid resource management is much related to resource management on HPC resources (our starting point).

– we will return to a broader Grid scope and its implications later

Page 3: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

3

Key Question

“Which services/resources to use for an activity, when, where, how?”

Typically: A particular user, or business application, or component applicationneeds for an activity one or several services/resourcesunder given constraints

• Trust & Security• Timing & Economics• Functionality & Service level• Application-specifics & Inter-dependencies• Scheduling and Access Policies

This question has to be answered in an automatic, efficient, and reliable way.

Part of the invisible and smart infrastructure!

Page 4: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

4

Motivation

Resource Management for Future/Next Generation Grids!

But what are Future Generation Grids?

HPC Computing– Parallel Computing– Cluster Computing– Desktop Computing

HPC Computing– Parallel Computing– Cluster Computing– Desktop Computing

Enterprise Grids– Business Services– Application Server– Webservices

Enterprise Grids– Business Services– Application Server– Webservices

Ambient IntelligenceUbiquitous Computing

– PDA, Mobile Devices

Ambient IntelligenceUbiquitous Computing

– PDA, Mobile Devicesdepends on who you ask!

Page 5: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

5

Resource Definition

Concluding from the different interpretations of “Grid”:for broad acceptance Grid RMS should probably cover the whole scope;

Resources:

Compute

Network

Storage

Data

Software

– components, licenses

Services

– functionality, ability

Management of some resources is less complex,

while other resources require coordination and orchestration to be effective (e.g. HW and SW).

Management of some resources is less complex,

while other resources require coordination and orchestration to be effective (e.g. HW and SW).

Page 6: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

6

Resource Management LayerGrid Resource Management System consists of :Local resource management system (Resource Layer)

– Basic resource management unit – Provide a standard interface for using remote resources– e.g. GRAM, etc.

Global resource management system (Collective Layer)– Coordinate all Local resource management system within multiple or

distributed Virtual Organizations (VOs)– Provide high-level functionalities to efficiently use all of resources

• Job Submission• Resource Discovery and Selection• Scheduling• Co-allocation• Job Monitoring, etc.

– e.g. Meta-scheduler, Resource Broker, etc.

Page 7: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

7

ResourceBroker

Grid Resource Manager

Grid Resource Manager

Grid Resource Manager

Information Services

MonitoringServices

SecurityServices

Core Grid Infrastructure Services

Grid Middlewar

e

PBS LSF …

Resource Resource Resource

Local Resource

Management

Higher-Level Services

User/Application

Grid RMS

Page 8: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

8

Core Functionalities of a Grid RMS

Resource Discovery

– online, on-demand process

Access to Resource Information

– static and dynamic information

Status Monitoring

– general resource monitoring

– monitoring with respect to a job

Allocation/Scheduling

– coordination is required

SLA Management

– reliable agreements

Execution Management/Provisioning

– start of a job / use of a resource

Accounting and Billing

Page 9: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

9

Case 1: RMS for specialized Applications

Specialized resource management dedicated to a single application domain.

– Goal: high efficiency

– Cost: higher development effort

The RMS is adapted to:

– application and its workflow

– resource configuration

There is need for specific interfaces to the resources.

Highly specialized for the application and therefore easier to handle for the user.

– The know-how has been built into the system.

Only certain types of jobs and resources are considered.

Only certain types of jobs and resources are considered.

Page 10: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

10

Case 2: RMS as Generic Grid-Middleware

Grid RMS is open for many applications

This may be less efficient than Case 1.

Generic interfaces are required that are adapted to many front- and backends.

This approach requires additional user-/application supplied information:

– job description• workflow, objectives, requirements, constraints

Consideration of security is an integral aspect

– wide variety of security levels

RMS for Future Generation Grids needs the flexibility to cover all kind of jobs and resources

RMS for Future Generation Grids needs the flexibility to cover all kind of jobs and resources

Page 11: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

11

FGG Resource Management Need for well-defined interfaces to core services

Inherent support for different implementations

While maintaining cooperation between these implementations

Resource DiscoveryAccess to Resource InformationStatus MonitoringAllocation/SchedulingSLA ManagementExecution Management/ProvisioningAccounting and Billing

Resource DiscoveryAccess to Resource InformationStatus MonitoringAllocation/SchedulingSLA ManagementExecution Management/ProvisioningAccounting and Billing

Page 12: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

12

Requirements

Resource Discovery:– scalable

• from cluster grids,• business grids• to global grids

– centralized or decentralized implementations, P2P

– unified naming scheme

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Aspects:

flexibility

scalability

efficiency

Aspects:

flexibility

scalability

efficiency

Page 13: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

13

Requirements

Resource Discovery:– scalable

• from cluster grids,• business grids• to global grids

– centralized or decentralized implementations, P2P

– unified naming scheme

Access to resource information:– static and historic information,– dynamic (future) information:

• planned, predicted

– may be subject to privacy concerns

• user and owner dependent

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Aspects:

flexibility

scalability

efficiency

Aspects:

flexibility

scalability

efficiency

Page 14: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

14

Problem: Job Submission Descriptions differ

The deliverables of the GGF/OGF Working Group JSDL:

A specification for an abstract standard Job Submission Description Language (JSDL) that is independent of language bindings, including; – the JSDL feature set and attribute semantics, – the definition of the relationship between attributes, – and the range of attribute values.

A normative XML Schema corresponding to the JSDL specification.

A document of translation tables to and from the scheduling languages of a set of popular batch systems for both the job requirements and resource description attributes of those languages, which are relevant to the JSDL.

Page 15: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

15

JSDL Attribute Categories

The job attribute categories include:

– Job Identity Attributes• ID, owner, group, project, type, etc.

– Job Resource Attributes• hardware, software, including applications, Web and Grid Services, etc.

– Job Environment Attributes• environment variables, argument lists, etc.

– Job Data Attributes• databases, files, data formats, and staging, replication, caching, and disk

requirements, etc.

– Job Scheduling Attributes• start and end times, duration, immediate dependencies etc.

– Job Security Attributes• authentication, authorisation, data encryption, etc.

Page 16: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

16

Requirements

Status monitoring:

– job and resource condition

– SLA status

Autonomic aspects:

– detection of unexpected changes

– allows prediction of system behavior

• related to an individual job• and to general demand

– trigger of re-scheduling/re-allocation

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Aspects:

reliability

scalability

Aspects:

reliability

scalability

Page 17: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

17

Requirements

Allocation/Scheduling:– Different application scenarios

• parallel, sequential jobs

• co-allocation and orchestration

• workflows

– Provider policies• access, cost, security

– User/application policies• scheduling objectives,

• cost/budget management

• deadlines

– Cooperation between RM systems– Support for different (= individual)

algorithms and strategies

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Aspects:

flexibility, easy-to-use

support business models

person-centric

efficiency

Aspects:

flexibility, easy-to-use

support business models

person-centric

efficiency

Page 18: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

18

Different Level of Scheduling

Resource-level scheduler

– low-level scheduler, local scheduler, local resource manager

– scheduler close to the resource, controlling a supercomputer, cluster, or network of workstations, on the same local area network

– Examples: Open PBS, PBS Pro, LSF, SGE

Enterprise-level scheduler

– Scheduling across multiple local schedulers belonging to the same organization

– Examples: PBS Pro peer scheduling, LSF Multicluster

Grid-level scheduler

– also known as super-scheduler, broker, community scheduler

– Discovers resources that can meet a job’s requirements

– Schedules across lower level schedulers

Page 19: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

19

Grid-Level Scheduler

Discovers & selects the appropriate resource(s) for a job

If selected resources are under the control of several local schedulers, a meta-scheduling action is performed

Architecture:– Centralized: all lower level schedulers are under the

control of a single Grid scheduler• not realistic in global Grids

– Distributed: lower level schedulers are under the control of several grid scheduler components; a local scheduler may receive jobs from several components of the grid scheduler

Page 20: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

20

Grid Scheduling

Scheduler

Schedule

tim

e

Job-Queue

Machine 1

Scheduler

Scheduleti

me

Job-Queue

Machine 2

Scheduler

Schedule

tim

e

Job-Queue

Machine 3

Grid-SchedulerGrid User

Page 21: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

21

Activities of a Grid Scheduler

GGF Document: “10 Actions of Super Scheduling (GFD-I.4)”

1. Authorization Filtering

3. Min. Requirement Filtering

2. Application Definition

Phase One-Resource Discovery

5. System Selection

4. Information Gathering

Phase Two - System Selection

7. Job Submission

6. Advance Reservation

9. Monitoring Progress

8. Preparation Tasks

11. Clean-up Tasks

10 Job Completion

Phase Three- Job Execution

Source: Jennifer Schopf

Page 22: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

22

Select a Resource for Execution

Most systems do not provide advance information about future job execution– user information not accurate as mentioned before– new jobs arrive that may surpass current queue entries due to

higher priority

Grid scheduler might consider current queue situation, however this does not give reliable information for future executions:– A job may wait long in a short queue while it would have been

executed earlier on another system.Available information:

– Grid information service gives the state of the resources and possibly authorization information

– Prediction heuristics: estimate job’s wait time for a given resource, based on the current state and the job’s requirements.

Page 23: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

23

Requirements (contd)

SLA management:– reliability– orchestration of services– quality of service– business models– accountability

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Aspects:

persistence

support business models

Aspects:

persistence

support business models

Page 24: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

24

Co-allocation

It is often requested that several resources are used for a single job.– that is, a scheduler has to assure that all resources are

available when needed.• in parallel (e.g. visualization and processing)

• with time dependencies (e.g. a workflow)

The task is especially difficult if the resources belong to different administrative domains.– The actual allocation time must be known for co-allocation– or the different local resource management systems must

synchronize each other (wait for availability of all resources)

Page 25: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

25

Example Multi-Site Job Execution

Scheduler

Scheduleti

me

Job-Queue

Machine 2

Scheduler

Schedule

tim

e

Job-Queue

Machine 3

A job uses several resources at different sites in parallel.Network communication is an issue.

Scheduler

Schedule

tim

e

Job-Queue

Machine 1

Grid-Scheduler

Multi-Side Job

Page 26: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

26

Advanced Reservation

Co-allocation and other applications require a priori information about the precise resource availability

With the concept of advanced reservation, the resource provider guarantees a specified resource allocation.– includes a two- or three-phase commit for agreeing on

the reservation

Implementations:– GARA/DUROC/SNAP provide interfaces for Globus to

create advanced reservation– implementations for network QoS available.

• setup of a dedicated bandwidth between endpoints– “WS-Agreement” defines a protocol for agreement

management

Page 27: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

27

Using Service Level Agreements

The mapping of jobs to resources can be abstracted using the concept of Service Level Agreement (SLAs)

SLA: Contract negotiated between– resource provider, e.g. local scheduler– resource consumer, e.g., grid scheduler, application

SLAs provide a uniform approach for the client to– specify resource and QoS requirements, while– hiding from the client details about the resources,– such as queue names and current workload

Page 28: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

28

GGF/OGF – GRAAP Working GroupGoal: Defining WebService-based protocols for negotiation and agreement

management

WS-Agreement Protocol:

Page 29: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

29

Requirements

SLA management:– reliability– orchestration of services– quality of service– business models– accountability

Execution Management– services, software,

data/storage, compute, network

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Aspects:

persistence

support business models

Aspects:

persistence

support business models

Page 30: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

30

GGF/OGF-WG DRMAA

GGF Working Group “Distributed Resource Management Application API”

From the charter:

Develop an API specification for the submission and control of jobs to one or more Distributed Resource Management (DRM) systems.

The scope of this specification is all the high level functionality which is necessary for an application to consign a job to a DRM system including common operations on jobs like termination or suspension.

The objective is to facilitate the direct interfacing of applications to today's DRM systems by application's builders, portal builders, and Independent Software Vendors (ISVs).

Page 31: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

31

RequirementsSLA management:

– reliability– orchestration of services– quality of service– business models– accountability

Execution Management– services, software,

data/storage, compute, network

Accounting and Billing– providing economic/financial

services– foundation of business models

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Resource Discovery

Access to Resource Information

Status Monitoring

Allocation/Scheduling

SLA Management

Execution Management/Provisioning

Accounting and Billing

Aspects:

persistence

support business models

Aspects:

persistence

support business models

Page 32: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

Scheduling in Future Generation Grids

Outlook on future Grid Resource Management and Scheduling

Page 33: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

33

Limitations of current Grid RMS

The interaction between local scheduling and higher-level Grid scheduling is currently a one-way communication– current local schedulers are not optimized for Grid-use– limited information available about future job execution– a site is usually selected by a Grid scheduler and the job

enters the remote queue.

The decision about job placement is inefficient.– Actual job execution is usually not known– Co-allocation is a problem as many systems do not

provide advance reservation

Page 34: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

34

Example of Grid Scheduling Decision Making

Scheduler

Schedule

tim

e

Job-Queue

Machine 1

Scheduler

Schedule

tim

e

Job-Queue

Machine 2

Scheduler

Schedule

tim

e

Job-Queue

Machine 3

Grid-SchedulerGrid User

15 jobs running20 jobs queued

5 jobs running2 jobs queued

40 jobs running80 jobs queued

Where to put the Grid job?

Page 35: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

35

Available Information from the Local Schedulers

Decision making is difficult for the Grid scheduler

– limited information about local schedulers is available

– available information may not be reliable

Possible information:

– queue length, running jobs

– detailed information about the queued jobs• execution length, process requirements,…

– tentative schedule about future job executions

These information are often technically not provided by the local scheduler

In addition, these information may be subject to privacy concerns!

Page 36: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

36

Consequence

Consider a workflow with 3 short steps (e.g. 1 minute each) that depend on each other

Assume available machines with an average queue length of 1 hour.The Grid scheduler can only submit the subsequent step if the previous job

step is finished.

Result:– The completion time of the workflow may be larger than 3 hours

(compared to 3 minutes of execution time)

– Current Grids are suitable for simple jobs, but still quite inefficient in handling more complex applications

Need for better coordination of higher- and lower-level scheduling!

Page 37: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

37

Example Grid Scenario

Remote CenterReads and Generates TB of Data

LAN/WAN Transfer

WAN Transfer Compute Resources

Visualization

Assume a data-intensive simulation that should be visualized and steered during runtime!

Page 38: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

38

Resource Request of a Simple Grid Job

A specified architecture with

48 processing nodes,

1 GB of available memory, and

a specified licensed software package

for 1 hour between 8am and 6pm of the following day • Time must be known in advance.

A specific visualization device during program execution

Minimum bandwidth between the VR device and the main computer during

program execution

Input: a specified data set from a data repository

at most 4 €

preference of cheaper job execution over an earlier execution.

Page 39: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

39

Example: Coordinated Simulation and VisualizationExpected output of a Grid scheduler:

time

Data Transfer

Loading Data Parallel Computation Providing Data

Data Transfer Network 1

Computer 1

Parallel ComputationComputer 2

Communication for Computation

Network 3

VR-Cave Visualization

Data Data Access Storing Data

Communication for Visualization

Network 2

Software UsageSoftware License

Data StorageStorage

resources

Reservations are necessary!

Page 40: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

40

Conclusions for Grid Scheduling

Grids ultimately require coordinated scheduling services.

Support for different scheduling instances

– different local management systems

– different scheduling algorithms/strategies

For arbitrary resources

– not only computing resources, also

– data, storage, network, software etc.

Support for co-allocation and reservation

– necessary for coordinated grid usage (see data, network, software, storage)

Different scheduling objectives

– cost, quality, other

Page 41: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

41

Grid-Level Scheduler

Discovers & selects the appropriate resource(s) for a job

If selected resources are under the control of several local schedulers, a meta-scheduling action is performed

Architecture:– Centralized: all lower level schedulers are under the

control of a single Grid scheduler• not realistic in global Grids

– Distributed: lower level schedulers are under the control of several grid scheduler components; a local scheduler may receive jobs from several components of the grid scheduler

Page 42: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

42

Grid Scheduling Scenarios – Example I

Page 43: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

43

Grid Scheduling Scenarios – Example II

Page 44: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

44

Grid Scheduling Scenarios – Example III

Page 45: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

45

Towards Grid Scheduling

Grid Scheduling Methods:

– Support for individual scheduling objectives and policies

– Multi-criteria scheduling models

– Economic scheduling methods to Grids

Architectural requirements:

– Generic job description

– Negotiation interface between higher- and lower-level scheduler

– Economic management services

– Workflow management

– Integration of data and network management

Page 46: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

46

Scheduling Objectives in the GridIn contrast to local computing, there is no general scheduling objective

anymore

– minimizing response time, minimizing cost

– tradeoff between quality, cost, response-time etc.

Cost and different service quality come into play

– the user will introduce individual objectives

– the Grid can be seen as a market where resource are concurring alternatives

Similarly, the resource provider has individual scheduling policies

Problem:

– the different policies and objectives must be integrated in the scheduling process

– different objectives require different scheduling strategies

– part of the policies may not be suitable for public exposition(e.g. different pricing or quality for certain user groups)

Page 47: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

47

Grid Scheduling Algorithms

Due to the mentioned requirements in Grids its not to be expected that a single scheduling algorithm or strategy is suitable for all problems.

Therefore, there is need for an infrastructure that – allows the integration of different scheduling algorithms– the individual objectives and policies can be included– resource control stays at the participating service

providers

Transition into a market-oriented Grid scheduling model

Page 48: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

48

Economic Scheduling

Market-oriented approaches are a suitable way to implement the interaction of different scheduling layers– agents in the Grid market can implement different policies and

strategies– negotiations and agreements link the different strategies

together– participating sites stay autonomous

Needs for suitable scheduling algorithms and strategies for creating and selecting offers– need for creating the Pareto-Optimal scheduling solutions

Performance relies highly on the available information– negotiation can be hard task if many potential providers are

available.

Page 49: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

49

Economic Scheduling (2)

Several possibilities for market models: auctions of resources/services auctions of jobs

Offer-request mechanisms support: inclusion of different cost models, price determination individual objective/utility functions for optimization goals

Market-oriented algorithms are considered: robust flexible in case of errors simple to adapt markets can have unforeseeable dynamics

Page 50: RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

50

Conclusions

Key Challenges for FGG RMS– Cooperation

• interoperability between Grid-RMS implementations and types• and between Grid-RMS and local RM systems

– Interoperability through well defined interfaces• identification and adaptation

– Scalability• domain-specific implementation may have limited scalability, • but the general architecture should cover millions of resources.

– Fault-tolerance• resources and instances of core services

– Common security model

The RMS should be invisible to the user andprovide a pervasive common architecture allowing different implementations while maintaining interoperability.