WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski...

19
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski [email protected] [email protected] Poznan Supercomputing And Networking Center

Transcript of WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski...

Page 1: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

WP9Resource Management

Current status and plans for future Juliusz Pukacki

Krzysztof Kurowski

[email protected]

[email protected]

Poznan Supercomputing And Networking Center

Page 2: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Introduction

Final goal of WP 9 – GRMS: GridLab Resource Managenet System

First prototype implementation – Scenario Broker

Page 3: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Scenario Broker functionality

Ability to choose "the best" resource for the job execution, according to Job Description and chosen mapping algorithm;Ability to submit Simple Job according to provided Job Description;Ability to migrate Simple Job to better resource, according to provided Job Description;Ability to cancel job;Provides information about job status;Provides other information about job (name of host where the job is/was running, start time, finish time);Provides list of candidate resources for job execution (according to provided Job Description);Provides list of jobs submitted by given user;Ability to transfer input and output files (gridFTP, GAAS);

Page 4: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Scenario Broker - overview

ResourceDiscovery

Broker

JobManager

InformationSystem

GRAM

GridFTPGASS

WebServicesInterface

Scenario Broker Globus Infrastructure

Client

(Application,

Portal)

Page 5: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Scenario Broker modules

Brokersteering process of job submition

choosing the best resources for job execution (scheduling algorithm)

transferring input and output files for job's executable

Resource Discoveryfinding resources that fulfils requirements described in Job Description

providing information about resources, required for job scheduling

Page 6: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Scenario Broker modules (2)

Job ManagerAbility to check current status of job

Ability to cancel running job

Monitoring for status changes of runing job

Page 7: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Job Description

Job executable file location

arguments

file argument (files which have to be present in working directory of running executable)

environment variables

standard input

standard output

standard error

checkpoint files

Page 8: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Job Description (2)

Resource requirements of executable name of host for job execution (if provided no scheduling algorithm is used)

operating system

required local resource management system

minimum memory required

minimum number of cpus required

minimum speed of cpu

other parameter passed directly to Globus GRAM

Page 9: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Job Description - example

< grmsjob appid = MyApplication><simplejob>

<resource><osname> Linux </osname><memory> 128 </memory><cpucount> 2 </cpucount>

</resource>  <application>

<executable><url> gsiftp://rage.man.poznan.pl/~/Apps/MyApp </url>

</executable><arguments>

<value> 12 </value> <value> abc </value></arguments><stdin>

<url>gsiftp://rage.man.poznan.pl/~/Apps/appstdin.txt </url></stdin><stdout>

<url>gsiftp://rage.man.poznan.pl/~/Apps/appstdout.txt </url>

</stdout></application >

</simplejob></grmsjob >

Page 10: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Collaboration

ScenarioBroker

AdaptiveComponets

DataManagement

InfomationServices

Portals

Monitoring

Security

Page 11: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Collaboration - working

Data Management (WP8) – broker can use Replication System and Data Transfer System

Adaptive Component (WP7) – broker gets additial parameters for job scheduling

Information Services (WP10) – broker uses Information System to get information about resources

Page 12: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Collaboration - started

Security (WP9) – work on scenarios of cooperation with Authorization Service

Portals (WP4) – interfaces disscussion

Page 13: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Implementation

Programming language: Java

Interface: GSI enabled web service based on Axis toolkit.

System: components implemented in CORBA technology.

Lower level requirements:Globus 2.0 installed on managed machines

GridFTP

Resources registerd in Information System (MDS)

Page 14: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

RM strategies behind the GRMS

We have to ‘somehow’ take into account application requirements, specific characteristics and finally end-users preferences during an initial application scheduling (submission phase),

We assume that application requirements could change during execution phase and we have to react ‘somehow’ to such application behaviours,

We should also ‘somehow’ consider administrators and resource owners preferences and their objectives, time-reservation approaches (research part of WP9)

We want to provide the GRMS’s interfaces to enable application developers as well as end-users to focus on high-level application design without scarifying application performance. To achieve this goal:

Page 15: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Observations

System-level schedules focus on throughput

Application-level schedules are not easily applied to new applications

Many end-users, applications and admin domains are considered at the same time

We need a balance between specific and generic approaches to scheduling

GAT + GRMS = a bridge between an application-level and system-level scheduling

Two phases: submission and execution

Page 16: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Submission phase (1)

Application requirements – XML based resource specification language as a flexible way to express specific application needs and requirements, including:

Hard constraints, e.g. OS = Linux, Mem > 512 MB, 4 CPUs, etc…

Performance models e.g. in the form of AART model,

Analytical, test and empirical models e.g. ET = 2.5(x * y) + CPU,

End-users preferences e.g. time-based (“application respond time is very important for me” or “I have a lot of free time (I am on vacation :-) cost is very important for me”),

Page 17: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Submission phase (2)

Matchmaking techniques – to select a set of resources which meet applications requirements (in general, lots of applications are submitted at the same time and lots of available resources).

Scheduling problem is NP-hard – a number of possible solutions (schedules) increases exponentially depending on a problem instance size. A schedule with the best e.g. execution time is selected (in this case application execution times is assumed to be a priori known to a scheduler):

Criteria: Cmax, Avg Cmax, Tardiness, etc.Scheduling algorithms: complete enumeration or heuristic,

(research) Time-reservation (task workload is assumed to be a priori known to the scheduler). General assumption, the appropriate system must be installed on resources e.g. Maui, LSF.(research) Multi-objective resource management (many parameters are assumed to be a priori known to the multi-objective scheduler)

Page 18: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Execution phase

Even an optimal schedule (e.g. a schedule with the shortest execution time) may need to be modified – dynamic changing resources as well as application requirements,

Rescheduling and adaptive techniques are desirable:Zakopane migration scenario – the first step on the painful road (e.g. how to estimate cost of migration procesess?),

WP7 Adaptive Components - adaptive strategies that let applications efficiently use the given resources,

WP8 Data Management - due to input/output files locality requirements (typical for data-intensive appplications) it is often unwanted to transport or replicate all databases, files, etc. at all compute resources in distributed grid environment,

WP4 Portals – to visualize the GRMS’s functionality

WP6 Security, WP10, ...

Page 19: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.

Plans

Work on integration with other GridLab services.Authorization Service

Work with client sidePortals

GAT API

Testing with GAT enabled applications in GridLab testbed.

Extensions to scenario broker in the context of resource management issues.