WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski...
-
Upload
rudolf-heath -
Category
Documents
-
view
214 -
download
1
Transcript of WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski...
![Page 1: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/1.jpg)
WP9Resource Management
Current status and plans for future Juliusz Pukacki
Krzysztof Kurowski
Poznan Supercomputing And Networking Center
![Page 2: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/2.jpg)
Introduction
Final goal of WP 9 – GRMS: GridLab Resource Managenet System
First prototype implementation – Scenario Broker
![Page 3: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/3.jpg)
Scenario Broker functionality
Ability to choose "the best" resource for the job execution, according to Job Description and chosen mapping algorithm;Ability to submit Simple Job according to provided Job Description;Ability to migrate Simple Job to better resource, according to provided Job Description;Ability to cancel job;Provides information about job status;Provides other information about job (name of host where the job is/was running, start time, finish time);Provides list of candidate resources for job execution (according to provided Job Description);Provides list of jobs submitted by given user;Ability to transfer input and output files (gridFTP, GAAS);
![Page 4: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/4.jpg)
Scenario Broker - overview
ResourceDiscovery
Broker
JobManager
InformationSystem
GRAM
GridFTPGASS
WebServicesInterface
Scenario Broker Globus Infrastructure
Client
(Application,
Portal)
![Page 5: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/5.jpg)
Scenario Broker modules
Brokersteering process of job submition
choosing the best resources for job execution (scheduling algorithm)
transferring input and output files for job's executable
Resource Discoveryfinding resources that fulfils requirements described in Job Description
providing information about resources, required for job scheduling
![Page 6: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/6.jpg)
Scenario Broker modules (2)
Job ManagerAbility to check current status of job
Ability to cancel running job
Monitoring for status changes of runing job
![Page 7: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/7.jpg)
Job Description
Job executable file location
arguments
file argument (files which have to be present in working directory of running executable)
environment variables
standard input
standard output
standard error
checkpoint files
![Page 8: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/8.jpg)
Job Description (2)
Resource requirements of executable name of host for job execution (if provided no scheduling algorithm is used)
operating system
required local resource management system
minimum memory required
minimum number of cpus required
minimum speed of cpu
other parameter passed directly to Globus GRAM
![Page 9: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/9.jpg)
Job Description - example
< grmsjob appid = MyApplication><simplejob>
<resource><osname> Linux </osname><memory> 128 </memory><cpucount> 2 </cpucount>
</resource> <application>
<executable><url> gsiftp://rage.man.poznan.pl/~/Apps/MyApp </url>
</executable><arguments>
<value> 12 </value> <value> abc </value></arguments><stdin>
<url>gsiftp://rage.man.poznan.pl/~/Apps/appstdin.txt </url></stdin><stdout>
<url>gsiftp://rage.man.poznan.pl/~/Apps/appstdout.txt </url>
</stdout></application >
</simplejob></grmsjob >
![Page 10: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/10.jpg)
Collaboration
ScenarioBroker
AdaptiveComponets
DataManagement
InfomationServices
Portals
Monitoring
Security
![Page 11: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/11.jpg)
Collaboration - working
Data Management (WP8) – broker can use Replication System and Data Transfer System
Adaptive Component (WP7) – broker gets additial parameters for job scheduling
Information Services (WP10) – broker uses Information System to get information about resources
![Page 12: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/12.jpg)
Collaboration - started
Security (WP9) – work on scenarios of cooperation with Authorization Service
Portals (WP4) – interfaces disscussion
![Page 13: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/13.jpg)
Implementation
Programming language: Java
Interface: GSI enabled web service based on Axis toolkit.
System: components implemented in CORBA technology.
Lower level requirements:Globus 2.0 installed on managed machines
GridFTP
Resources registerd in Information System (MDS)
![Page 14: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/14.jpg)
RM strategies behind the GRMS
We have to ‘somehow’ take into account application requirements, specific characteristics and finally end-users preferences during an initial application scheduling (submission phase),
We assume that application requirements could change during execution phase and we have to react ‘somehow’ to such application behaviours,
We should also ‘somehow’ consider administrators and resource owners preferences and their objectives, time-reservation approaches (research part of WP9)
We want to provide the GRMS’s interfaces to enable application developers as well as end-users to focus on high-level application design without scarifying application performance. To achieve this goal:
![Page 15: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/15.jpg)
Observations
System-level schedules focus on throughput
Application-level schedules are not easily applied to new applications
Many end-users, applications and admin domains are considered at the same time
We need a balance between specific and generic approaches to scheduling
GAT + GRMS = a bridge between an application-level and system-level scheduling
Two phases: submission and execution
![Page 16: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/16.jpg)
Submission phase (1)
Application requirements – XML based resource specification language as a flexible way to express specific application needs and requirements, including:
Hard constraints, e.g. OS = Linux, Mem > 512 MB, 4 CPUs, etc…
Performance models e.g. in the form of AART model,
Analytical, test and empirical models e.g. ET = 2.5(x * y) + CPU,
End-users preferences e.g. time-based (“application respond time is very important for me” or “I have a lot of free time (I am on vacation :-) cost is very important for me”),
![Page 17: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/17.jpg)
Submission phase (2)
Matchmaking techniques – to select a set of resources which meet applications requirements (in general, lots of applications are submitted at the same time and lots of available resources).
Scheduling problem is NP-hard – a number of possible solutions (schedules) increases exponentially depending on a problem instance size. A schedule with the best e.g. execution time is selected (in this case application execution times is assumed to be a priori known to a scheduler):
Criteria: Cmax, Avg Cmax, Tardiness, etc.Scheduling algorithms: complete enumeration or heuristic,
(research) Time-reservation (task workload is assumed to be a priori known to the scheduler). General assumption, the appropriate system must be installed on resources e.g. Maui, LSF.(research) Multi-objective resource management (many parameters are assumed to be a priori known to the multi-objective scheduler)
![Page 18: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/18.jpg)
Execution phase
Even an optimal schedule (e.g. a schedule with the shortest execution time) may need to be modified – dynamic changing resources as well as application requirements,
Rescheduling and adaptive techniques are desirable:Zakopane migration scenario – the first step on the painful road (e.g. how to estimate cost of migration procesess?),
WP7 Adaptive Components - adaptive strategies that let applications efficiently use the given resources,
WP8 Data Management - due to input/output files locality requirements (typical for data-intensive appplications) it is often unwanted to transport or replicate all databases, files, etc. at all compute resources in distributed grid environment,
WP4 Portals – to visualize the GRMS’s functionality
WP6 Security, WP10, ...
![Page 19: WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski pukacki@man.poznan.pl kikas@man.poznan.pl Poznan Supercomputing.](https://reader035.fdocuments.in/reader035/viewer/2022071807/56649e865503460f94b89d6e/html5/thumbnails/19.jpg)
Plans
Work on integration with other GridLab services.Authorization Service
Work with client sidePortals
GAT API
Testing with GAT enabled applications in GridLab testbed.
Extensions to scenario broker in the context of resource management issues.