11. grid scheduling and resource managament
-
Upload
sandeep-poonia -
Category
Education
-
view
603 -
download
0
description
Transcript of 11. grid scheduling and resource managament
![Page 1: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/1.jpg)
GRID COMPUTINGGrid Scheduling & Resource Management
Sandeep Kumar PooniaHead of Dept. CS/IT, Jagan Nath University, Jaipur
B.E., M. Tech., UGC-NET
LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
11/9/2013 1Sandeep Kumar Poonia
![Page 2: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/2.jpg)
IntroductionScheduling ParadigmsHow Scheduling WorksA Review of Condor, SGE, PBS and LSFGrid Scheduling with QoS
OUTLINE
![Page 3: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/3.jpg)
Grid scheduling is a process of mapping Grid jobs to
resources over multiple administrative domains.
A Grid job can be split into many small tasks.
The scheduler has the responsibility of selecting
resources and scheduling jobs in such a way that the
user and application requirements are met, in terms of
overall execution time (throughput) and cost of the
resources utilized.
Introduction
![Page 4: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/4.jpg)
Jobs, via Globus, can be submitted to systems managed by Condor, the Sun Grid Engine (SGE), thr Portable Batch
System (PBS) and the Load Sharing Facility (LSF)
Introduction
![Page 5: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/5.jpg)
Scheduling Paradigms
Centralized Scheduling
Hierarchical Scheduling
Distributed Scheduling
![Page 6: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/6.jpg)
Centralized Scheduling
In a centralized scheduling environment, a central
machine (node) acts as a resource manager to
schedule jobs to all the surrounding nodes that are
part of the environment.
This scheduling paradigm is often used in situations
like a computing centre where resources have
similar characteristics and usage policies.
![Page 7: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/7.jpg)
Centralized Scheduling
Here, jobs are first submitted to the central scheduler, which then
dispatches the jobs to the appropriate nodes. Those jobs that
cannot be started on a node are normally stored in a central job
queue for a later start.
![Page 8: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/8.jpg)
Centralized Scheduling: Advantage & Disadvantage
Centralized scheduling system may produce better scheduling
decisions because it has all necessary, and up-to-date,
information about the available resources.
Centralized scheduling does not scale well with the increasing
size of the environment that it manages.
The scheduler itself may well become a bottleneck, and if
there is a problem with the hardware or software of the
scheduler’s server, i.e. a failure,
it presents a single point of failure in the environment.
![Page 9: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/9.jpg)
Distributed Scheduling
No central scheduler responsible for managing all the
jobs.
It involves multiple localized schedulers, which interact
with each other in order to dispatch jobs to the
participating nodes.
There are two mechanisms for a scheduler to
communicate with other schedulers
Direct Communication
Indirect Communication.
![Page 10: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/10.jpg)
Distributed Scheduling: Direct Communication
Each local scheduler can directly communicate withother schedulers for job dispatching.
![Page 11: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/11.jpg)
Each scheduler has a list of remote schedulers that they can
interact with, or there may exist a central directory that
maintains all the information related to each scheduler.
If a job cannot be dispatched to its local resources, its
scheduler will communicate with other remote schedulers
to find resources appropriate and available for executing its
job.
Each scheduler may maintain a local job queue(s) for job
management.
Distributed Scheduling: Direct Communication
![Page 12: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/12.jpg)
Distributed Scheduling: Indirect Communication
Communication via a central job pool
In this scenario, jobs that cannot be executed immediately are sent to a central job pool.
![Page 13: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/13.jpg)
Distributed Scheduling: Indirect Communication
Communication via a central job pool
Compared with direct communication, the localschedulers can potentially choose suitable jobs toschedule on their resources.
Policies are required so that all the jobs in the pool areexecuted at some time.
This method can be modified, so that all jobs arepushed directly in the job-pool after submission.
This way all small jobs requiring few resources canbe used for utilizing free resources on allmachines.
![Page 14: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/14.jpg)
Hierarchical scheduling
In hierarchical scheduling, a centralized scheduler interacts withlocal schedulers for job submission. The centralized scheduler is akind of a meta-scheduler that dispatches submitted jobs to localschedulers.
![Page 15: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/15.jpg)
Similar to the centralized scheduling paradigm,
hierarchical scheduling can have scalability and
communication bottlenecks.
However, compared with centralized scheduling,
one advantage of hierarchical scheduling is that
the global scheduler and local scheduler can have
different policies in scheduling jobs.
Hierarchical scheduling
![Page 16: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/16.jpg)
HOW SCHEDULING WORKS
Grid scheduling involves four main stages: resource discovery, resource selection, schedule generation and job execution
![Page 17: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/17.jpg)
Resource discovery
Goal: identify a list of authenticated resources that areavailable for job submission.In order to cope with the dynamic nature of the Grid,a scheduler needs to have some way of incorporatingdynamic state information about the availableresources into its decision-making process.A Grid environment typically usesa pull model,a push model ora push–pull model
for resource discovery.
![Page 18: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/18.jpg)
Resource discovery : The pull model
A single daemon associated with the scheduler can queryGrid resources and collect state information such as CPUloads or the available memory.
![Page 19: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/19.jpg)
Resource discovery : The pull model
The pull model for gathering resource informationincurs relatively small communication overhead,but unless it requests resource informationfrequently, it tends to provide fairly staleinformation which is likely to be constantly out-of-date, and potentially misleading.
In centralized scheduling, the resourcediscovery/query process could be rather intrusiveand begin to take significant amounts of time asthe environment being monitored gets larger andlarger.
![Page 20: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/20.jpg)
Resource discovery : The push model
![Page 21: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/21.jpg)
Resource discovery
Each resource in the environment has a daemonfor gathering local state information,
which will be sent to a centralized scheduler thatmaintains a database to record each resource’sactivity.
If the updates are frequent, an accurate view ofthe system state can be maintained over time;obviously, frequent updates to the database areintrusive and consume network bandwidth.
![Page 22: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/22.jpg)
Resource discovery : The push–pull model
The push–pull model lies somewhere between the pull model andthe push model.
![Page 23: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/23.jpg)
Resource discovery : The push–pull model
Each resource in the environment runs a daemonthat collects state information.
Instead of directly sending this information to acentral scheduler, there exist some intermediatenodes running daemons that aggregate stateinformation from different sub-resources thatrespond to queries from the scheduler.
A challenge of this model is to find out whatinformation is most useful, how often it should becollected and how long this information should bekept around.
![Page 24: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/24.jpg)
Resource Selection
The second phase of the scheduling process : Select those resources that best suit the constraints
and conditions imposed by the user, such as CPUusage, RAM available or disk storage.
The result of resource selection is to identify aresource list Rselected in which all resources can meetthe minimum requirements for a submitted job or ajob list.
The relationship between resources availableRavailable and resources selected Rselected is:
Rselected ⊆ Ravailable
![Page 25: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/25.jpg)
Resource Generation
The generation of schedules involves two
steps,
selecting jobs and
producing resource selection strategies.
![Page 26: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/26.jpg)
Resource Generation : Job Selection
The resource selection process is used to chooseresource(s) from the resource list Rselected for a givenjob.
Since all resources in the list Rselected could meet theminimum requirements imposed by the job, analgorithm is needed to choose the best resource(s) toexecute the job.
Although random selection is a choice, it is not anideal resource selection policy.
The resource selection algorithm should take intoaccount the current state of resources and choose thebest one based on a quantitative evaluation.
![Page 27: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/27.jpg)
Resource Generation : Job Selection
A resource selection algorithm that only takes CPU and RAM intoaccount could be designed as follows:
where :WCPU – the weight allocated toCPU speed;CPUload – the current CPU load;CPUspeed – real CPU speed;CPUmin – minimum CPU speed;
WRAM – the weight allocated toRAM;RAMusage – the current RAMusage;RAMsize – original RAM size; andRAMmin – minimum RAM size.
![Page 28: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/28.jpg)
Resource Generation : Job Selection
Example: Suppose that the total weighting used in thealgorithm is 10, where the CPU weight is 6 and the RAM weightis 4. The minimum CPU speed is 1 GHz and minimum RAM sizeis 256 MB. Resource information matrix is as follow:
Find the best resource for submitted job.
![Page 29: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/29.jpg)
Resource Generation : Job Selection
Then, evaluation values for resources can becalculated using the three formulas:
From the results we know Resource3 is the best choicefor the submitted job.
![Page 30: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/30.jpg)
Resource Generation : Resource Selection
The goal of job selection is to select a job from a
job queue for execution. Four strategies that can
be used to select a job are given below.
First come first serve
Random Selection
Priority-based Selection
Backfilling Selection
![Page 31: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/31.jpg)
Resource Generation : Resource Selection
First come first serve: The scheduler selects jobs for execution in the order of
their submissions. If there is no resource available for the selected job, the
scheduler will wait until the job can be started. The other jobs in the job queue have to wait.
There are two main drawbacks with this type of job selection.1. It may waste resources when, for example, the job
selected needs more resources to be available beforeit can start, which results in a long waiting time.
2. jobs with high priorities cannot get dispatchedimmediately if a job with a low priority needs moretime to complete.
![Page 32: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/32.jpg)
Resource Generation : Resource Selection
Random selection: The next job to be scheduled is randomly
selected from the job queue. Apart from the two drawbacks with the first-
come-first-serve strategy, jobs selection is notfair and job submitted earlier may not bescheduled until much later.
![Page 33: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/33.jpg)
Resource Generation : Resource Selection
Priority-based selection: Jobs submitted to the scheduler have different
priorities. The next job to be scheduled is the job with the
highest priority in the job queue. A job priority can be set when the job is submitted. One drawback of this strategy is that it is hard to set
an optimal criterion for a job priority. A job with the highest priority may need more
resources than available and may also result in a longwaiting time and inability to make good use of theavailable resources.
![Page 34: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/34.jpg)
Resource Generation : Resource Selection
Backfilling selection:
The backfilling strategy requires knowledge of
the expected execution time of a job to be
scheduled.
If the next job in the job queue cannot be
started due to a lack of available resources,
backfilling tries to find another job in the queue
that can use the idle resources.
![Page 35: 11. grid scheduling and resource managament](https://reader034.fdocuments.in/reader034/viewer/2022051412/5485ad35b479590a0d8b4edd/html5/thumbnails/35.jpg)
Job execution
Once a job and a resource are selected, the next
step is to submit the job to the resource for
execution.
Job execution may be as easy as running a single
command or as complicated as running a series
of scripts that may, or may not, include set up or
staging.