1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial...

39
1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, Ch ina CSF4 Tutorial The 3rd PRAGMA Institute, Penan g Malaysia, 2008-10-21

Transcript of 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial...

Page 1: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

1

Dr. Xiaohui WeiCollege of Computer Science and Techn

ology, Jilin University, China

CSF4 Tutorial

The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21

Page 2: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

2

Content

• What is CSF• CSF4 Services• CSF4 Plugin Mechanism• Workflow and data aware scheduling• Array Job• VJM – Resource Co-allocation• How to use CSF4 in your Grid• Current Status and Future Plan

Page 3: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

3

What is CSF4• CSF4 is a WSRF compliant meta-scheduler, its first version was released a

s an execution management service component of Globus Toolkit 4.(2004) • It is an open source project. (sourceforge.net)

Page 4: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

4

What is CSF4

• CSF4 is designed as a Meta-scheduler– Global job scheduling, make job scheduling decisions involving

resources across/span multiple administrative domains (co-allocation)

– CSF4 does not own the resources– CSF4 need work with local schedulers (like LSF, PBS, Condor,

SGE etc), which are resource owners, to fulfill job dispatch• CSF4 is WSRF compliant

– CSF4 consists of a set of WSRF based services, such as job service, queue service, resource management service etc.

• CSF4 uses GRAM to work with local schedulers– Support both of WS-GRAM(GT4) and Pre-WS GRAM(GT2)– Support LSF, PBS and SGE– Support job submission, job control, query– Support automatically cluster selection for job execution

Page 5: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

5

What is CSF4

Application Layer

Collective Layer

Fabric Layer

Resource Layer

User Applications

Meta-Scheduler

LSF SGE PBS Condor

Resource Manager adapter

Resource Management

Protocal

Reservation & Job excution Request

Reservation & Job excution Reservation Info, Resource Info

Reservation & Job excution Reservation Info, Resource Info

Connectivity Layer

Web Service interface

Gram Protocol

Page 6: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

6

What is CSF4

CSF4 Meta-Scheduler

Grid Site GT2

LSF

Grid Site GT2

PBS

Grid Site GT4

SGE

Grid Site GT2

Condor……

Page 7: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

7

What is CSF4

• Flexible and Expendable scheduling policies– CSF4 supports scheduling plug-in model, easy to expend new

policies– FCFS/Throttle scheduling policies were shipped with the first

version of CSF4– Workflow and Data Aware scheduling were implemented recently– The users are able to combine multiple scheduling policies to

implement more advanced job scheduling (flexible)– The users are able to introduce new scheduling policies

• Support resource co-allocation– Support resource co-allocation across multiple administrative

domains– We implemented a resource co-allocation service, VJM, in CSF

• VJM is not rely on resource advance reservation (so it can work with SGE and GRAM)

• VJM is going to be enhanced as an independent WSRF service to provide resource co-allocation for grid applications (very soon)

Page 8: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

8

CSF4 Services

Page 9: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

9

CSF4 Services

• CSF4 consists of a bunch of web services, which are Job Service, Reservation Service, Queuing Service, and Resource Manager Factory Service etc.

• Job Service – Job Service provides the interfaces for end users to

fully control a job. • The users are able to create job instances, submit jobs to a

queue, modify a job’s description and monitor job status etc. Once created, a job’s EPR will be returned to the user for further operations.

– CSF jobs are described in RSL– Any CSF job must belong to a queue for scheduling

Page 10: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

10

CSF4 Services

• Reservation Service– Reservation Service allows the users to reserve the resources for their jo

bs in advance so that the availability of the resources can be guaranteed.

– Resource reservation requests are treated as special jobs, with resource requirements but without execution binaries

– CSF extended RSL to support resource reservation (support for LSF only)

– The reservation requests will be put into a queue, and then be forwarded to the local scheduler by Queue Service like normal jobs

– Both the jobs and reservation requests are hosted in GT4 container as RPs (Resource Property), and their EPRs will be returned to the users

– In the mean time, those EPRs are saved in WS-MDS as well. • The recovery mechanism of GT4 Index Service will make the jobs and reserv

ations persistence after CSF4 reboot. • GT4 Trigger Service is able to notify the end users once their jobs or reservat

ions status changed.

Page 11: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

11

CSF4 Services

• Queuing Service– The container holding the jobs and reservation requests– A queue normally represents a specific scheduling policy– Multiple queues can be configured in CSF, and different

queues usually have different scheduling polices configured.

• Scheduling policies are capsulated in plug-ins• The plug-ins are dynamic loaded for a queue according to

configuration• More scheduling plug-ins implemented means richer scheduling

policies are provided (combination)

– At submission time, the user should choose a queue for their jobs so that the proper scheduling policy can be applied. (Otherwise, it will be put into the default queue. )

Page 12: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

12

CSF4 Services

• Resource Manager Services– Resource Manger Services are not used by end users directly. They

are designed to support alternative protocols other than WS GRAM. – Resource Manager Services consist of one factory service, Resour

ce Manager Factory Service, and two instance services, Resource Manager Lsf Service and Resource Manager Gram Service.

– Resource Manager Lsf Service is an instance service designed to support enhanced-GRAM protocol between CSF4 and LSF. Some advanced features, such as resource reservation are supported via this service.

• Following the same idea, new instance services can be designed for SGE, and PBS as well to support special features not supported by GRAM yet.

– Resource Manager Gram Service to support GRAM2(GT2) protocol

Page 13: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

13

CSF4 Services

Page 14: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

14

CSF4 Plugin Mechanism

• Motivations– In the real world, different users have different requirements. No

matter how many scheduling polices are provided by a scheduler, no resource management system can meet all users’ needs.

– But for a specific user, he/she does not need many scheduling policies. For example, most of Platform LSF customers only use 5%-10% LSF features.

– It’s difficult to implement many scheduling features in a single module, it’s harder to maintain and add new features (from vendor point of view)

– It’s a hard work for users to implement tailored scheduling policy by themselves. Because it’s very complex to implement a scheduler from the scratch. (it would be useful if we enable the users to implement scheduling policies by themselves easily?)

Page 15: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

15

CSF4 Plugin Mechanism• Overview

– The CSF4 plug-in mechanism consists of framework and plug-in modules

– Different scheduling policies are capsulated in individual scheduling plug-in modules

– Scheduling polices are defined for each queue respectively. Normally Multiple queues are defined in the scheduler, different queue have different policies (default queue’s policy is FCFS)

– The scheduler framework works as a motherboard with slots to hold scheduler plugin modules for each queue.

– Framework will do all the common and tedious work that a job scheduler has to do, such as job management, available resource collection, job dispatch and monitor, events delivery, and recovery … …

– The CSF4 framework will load the desired plug-in modules for each queue according to the configuration

– Multiple plug-in modules can be used in combination– CSF4 provide the plug-in APIs so that the users can develop new sched

uling policies easily

Page 16: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

16

CSF4 Plugin Mechanism

Queue 1

Job List

Queue 2

Job List

Workflow

Plugin

Data Aware

Plugin

Resource Availability

Info

Job

Dispatch

CSF Framework

FCFS

Plugin

CSF4 Plug-in Architecture

Page 17: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

17

CSF4 Plugin APIs

schedInit()

schedOrder()

schedMatch()

schedPost()

Initialization

Decide which job can/cannot go, and the job dispatch order

Decide the job execution locations

Not used so far, enable plug-in do something after the scheduling decisions are made, such as update internal counters etc

jobCreated()

jobSubmitted()

queuedJobStatusChanged()

Event Notification Functions:

Scheduling call back Functions:

-------------------------------

runningJobStatusChanged()

jobRecovery()

resourceReady()

CSF4 framework will inform the plug-ins once an event happens

**Note: Once you implement the above functions, you normally can implement a scheduling policy (you do NOT need implement all of them.)

**Note: Such notifications can just be ignored if your plug-in (scheduling policy) is not interested in them

Page 18: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

18

Develop simple scheduling policies

• 1. Example one: FCFS (First Come First Serve) Policy• As we just care about the job dispatch order, so we just need imple

ment SchedOrder() in FCSF plug-in. All the other functions just leave empty. The p-sudo code is as below,

Vector SchedOrder (Vector Jobs) { // bubble sorting while (HaveChange) { HaveChange = False; for ( 1< i < n ) { if( jobs[i].submitTime > jobs[i+1].submitTime ) { swap (jobs[i], jobs[i+1]); HaveChange = True; } end if } // end for } // end while} // End of SchedOrder()

Page 19: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

19

Develop simple scheduling policies

• 2. Example two: Small job go first - SJFS• Similar with FCFS, so we just need implement SchedOrder() in SJF

S plug-in. The only difference is that the jobs are sorted by their required CPU numbers instead of submission time.

Vector SchedOrder (Vector Jobs) { // bubble sorting while (HaveChange) { HaveChange = False; for ( 1< i < n ) { if( jobs[i].numCPU > jobs[i+1].numCPU ) { swap (jobs[i], jobs[i+1]) HaveChange = True; } end if } // end for } // end while} // End of SchedOrder()

Page 20: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

20

Data Aware Plugin• Data Aware Plugin is to decide the job execution location instead of dispatch ord

er. So it need implement SchedMatch() instead of SchedOrder().• We implemented a data aware plugin to schedule data intensive applications on

Gfarm file system.

SchedMatch() Job dispatch instructions to

CSF framework

Gfarm APIs/Commands

CSF Plugin APIs

Job list

Available Host List

Data aware plugin

Regular Job

Data intensive Job

Information

Schedule Instructions

Hosts with required data file

Map jobs to hosts

Data file location info

Page 21: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

21

Grid Workflow Plugin• We implemented a Workflow plugin to support workflow jobs

– Using XPDL (XML Process Definition Language) describe grid workflow tasks– Scheduling algorithm try to get the least makespan time and minimum space cost

Plan

Maker Transfer ready-to-go

workflow sub jobs into

real jobs (RSL), and insert

them into framework’s job

list

Scheduler Framework APIs

Job list Updated Job list

Workflow Plugin

Non workflow Job in RSL Workflow job in XPDL

Information

Schedule Instructions

Finished workflow Sub job(XPDL) Ready to go workflow sub job(XPDL)

Not-Ready workflow sub job(XPDL) Real workflow sub job (RSL)

Generate real workflow sub job (RSL)

from its XPDL description

Page 22: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

22

An example of Workflow

Sub Workflow

Start MWF3MWF1

MWF2

MWF5

MWF6

End

Start

SWF0

SWF1

End

MWF4

MWF7Main

WorkflowMWF0

Page 23: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

23

Workflow Job description in XPDL

Page 24: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

24

Integrate Grid Workflow Scheduling with Data Aware Scheduling

• Data aware plugin and Workflow plugin can be used in combination to support data intensive workflow applications

File location info/

operations

Non workflow job(RSL) Workflow job (XPDL)

Ready job Non ready job

Real job (RSL) Available hosts

...

Workflow Plugin Data Aware Plugin

CSF4 Framework

..

.

Job Dispatch Resource

List

Updated Job List

Gfarm APIs

map

Job List

Finish job

Page 25: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

25

Array Job

• Motivations: – In some case that the user would execute many instances (1000 for

example) of same application to compete a big task, and there is no dependency and communication among jobs.

• For example, in life science, AutoDock may be used to dock different ligands to a target protein structure, or Blast may be used with different input sequences to search for potentially related sequences within a target database.

– The users have to submit a bunch of same jobs to the meta-scheduler, it is a time-consuming operation to submit a huge number of jobs one by one as below,

• Csf-job-submit sameApplication – i inputData001 –o output001• Csf-job-submit sameApplication – i inputData002 –o output002• …. ….• Csf-job-submit sameApplication – i inputData1000 –o output1000

Page 26: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

26

Array Job

• CSF4 array job features– The user just use one command to submit any number of array j

obs as below (save the job submission time dramatically)• Csf-job-submit sameApplication –A 1-1000 – i input –o output

– CSF4 will generate 1000 instances of sameApplication in the system, and

• The nth instance of the job will take “input.n” as input file name, and “output.n” as output file name.

• These 1000 instances of sameApplication are not generated immediately after the submission, but step by step when there are available resources for execution. (reduce the memory cost)

• The user can query the status of the array job as a whole, or the status of each individual instance of the array job. (good job control)

Page 27: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

27

Array Job Plug-in

Job [1…1000]

Total: 1000

Finish: 1-50

Running: 50-100

Next: 101

Generate the sub jobs

(array element) of the

array according available

resources, and insert them

into framework’s job list

Array Job 1-1000

Updated Job list

Array Job

Normal Job in RSL Array job

Generate array job elements (RSL) from the job array

1-1000 Submit

to CSF4

101 102 150

1-n

n Array job element in RSL

Array Job elements

Page 28: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

28

VJM – Resource Co-allocation

• Co-allocation challenges– Some applications’ resource requirements cannot meet

by a single domain, so resource co-allocation is very important especially for large scale parallel jobs

– Co-allocation is time consuming and easy to fail (time out)

• The resources in a grid are actually owned by different domains, each domain has its own scheduling policy with dynamic resource availability. The resource availability is not guaranteed.

• A number of co-allocation protocols proposed like Duroc (MPICH-G2) are based on two phase commit. However, the implementation of Duroc in MPICH-G2 mixed the resource reservation stage and the job execution stage. ( MPI_INIT() )

• Resource advance reservation is proposed to guarantee the resource availability in local domains

Page 29: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

29

VJM – Resource Co-allocation

• The problems of resource advance reservation– Not all the local schedulers support resource reservation– The feature requires the end user to specify the duration of

reservation, but in some cases it’s infeasible• The users usually have little knowledge on the resource availability

of the grid resources, it is hard for them to give out a good begin time. In [10], the begin time of a reservation was set to a random number between 0~2 hours, it is not reasonable.

• It’s also hard to give out a good end time of the reservation. When the users do not know the runtime of their applications (many cases), they have to set an upper limit value to ensure the job’s completion. This will aggravate the competing and conflict of resource allocation.

Page 30: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

30

VJM – Resource Co-allocation

• VJM model– VJM separate the resource co-allocation phase from the job execution.– In the resource co-allocation phase, VJM sends virtual jobs (VJobs) ins

tead of real parallel jobs to grid sites via GRAM protocol– A virtual job has same resource requirements with its corresponding real

job but without execution binaries. – When the virtual job startup, it will report back to VJC (virtual job center) t

hat the resource for the sub job has been reserved. – As all the virtual jobs registered successfully (co-allocation succeed), VJ

C dispatches the real jobs to their corresponding virtual jobs to start. – With VJM, the user does not need to specify the time duration of the reso

urce reservation. VJM will automatically reserve the earliest available resources for the real jobs in a dynamic grid environment.

• Based on queuing theory, VJM evaluates the overall capability that a local resource domain can provide through its history data, such as the average job waiting time in the local queue, and the average job execution time and so on.

• Based on the evaluation, VJM will decide which clusters should be prefered for a parallel job and how to distribute the VJobs among them.

Page 31: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

VJ

Parallel Job A Parallel Job B Parallel Job C

PBS

RR R R...

SGE

R R R R...

LSF

R R R R...

R R R

R

R

R

VJob Pool

...Vjob Manager

VJC

Meta-Scheduler (CSF4)

Resource Request Queue

Local Queue

Notify

RJ RJRJ RJ

RJ RJ RJ RJ

RJ RJ

RJ RJ

virtual job that has launched the real jobRJ RJ RJ

R PBS resource R LSF resource SGE resourceR

virtual job that has obtained resourceR R RVJ virtual job that has not obtained resource

RJ real jobRJ RJ

Page 32: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

32

VJM – Resource Co-allocation

• Actually the set of virtual jobs corresponding to an application dynamically construct a cross-domains virtual execution cluster dedicated for this application to run.

• It is a best-efforts style resource co-allocation• It is more suitable for the case that the user does not

know the resource availability and his/her application’s runtime. If the user has enough knowledge, he/she can use resource advance reservation.

Page 33: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

33

How to use CSF

• Use CSF4 front end to perform global job scheduling in your grid. You can submit your jobs to CSF4 via command line or CSF4 Portal.

Page 34: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

34

CSF4 Portal

Page 35: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

35

How to use CSF4

• Provide backend meta-scheduling for your grid environment with your own Web Portal – like My Workshpere by NBCR)

Page 36: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

36

CSF4 APIs

• You need do some integration work in this case.

Page 37: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

37

How to deploy the scheduling policies

• Configure multiple queues in CSF4, and each queue with different scheduling policies (plug-ins). Then submit jobs to the proper queue according to their scheduling requirements.

• Combine multiple CSF4 plugins to provide more advanced meta-scheduling for a queue.– Such as combine workflow plugin with data aware plugin

• Develop your own meta scheduling policies using CSF4 plug-in APIs (For advance users)

Page 38: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

38

Current Status and Future Plan

• We are wrapping up the new features• We are going to provide complete user manual and developer guide

very soon (weakness)• We hope there will be more users to use CSF4 and give us the

feedback• We will continue working on the plug-in mechanism. We hope more

and more users can develop their own scheduling policies via CSF4 plug-in APIs (one of our major objectives)

• We will continue working on the VJM mechanism. We plan to make VJM as a separated middle ware to provide resource co-allocation service in a grid.

• We are porting CSF4 to GT4.2(almost finished)

Page 39: 1 Dr. Xiaohui Wei College of Computer Science and Technology, Jilin University, China CSF4 Tutorial The 3rd PRAGMA Institute, Penang Malaysia, 2008-10-21.

39

谢谢!Thanks!