CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding [email protected]@ucsd.edu or...

30
CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding [email protected] or [email protected] College of Computer Science & Technology Jilin University National Biomedical Computing Resource, Uni versity of California, San Diego

Transcript of CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding [email protected]@ucsd.edu or...

Page 1: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

CSF4 Meta-Scheduler Tutorial

1st PRAGMA Institute

Zhaohui Ding

[email protected] or [email protected] of Computer Science & Technology

Jilin UniversityNational Biomedical Computing Resource, University of Ca

lifornia, San Diego

Page 2: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

2

Agenda

Meta-scheduler & CSF4 IntroductionCSF4 ArchitectureCSF4 FunctionalitiesFuture WorkDemo and Practice

Page 3: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

3

What is Meta-Scheduler

Resource Allocation & Management Heterogeneous Distributed Dynamic

Local Scheduler VS Meta-scheduler

Page 4: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

4

Local Scheduler VS Meta-Scheduler

Local Scheduler Meta Scheduler

Administrative scope

Cluster, Single Domain Grid, Multiple Domains,

Virtual Organizations

Hardware &

Software (OS)

Homogeneous Heterogeneous

OS-independent

Data management

LAN file system (NFS, FTP, scp)

Global file system

(Gridftp, Gfarm)

Security OS user/passwd, NIS, ssh public key

Grid Security Infrastructure(GSI)

Resource Management

Protocol

Specified, Private Protocols for different

local scheduler

Standard, Open, General-Purpose Protocols

(GRAM)

Scheduling mode

Centralized Centralized / Distributed

Page 5: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

5

Meta-Scheduler VS Local Scheduler

Local Scheduler LSF (Load Sharing Facility) PBS (Portable Batch System) SGE (Sun Grid Engine) Condor IBM Loadleveler

Meta-Scheduler CSF Maui (Silver) Gridway Nimrod-G Condor-G

Page 6: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

6

CSF4

What is CSF Meta-Scheduler Full Name: Community Scheduler Framework CSF4 contains a group of grid services host in GT4 CSF4 is a full WSRF compliant meta-scheduler. Open Source project and can be accessed at

http://sourceforge.net/projects/gcsf Developed by Lab. of Distributed Computing and Syst

em Architecture, Jilin University, China CSF4 has been added to Globus Toolkit 4 as an Ex

ecution Component

Page 7: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

7

CSF4 in Globus Toolkit 4

Page 8: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

8

A typical deployment

CSF4 Meta-Scheduler

Grid Site GT2

LSF

Grid Site GT2

PBS

Grid Site GT4

SGE

Grid Site GT2

Condor……

Page 9: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

9

What CSF4 Can Do?

Basic Functionalities Submit jobs to Grid without Specifying Cluster Monitor and Control Jobs Provide Queuing Service Schedule jobs and resource by custom-built poli

ces CSF4 Portlet (A Web browser based User Interf

ace)

Page 10: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

10

What CSF4 Can Do? (cont.)

Advanced Functionalities Multiple Domains Resource Information Sharing Multiple scale resource scheduling policies Automatic user credentials delegation Automatic data-staging Extensible scheduling framework Supporting grid parallel jobs (MPI&MPICH-G2)

Page 11: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

11

CSF4 – Architecture

Local Machine

PBS SGE CondorLSFLocal

MachinePBS SGE Condor

: Adapter : Local Scheduler

CSF4 Services

Queuing Service

Resource Manager LSF Service

GramPBS GramCondorGramFork GramSGE

WS-GRAM

gabd

Resource Manager Factory Service

Job Service

Reservation Srevice

GT2 Environment

GateKeeper

GramPBS GramSGE GramCondorGramFork

Resource Manager Gram Service

WS-MDSMeta Information

Grid Envi ronment

GramLSF

Page 12: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

12

CSF4 – Architecture User view

Page 13: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

13

Local Scheduler And Infrastructure Supported by CSF4

Local Scheduler Supported LSF PBS SGE Condor

Infrastructure Supported Globus Toolkit 4 Globus Toolkit 2

Page 14: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

14

CSF4 – Functionalities Scheduling Plug-in Framework

Designed For Queuing ServiceProvide A set of policiesCustomizableExtensible

Page 15: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

15

Existent Scheduling Policies

FCFS (First Come First Serve) round-robin Default policy

Throttle Restrict the number of jobs in a scheduling cycle

Array Job Plug-in Design for life science applications (such as AutoDock, BLAST)

MPICH-G2 Plug-in (under-developing) The plug-in guarantee the synchronized resource allocation can

be successful Data intensive applications plug-in (under-developing)

Page 16: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

16

Schedule plug-in & scheduling policies

Each policy is implemented inside a scheduling plugiEach policy is implemented inside a scheduling plugin modulen module

A queue can load multiple plugin modulesA queue can load multiple plugin modules

Page 17: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

17

Resource Information Sharing

A MDS information provider for CSF4

Multiple CSF4 can share the resource information

Page 18: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

18

CSF4 – Functionalities (cont.)

Deploy Multiple CSF4 in a Grid CommunityDeploy Multiple CSF4 in a Grid Community

Page 19: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

19

Array Job

AutoDock and Blast-like applications A large number of sub-jobs. Execute same binary Different input/output files

Page 20: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

20

Array Job (cont.)

Advantages Submit job only once Save submission time and memory storage

Executable: autogrid4Input: hsg.gpfOutput:hsg.glgArray Size: 100

Array Job

CSF4 Meta-scheduler

Executable: autogrid4Input: hsg.gpf.1Output:hsg.glg.1

Executable: autogrid4Input: hsg.gpf.100Output:hsg.glg.100

Executable: autogrid4Input: hsg.gpf.2Output:hsg.glg.2

Submit

......

Split

Page 21: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

21

Data Staging

Manual Data Staging Which clusters I can use? Which clusters my jobs will running on? Where is the output data? When will the job finish, so that I can stage-out

the output data?

Page 22: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

22

Manual Data Staging

Without Meta-Scheduler

User

Cluster

Cluster

Cluster

Input DataOutput

Data

Manual Stage In

Submit Job

Manual Stage Out

Page 23: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

23

Automatic Data Staging

With CSF4 Automatic Data Staging

User Cluster

Cluster

Cluster

Submit Job

Input Data

Output Data

Submit Job

Gridftp

CSF4 Meta-Scheduler

Page 24: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

24

Integrate CSF4 with Gfarm

With CSF4 Automatic Data-Staging and Gfarm

User

Submit JobSubmit Job

Gridftp

CSF4 Meta-Scheduler

Gfarm

Input Data

Output Data

Input Data

Output Data

Create

Page 25: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

25

Application Based Scheduling

Cluster

Cluster

Cluster

Autodock

NAMDOther

Resource

12 CPUs

Autodock

BLAST

Other Resource

64 CPUs

NAMD

Autodock 8 CPUs

Other Resource

CSF4 Meta-Scheduler

Available Resource Lists

Resource

Scheduling Modules

Resource Requirements

Autodock1990 CPUs

NAMD24 CPUs

Autodock100 CPUs

Autodock1 CPU

……

……

User

Page 26: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

26

CSF4 User Interface

CSF4 Portal

Page 27: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

27

CSF4 User Interface

CSF4 Command Line

Page 28: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

28

Under-Developing work

CSF4

Virtual Resource Pool

VJMgr1

2

3

54

vj: virtual job, rj: real job

· Cluster Selection· Resource Pre-Check· Resource Re-assign

Cluster Cluster

Busy Host

Cluster

BusyHost

vj

rj

vj

rj

vj

rj

vj

rj

SGE PBS LSF

vj

rj

Page 29: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

29

Demo & Practice

https://www.nbcr.net/pub/wiki/index.php?title=CSF4_Tutorial_PRAGMA13

Page 30: CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding zhding@ucsd.eduzhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cnzhaohui.ding@email.jlu.edu.cn.

30

Thank you감사합니다

ありがとうございます謝謝谢谢