Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández...

29
Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB

Transcript of Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández...

Page 1: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid

Environments

Enol FernándezUAB

Page 2: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 2partner’s

logo

Introduction CrossBroker Glide In Parallel Job Support Interactive Job Support Conclusions

Page 3: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 3partner’s

logo

REMOTE SITE

Internet

REMOTE SITE

Middleware Middleware

SERVICES

Middleware

Batch execution on Grids

F1 F2Job

O1 O2

Page 4: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 4partner’s

logo

REMOTE SITE

Internet

REMOTE SITE

Middleware Middleware

SERVICES

Middleware

F1 F2Job

Parallel & Interactive Job Execution

Use of resources from different sitesResource-sets searchCo-allocation & synchronizationFast start-upExecution in high-occupancy situations

F1 F2Job

MPI

I/O forwarding

Page 5: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 5partner’s

logo

CrossBroker

CrossBroker does automatic scheduling in Grid Environments

Resource discoveryResource SelectionJob Execution

Jobs not treated by gLite:parallel jobs (MPI)

Run in more than one resource, in a coordinated fashion.

Interactive jobsThe user interacts with the application during its execution

Page 6: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 6partner’s

logo

CrossBroker

SchedulingAgent

ResourceSearcher

ApplicationLauncher

Condor-G DAGMan

MigratingDesktop

InformationIndex

ReplicaManager

CrossBroker

EGEE/Globus

LRMS

EGEE/Globus

LRMS

CE CE

WN WN

Outdated informationDynamic changes

LRMS (PBS, LSF, Condor): limited external control

Non cooperative LRMS

Local user jobs

Page 7: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008partner’s

logo

Glide In

The ideaEach batch job is encapsulated in an agent that takes control over the WN independently of its LRMS

Lightweight Virtual MachinesEach Worker Node is divided in 2 VMEach VM can execute jobs independently (e.g. batch and interactive)Fast startup of jobs (no need to go trough globus + LRMS)NOT a full virtual machine (Xen, VMWare,…)NO need for special priviledges in the WN

Page 8: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 8partner’s

logo

Glide In

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMSBatchJob

Page 9: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 9partner’s

logo

Glide In

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Agent

VM1 VM2

BatchJob

Page 10: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 10partner’s

logo

Glide In

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Agent

VM1 VM2

BatchJob

Page 11: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 11partner’s

logo

Glide In

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Agent

VM1 VM2

BatchJob

Available for other

jobs

Page 12: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 12partner’s

logo

Parallel Job Support

Support for parallel jobs:Open MPIPACX-MPIMPICH-P4MPICH-G2Plain (just the machines)

Takes into account sites capabilites. Low level details of MPI implementations

and sites handled by starter scripts. mpi-start is configured automatically and used by default.

Page 13: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 13partner’s

logo

Parallel Job Support

Changes in JDLJOBTYPE:

Normal: sequential jobs, just one CPUParallel: more than one CPU

SUBJOBTYPE:openmpipacx-mpimpichmpich-g2Plain

Plain allows easy extension for supporting new parallel job types

Page 14: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 14partner’s

logo

Parallel Job Support

Type = "Job";VirtualOrganisation = "imain";JobType = "Parallel";SubJobType = "pacx-mpi";NodeNumber = 5;Executable = "test-app";Arguments = "-v";InputSandbox = {"test-app", "inputfile"};OutputSanbox = {"std.out", "std.err"};StdErr = "std.err“;StdOutput = "std.out";Rank = other.GlueHostBenchmarkSI00 ;Requirements = other.GlueCEStateStatus == "Production";

Page 15: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 15partner’s

logo

Parallel Job Support

[Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10

[Groups with 2 CEs] [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 [Rank=1000] bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 lngrid02.lip.pt:2129/jobmanager-pbs-workq freeCPUs = 2

CE

CE4= xgrid.icm.edu.plFreeCPUs = 6Disk = 100AverageSI = 1000

CE

CE2=aocegrid.uab.esFreeCPUs = 10Disk = 100AverageSI = 4000

CE

CE3=bee001.ific.uv.esFreeCPUs = 3Disk = 100AverageSI = 1000

CE

CE1=zeus.cyf-kr.edu.plFreeCPUs = 2Disk = 100AverageSI = 2000

CrossBroker

MPI enabled CE

Non-MPI enabled CE

CE

CE5=lngrid02.lip.ptFreeCPUs = 2Disk = 100AverageSI = 1000

[Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10

[Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3Rank=1000] lngrid02.lip.pt/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3

Page 16: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 16partner’s

logo

Parallel Job Support

CE

CE3=bee001.ific.uv.esFreeCPUs = 3Disk = 100AverageSI = 1000

CrossBroker

CE

CE5=lngrid02.lip.ptFreeCPUs = 2Disk = 100AverageSI = 1000

MPISubTask

MPISubTask

Startupserver

1. Launch a PACX Startup Server

2. Submit MPI Subtasks

3. MPI-START will start each of the Subtasks

4. Subtask notify the startup server and start running

5. CrossBroker monitors the application

Page 17: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 17partner’s

logo

Parallel Job Support

CrossBroker search and selects sets of resources for the jobs

There is no guarantee that all tasks of the same job will start at the same time

1st choice: select only sites with free resources. The job will run immediately. Unfortunately, free resources are not always available2nd choice: allocate a resource temporally and wait until all other tasks show up. Timeshare the resource with a backfilling policy to avoid resource iddleness

Page 18: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 18partner’s

logo

Glide In for co-allocation

SchedulingAgent

Condor-G

CrossBroker Grid Resource

LRMSMPIJOB

Page 19: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 19partner’s

logo

Glide In for co-allocation

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Agent

VM1 VM2

Waiting for the rest of

tasks

MPIJOB

MPITask

Page 20: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 20partner’s

logo

Glide In for co-allocation

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Agent

VM1 VM2MPITASK

JOB

BackFillingWhile the MPI waits

Page 21: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 21partner’s

logo

Glide In for co-allocation

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Agent

VM1 VM2MPI

TASK

All tasksReady!

JOB

Page 22: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008partner’s

logo

Interactive Job Support

Fast startup:Cache of resources: fast matchmakingScheduling priority: use free resources or glideinsFast notification of events

CrossBroker injects interactive agents that enable communication between user and job

Transparent to the userCondor Bypass & glogin agents

Page 23: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 23partner’s

logo

Interactive Job Support

Changes in JDLINTERACTIVE: true/false. Indicates that the job is interactive and the broker should treat it with higher proirity

INTERACTIVEAGENTINTERACTIVEAGENTARGUMENTS

These attributes specify the command (and its arguments) used to communicate with the user.

Page 24: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 24partner’s

logo

Interactive MPI application

Type = "Job";VirtualOrganisation = "imain";JobType = "Parallel";SubJobType = “openmpi";NodeNumber = 4;Interactive = TRUE;InteractiveAgent = “glogin“;InteractiveAgentArguments = “-r –p 195.168.105.65:23433“;Executable = "test-app";InputSandbox = {"test-app", "inputfile"};OutputSanbox = {"std.out", "std.err"};StdErr = "std.err“;StdOutput = "std.out";Rank = other.GlueHostBenchmarkSI00 ;Requirements = other.GlueCEStateStatus == "Production";

Page 25: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008partner’s

logo

Interactive MPI application

Worker

User’s Machine

Video Stream glogin Master

WorkerWorker

MPI

Started with mpi-start

Remote Resource Started by theCrossBroker

Page 26: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 26partner’s

logo

Glide In for interactive jobs

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Agent

VM1 VM2BATCH

INT.JOB

Page 27: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008 27partner’s

logo

Glide In for interactive jobs

SchedulingAgent

Condor-G

CrossBroker

ApplicationLauncher

Grid Resource

LRMS

Agent

VM1 VM2BATCH

INT.JOB

BATCH

Priorityadjustment

Startup-timeReductionOnly one

layer involved

Page 28: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

INGRID 2008, 9th april 2008partner’s

logo

Conclusions & Future work

CrossBroker gives support to Parallel and Interactive jobs

Automatically

Interoperable with EGEE

Glide InFast startup of jobs

Co-allocation without reservation or wasting resources

Future work:Explore more complex multiprogramming (e.g. 3 or more VM)

Decentralization of the services

Page 29: Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

partner’slogo

Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid

Environments

Enol FernándezUAB