Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

29
Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University

description

Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications. Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University. Parameter Sweep Applications. An important class of applications Set of independent tasks MCell Application - PowerPoint PPT Presentation

Transcript of Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Page 1: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Opportune Job Shredding:An Efficient Approach for

Scheduling Parameter Sweep Applications

Rohan Kurian, Pavan Balaji, P. Sadayappan

The Ohio State University

Page 2: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Parameter Sweep Applications

An important class of applicationsSet of independent tasksMCell Application

3D simulations for sub-cellular architecture/physiologyGTOMO (Parallel Tomography) Application

Multiple view-point simulation

Systems exist for scheduling on the Grid Cluster-based Scheduling?

Page 3: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Application Level Schedulers

Manage the scheduling of applicationsBreak the application to appropriate

chunksAPST (AppLeS Parameter Sweep Template)NIMROD

Greedy approach to schedule PSA chunks

Page 4: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions

Page 5: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Job Scheduling in Clusters Mapping arriving jobs to available resources Multiple Schemes for Scheduling

First Come First Serve (FCFS) Conservative Scheduling Aggressive or EASY Scheduling

Fair-Share Constraints A user can not have more than ‘N’ queued jobs

Submitting the multiple chunks of a PSA job Violation of Fair-Share constraints Combine chunks to form a single parallel job

Page 6: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Formation of PSAs in ClustersSmall

Independent Tasks

Parallel Parameter

Sweep Application

Page 7: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions

Page 8: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Multi-Site Job SchedulingMultiple Simultaneous Requests

Job submitted to multiple sitesStarted on the earliest clusterExisting schemes have limitations

Heterogeneous ClustersDifferent Scheduling Schemes

Page 9: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Multiple-simultaneous-requests

Meta Scheduler

Local Scheduler

Meta Scheduler

Local Scheduler

Meta Scheduler

Local Scheduler

Jobs

Jobs

JobsSite 1 Site 2

Site 3

Page 10: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions

Page 11: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

PSA Scheduling Strategies Flooding based Job Shredding

Submit all chunks in the PSA at onceGreedy approach Improves User and System metricsDoesn’t ensure fairness to Non-PSA jobs

Opportune Job ShreddingUses an additional Application-Level Scheduler

Monitors the current schedule of the system If no normal backfill is possible

Allow PSA jobs to shred and backfill

Page 12: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions

Page 13: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Multi-Site Scheduling for PSAsTwo-level Application Level SchedulersNo constraints on sites

Allowed to have different speedsAllowed to have different scheduling

policiesSimilar to “Multiple Simultaneous

Requests”Simultaneous requests only for PSAs

Page 14: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Multi-Site Scheduling for PSAs

App-Level Scheduler

Job Queue Local Scheduler

App-Level Scheduler

Job Queue Local Scheduler

App-Level Scheduler

Job Queue Local Scheduler

MetaApplication-Level

Scheduler

Site 1

Site 2

Site 3

Page 15: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions

Page 16: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Performance MetricsResponse Time

Completion Time – Submit TimeSlowdown

Response Time / RuntimeLoss of Capacity (LOC)

LOC = min {(waiting jobs procs), idle procs}

T = Time for which this state lastsLOC = LOC x T

Page 17: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Evaluation Scheme Simulation based Approach CTC trace from Feitelson’s archive EASY backfilling used For multi-site evaluation

CTC traces from 3 different monthsProcessing speeds in the ratio 2:1:3

Page 18: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Flooding Based Job ShreddingAverage Slowdown (10% PSA Jobs)

-150

-100

-50

0

50

100

1 1.2 1.5

LoadP

erce

ntag

e de

crea

se

All Jobs PSA Jobs Non-PSA Jobs

Average Response Time(10% PSA Jobs)

-20

0

20

40

60

80

1 1.2 1.5

Load

Per

cent

age

decr

ease

All Jobs PSA Jobs Non-PSA Jobs

• Up to 60% improvement for PSA Jobs• Up to 90% worse performance for Non-PSA

Jobs

Page 19: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Flooding: Job Category wise breakup

Average Response Time(10% PSA Jobs)

-100

-80

-60

-40

-20

0

20

1 1.2 1.5

Load

Per

cent

age

decr

ease

NarrowShort NarrowLongWideShort WideLong

Average Slowdown(10% PSA Jobs)

-140-120-100

-80-60-40-20

02040

1 1.2 1.5

LoadP

erce

ntag

e de

crea

seNarrowShort NarrowLongWideShort WideLong

• Narrow Short Non-PSA jobs suffer most• Loss of back-filling opportunities is the main

reason

Page 20: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Flooding: Loss of CapacityLoss Of Capacity (10% PSA jobs)

0

10

20

30

40

50

60

70

80

1 1.2 1.5

Load

Per

cent

age

decr

ease

10% PSA Jobs

• Up to 75% improvement in the Loss of Capacity

Page 21: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Opportune Job ShreddingAverage Response Time

(10% PSA Jobs)

-2

0

2

4

6

8

10

1 1.2 1.5

Load

Per

cent

age

decr

ease

All Jobs PSA Jobs Non-PSA Jobs

Average Slowdown(10% PSA Jobs)

-100

1020304050607080

1 1.2 1.5Load

Per

cent

age

decr

ease

All Jobs PSA Jobs Non-PSA Jobs

• Up to 70% improvement for PSA Jobs• Less than 2% worsening in performance for Non-

PSA Jobs

Page 22: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Opportune: Job Category wise breakup

Average Response Time(10 % PSA Jobs)

-3

-2-1

01

23

4

1 1.2 1.5

Load

Per

cent

age

decr

ease

NarrowShort NarrowLongWideShort WideLong

Average Slowdown (10% PSA Jobs)

-8

-6

-4

-2

0

2

4

1 1.2 1.5

LoadP

erce

ntag

e de

crea

seNarrowShort NarrowLongWideShort WideLong

• No category of Non-PSA jobs suffers more than 7%

Page 23: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Opportune: Loss of CapacityLoss Of Capacity (10% PSA Jobs)

02468

101214

1 1.2 1.5

Load

Per

cent

age

decr

ease

10% PSA Jobs

• Up to 12% improvement in the Loss of Capacity

Page 24: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Opportune (Multi-Site)Average Response Time

(10% PSA Jobs)

0102030405060708090

1 1.2 1.5Load

Perce

ntag

e dec

reas

e

PSA Jobs Cluster1 Non-PSA Jobs Cluster1PSA Jobs Cluster2 Non-PSA Jobs Cluster2PSA Jobs Cluster3 Non-PSA Jobs Cluster3

Average Slowdown (10% PSA Jobs)

-40-20

020406080

100120

1 1.2 1.5

LoadPe

rcent

age d

ecre

ase

PSA Jobs Cluster1 Non-PSA Jobs Cluster1PSA Jobs Cluster2 Non-PSA Jobs Cluster2PSA Jobs Cluster3 Non-PSA Jobs Cluster3

• Up to 95% improvement for PSA Jobs• No significant loss of performance for Non-PSA jobs

Page 25: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Opportune (Multi-Site):Response Time

Average Response Time (10% PSA Jobs)

0102030405060708090

1 1.2 1.5Load

Perce

ntag

e dec

reas

e

PSA Jobs Cluster1 Non-PSA Jobs Cluster1 PSA Jobs Cluster2Non-PSA Jobs Cluster2 PSA Jobs Cluster3 Non-PSA Jobs Cluster3

• Up to 75% improvement for PSA Jobs• No significant loss of performance for Non-PSA jobs

Page 26: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Opportune (Multi-Site):Slowdown

Average Slowdown (10% PSA Jobs)

-40-20

020406080

100120

1 1.2 1.5

Load

Perce

ntag

e dec

reas

e

PSA Jobs Cluster1 Non-PSA Jobs Cluster1 PSA Jobs Cluster2Non-PSA Jobs Cluster2 PSA Jobs Cluster3 Non-PSA Jobs Cluster3

• Up to 95% improvement for PSA Jobs• No significant loss of performance for Non-PSA jobs

Page 27: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Opportune (Multi-Site):Loss of Capacity

Loss Of Capacity (10% PSA Jobs)

05

101520253035404550

1 1.2 1.5

Load

Per

cent

age

decr

ease

Cluster1Cluster2Cluster3

• Up to 45% improvement in the Loss of Capacity

Page 28: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Concluding RemarksOpportune Job Shredding

Efficient Scheduling of PSAsSingle Site and Multi-Site versionsSignificant improvement for PSA jobsEnsures that Non-PSA jobs are not affected

Plan to integrate this with Prod. Schedulers

Page 29: Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications

Thank You!