Workflow Task Clustering for Best Effort Systems with Pegasus

1
Workflow Task Clustering Best Effort Systems with Pegasus Gurmeet Singh, Mei-Hui Su, Karan Vahi Ewa Deelman, Gaurang Mehta Information Sciences Institute University of Southern California Marina del Rey, CA 90292 Bruce Berriman, John Good Infrared Processing and Analysis Center California Institute of Technology Pasadena, CA 91125 Daniel S. Katz Center for Computation and Technology Louisiana State University Baton Rouge, LA 70803 *The full moon is 0.5 deg. sq. when viewed form E arth, F ull S ky is ~ 400,000 deg. sq. G enerating m osaics ofthe sky 97G B 38G B 20G B 5.5GB 1.2GB Total data footprint 2 hrs.14 m ins 8,586 22,850 1,444 6 1hr46 m ins 4,856 13,061 747 4 49 m ins 1,444 3,906 212 2 6 hours 20,652 54,434 3,722 10 40 m ins 232 588 53 1 A pprox. execution tim e (20 procs) N um ber ofjobs N um berof Interm ediate files N um berof inputdata files Size ofthe m osaic is degrees square* BgM odel Project Project Project Diff Diff Fitplane Fitplane Background Background Background Add Image1 Image2 Image3 Pegasus Based on program m ing language principles Leverage abstraction forw orkflow description to obtain ease ofuse,scalability,and portability Provide a com pilerto m ap from high-level descriptions to executable w orkflow s Correctm apping Perform ance enhanced m apping R ely on a runtim e engine to carry outthe instructions Scalable m anner R eliable m anner DAG M an (D irected A cyclic G raph M A N ager) R uns w orkflow s thatcan be specified as D irected Acyclic G raphs Enforces D AG dependencies Progresses as faras possible in the face offailures Provides retries,throttling,etc. R uns on top ofC ondor(and is itselfa C ondorjob) A view of the Rho Oph dark cloud constructed with Montage from deep exposures made with the Two Micron All Sky Survey (2MASS) Extended Mission Pegasus W orkflow M apping O riginalw orkflow : 15 com pute nodes devoid ofresource assignm ent 4 1 8 5 10 9 13 12 15 R esulting w orkflow m apped onto 3 G rid sites: 11 com pute nodes (4 reduced based on available interm ediate data) 13 data stage-in nodes 8 inter-site data transfers 14 data stage-outnodes to long- term storage 14 data registration nodes (data cataloging) 9 4 8 3 7 10 13 12 15 60 jobs to execute R esulting w orkflow m apped onto 3 G rid sites: 11 com pute nodes (4 reduced based on available interm ediate data) 13 data stage-in nodes 8 inter-site data transfers 14 data stage-outnodes to long- term storage 14 data registration nodes (data cataloging) 9 4 8 3 7 10 13 12 15 R esulting w orkflow m apped onto 3 G rid sites: 11 com pute nodes (4 reduced based on available interm ediate data) 13 data stage-in nodes 8 inter-site data transfers 14 data stage-outnodes to long- term storage 14 data registration nodes (data cataloging) R esulting w orkflow m apped onto 3 G rid sites: 11 com pute nodes (4 reduced based on available interm ediate data) 13 data stage-in nodes 8 inter-site data transfers 14 data stage-outnodes to long- term storage 14 data registration nodes (data cataloging) R esulting w orkflow m apped onto 3 G rid sites: 11 com pute nodes (4 reduced based on available interm ediate data) 13 data stage-in nodes 8 inter-site data transfers 14 data stage-outnodes to long- term storage 14 data registration nodes (data cataloging) 9 4 8 3 7 10 13 12 15 9 4 8 3 7 10 13 12 15 60 jobs to execute The structure of a small Montage workflow Automatic Node clustering Two clusters per level Two tasks per c No clustering Level-based, clustering factor 5 0 1 10 100 1,000 10,000 100,000 1,000,000 42 43 44 45 49 50 51 2 3 4 5 7 8 25 26 27 29 30 31 34 41 W eek ofthe year2005-2006 jo b s /tim e in H ou jobs H ours SCEC CyberShake workflows run using Pegasus and DAGMan on the TeraGrid and USC resources Cumulatively, the workflows consisted of over half a million tasks and used over 2.5 CPU Years. The largest CyberShake workflow contained on the order of 100,000 nodes and accessed 10TB Support for LIGO on Open Science Grid LIGO Workflows: 185,000 nodes, 466,000 edges 10 TB of input data, 1 TB of output data. pegasus.isi.edu C ondorQ ueue LO CAL SUBM IT H O ST (C om m unity resource) D AG M an AbstractWorkflow (R esource-independent) Executable Workflow (R esources Identified) R eady Tasks Pegasus N ational C yberInfrastructure jobs inform ation 1 degree 2 Montage On TeraGrid Pegasus C an m ap portions ofw orkflow s ata tim e Supports the range ofjust-in-tim e to full-ahead mappings C an clusterw orkflow nodes to increase com putational granularity C an m inim ize the am ountofspace required forthe execution ofthe w orkflow D ynam ic data cleanup C an handle w orkflow s on the orderof100,000 tasks Supportfora variety offault-recovery techniques

description

Workflow Task Clustering for Best Effort Systems with Pegasus. pegasus.isi.edu. Gurmeet Singh, Mei-Hui Su, Karan Vahi Ewa Deelman, Gaurang Mehta Information Sciences Institute University of Southern California Marina del Rey, CA 90292. Bruce Berriman, John Good - PowerPoint PPT Presentation

Transcript of Workflow Task Clustering for Best Effort Systems with Pegasus

Page 1: Workflow Task Clustering  for Best Effort Systems with Pegasus

Workflow Task Clustering for Best Effort Systems with Pegasus

Gurmeet Singh, Mei-Hui Su, Karan VahiEwa Deelman, Gaurang MehtaInformation Sciences Institute

University of Southern CaliforniaMarina del Rey, CA 90292

Bruce Berriman, John GoodInfrared Processing and Analysis Center

California Institute of TechnologyPasadena, CA 91125

Daniel S. KatzCenter for Computation and Technology

Louisiana State UniversityBaton Rouge, LA 70803

*The full moon is 0.5 deg. sq. when viewed form Earth, Full Sky is ~ 400,000 deg. sq.

Generating mosaics of the sky

97GB

38GB

20GB

5.5GB

1.2GB

Total data footprint

2 hrs. 14 mins8,58622,8501,4446

1hr 46 mins4,85613,0617474

49 mins1,4443,9062122

6 hours20,65254,4343,72210

40 mins232588531

Approx. execution time (20 procs)

Number of jobs

Number of Intermediate files

Number of input data files

Size of the mosaic is degrees square*

97GB

38GB

20GB

5.5GB

1.2GB

Total data footprint

2 hrs. 14 mins8,58622,8501,4446

1hr 46 mins4,85613,0617474

49 mins1,4443,9062122

6 hours20,65254,4343,72210

40 mins232588531

Approx. execution time (20 procs)

Number of jobs

Number of Intermediate files

Number of input data files

Size of the mosaic is degrees square*

BgModel

Project

Project

Project

Diff

Diff

Fitplane

Fitplane

Background

Background

Background

Add

Image1

Image2

Image3

Pegasus

Based on programming language principles Leverage abstraction for workflow description to

obtain ease of use, scalability, and portability

Provide a compiler to map from high-level descriptions to executable workflows

Correct mapping

Performance enhanced mapping

Rely on a runtime engine to carry out the instructions

Scalable manner

Reliable manner

DAGMan (Directed Acyclic Graph MANager)

Runs workflows that can be specified as Directed Acyclic Graphs

Enforces DAG dependencies

Progresses as far as possible in the face of failures

Provides retries, throttling, etc.

Runs on top of Condor (and is itself a Condor job)

A view of the Rho Oph dark cloud constructed with Montage from deep exposures made with the Two Micron All Sky Survey (2MASS) Extended Mission

Pegasus Workflow Mapping

Original workflow: 15 compute nodesdevoid of resource assignment

41

85

10

9

13

12

15

Resulting workflow mapped onto 3 Grid sites:

11 compute nodes (4 reduced based on available intermediate data)

13 data stage-in nodes

8 inter-site data transfers

14 data stage-out nodes to long-term storage

14 data registration nodes (data cataloging)

9

4

837

10

13

12

15

60 jobs to execute

Resulting workflow mapped onto 3 Grid sites:

11 compute nodes (4 reduced based on available intermediate data)

13 data stage-in nodes

8 inter-site data transfers

14 data stage-out nodes to long-term storage

14 data registration nodes (data cataloging)

9

4

837

10

13

12

15

Resulting workflow mapped onto 3 Grid sites:

11 compute nodes (4 reduced based on available intermediate data)

13 data stage-in nodes

8 inter-site data transfers

14 data stage-out nodes to long-term storage

14 data registration nodes (data cataloging)

Resulting workflow mapped onto 3 Grid sites:

11 compute nodes (4 reduced based on available intermediate data)

13 data stage-in nodes

8 inter-site data transfers

14 data stage-out nodes to long-term storage

14 data registration nodes (data cataloging)

Resulting workflow mapped onto 3 Grid sites:

11 compute nodes (4 reduced based on available intermediate data)

13 data stage-in nodes

8 inter-site data transfers

14 data stage-out nodes to long-term storage

14 data registration nodes (data cataloging)

9

4

837

10

13

12

15

9

4

837

10

13

12

15

60 jobs to executeThe structure of a small Montage

workflow

Automatic Node clustering

Two clusters per level Two tasks per cluster

No clustering Level-based, clustering factor 5

0

1

10

100

1,000

10,000

100,000

1,000,000

42 43 44 45 49 50 51 2 3 4 5 7 8 25 26 27 29 30 31 34 41

Week of the year 2005-2006

job

s / t

ime

in

Ho

urs

jobs Hours SCEC CyberShake workflows run using Pegasus and DAGMan on the TeraGrid and USC resources

Cumulatively, the workflows consisted of over half a million tasks and used over 2.5 CPU Years.

The largest CyberShake workflow contained on the order of 100,000 nodes and accessed 10TB of data

Support for LIGO on Open Science GridLIGO Workflows:185,000 nodes, 466,000 edges10 TB of input data, 1 TB of output data.

pegasus.isi.edu

Condor QueueLOCAL SUBMIT HOST (Community resource)

DAGManAbstract Workflow

(Resource-independent)

Executable Workflow

(Resources Identified)

Ready Tasks

Pegasus

National CyberInfrastructure

jobsinformation

1 degree2

MontageOn TeraGrid

Pegasus

Can map portions of workflows at a time

Supports the range of just-in-time to full-ahead mappings

Can cluster workflow nodes to increase computational granularity

Can minimize the amount of space required for the execution of the workflow Dynamic data cleanup

Can handle workflows on the order of 100,000 tasks

Support for a variety of fault-recovery techniques