Ewa Deelman [email protected] Using Grid Technologies to Support Large-Scale Astronomy Applications...

26
Ewa Deelman [email protected] Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information Sciences Institute

Transcript of Ewa Deelman [email protected] Using Grid Technologies to Support Large-Scale Astronomy Applications...

Page 1: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Using Grid Technologies to Support Large-Scale Astronomy Applications

Ewa DeelmanCenter for Grid Technologies

USC Information Sciences Institute

Page 2: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Outline

• Large-scale applications• Mapping large-scale applications onto Grid

environments– Pegasus (developed by ISI under the GriPhyN

project)

• Supporting Montage (an image mosaicking application) on the Grid

• Recent results of running on the Teragrid • Other applications and conclusions

Page 3: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Acknowledgements

Pegasus• Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet

Singh, Mei-Hui Su, Karan Vahi (Center for Grid Technologies, ISI)

• James Blythe, Yolanda Gil (Intelligent Systems Division, ISI)• http://pegasus.isi.edu • Research funded as part of the NSF GriPhyN, NVO and SCEC

projects and EU-funded GridLab

Montage• Bruce Berriman, John Good, Anastasia Laity, IPAC• Joseph C. Jacob, Daniel S. Katz, JPL• http://montage.ipac.caltech.edu/ • Montage is funded by the NASA’s Earth Science Technology

Office, Computational Technologies Project, under Cooperative Agreement Number NCC5-626 between NASA and the California Institute of Technology.

Page 4: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Grid Applications

• Increasing in the level of complexity• Use of individual application components• Reuse of individual intermediate data products• Description of Data Products using Metadata Attributes

• Execution environment is complex and very dynamic– Resources are heterogeneous and distributed in the WAN– Resources come and go because of failure or policy

changes– Data is replicated– Components can be found at various locations or staged in

on demand

• Separation between – the application description – the actual execution description

Page 5: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Scientific AnalysisW

orkf

low

Evo

lutio

n Select the Input Data

Map the Workflow onto Available Resources

Execute the Workflow

Construct the Analysis

Workflow Template

Abstract Worfklow

Concrete Workflow

Tasks to be executed

Grid Resources

Page 6: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Execution EnvironmentScientific AnalysisW

orkf

low

Evo

lutio

n

Grid Resources

Select the Input Data

Map the Workflow onto Available Resources

Execute the Workflow

Information Services

Library of Application

Components

Data Catalogs

Construct the Analysis

Resource availability and characteristics

Tasks to be executed

Data properties

Component characteristics

Workflow Template

Abstract Worfklow

Concrete Workflow

Aut

omat

edU

ser

guid

ed

Page 7: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Why Automate Workflow Generation?

Usability: – Limit User’s necessary Grid knowledge

• Monitoring and Directory Service • Replica Location Service

Complexity: – User needs to make choices

• Alternative application components• Alternative files• Alternative locations

– The user may reach a dead end– Many different interdependencies may occur among components

Solution cost: – Evaluate the alternative solution costs

• Performance• Reliability• Resource Usage

Global cost: – minimizing cost within a community or a virtual organization – requires reasoning about individual user’s choices in light of other

user’s choices

Page 8: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Concrete Workflow Generation and Mapping

Input Data Selector

Compositional Analysis Tool

(CAT)

PegasusCondor

DAGManConcrete Workflow

Results

Workflow Template

Chimera

MontageAbstract Workflow Service

Abstract Workflow

Grid Resourcesjobs

Application-dependent

Application independent

Page 9: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Specifying abstract workflows

• Using GriPhyN Tools (Chimera)– Using the Chimera Virtual Data Language

• Writing the abstract workflow directly– Using scripts (write XML)

• Using high-level workflow composition tools– Component Analysis Tool (CAT), uses ontologies to

describe workflow components

TR galMorph( in redshift, in pixScale, in zeroPoint, in Ho, in om, in flat, in image, out galMorph ) { … }

Page 10: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Generating a Concrete Workflow

Information– location of files and component

Instances– State of the Grid resources

Select specific – Resources– Files– Add jobs required to form a

concrete workflow that can be executed in the Grid environment

• Data movement– Data registration– Each component in the abstract

workflow is turned into an executable job

FFT filea

/usr/local/bin/fft /home/file1

Move filea from host1://home/filea

to host2://home/file1

Abstract Workflow

Concrete Workflow

DataTransfer

Data Registration

Page 11: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Pegasus:Planning for Execution in Grids

• Maps from abstract to concrete workflow– Algorithmic and AI-based techniques

• Automatically locates physical locations for both components (transformations) and data

• Finds appropriate resources to execute• Reuses existing data products where applicable• Publishes newly derived data products

– Chimera virtual data catalog– Provides provenance information

Page 12: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Information Components used by Pegasus

• Globus Monitoring and Discovery Service (MDS)– Locates available resources– Finds resource properties

• Dynamic: load, queue length• Static: location of GridFTP server, RLS, etc

• Globus Replica Location Service– Locates data that may be replicated– Registers new data products

• Transformation Catalog– Locates installed executables

Page 13: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Example Workflow Reduction

• Original abstract workflow

• If “b” already exists (as determined by query to the RLS), the workflow can be reduced

d1 d2ba c

d2

b c

Page 14: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Mapping from abstract to concrete

• Query RLS, MDS, and TC, schedule computation and data movement

Execute d2 at B

Move b from A

to B

Move c from B

to U

Register c in the

RLS

d2b c

Page 15: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Condor’s DAGMan

• Developed at UW Madison (Livny)• Executes a concrete workflow• Makes sure the dependencies are followed• Executes the jobs specified in the workflow

– Execution– Data movement– Catalog updates

• Provides a “rescue DAG” in case of failure

Page 16: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

What is Montage?

• Delivers custom, science grade image mosaics – User specifies projection, coordinates, spatial sampling,

mosaic size, image rotation – Preserve astrometry & photometric accuracy– Modular “toolbox” design

• Loosely-coupled Engines for Image Reprojection, Background Rectification, Co-addition

– Control testing and maintenance costs– Flexibility; e.g custom background algorithm; use as a

reprojection and co-registration engine

• Public service will be deployed on the Teragrid– Order mosaics through web portal

Page 17: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Region Name, Degrees

Pegasus

Concrete Workflow

Condor DAGMAN

TeraGrid Clusters

SDSC

NCSA

ISI Condor Pool

AbstractWorkflow

User Portal

AbstractWorkflowService

2MASSImage ListService

Grid Schedulingand Execution

Service

UserNotification

Service

ComputationalGrid

mDAGFiles

m2MASSList

mGridExec

ImageList

DAGMan

mNotif y

J PL

J PL

I PAC I PAC

I SIAbstractWorkflow

Montage Portal

Page 18: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Small Montage Workflow

~1200 nodes

Page 19: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Mosaic of M42 created on the Teragrid resources using Pegasus

Page 20: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Node Clustering for Performance (Gurmeet Singh,

ISI)

mProject

mDiff

mFitplane

mConcatFit

mBgModel

mBackground

mAdd

Overheads are incurred when scheduling individual nodes of the workflow

One way to look at the workflow is by level and then cluster jobs within the level and destined for the same host

You can construct as many clusters as there are available processors for example

Page 21: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Total time (in minutes) for executing the concrete workflow for creating a mosaic covering 6× 6degrees2 region centered at M16.

Page 22: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Total time taken (in minutes) for executing the concrete workflow as the size of the desired mosaic increases from 1×1 degree2 to 10×10 degree2 centered at M16.

Number of nodes inThe abstract workflow

64 processors used

Page 23: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Benefits of the workflow & Pegasus approach

• The workflow exposes – the structure of the application– maximum parallelism of the application

• Pegasus can take advantage of the structure to– Set a planning horizon (how far into the workflow to plan)– Cluster a set of workflow nodes to be executed as one

• Pegasus shields from the Grid details• Pegasus can run the workflow on a variety of resources • Pegasus can run a single workflow across multiple resources• Pegasus can opportunistically take advantage of available

resources (through dynamic workflow mapping) • Pegasus can take advantage of pre-existing intermediate

data products• Pegasus can improve the performance of the application.

Page 24: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Applications Using Pegasus and DAGMan

• GriPhyN applications:– High-energy physics: Atlas, CMS (many)– Astronomy: SDSS (Fermi Lab, ANL)– Gravitational-wave physics: LIGO (Caltech, AEI)

• Astronomy:– Galaxy Morphology (NCSA, JHU, Fermi, many others, NVO-funded)– Montage (IPAC, JPL, NASA-funded)

• Biology– BLAST (ANL, PDQ-funded)

• Neuroscience– Tomography for Telescience(SDSC, NIH-funded)

• Earthquake Science– Simulation of earthquake propagation in soil (in the Southern

California area – SCEC)

Page 25: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

Future directions

• Improving scheduling strategies• Supporting the Pegasus framework through

pluggable interfaces for resource and data selection

• Support for staging in executables on demand• Supporting better space and resource

management (space and compute node reservation)

• Reliability

Page 26: Ewa Deelman deelman@isi.edu Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.

Ewa Deelman [email protected]

For more information

• NVO project www.us-vo.org• GriPhyN project www.griphyn.org

– Virtual Data Toolkit www.cs.wisc.edu/vdt

• Montage montage.ipac.caltech.edu (IRSA Booth)• Pegasus pegasus.isi.edu • My website www.isi.edu/~deelman