NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May...

20
NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006

Transcript of NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May...

Page 1: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

NASA GSFC Landsat Data Continuity Mission (LDCM)

Grid Prototype (LGP)

Beth WeinsteinNASA GSFC

May 8, 2006

Page 2: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

LDCM Grid Prototype (LGP) Introduction

A Grid infrastructure allows scientists at resource-poor sites access to remote resource-rich sites• Enables greater scientific research• Maximizes existing resources• Limits the expense of building new facilities

The objective of the LDCM Grid Prototype (LGP) is to assess the applicability and effectiveness of a data grid to serve as the infrastructure for research scientists to generate Landsat-like data products

Page 3: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

LGP Milestones

Capability 1 (C1) (12/03 - 12/04)• Demonstrated a basic grid infrastructure to enable a

science user to run their program on a specified resource in a virtual organization

• Virtual organization (VO) included GSFC labs and USGS EROS resources

• Basic Globus Toolkit 2.4 (e.g. GSI, GridFTP, GRAM)Capability 2 (C2) (12/04 - 9/05)• Demonstrated an expanded grid infrastructure to allow the

dynamic allocation of resources to enable a specific science application

• VO included NASA GSFC labs, USGS EROS, University of Maryland (UMD)

• Workflow enabledNASA ROSES ACCESS A.26 (1/06 – 1/08)• Land Cover Change Processing and Analysis System: LC-

ComPS

Page 4: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Capability 1 Science Scenario

LEDAPSL7ESR

MODIS MOD09GHK

Page 5: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Capability 1 Summary

Prepare two heterogeneous data sets at different remote locations for like “footprint” comparison from a science user’s home site• The MODIS Reprojection Tool (MRT) serves as our

“typical science application” developed at the science users site (GSFC Building 32 in demo)•mrtmosaic and resample (subset and reproject)•Operates on MODIS and LEDAPS (Landsat surface

reflectance) scenes• Data distributed at remote facilities

•NASA GSFC Building 23 (MODIS scenes)•USGS EROS (LEDAPS scenes)

Solves a realistic scientific scenario using grid-enabled resources

Page 6: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Capability 2 Science Scenario

Landsat Scene 1

Path/Row: 182/61Date:

2/12/2002

2002 182/61 Composite

Landsat Scene 2

Path/Row: 182/61Date:

6/4/2002

Page 7: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Capability 2 Summary

Create direct reflectance composite products using Landsat dataBlender Task 1 scenario and modules were contributed by Jeff Masek and Feng GaoModules• lndcal - calibration• lndcsm – cloud shadow mask• lndsr – surface reflectance• lndreg - registration• lndcom - composite

Input data• Up to 5 Landsat scenes: spatially coincident• GSFC ancillary data:

•TOMS (ozone)•Reanalysis (Water Vapor)

Output data: 1 LEDAPS/Blender composite scene

Page 8: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

lndsr

2001 Landsat Scene 2

lndcal

lndcsm

lndsr

lndreg

lndcom

lndcal

lndcsm

lndsr

30m resolution 2001 composite product (single path-row)

Capability 2 Scenario

EROS Pool

2001 Landsat Scene 1

lndcal

lndcsm

2001 Landsat Scene 3

lndcal

lndcsm

lndsr

2001 Landsat Scene 4

ancillary inputs

lndreg lndreg

ancillary inputs

ancillary inputs

ancillary inputs

Page 9: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Capability 2 Virtual Organization

GSFC SEN1Gbps

Backbone

Edclxs66 (2)USGS EROS

Sioux Falls, SD

LGP23 (4)GSFC

B23/W316

LGP32 (2)Science User_1

GSFCB32/C101

Capability 3

MAX (College Park)OC48, 2.4Gbps

Backbone

USGS EROS1 Gbps

Backbone

vBNS+ (Chicago)OC48, 2.4Gbps

Backbone

1 Gbps

1 Gbps

SEN: Science and Engineering NetworkMAX: Mid-Atlantic CrossroadsDREN: Defense Research and Engineering NetworkvBNS+: Very high Performance Network Service

OC12, 622 Mbps

OC12, 622 MbpsShared with DREN

1 GbpsUSGS EROS

GSFC

UMD1 (2)UMD

College ParkMacCl23 (12)

GSFCB23/W316

1 Gbps

1 Gbps

Capability 2

Page 10: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Capability 2 Grid Workflow

In Capability 1, jobs were run on a specific resourceIn Capability 2, workflow provided the ability to submit a job to the “Grid” (VO)• Leverage distributed resource sharing and

collaboration on a large-scale• Grid resource management

•Automatic allocation of grid resources•Sub task management

• Reliable job completion• Leverage idle cpu cycles

Page 11: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Java CoG 4_0_a1

Capability 2 Workflow Software: Karajan

Karajan provides grid workflow functions• Includes task management

language and an execution engine

• Integrated with the Java Commodity Grid (CoG) Kit

• Includes a task scheduler•Runs gridExecute and gridTransfer

tasks on grid resources•Manages both local and remote

resources• Specifies workflow using XML• Supplies command line and GUI

interfaces

Karajan

Globus Toolkit 2.4.3

GlobusGate

keeperGRAMGridFTP

Karajan – Globus Grid Architecture

Page 12: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

User Configuration File Specification

User creates product.spec configuration filePath, row, and acquisition date provided for each input scene

# product.spec example file

host: edclxs66.cr.usgs.govbase_directory: /data/LEDAPS182 062 20010719 base182 062 20030215182 062 20040218182 062 20040609-# default to host and base_directory specified above182 061 20020212 base182 061 20020604182 061 20040101182 061 20040218182 061 20040711-

Page 13: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Copy_output

Capability 2 Architecture

lndpm

lndcsm

lndsr

lndreg

create_composite1.xml

<parallel>

<sequential>

Scene 1 <path, row, acqDate>

Base Scene

lndcom

Scene 2<path, row, acqDate>

driver.pl

Karajan

lndcal

lndpm

lndcsm

lndsr

lndcal

lndpm

lndcsm

lndsr

lndreg

create_composite2.xml<sequential>

Scene 1 <path, row, acqDate>

Base Scene

lndcom

Scene 2<path, row, acqDate>

lndcal

lndpm

lndcsm

lndsr

lndcal

Copy_output

Host 1 Host 2Host …

driver.xmlProduct.spec

Page 14: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Capability 2 Performance

Processing benchmarks:

*Each composite had 4 input scenes

Transfer rates• File Transfer using 8 parallel streams

• Raw Data Files (TIF) 57 Mb in 45-50 Sec. (~ 1.26 Mbps)• Final Output File (HDF) 1.25 Gb in 5 Minutes (~ 4 Mbps)

• Conclusion: Larger files are more efficient Data Host Remote Host

File Transfer 7 % 9%CPU Processing 93 % 91%

# of composites* Time to process

8 3 hours

16 5 hours 36 minutes

32 (2 16 parallel runs) 11 hours 46 minutes

48 12 hours 50 minutes

Page 15: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Performance Research and Potential Plans

Benchmarked processing rates for producing up to 50 output scenesCompleted initial analysis of transfer and processing rates obtained using Netlogger• Netlogger provides the ability to monitor applications within a

complex distributed environment in order to determine exactly where time is spent

Room for Optimization• Analyze process flows to optimize running in operational setting

and implement optimization strategies below• Complete input file compression on data host prior to file transfer• Increase the parallelization

• Parallel runs of multiple input scenes for a single composite• Parallel file transfer

• Add more CPUs and maximize CPU utilization• Look at error handling and possibility of automatic re-starting of

jobs

Page 16: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

LGP Lessons Learned

The Open Source environment can be very beneficial• Reuse, Collaboration incentive• “Hardened software” (i.e. GSI)

A surprising amount of time was spent on basic network administration and security• Network performance• Firewall/ports

Maintaining configuration management across independent agencies and centers is difficult• MapCenter - System status tool (QA/Calibration)

Understanding the processing flow and modules required for optimization• Once size doesn’t fit all (at least not yet)• Allow for remote processing; dynamic ancillary data• CPU intensive vs. data intensive

Karajan is somewhat immature, but we have passed on requests to CoG developers • Karajan does provide the basic framework for creating workflows in an operational

setting. Functionality not provided by the basic framework is being provided by external wrapper scripts

• Developed workaround to pass environment variables across processing runs• Provided wrapper script to pass arguments to underlying Globus executables

• Very elementary Job Scheduler

Page 17: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Current and Future Work

LDCM Grid Prototype work will continue Receiving NASA ROSES ACCESS A.26 funding for Land Cover Change Processing and Analysis System (LC-ComPS)Use grid technology to allow regional and continental-scale land cover analysis at high resolution• Use Globus 4.0 as the underlying Grid

infrastructure• Improve error handling in the workflow scripts

and handle automatic re-starting of tasks in the event of failures

• Expand the “pool” of machines in VO

Page 18: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Backup Slides

Page 19: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Acknowledgements

Sponsors• LDCM - Bill Ochs, Matt Schwaller• Code 500/580 - Peter Hughes,

Julie Loftis

LGP Team members• Jeff Lubelczyk (Lead)• Gail McConaughy (Branch

Principal)• Beth Weinstein (SW Lead)• Ben Kobler (HW, Networks)• Eunice Eng (SW Dev, Data)• Valerie Ward (SW Dev, Apps)• Ananth Rao ([SGT] SW Arch/Dev,

Grid Expert)• Brooks Davis ([Aerospace Corp]

Grid Expert)• Wayne Yu ([QSS] Sys Admin)

GSFC Science Input• Jeff Masek (Blender)• Feng Gao (Blender)

USGS EROS• Stu Doescher (Mgmt)• Chris Doescher (POC)• John Dwyer• Tom Mcelroy• Mike Neiers (Sys Support)• Cory Ranschau (Sys

Admin)

University of Maryland (UMD)• Paul Davis• Gary Jackson

Page 20: NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May 8, 2006.

Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS

Acronym List

ACCESS Advancing Collaborative Connections for Earth-Sun System Science

EROS Earth Resources Observation and ScienceFTP File Transfer ProtocolGASS Globus Access to Secondary StorageGRAM Grid Resource Allocation & ManagementGSI Grid Security InfrastructureLC-ComPS Land Cover Change Processing and Analysis SystemLDCM Landsat Data Continuity MissionLEDAPS Landsat Ecosystem Disturbance Analysis Adaptive

Processing System LGP LDCM Grid PrototypeLP DAAC Land Processes Distributed Active Archive CenterMDS Monitoring & Discovery System (MDS)MODIS Moderate Resolution Imaging SpectroradiometerMRT MODIS Reprojection ToolROSES Research Opportunities in Space and Earth Sciences