NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May...
-
Upload
magnus-powers -
Category
Documents
-
view
216 -
download
2
Transcript of NASA GSFC Landsat Data Continuity Mission (LDCM) Grid Prototype (LGP) Beth Weinstein NASA GSFC May...
NASA GSFC Landsat Data Continuity Mission (LDCM)
Grid Prototype (LGP)
Beth WeinsteinNASA GSFC
May 8, 2006
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
LDCM Grid Prototype (LGP) Introduction
A Grid infrastructure allows scientists at resource-poor sites access to remote resource-rich sites• Enables greater scientific research• Maximizes existing resources• Limits the expense of building new facilities
The objective of the LDCM Grid Prototype (LGP) is to assess the applicability and effectiveness of a data grid to serve as the infrastructure for research scientists to generate Landsat-like data products
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
LGP Milestones
Capability 1 (C1) (12/03 - 12/04)• Demonstrated a basic grid infrastructure to enable a
science user to run their program on a specified resource in a virtual organization
• Virtual organization (VO) included GSFC labs and USGS EROS resources
• Basic Globus Toolkit 2.4 (e.g. GSI, GridFTP, GRAM)Capability 2 (C2) (12/04 - 9/05)• Demonstrated an expanded grid infrastructure to allow the
dynamic allocation of resources to enable a specific science application
• VO included NASA GSFC labs, USGS EROS, University of Maryland (UMD)
• Workflow enabledNASA ROSES ACCESS A.26 (1/06 – 1/08)• Land Cover Change Processing and Analysis System: LC-
ComPS
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Capability 1 Science Scenario
LEDAPSL7ESR
MODIS MOD09GHK
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Capability 1 Summary
Prepare two heterogeneous data sets at different remote locations for like “footprint” comparison from a science user’s home site• The MODIS Reprojection Tool (MRT) serves as our
“typical science application” developed at the science users site (GSFC Building 32 in demo)•mrtmosaic and resample (subset and reproject)•Operates on MODIS and LEDAPS (Landsat surface
reflectance) scenes• Data distributed at remote facilities
•NASA GSFC Building 23 (MODIS scenes)•USGS EROS (LEDAPS scenes)
Solves a realistic scientific scenario using grid-enabled resources
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Capability 2 Science Scenario
Landsat Scene 1
Path/Row: 182/61Date:
2/12/2002
2002 182/61 Composite
Landsat Scene 2
Path/Row: 182/61Date:
6/4/2002
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Capability 2 Summary
Create direct reflectance composite products using Landsat dataBlender Task 1 scenario and modules were contributed by Jeff Masek and Feng GaoModules• lndcal - calibration• lndcsm – cloud shadow mask• lndsr – surface reflectance• lndreg - registration• lndcom - composite
Input data• Up to 5 Landsat scenes: spatially coincident• GSFC ancillary data:
•TOMS (ozone)•Reanalysis (Water Vapor)
Output data: 1 LEDAPS/Blender composite scene
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
lndsr
2001 Landsat Scene 2
lndcal
lndcsm
lndsr
lndreg
lndcom
lndcal
lndcsm
lndsr
30m resolution 2001 composite product (single path-row)
Capability 2 Scenario
EROS Pool
2001 Landsat Scene 1
lndcal
lndcsm
2001 Landsat Scene 3
lndcal
lndcsm
lndsr
2001 Landsat Scene 4
ancillary inputs
lndreg lndreg
ancillary inputs
ancillary inputs
ancillary inputs
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Capability 2 Virtual Organization
GSFC SEN1Gbps
Backbone
Edclxs66 (2)USGS EROS
Sioux Falls, SD
LGP23 (4)GSFC
B23/W316
LGP32 (2)Science User_1
GSFCB32/C101
Capability 3
MAX (College Park)OC48, 2.4Gbps
Backbone
USGS EROS1 Gbps
Backbone
vBNS+ (Chicago)OC48, 2.4Gbps
Backbone
1 Gbps
1 Gbps
SEN: Science and Engineering NetworkMAX: Mid-Atlantic CrossroadsDREN: Defense Research and Engineering NetworkvBNS+: Very high Performance Network Service
OC12, 622 Mbps
OC12, 622 MbpsShared with DREN
1 GbpsUSGS EROS
GSFC
UMD1 (2)UMD
College ParkMacCl23 (12)
GSFCB23/W316
1 Gbps
1 Gbps
Capability 2
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Capability 2 Grid Workflow
In Capability 1, jobs were run on a specific resourceIn Capability 2, workflow provided the ability to submit a job to the “Grid” (VO)• Leverage distributed resource sharing and
collaboration on a large-scale• Grid resource management
•Automatic allocation of grid resources•Sub task management
• Reliable job completion• Leverage idle cpu cycles
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Java CoG 4_0_a1
Capability 2 Workflow Software: Karajan
Karajan provides grid workflow functions• Includes task management
language and an execution engine
• Integrated with the Java Commodity Grid (CoG) Kit
• Includes a task scheduler•Runs gridExecute and gridTransfer
tasks on grid resources•Manages both local and remote
resources• Specifies workflow using XML• Supplies command line and GUI
interfaces
Karajan
Globus Toolkit 2.4.3
GlobusGate
keeperGRAMGridFTP
Karajan – Globus Grid Architecture
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
User Configuration File Specification
User creates product.spec configuration filePath, row, and acquisition date provided for each input scene
# product.spec example file
host: edclxs66.cr.usgs.govbase_directory: /data/LEDAPS182 062 20010719 base182 062 20030215182 062 20040218182 062 20040609-# default to host and base_directory specified above182 061 20020212 base182 061 20020604182 061 20040101182 061 20040218182 061 20040711-
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Copy_output
Capability 2 Architecture
lndpm
lndcsm
lndsr
lndreg
create_composite1.xml
<parallel>
<sequential>
Scene 1 <path, row, acqDate>
Base Scene
lndcom
Scene 2<path, row, acqDate>
driver.pl
Karajan
lndcal
lndpm
lndcsm
lndsr
lndcal
…
…
lndpm
lndcsm
lndsr
lndreg
create_composite2.xml<sequential>
Scene 1 <path, row, acqDate>
Base Scene
lndcom
Scene 2<path, row, acqDate>
lndcal
lndpm
lndcsm
lndsr
lndcal
…
Copy_output
Host 1 Host 2Host …
driver.xmlProduct.spec
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Capability 2 Performance
Processing benchmarks:
*Each composite had 4 input scenes
Transfer rates• File Transfer using 8 parallel streams
• Raw Data Files (TIF) 57 Mb in 45-50 Sec. (~ 1.26 Mbps)• Final Output File (HDF) 1.25 Gb in 5 Minutes (~ 4 Mbps)
• Conclusion: Larger files are more efficient Data Host Remote Host
File Transfer 7 % 9%CPU Processing 93 % 91%
# of composites* Time to process
8 3 hours
16 5 hours 36 minutes
32 (2 16 parallel runs) 11 hours 46 minutes
48 12 hours 50 minutes
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Performance Research and Potential Plans
Benchmarked processing rates for producing up to 50 output scenesCompleted initial analysis of transfer and processing rates obtained using Netlogger• Netlogger provides the ability to monitor applications within a
complex distributed environment in order to determine exactly where time is spent
Room for Optimization• Analyze process flows to optimize running in operational setting
and implement optimization strategies below• Complete input file compression on data host prior to file transfer• Increase the parallelization
• Parallel runs of multiple input scenes for a single composite• Parallel file transfer
• Add more CPUs and maximize CPU utilization• Look at error handling and possibility of automatic re-starting of
jobs
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
LGP Lessons Learned
The Open Source environment can be very beneficial• Reuse, Collaboration incentive• “Hardened software” (i.e. GSI)
A surprising amount of time was spent on basic network administration and security• Network performance• Firewall/ports
Maintaining configuration management across independent agencies and centers is difficult• MapCenter - System status tool (QA/Calibration)
Understanding the processing flow and modules required for optimization• Once size doesn’t fit all (at least not yet)• Allow for remote processing; dynamic ancillary data• CPU intensive vs. data intensive
Karajan is somewhat immature, but we have passed on requests to CoG developers • Karajan does provide the basic framework for creating workflows in an operational
setting. Functionality not provided by the basic framework is being provided by external wrapper scripts
• Developed workaround to pass environment variables across processing runs• Provided wrapper script to pass arguments to underlying Globus executables
• Very elementary Job Scheduler
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Current and Future Work
LDCM Grid Prototype work will continue Receiving NASA ROSES ACCESS A.26 funding for Land Cover Change Processing and Analysis System (LC-ComPS)Use grid technology to allow regional and continental-scale land cover analysis at high resolution• Use Globus 4.0 as the underlying Grid
infrastructure• Improve error handling in the workflow scripts
and handle automatic re-starting of tasks in the event of failures
• Expand the “pool” of machines in VO
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Backup Slides
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Acknowledgements
Sponsors• LDCM - Bill Ochs, Matt Schwaller• Code 500/580 - Peter Hughes,
Julie Loftis
LGP Team members• Jeff Lubelczyk (Lead)• Gail McConaughy (Branch
Principal)• Beth Weinstein (SW Lead)• Ben Kobler (HW, Networks)• Eunice Eng (SW Dev, Data)• Valerie Ward (SW Dev, Apps)• Ananth Rao ([SGT] SW Arch/Dev,
Grid Expert)• Brooks Davis ([Aerospace Corp]
Grid Expert)• Wayne Yu ([QSS] Sys Admin)
GSFC Science Input• Jeff Masek (Blender)• Feng Gao (Blender)
USGS EROS• Stu Doescher (Mgmt)• Chris Doescher (POC)• John Dwyer• Tom Mcelroy• Mike Neiers (Sys Support)• Cory Ranschau (Sys
Admin)
University of Maryland (UMD)• Paul Davis• Gary Jackson
Sponsored by NASA LDCM, NASA/GSFC Code 580 Team: 586/585/SGT/QSS/Aerospace Corp/USGS EROS
Acronym List
ACCESS Advancing Collaborative Connections for Earth-Sun System Science
EROS Earth Resources Observation and ScienceFTP File Transfer ProtocolGASS Globus Access to Secondary StorageGRAM Grid Resource Allocation & ManagementGSI Grid Security InfrastructureLC-ComPS Land Cover Change Processing and Analysis SystemLDCM Landsat Data Continuity MissionLEDAPS Landsat Ecosystem Disturbance Analysis Adaptive
Processing System LGP LDCM Grid PrototypeLP DAAC Land Processes Distributed Active Archive CenterMDS Monitoring & Discovery System (MDS)MODIS Moderate Resolution Imaging SpectroradiometerMRT MODIS Reprojection ToolROSES Research Opportunities in Space and Earth Sciences