Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of...

36

Transcript of Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of...

Page 1: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.
Page 2: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Thomas HuangPO.DAAC Software System Engineer

Jet Propulsion LaboratoryCalifornia Institute of Technology

These activities were carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space

Administration. © 2010 California Institute of Technology. Government sponsorship acknowledged.

Page 3: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

About me…

PO.DAAC Software System Engineer and Architect of its Data Management and Archive System

Background in planetary data management, secure near real-time distribution systems

Huang - 01062010

Page 4: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

OutlinePattern for data ingestion to distributionOur legacy data systemThe new PO.DAAC Data Management and

Archive SystemConclusionQ&A

Huang - 01062010

Page 5: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Simple Pattern

Huang - 01062010

Page 6: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Can All These Broken Pieces Fit?

Huang - 01062010

Page 7: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Legacy Data Systems

Huang - 01062010

… It Works!?

3 different data systems according to the simple pattern

Deployed in multiple instances

Mostly consists of one-off scripts

Limited reusability Limited portability Scalability? Reliability?

Page 8: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

stovepipe

Legacy Data Systems

Huang - 01062010

Page 9: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Our N

ew

Data M

anagement and Archive System

Huang - 01062010

Page 10: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Huang - 01062010

Software Development Process

Page 11: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Technologies and Standards

Huang - 01062010

Page 12: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Documents

Huang - 01062010

Page 13: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Architecture A system of RESTful

services Standardized messages

exchange between services

Unified data model Distributed data ingestion

services Standardized event

tracking and notification service

Huang - 01062010

Page 14: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Manager Webservice

• Transaction-Oriented

• Load-Balanced job assignment

• On-The-Fly Deployment of

Engines

• Dynamic support of new data

product

• State-Driven Product

Management

• Resource Management

• Transaction-Oriented

• Load-Balanced job assignment

• On-The-Fly Deployment of

Engines

• Dynamic support of new data

product

• State-Driven Product

Management

• Resource Management

RESTful

Huang - 01062010

Page 15: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

File Management Engines

RESTful

• Lightweight RESTful file service

• Supports typical file operations (add,

move, delete, etc.)

• A single instance can carryout

multiple granule operations in

parallel

• Supports various file protocols (FTP,

SFTP, FILE, HTTP… etc.)

• Tracks and limits the number of jobs

it can handle

• Trans and limits the number of

outbound communications

• Typical instances: ingest, archive,

and purge

• Lightweight RESTful file service

• Supports typical file operations (add,

move, delete, etc.)

• A single instance can carryout

multiple granule operations in

parallel

• Supports various file protocols (FTP,

SFTP, FILE, HTTP… etc.)

• Tracks and limits the number of jobs

it can handle

• Trans and limits the number of

outbound communications

• Typical instances: ingest, archive,

and purge

Huang - 01062010

Page 16: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Product Inventory• Unified Metadata Data Model

• References applicable models

(e.g. ISO 19115, DIF, DIF, ECHO,

GCMD…)

• Extensible to support capturing

of collection/dataset/granule-

specific data attributes

• Support geospatial data

• Support project-specific data

archive and distribution policies

• Unified Metadata Data Model

• References applicable models

(e.g. ISO 19115, DIF, DIF, ECHO,

GCMD…)

• Extensible to support capturing

of collection/dataset/granule-

specific data attributes

• Support geospatial data

• Support project-specific data

archive and distribution policies

Huang - 01062010

Page 17: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Data Handlers

An application framework Plugin interface for product-

specific metadata handling and validation

Transforming product metadata into internal Submission Information Package (SIP)

Data discovery Local caching of data products

Huang - 01062010

Page 18: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Data Handlers - GHRSST

• Adaptation

– MMR validation and translation

– Data file validation

– Scans local/remote locations for new data

– Integration with back-end RDAC cluster

• Inventory

– Full migration from existing MySQL database

• Port to use the new data model

– FGDC and Index generators

– Website

• Adaptation

– MMR validation and translation

– Data file validation

– Scans local/remote locations for new data

– Integration with back-end RDAC cluster

• Inventory

– Full migration from existing MySQL database

• Port to use the new data model

– FGDC and Index generators

– Website

Huang - 01062010

The Group for High-Resolution Sea Surface Temperature (GHRSST)

Ingest and maintain interfaces to 52 GHRSST L2P/L3P/L4 datastreams from 10 Regional Data Assembly Center (RDAC)

~25GB/day >5000 granules/day

Realtime quality checking for data and metadata granules

Create Federal Geographic Data Committee metadata for daily collection granules

Distribution via FTP/OPeNDAP/POET

Maintain interfaces to the LTSRF for 30-day old data and metadata exchange

Page 19: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Data Handlers - ASCAT

• Adaptation

– Metadata validation and translation

– Data file validation

– Scans remote locations for new data

Dataset definition and policies

• Adaptation

– Metadata validation and translation

– Data file validation

– Scans remote locations for new data

Dataset definition and policiesHuang - 01062010

The Advanced SCATterometer (ASCAT)

Ingest and maintain interfaces to 2 L2 datastreams KNMI

~57 MB/day ~21 GB/year

Page 20: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Significant Event WS

Huang - 01062010

Page 21: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Significant Event Web

Huang - 01062010

Page 22: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

DAAC in a Box?

Huang - 01062010

Page 23: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

“premature optimization is the root of all evil.”

Donald Knuth“The Art of Computer Programming”

Huang - 01062010

Page 24: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Ingest

3 (36 parallel jobs)

Archive

3 (36 parallel jobs)

Purge

2 (20 parallel jobs)

21,254granules/day

4 seconds/granule

21,254granules/day

4 seconds/granule

Implementation

Optimization

Database Performance

Turning

Implementation

Optimization

Database Performance

Turning

Sample Performance

Huang - 01062010

Page 25: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

ConclusionPO.DAAC DMAS

A system of RESTful webservicesScalablePortableExtensibleOperationally supports GHRSST and ASCAT

Future worksNew products: AquariusGHRSST GDS 2.0 metadata modelMigrationData subscription Administration tools

Huang - 01062010

Page 26: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Huang - 01062010

Page 27: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

BACKUP SLIDES

Page 28: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

FY ‘09 HighlightsWebservice ArchitectureData Ingestion and Archive WSDistributed Ingestion/Archive

EnginesLoad BalancingService MonitoringSignificant Event WSSuite of reusable componentsECHO publication Dataset and

Granule metadataGHRSSTASCAT L2

ASCAT

Huang - 09022009

Page 29: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Product SubscriptionEnable implementation of value-added

services

Page 30: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Archive ToolsMetadata Distribution

Page 31: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

… can we build a data system with all these characteristics?

Scalable

Simple

Speed

Standardiz

e

Our Challenge

Huang - 09022009

Page 32: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

• Load-Balance• Transaction-Oriented• On-The-Fly Deployment of

Engines• Dynamic support of new

data product• Scalable• State-Driven Job

Management

• Load-Balance• Transaction-Oriented• On-The-Fly Deployment of

Engines• Dynamic support of new

data product• Scalable• State-Driven Job

Management

DMAS – Ingestion and Archive Service

Huang - 09022009

Page 33: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

DMAS – Significant Event Service

Huang - 09022009

Page 34: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Swath Tiler

MetadataSubmission

• Dataset subscriber• Trigger by newly

archived granules• Dispatch swath

tiling program• Submit tiling

metadata to NAIAD WS

• Dataset subscriber• Trigger by newly

archived granules• Dispatch swath

tiling program• Submit tiling

metadata to NAIAD WS

DMAS – Data SubscriberIntegration with NAIAD

Huang - 09022009

Page 35: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

DMAS GoalsService tools

administrationproduct rolloutcontact management

New data subscription capabilityMaking DMAS the data hub - RSS feed,

automatic delivery of new granule, thumbnail generation… etc.

New dataset search capabilityevaluating VODC – ACCESS program

New data productsLegacy migration supportPlanning 4 DMAS releases

FY ’10

2System

Releases

(DMAS + T&S)

Huang - 09022009

Page 36: Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet.

Configuration ManagementHow to management

versions of third-party softwaredependency matrixupgrade to one or more third-party software

Standard development process between development teams change management software packaging dependency management

Standard build and deployment process

FY ’10

CM?

Huang - 09022009