Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented...

40
Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer, Planetary Data System, Co Investigator, Object Oriented Data Technology Task Thuy Tran, Senior Software Engineer, Enterprise Data Architecture and Object Oriented Data Technology Tasks Jet Propulsion Laboratory, California Institute of Technology National Aeronautics and Space Administration Interoperability and Data Architecture for Metadata Development Biomarkers Knowledge System Meeting Bethesda, MD September 8, 2000

Transcript of Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented...

Page 1: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Dan Crichton, Manager, Enterprise Data Architecture Task,

Principal Investigator Object Oriented Data Technology Task

Steve Hughes, Lead System Engineer, Planetary Data System,

Co Investigator, Object Oriented Data Technology Task

Thuy Tran, Senior Software Engineer, Enterprise Data Architecture and

Object Oriented Data Technology Tasks

Jet Propulsion Laboratory, California Institute of Technology

National Aeronautics and Space Administration

Interoperability and Data Architecture for Metadata Development

Biomarkers Knowledge System MeetingBethesda, MD

September 8, 2000

Page 2: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 2

Outline

Describe JPL and enterprise computing

Define an Enterprise Data Architecture

Define why an EDA is Critical to NASA

Address Data Interoperability Challenges

Describe the Object Oriented Data Technology task

Case Study: The Planetary Data System

Provide an Overview of PDS and Objectives

Describe the PDS Organizational Structure

What has the PDS accomplished

Discuss the role of Metadata within the PDS

Demo search of PDS data sets and PTI data sets

D. Crichton

S. Hughes

T. Tran

Page 3: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 3

About JPL and Enterprise Computing

JPL is a federally funded research and development center (FFRDC) run by Caltech for NASA

JPL is NASA’s lead center for robotic exploration of the universe

JPL has an enormous amount of data that it needs to manage from scientific data, to engineering, to institutional

We represent several efforts in both the research and enterprise side of JPL that is addressing enterprise architectures for integrating data at both JPL and NASA. Such efforts include

Knowledge Management

Enterprise Data Architecture Task

Planetary Data System

Object Oriented Data Technology

Page 4: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 4

What is an Enterprise Data Architecture?

An enterprise data architecture provides the infrastructure necessary to enable development of interoperable, enterprise-wide applications Data Interoperability

Data Sharing

Data Access Facilitate access

Reduce complexity

Data Management Data Archiving

Basic infrastructure to support knowledge discovery

Page 5: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 5

Why is an EDA Critical to NASA?

Interoperability is an important key to unlock knowledge discovery Allows scientists the ability to locate critical information Enables knowledge management across NASA A key to scientific discovery

State of data systems at NASA agency wide Difficult to access (no standard interfaces) Geographically distributed Have no standard language or protocol for interchange (no

EDI) agency wide No common metadata language agency wide Have no system for registration of data products Have little or no interoperability Have few common terms for describing data

Page 6: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 6

Interoperability Challenges and Needs

Space scientists can not easily locate or use data across the hundreds if not thousands of autonomous, heterogeneous, and distributed data systems currently in the Space Science community.

Heterogeneous Systems Data Management - RDBMS, ODBMS, Home Grown DBMS, Binary Files

Platforms - UNIX, Linux, WIN3.x/9x/NT, Mac, VMS, …

Interfaces - Web, Windows, Command Line

Data Formats - HDF, CDF, NetCDF, PDS, FITS, VICR, ASCII, ...

Data Volume - KiloBytes to TeraBytes

Heterogeneous Disciplines Moving targets and stationary targets

Multiple coordinate systems

Multiple data object types (images, cubes, time series,binary,document) Multiple interpretations of single object types

Multiple software solutions to same problem.

Incompatible and/or missing metadata

Page 7: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 7

Solution: Build a Data Architecture

Solution build a data architecture by initially focusing on Metadata management

A middleware framework for interoperability

Page 8: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 8

Focus on Metadata

Metadata is data about data Provides descriptive information about the data

Classification, identification, etc Metadata Example

Data Value: 55 (not descriptive)

Metadata Values: Data Element Name:Vehicle_Speed Unit: Miles per Hour Description: The average velocity of a vehicle.

Build metadata repositories that manage information about distributed data products (E.g. location, target, observation date, etc)

Use standards where appropriate ISO/IEC 11179 – A framework for the Specification and Standardization of

Data Elements Dublin Core – A metadata element set intended to facilitate discovery of

electronic resources.

Page 9: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 9

Focus on Middleware

Middleware defined asIn the computer industry, middleware is a general term for any programming that serves to “glue together” or mediate between two separate and usually already existing programs. A common application of middleware is to allow programs written for access to a particular database to access other databases.

Messaging is a common service provided by middleware programs so that different applications can communicate. The systematic tying together of disparate applications is known as enterprise application integration.

http://www.whatis.com

Middleware allows for the encapsulation of individual data systems Hide uniqueness by introducing the data architecture layer

Ties distributed applications together an often works with a Electronic Data Interchange (EDI) type mechanism

Enables reuse and promotes standards

Page 10: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 10

Role of Middleware

Middleware can tie application, data, and user interfaces together and hide the unique interfaces

Applications User Interface

Data

Middleware

Page 11: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 11

NIST I.T. Architecture for Federal Gov’t

Business

InformationArchitecture

Information SystemsArchitecture

Enterprise Data Architecture

Delivery Systems Architecture

Drives ↓

Prescribes ↓

Identifies ↓

Supported by ↓

Redrawn from Federal Enterprise Architecture Framework version 1.1, September 1999, Chief Information Officers Council

Page 12: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 12

Object Oriented Data Technology Task

Research task funded by the Office of Space Science (OSS) at NASA

Provides a framework for managing data access and interoperability Archive Service: For managing data sets

Profile Service: For managing metadata profiles about data systems, data sets, and data products

Product Service: To tie individual data systems into a larger enterprise data system

Presented a paper at CODATA in March 2000 called “Science Search and Retrieval using XML”

Page 13: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 13

OODT Focus

Focus on building middleware components for an enterprise data architecture

Focus on building “profiles” for managing metadata information about cross-disciplinary resources

Provide sufficient layers of abstraction in the architecture to isolate technologies choices from the architecture choices

XML for the data content CORBA for the data transport

Research technologies for implementing a distributed data architecture Distributed Object Computing (CORBA, DCOM, etc) Database Technology (RDBMS, ODBMS) Data Access Technologies (O/JDBC, STEP, XML, etc) Directory Implementations (LDAP) Data Interchange (XML) Communication Technologies (Web/HTTP, MOM, RPC, etc)

Page 14: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 14

OODT Pilot Activity

Partner with the Planetary Data System (PDS) to address interoperability across 10 PDS silos

Build a generic XML Document Type Definition (DTD) that will support PDS data dictionary and metadata infrastructure

Demonstrate how a science query can return data across the PDS nodes

Demonstrate how the same interface can return information between planetary and astrophysics data systems

Page 15: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 15

OODT Metadata Development

Metadata Registry – Develop a data management system for managing the semantics of data that is shared within and between domains. Terminology Base – Domain specific name space.

Data Dictionary – Inventory of domain terms with definitions and other distinguishing attributes.

Ontology – A set of concepts, their relationships and constraints, all within the scope of a domain.

XML for metadata registry and communication

The PDS experience with the Planetary Science Data Dictionary has shown the criticality of metadata in enabling data sharing and system interoperability.

Page 16: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 16

What is the PDS?

PDS is the official planetary science data archive for the NASA Office of Space Science (OSS) Solar System Exploration (SSE).

PDS is chartered to ensure that SSE planetary data are archived and available to the scientific community.

PDS is a distributed system designed to optimize scientific oversight in the archiving process.

The PDS has been in existence for 10 years.

Page 17: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 17

Objectives of the PDS

Publish and disseminate documented data sets for use in scientific analysis

Work with projects to help design, generate, and validate data products for placement in archive

Develop and maintain archive data standards to ensure future usability.

Provide expert scientific help to the user community.

Page 18: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 18

What is meant by a documentedData Set?

The goal of the PDS archiving system is for each data set to be autonomous, i.e., all information required to understand and interpret the data should be included in the archive.

To that end, an archive package includes: Raw data Data calibrated to physical units Calibration data and algorithms Ancillary data, e.g. observation geometry Higher level data products (maps, projections, other aggregations) Documentation for the data, instrument, flight project, etc. (metadata)

Page 19: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 19

What is the structure of the PDS?

PDS is a distributed system designed to optimize scientific oversight in the archiving process

The PDS is managed by discipline scientists working with the project manager

PDS Science Discipline Nodes Archival of data and supporting documentation

Expertise in researching and interpreting the data

Expertise in the planning and design of future observations and data sets

Distribution of data to the community

PDS Central Node Program management

Project engineering

Standards development

Page 20: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 20

PDS Nodes and Institutions (Silos)

NAIF/JPL

Small Bodies/UMD

Atmospheres/New Mexico State

Geosciences/Washington University

Planetary Plasma/UCLA

Rings/Ames

Radio Science/Stanford

Central Node/JPL

Imaging/USGS

Imaging/JPL

Page 21: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 21

What has the PDS Accomplished?

Produced a high-quality peer-reviewed archive of Solar System Exploration Data

Stored for long-term viability

Described by metadata

Distributed either online or on CD media

Developed a robust standards architecture Planetary Science Data Dictionary - Provides the “domain of discourse”

for the planetary science community.

Planetary Community Model - Provides formalized descriptions of the entities and their relationships within the planetary science community.

Developed science driven management structure Responsive to changing mission project environment through

distributed, science discipline oriented nodes.

Page 22: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 22

The Use of Metadata in the PDS

Mission

Data Set Collection

Data SetInstrument

Spacecraft

Target

Document Image Time Series Spectrum

Planetary ScienceModel

Locate and Use Data- Use context to find data- Use context to understand data

System Interoperability- Use context to share data

Correlative Science- Use context to find new relationships between data

ModelAttributes

Label

Data

Page 23: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 23

What OODT is doing for PDS?

Problem Statement - In spite of the Web and a common standards architecture, the PDS continues to be a collection of heterogeneous data systems with little resource sharing.

Solution Prototype a PDS profile service that will manage metadata

profiles for data sets, data products, and data systems.

Prototype PDS product servers to integrate individual data systems.

Promote the use of archive services by mission projects for more efficient production of data products.

Page 24: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 24

OODT Demonstration

Search for PDS data sets

Search for PDS images (granules)

Search for Astrophysics data by star Searches the Palomar Testbed Interferometer (PTI) archive

Page 25: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 25

OODT Query Flow

Query Server

Profile Serverjpl

Product Serverjpl.pti

Product Serverjpl.pds

Product Serverjpl.pds.mola

Userquery

XSL(profiles ordata productsformatted)

XMLQuery(no results)

XMLQuery(profiles ordata resultsas requested)

XMLQuery(no results)

XMLQuery(profiles of resources to handle query)

XM

LQ

ue

ry (

da

ta r

esu

lts)

XM

LQ

ue

ry (

pro

du

ct s

ea

rch

)

Search Web Page

Profile DB

PTI Repository

PDS DVD Jukebox

PDS MOLA Oracle DB

Que

ryC

lien

t

Web

se

rver

sear

ch.js

p

Page 26: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 26

More Information

“Science Search and Retrieval using XML” by OODT Team. Presented at Second National Conference on Scientific and Technical Data, National Academy of Sciences, Washington D.C.

http://oodt.jpl.nasa.gov/doc/papers/codata/paper.pdf

Planetary Data System

http://pds.jpl.nasa.gov

Dublin Core

http://purl.oclc.org/dc

Extensible Markup Language

http://www.w3c.org/XML

ISO/IEC 11179: Specification and Standardization of Data Elements

Object Management Group (CORBA and UML standards)

http://www.omg.org

Federal CIO Statement on Metadata

http://www.cio.gov/docs/metadata.htm

National Information Standards Organization Z39.50 Information Retrieval Protocol

http://www.niso.org/z3950.html

Page 27: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 27

Backup Slides

Backup Slides

Page 28: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 28

JPL Org Chart (partial)

Caltech President

JPL Office of theDirector

Institutional Computing/ Chief Information Officer

Science and Earth Science Programs

Enterprise ApplicationsOffice

Enterprise Infrastructure Office

Engineering and ScienceDirectorate

Science Data Managementand Archiving

Planetary Data SystemObject Oriented Data

Technology

Page 29: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 29

Org Chart - Responsibility Flow

Science Data Managementand Archiving

Enterprise ApplicationsOffice

Planetary Data System

Object Oriented DataTechnology

Funding, ProgrammaticOversight

Deliverables

Deliverables

Deliverables

Program Offices Implementation Organizations

Task ManagementDesign and Implementation Responsibility

Page 30: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 30

Institutional Enterprise Data Architecture Breakdown

Paradigm shift from stove pipe implementations to horizontal solutions that cross organizational boundaries

Include such services as Enterprise Application Standards

Object Services

Data Infrastructure Services Database Hosting

Metadata Management

Data Interchange

Information Architecture Services Institutional directory and security access

Data system APIs for access

Data mining and data warehousing

Data Management Services Data archiving

Page 31: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 31

JPL Enterprise Architecture (Logical View)

Enterprise Information System Infrastructure Services

Enterprise Data & Application Architecture

Know ledge Management Services

JPL Institutional Information Technology Architecture

E nterpriseA pplica tionS tandards

O bject S ervicesD ata In frastructure

S ervices

In form ationM anagem ent

S ervices

D ata M anagem entS ervices

D irectoryS ervices

JPL Enterprise IT Applications JPL Missions/Projects JPL Partners

D istribu ted F ileS ervices

Enterprise Information System Netw ork Infrastructure

N etwork S ervices

N etwork D evices

D ata and A pplica tionS ecurity

M essagingS ystem s

M anagem ent

K now ledgeS tandards

K now ledgeN avigation

D ocum entationM anagem ent

P ro ject W ebS ites

C ollaborativeE nvironm ent

E xpertC onnections

U C S P D M S D M IE JP L D ata S ystem sP ro ject

D eve lopm entN A S A U niversity IndustryD N P

Page 32: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 32

What is a profile?

A profile is a set of resource definitions implemented in XML for data products residing in one or more distributed systems

Profile servers are CORBA servers that manage XML profile definitions

Profile servers communicate via XML-over-CORBA

Developed Java classes that map XML profiles to a Java object

Profile Distributed Node Architecture

Page 33: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 33

Profile Server Architecture

Page 34: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 34

Profile DTD

<!ELEMENT profiles (profile+)>

<!ELEMENT profile (profAttributes, resAttributes, profElement*)>

<!ELEMENT profAttributes (profId, profVersion*, profTitle*, profDesc*, profType*, profStatusId*, profSecurityType*, profParentId*, profChildId*, profRegAuthority*, profRevisionNote*, profDataDictId*)>

<!ELEMENT resAttributes (Identifier, Title*, Format*, Description*, Creator*, Subject*, Publisher*, Contributor*, Date*, Type*, Source*, Language*, Relation*, Coverage*, Rights*, resContext*, resAggregation*, resClass*, resLocation*)>

<!ELEMENT profElement (elemId*, elemName, elemDesc*, elemType*, elemUnit*, elemEnumFlag*, (elemValue | (elemMinValue, elemMaxValue))*, elemSynonym*, elemObligation*, elemMaxOccurrence*, elemComment*)>

Page 35: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 35

Profile Example

<<profile> <profAttributes> <profId>OODT_PDS_DATA_SET_INV_82</profId> <profDataDictId>OODT_PDS_DATA_SET_DD_V1.0</profDataDictId> </profAttributes> <resAttributes> <Identifier>VO1/VO2-M-VIS-5-DIM-V1.0</Identifier> <Title>VO1/VO2 MARS VISUAL IMAGING SUBSYSTEM DIGITAL IMAGING MODEL</Title> <Format>text/html</Format> <Language>en</Language> <resContext>PDS</resContext> <resAggregation>dataSet</resAggregation> <resClass>data.dataSet</resClass> <resLocation>http://pds.jpl.nasa.gov/cgi-bin/pdsserv.pl?OBJECT_ID=PDS100751</resLocation> </resAttributes> <profElement> <elemId>ARCHIVE_STATUS</elemId> <elemName>ARCHIVE_STATUS</elemName> <elemType>ENUMERATION</elemType> <elemEnumFlag>T</elemEnumFlag> <elemValue>ARCHIVED</elemValue> </profElement> <profElement> <elemId>TARGET_NAME</elemId> <elemName>TARGET_NAME</elemName> <elemType>ENUMERATION</elemType> <elemEnumFlag>T</elemEnumFlag> <elemValue>MARS</elemValue> </profElement></profile>

Page 36: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 36

OODT Product Server

The Product Server plugs into the OODT framework and manages the “handshake” between the data system and the OODT system.

Extensible by dynamically loading objects at runtime which are specific to the data system model

Queries and results are passed using an OODT XML Query structure

Product Server

Flat file accessor

Unusual accessor

Query

Result

Generic Server Implementation Class

FileSys

Database

Page 37: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 37

XML Query Structure

Defined as follows The query description

The results Result 1

Result Header

Result Data

Result 2 Result Header

Result Data

...

Result N Result Header

Result Data

Page 38: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 38

Why XML for OODT?

XML doesn’t provide a “silver bullet,” but it does allow us to refocus the problem on metadata Metadata is a key to interoperability

XML is language neutral

Allows the designer to separate the data and the transport (re: CORBA vs XML-over-CORBA) Transport mechanism and data are not tied together

Could be XML/HTTP

Simpler deployments

Simpler interfaces

Allows technologies to grow and change independently

Real value of XML is the content

Page 39: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 39

CORBA vs XML

XML over CORBA/IIOP

module jpl { module user { interface UserManager { string do(string xml);

}; };};

<transaction> <findUser> <user> <surname>Doe</surname> </user> </findUser></transaction>

CORBA method

module jpl { module user { interface UserManager { User findUser(string

name); }; interface User { String getName(); }; };};

Page 40: Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,

Page 40

Middleware Framework for OODT

OODT/Science Web Tools

ArchiveClient

OBJECT ORIENTED DATA TECHNOLOGY FRAMEWORK

ProfileXMLData

NavigationService

Data System

2

Data System

2

Other Service 1

Other Service 2

QueryService

ProductService

ProfileService

ArchiveService

Bridge to External Services