Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented...
-
Upload
chloe-verity-carter -
Category
Documents
-
view
214 -
download
0
Transcript of Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented...
Dan Crichton, Manager, Enterprise Data Architecture Task,
Principal Investigator Object Oriented Data Technology Task
Steve Hughes, Lead System Engineer, Planetary Data System,
Co Investigator, Object Oriented Data Technology Task
Thuy Tran, Senior Software Engineer, Enterprise Data Architecture and
Object Oriented Data Technology Tasks
Jet Propulsion Laboratory, California Institute of Technology
National Aeronautics and Space Administration
Interoperability and Data Architecture for Metadata Development
Biomarkers Knowledge System MeetingBethesda, MD
September 8, 2000
Page 2
Outline
Describe JPL and enterprise computing
Define an Enterprise Data Architecture
Define why an EDA is Critical to NASA
Address Data Interoperability Challenges
Describe the Object Oriented Data Technology task
Case Study: The Planetary Data System
Provide an Overview of PDS and Objectives
Describe the PDS Organizational Structure
What has the PDS accomplished
Discuss the role of Metadata within the PDS
Demo search of PDS data sets and PTI data sets
D. Crichton
S. Hughes
T. Tran
Page 3
About JPL and Enterprise Computing
JPL is a federally funded research and development center (FFRDC) run by Caltech for NASA
JPL is NASA’s lead center for robotic exploration of the universe
JPL has an enormous amount of data that it needs to manage from scientific data, to engineering, to institutional
We represent several efforts in both the research and enterprise side of JPL that is addressing enterprise architectures for integrating data at both JPL and NASA. Such efforts include
Knowledge Management
Enterprise Data Architecture Task
Planetary Data System
Object Oriented Data Technology
Page 4
What is an Enterprise Data Architecture?
An enterprise data architecture provides the infrastructure necessary to enable development of interoperable, enterprise-wide applications Data Interoperability
Data Sharing
Data Access Facilitate access
Reduce complexity
Data Management Data Archiving
Basic infrastructure to support knowledge discovery
Page 5
Why is an EDA Critical to NASA?
Interoperability is an important key to unlock knowledge discovery Allows scientists the ability to locate critical information Enables knowledge management across NASA A key to scientific discovery
State of data systems at NASA agency wide Difficult to access (no standard interfaces) Geographically distributed Have no standard language or protocol for interchange (no
EDI) agency wide No common metadata language agency wide Have no system for registration of data products Have little or no interoperability Have few common terms for describing data
Page 6
Interoperability Challenges and Needs
Space scientists can not easily locate or use data across the hundreds if not thousands of autonomous, heterogeneous, and distributed data systems currently in the Space Science community.
Heterogeneous Systems Data Management - RDBMS, ODBMS, Home Grown DBMS, Binary Files
Platforms - UNIX, Linux, WIN3.x/9x/NT, Mac, VMS, …
Interfaces - Web, Windows, Command Line
Data Formats - HDF, CDF, NetCDF, PDS, FITS, VICR, ASCII, ...
Data Volume - KiloBytes to TeraBytes
Heterogeneous Disciplines Moving targets and stationary targets
Multiple coordinate systems
Multiple data object types (images, cubes, time series,binary,document) Multiple interpretations of single object types
Multiple software solutions to same problem.
Incompatible and/or missing metadata
Page 7
Solution: Build a Data Architecture
Solution build a data architecture by initially focusing on Metadata management
A middleware framework for interoperability
Page 8
Focus on Metadata
Metadata is data about data Provides descriptive information about the data
Classification, identification, etc Metadata Example
Data Value: 55 (not descriptive)
Metadata Values: Data Element Name:Vehicle_Speed Unit: Miles per Hour Description: The average velocity of a vehicle.
Build metadata repositories that manage information about distributed data products (E.g. location, target, observation date, etc)
Use standards where appropriate ISO/IEC 11179 – A framework for the Specification and Standardization of
Data Elements Dublin Core – A metadata element set intended to facilitate discovery of
electronic resources.
Page 9
Focus on Middleware
Middleware defined asIn the computer industry, middleware is a general term for any programming that serves to “glue together” or mediate between two separate and usually already existing programs. A common application of middleware is to allow programs written for access to a particular database to access other databases.
Messaging is a common service provided by middleware programs so that different applications can communicate. The systematic tying together of disparate applications is known as enterprise application integration.
http://www.whatis.com
Middleware allows for the encapsulation of individual data systems Hide uniqueness by introducing the data architecture layer
Ties distributed applications together an often works with a Electronic Data Interchange (EDI) type mechanism
Enables reuse and promotes standards
Page 10
Role of Middleware
Middleware can tie application, data, and user interfaces together and hide the unique interfaces
Applications User Interface
Data
Middleware
Page 11
NIST I.T. Architecture for Federal Gov’t
Business
InformationArchitecture
Information SystemsArchitecture
Enterprise Data Architecture
Delivery Systems Architecture
Drives ↓
Prescribes ↓
Identifies ↓
Supported by ↓
Redrawn from Federal Enterprise Architecture Framework version 1.1, September 1999, Chief Information Officers Council
Page 12
Object Oriented Data Technology Task
Research task funded by the Office of Space Science (OSS) at NASA
Provides a framework for managing data access and interoperability Archive Service: For managing data sets
Profile Service: For managing metadata profiles about data systems, data sets, and data products
Product Service: To tie individual data systems into a larger enterprise data system
Presented a paper at CODATA in March 2000 called “Science Search and Retrieval using XML”
Page 13
OODT Focus
Focus on building middleware components for an enterprise data architecture
Focus on building “profiles” for managing metadata information about cross-disciplinary resources
Provide sufficient layers of abstraction in the architecture to isolate technologies choices from the architecture choices
XML for the data content CORBA for the data transport
Research technologies for implementing a distributed data architecture Distributed Object Computing (CORBA, DCOM, etc) Database Technology (RDBMS, ODBMS) Data Access Technologies (O/JDBC, STEP, XML, etc) Directory Implementations (LDAP) Data Interchange (XML) Communication Technologies (Web/HTTP, MOM, RPC, etc)
Page 14
OODT Pilot Activity
Partner with the Planetary Data System (PDS) to address interoperability across 10 PDS silos
Build a generic XML Document Type Definition (DTD) that will support PDS data dictionary and metadata infrastructure
Demonstrate how a science query can return data across the PDS nodes
Demonstrate how the same interface can return information between planetary and astrophysics data systems
Page 15
OODT Metadata Development
Metadata Registry – Develop a data management system for managing the semantics of data that is shared within and between domains. Terminology Base – Domain specific name space.
Data Dictionary – Inventory of domain terms with definitions and other distinguishing attributes.
Ontology – A set of concepts, their relationships and constraints, all within the scope of a domain.
XML for metadata registry and communication
The PDS experience with the Planetary Science Data Dictionary has shown the criticality of metadata in enabling data sharing and system interoperability.
Page 16
What is the PDS?
PDS is the official planetary science data archive for the NASA Office of Space Science (OSS) Solar System Exploration (SSE).
PDS is chartered to ensure that SSE planetary data are archived and available to the scientific community.
PDS is a distributed system designed to optimize scientific oversight in the archiving process.
The PDS has been in existence for 10 years.
Page 17
Objectives of the PDS
Publish and disseminate documented data sets for use in scientific analysis
Work with projects to help design, generate, and validate data products for placement in archive
Develop and maintain archive data standards to ensure future usability.
Provide expert scientific help to the user community.
Page 18
What is meant by a documentedData Set?
The goal of the PDS archiving system is for each data set to be autonomous, i.e., all information required to understand and interpret the data should be included in the archive.
To that end, an archive package includes: Raw data Data calibrated to physical units Calibration data and algorithms Ancillary data, e.g. observation geometry Higher level data products (maps, projections, other aggregations) Documentation for the data, instrument, flight project, etc. (metadata)
Page 19
What is the structure of the PDS?
PDS is a distributed system designed to optimize scientific oversight in the archiving process
The PDS is managed by discipline scientists working with the project manager
PDS Science Discipline Nodes Archival of data and supporting documentation
Expertise in researching and interpreting the data
Expertise in the planning and design of future observations and data sets
Distribution of data to the community
PDS Central Node Program management
Project engineering
Standards development
Page 20
PDS Nodes and Institutions (Silos)
NAIF/JPL
Small Bodies/UMD
Atmospheres/New Mexico State
Geosciences/Washington University
Planetary Plasma/UCLA
Rings/Ames
Radio Science/Stanford
Central Node/JPL
Imaging/USGS
Imaging/JPL
Page 21
What has the PDS Accomplished?
Produced a high-quality peer-reviewed archive of Solar System Exploration Data
Stored for long-term viability
Described by metadata
Distributed either online or on CD media
Developed a robust standards architecture Planetary Science Data Dictionary - Provides the “domain of discourse”
for the planetary science community.
Planetary Community Model - Provides formalized descriptions of the entities and their relationships within the planetary science community.
Developed science driven management structure Responsive to changing mission project environment through
distributed, science discipline oriented nodes.
Page 22
The Use of Metadata in the PDS
Mission
Data Set Collection
Data SetInstrument
Spacecraft
Target
Document Image Time Series Spectrum
Planetary ScienceModel
Locate and Use Data- Use context to find data- Use context to understand data
System Interoperability- Use context to share data
Correlative Science- Use context to find new relationships between data
ModelAttributes
Label
Data
Page 23
What OODT is doing for PDS?
Problem Statement - In spite of the Web and a common standards architecture, the PDS continues to be a collection of heterogeneous data systems with little resource sharing.
Solution Prototype a PDS profile service that will manage metadata
profiles for data sets, data products, and data systems.
Prototype PDS product servers to integrate individual data systems.
Promote the use of archive services by mission projects for more efficient production of data products.
Page 24
OODT Demonstration
Search for PDS data sets
Search for PDS images (granules)
Search for Astrophysics data by star Searches the Palomar Testbed Interferometer (PTI) archive
Page 25
OODT Query Flow
Query Server
Profile Serverjpl
Product Serverjpl.pti
Product Serverjpl.pds
Product Serverjpl.pds.mola
Userquery
XSL(profiles ordata productsformatted)
XMLQuery(no results)
XMLQuery(profiles ordata resultsas requested)
XMLQuery(no results)
XMLQuery(profiles of resources to handle query)
XM
LQ
ue
ry (
da
ta r
esu
lts)
XM
LQ
ue
ry (
pro
du
ct s
ea
rch
)
Search Web Page
Profile DB
PTI Repository
PDS DVD Jukebox
PDS MOLA Oracle DB
Que
ryC
lien
t
Web
se
rver
sear
ch.js
p
Page 26
More Information
“Science Search and Retrieval using XML” by OODT Team. Presented at Second National Conference on Scientific and Technical Data, National Academy of Sciences, Washington D.C.
http://oodt.jpl.nasa.gov/doc/papers/codata/paper.pdf
Planetary Data System
http://pds.jpl.nasa.gov
Dublin Core
http://purl.oclc.org/dc
Extensible Markup Language
http://www.w3c.org/XML
ISO/IEC 11179: Specification and Standardization of Data Elements
Object Management Group (CORBA and UML standards)
http://www.omg.org
Federal CIO Statement on Metadata
http://www.cio.gov/docs/metadata.htm
National Information Standards Organization Z39.50 Information Retrieval Protocol
http://www.niso.org/z3950.html
Page 27
Backup Slides
Backup Slides
Page 28
JPL Org Chart (partial)
Caltech President
JPL Office of theDirector
Institutional Computing/ Chief Information Officer
Science and Earth Science Programs
Enterprise ApplicationsOffice
Enterprise Infrastructure Office
Engineering and ScienceDirectorate
Science Data Managementand Archiving
Planetary Data SystemObject Oriented Data
Technology
Page 29
Org Chart - Responsibility Flow
Science Data Managementand Archiving
Enterprise ApplicationsOffice
Planetary Data System
Object Oriented DataTechnology
Funding, ProgrammaticOversight
Deliverables
Deliverables
Deliverables
Program Offices Implementation Organizations
Task ManagementDesign and Implementation Responsibility
Page 30
Institutional Enterprise Data Architecture Breakdown
Paradigm shift from stove pipe implementations to horizontal solutions that cross organizational boundaries
Include such services as Enterprise Application Standards
Object Services
Data Infrastructure Services Database Hosting
Metadata Management
Data Interchange
Information Architecture Services Institutional directory and security access
Data system APIs for access
Data mining and data warehousing
Data Management Services Data archiving
Page 31
JPL Enterprise Architecture (Logical View)
Enterprise Information System Infrastructure Services
Enterprise Data & Application Architecture
Know ledge Management Services
JPL Institutional Information Technology Architecture
E nterpriseA pplica tionS tandards
O bject S ervicesD ata In frastructure
S ervices
In form ationM anagem ent
S ervices
D ata M anagem entS ervices
D irectoryS ervices
JPL Enterprise IT Applications JPL Missions/Projects JPL Partners
D istribu ted F ileS ervices
Enterprise Information System Netw ork Infrastructure
N etwork S ervices
N etwork D evices
D ata and A pplica tionS ecurity
M essagingS ystem s
M anagem ent
K now ledgeS tandards
K now ledgeN avigation
D ocum entationM anagem ent
P ro ject W ebS ites
C ollaborativeE nvironm ent
E xpertC onnections
U C S P D M S D M IE JP L D ata S ystem sP ro ject
D eve lopm entN A S A U niversity IndustryD N P
Page 32
What is a profile?
A profile is a set of resource definitions implemented in XML for data products residing in one or more distributed systems
Profile servers are CORBA servers that manage XML profile definitions
Profile servers communicate via XML-over-CORBA
Developed Java classes that map XML profiles to a Java object
Profile Distributed Node Architecture
Page 33
Profile Server Architecture
Page 34
Profile DTD
<!ELEMENT profiles (profile+)>
<!ELEMENT profile (profAttributes, resAttributes, profElement*)>
<!ELEMENT profAttributes (profId, profVersion*, profTitle*, profDesc*, profType*, profStatusId*, profSecurityType*, profParentId*, profChildId*, profRegAuthority*, profRevisionNote*, profDataDictId*)>
<!ELEMENT resAttributes (Identifier, Title*, Format*, Description*, Creator*, Subject*, Publisher*, Contributor*, Date*, Type*, Source*, Language*, Relation*, Coverage*, Rights*, resContext*, resAggregation*, resClass*, resLocation*)>
<!ELEMENT profElement (elemId*, elemName, elemDesc*, elemType*, elemUnit*, elemEnumFlag*, (elemValue | (elemMinValue, elemMaxValue))*, elemSynonym*, elemObligation*, elemMaxOccurrence*, elemComment*)>
Page 35
Profile Example
<<profile> <profAttributes> <profId>OODT_PDS_DATA_SET_INV_82</profId> <profDataDictId>OODT_PDS_DATA_SET_DD_V1.0</profDataDictId> </profAttributes> <resAttributes> <Identifier>VO1/VO2-M-VIS-5-DIM-V1.0</Identifier> <Title>VO1/VO2 MARS VISUAL IMAGING SUBSYSTEM DIGITAL IMAGING MODEL</Title> <Format>text/html</Format> <Language>en</Language> <resContext>PDS</resContext> <resAggregation>dataSet</resAggregation> <resClass>data.dataSet</resClass> <resLocation>http://pds.jpl.nasa.gov/cgi-bin/pdsserv.pl?OBJECT_ID=PDS100751</resLocation> </resAttributes> <profElement> <elemId>ARCHIVE_STATUS</elemId> <elemName>ARCHIVE_STATUS</elemName> <elemType>ENUMERATION</elemType> <elemEnumFlag>T</elemEnumFlag> <elemValue>ARCHIVED</elemValue> </profElement> <profElement> <elemId>TARGET_NAME</elemId> <elemName>TARGET_NAME</elemName> <elemType>ENUMERATION</elemType> <elemEnumFlag>T</elemEnumFlag> <elemValue>MARS</elemValue> </profElement></profile>
Page 36
OODT Product Server
The Product Server plugs into the OODT framework and manages the “handshake” between the data system and the OODT system.
Extensible by dynamically loading objects at runtime which are specific to the data system model
Queries and results are passed using an OODT XML Query structure
Product Server
Flat file accessor
Unusual accessor
Query
Result
Generic Server Implementation Class
FileSys
Database
Page 37
XML Query Structure
Defined as follows The query description
The results Result 1
Result Header
Result Data
Result 2 Result Header
Result Data
...
Result N Result Header
Result Data
Page 38
Why XML for OODT?
XML doesn’t provide a “silver bullet,” but it does allow us to refocus the problem on metadata Metadata is a key to interoperability
XML is language neutral
Allows the designer to separate the data and the transport (re: CORBA vs XML-over-CORBA) Transport mechanism and data are not tied together
Could be XML/HTTP
Simpler deployments
Simpler interfaces
Allows technologies to grow and change independently
Real value of XML is the content
Page 39
CORBA vs XML
XML over CORBA/IIOP
module jpl { module user { interface UserManager { string do(string xml);
}; };};
<transaction> <findUser> <user> <surname>Doe</surname> </user> </findUser></transaction>
CORBA method
module jpl { module user { interface UserManager { User findUser(string
name); }; interface User { String getName(); }; };};
Page 40
Middleware Framework for OODT
OODT/Science Web Tools
ArchiveClient
OBJECT ORIENTED DATA TECHNOLOGY FRAMEWORK
ProfileXMLData
NavigationService
Data System
2
Data System
2
Other Service 1
Other Service 2
QueryService
ProductService
ProfileService
ArchiveService
Bridge to External Services