Development of the Next Generation PDSSataStadads...
Transcript of Development of the Next Generation PDSSataStadads...
Development of the Next Generation PDS Data StandardsS ata Sta da ds
PDS4
Earth and Space ScienceInformatics Workshop
J. Steven HughesPDS4 Data Design Working Group
August 2-4, 2010
Topics
• IntroductionD i G l• Design Goals
• Key Architectural Concepts• Data Driven Development• Data Driven Development
Copyright 2009 California Institute of TechnologyGovernment sponsorship acknowledged
Why upgrade the PDS Data Standards?Why upgrade the PDS Data Standards?
• The current PDS data standards (PDS3) were d l d i h l 1980’ d fi h developed in the late 1980’s to define the concepts and terms needed for archiving science data in the planetary science domain.p y• Data standards were innovative for their time however
after almost two decades of use:• Ambiguity had crept inAmbiguity had crept in• Data formats had become obsolete• Usability software had become difficult to maintain
Th i h d i ifi bl f • These issues have caused significant problems for PDS operations, data providers, and end-users.
4
DeliverablesI f ti M d l• Information Model• The Information Model defines object classes, including
data structures, formats, and products as well as data sets, documents, software, and missions. sets, documents, software, and missions.
• Data Dictionary• Model - The Data Dictionary Model provides the schema for Model The Data Dictionary Model provides the schema for
the data dictionary. • Content - The Data Dictionary documents the data
elements used in the Information Model.
• Standards Reference • The Standards Reference documents the overall standards
architecture.
• Grammar Options• XML is the working grammar of the archive.
5
Design Goalsg
• Simplified Data Formats
• Long-term Stability in the Archive (Data structures should not become obsolete)structures should not become obsolete)
• Efficient Archive Preparation for Data Providers
• Efficient Data Service Development
• Enhanced Data Dictionary
6
Key Features
• Four base formats for all archived information• Physical data segments map directly to logical
tsegments• Documents, software and ancillary data
treated as rigorously as observational datatreated as rigorously as observational data• Keyword content sorted into independent
classes• Hierarchical data dictionary with delegated
authorities
7
Base Formats
All th d t d l ith b b k d All the data we deal with can be broken down into one or more of the base formats.
• Arrays
• Tables
• Parseable byte streams
Encoded byte streams• Encoded byte streams
8
Data Dictionary
• All keywords grouped into classes
• Separate (or partitioned) dictionaries to distribute authority
• Strict central control over structural descriptions and universally required sections
9
Data Dictionary - Logical View
GovernanceR i t ti A th it
ISO/IEC 11179:2003 Volume:3 Metadata Registry Specification
Common
• Registration Authority• Steward• Namespace
g y p
Discipline / External Source
Local Data Dictionaries (Mission)( )
All Products Are Equal
All products are treated with equal rigor in labelling and documenting.
• Ensures the ability to cross-reference throughout the archive holdings
• Supports interface selection and packaging options for usersp
• Necessary for tracking and processing formats that may require migration in futurethat may require migration in future
11
OAIS Information Objectj
• The OAIS* Information Object unifies digital, conceptual and physical objects and their conceptual and physical objects and their descriptions
D ObjRepresentationStructuralSemantic
Data ObjectDigitalPhysicalConceptual
• A product is a uniquely defined package of related information objects
Conceptual
related information objects• Data Product, Software, Document
• A data set is a collection of productsp
12* Open Archive Information System
Data Product Components
Registry Object &Web Resource
Classification
Product
Description Combinations
Data Object Description
Structure
Data
Data Driven Development Process
• The ontology defines the things in the domain and their
l ti hi relationships.
• A Data Dictionary defines data elements.
• The report writer uses the ontology and data dictionary to data dictionary to export and translate the information model into various notations and languages.
•Updates to the ontology are reflected ontology are reflected in the artifacts automatically.
Example XML Schema - Image Grayscale<xsd:complexType name="Image_Grayscale_Type"><!-- Structure_Base_Type:Array_Base --><xsd:sequence>
<xsd:element name="local_identifier" type="dd:local_identifier_Type" minOccurs="1" maxOccurs="1<xsd:element name="comment" type="dd:comment_Type" minOccurs="0" maxOccurs="1"> </xsd:element><xsd:element name="axes" type="dd:Array_2D_axes_Type" minOccurs="1" maxOccurs="1"> </xsd:elemen<xsd:element name="axis_order" type="dd:Image_Grayscale_axis_order_Type" minOccurs="1" maxOccur<xsd:element name="object_encoding_type" type="dd:Array_Base_object_encoding_type_Type" minOccu< d l t "D t L ti " t " d D t L ti T " i O "1" O "1"> </<xsd:element name="Data_Location" type="pds:Data_Location_Type" minOccurs="1" maxOccurs="1"> </<xsd:element name="Array_Axis" type="pds:Array_Axis_Type" minOccurs="2" maxOccurs="2"> </xsd:el<xsd:element name="Array_Element" type="pds:Array_Element_Type" minOccurs="1" maxOccurs="1"> </
</xsd:sequence></xsd:complexType>
<xsd:complexType name="Data_Location_Type">…
<xsd:complexType name="Array_Axis_Type"><xsd:sequence>
<xsd:element name="elements" type="dd:elements_Type" minOccurs="1" maxOccurs="1"> </xsd:element<xsd:element name="name" type="dd:name_Type" minOccurs="1" maxOccurs="1"> </xsd:element><xsd:element name="scale_type" type="dd:scale_type_Type" minOccurs="0" maxOccurs="1"> </xsd:ele<xsd:element name="sequence_number" type="dd:sequence_number_Type" minOccurs="1" maxOccurs="1"><xsd:element name="unit" type="dd:unit_Type" minOccurs="0" maxOccurs="1"> </xsd:element>
</xsd:sequence></xsd:sequence></xsd:complexType>
<xsd:complexType name="Array_Element_Type"><xsd:sequence>
<xsd:element name="data type" type="dd:data type Type" minOccurs="1" maxOccurs="1"> </xsd:eleme
18
sd:e e e t a e data_type type dd:data_type_ ype Occu s a Occu s / sd:e e e<xsd:element name="scaling_factor" type="dd:scaling_factor_Type" minOccurs="0" maxOccurs="1"> <<xsd:element name="unit" type="dd:unit_Type" minOccurs="0" maxOccurs="1"> </xsd:element><xsd:element name="value_offset" type="dd:value_offset_Type" minOccurs="0" maxOccurs="1"> </xsd
</xsd:sequence></xsd:complexType>
Example XML Labels – Image Grayscale<Image_Grayscale><local_identifier>MPFL_M_IMP_IMAGE</local_identifier><axes>2</axes><axis_order>FIRST_INDEX_FASTEST</axis_order><object_encoding_type>BINARY</object_encoding_type><Data_Location>
<file_local_identifier>F09128.IMG</file_local_identifier><offset>1</offset>
</Data_Location><Array_Axis>
<elements>248</elements><name>LINE</name><sequence_number>1</sequence_number>/</Array_Axis>
<Array_Axis><elements>256</elements><name>SAMPLE</name>
2 /<sequence_number>2</sequence_number></Array_Axis><Array_Element>
<data_type>SignedMSB4</data_type>/ l
19
</Array_Element></Image_Grayscale>
Registry Configuration File - Associations
<!-- AssociationType definitions -->
<rim:RegistryObject xsi:type="rim:ClassificationNodeType" code="has_browse" parent="urn:oasis:names:tc:ebxml-regrep:classificationScheme:Asslid="urn:nasa:pds:profile:regrep:AssociationType:has_browse“id="urn:nasa:pds:profile:regrep:AssociationType:has_browse"><rim:Name>
<rim:LocalizedString charset="UTF-8" value="has_browse"/></rim:Name>
</rim:RegistryObject>
<rim:RegistryObject xsi:type="rim:ClassificationNodeType" code="has_calibration" parent="urn:oasis:names:tc:ebxml-regrep:classificationScheme:Asslid="urn:nasa:pds:profile:regrep:AssociationType:has_calibrationid="urn:nasa:pds:profile:regrep:AssociationType:has_calibration"<rim:Name>
<rim:LocalizedString charset="UTF-8" value="has_calibration"/></rim:Name>
24
</rim:RegistryObject>
Identifiable•The Identifiable model •The Identifiable model defines objects that can be registered into a registry and stored into a repository.
• Based on ISO 15000-3-ebXML RIM, Dublin Core; W3C:XML/SchemaW3C:XML/Schema
•Each Identifiable has a globally unique immutable identifie a logical identifie identifier, a logical identifier for grouping versions, and all names that might have been assigned to the object.g j
•Identifiables can be located and retrieved by a single query against a federated query against a federated registry system.
Identifiable•The Identifiable model •The Identifiable model defines objects that can be registered into a registry and stored into a repository.
• Based on ISO 15000-3-ebXML RIM, Dublin Core; W3C:XML/SchemaW3C:XML/Schema
•Each Identifiable has a globally unique immutable identifie a logical identifie identifier, a logical identifier for grouping versions, and all names that might have been assigned to the object.g j
•Identifiables can be located and retrieved by a single query against a federated query against a federated registry system.
Schedule and
Data Dictionary ModelISO/IEC 11179 adopted
Generic Product ModelDesigned and in testing
Progress ChartDesigned and in testing
Fundamental StructuresDesigned and in testing
Data FormatsData FormatsInitial set designed and in testing
Data Element NomenclatureRules drafted
D t Di tiData DictionaryClean-up started
Context ModelDesign started
XML/SchemaDesigned and in testing
Discipline ModelsInitial set designed; More neededInitial set designed; More needed
PDS4 Standards Reference, TutorialsConcept of Operations, DPHIn progress; dependent on model
Jan 2010 Sys Rev/MC SepJul Acc Rev
Benefits of the PDS4 Data Model
• The data model is managed in a ontology modeling tool.• The model is formally defined.The model is formally defined.• The model can be validated and tested.
• Defines a few simple fundamental data structures.• Fundamental data structures may be extended and • Fundamental data structures may be extended and
combined to form more complex data formats• The overall architecture is model driven.
Di t l th d l f it i l t ti• Disentangles the model from its implementation.• Model can evolve over time as research domain changes.• Drives the generation of documentation, label schema,
and other model dependent artifacts.• The data dictionary uses a standard data dictionary model.
30
Proposed IPDA Data Standards Project
• Identify the core elements of the PDS4 data standards
• Develop a process for maintaining alignment between the IPDA and the PDSPDS
UniqueNamespace
32
Positioning the PDS for the Future• Support for Advanced Technologies
• Service Oriented Architectures; Semantic Searches, Text and Facet Based Searches; Machine reasoning; Automatic classification; Logical Consistency Checking.
• Federated Registries: Unique Identification, Versioning, Federated queries, f d t d li ti f d t d li ki fi ti t federated replication, federated linking, configuration management, subscribe/notification, logging.
• Support for Interoperability• Shared Ontology across Planetary Science Disciplines; Shared ontologies
ithi S i Di i liwithin Science Disciplines• Standard Data Dictionary Schema
• Namespace partitioning; classification schemes; registration authority, submitter, steward
• Standards Based• ISO/IEC 11179-MDR; ISO 14721:2003-OAIS; ISO 15000-3-RIM; ISO/IEC
19502-MOF; ISO 639-RDF; OWL_DL; ISO/IEC 19501-UML; Dublin Core; W3C:XML/Schema;ISO 11404-Data Types; ISO 8601-Time
• Model Driven Implementation Philosophy• Metadata can be used in ways not yet envisioned.
• Supported Implementation Languages• XML, PVL, ODL, RDF/XML, OWL/XML, YADL, , , / , / ,
• Modeling Approach• Ontology - Object-Oriented semantics including class hierarchy, class
inheritance; named and typed associations; class, attribute and value cardinalities; network and recursive.