OASIS€¦ · Web viewThis proposal is for a proof of concept / showcase project to demonstrate...

ebXML Registry and Repository for e-Government

Version 0.1b 6 July 2004

Document identifier:

wd-eGov-regrep-0p1b.doc

Location:

Editor:

Paul Spencer, Office of the e-Envoy, UK ([email protected])

Carl Mattocks, CheckMi ([email protected])

Contributors:

Farrukh Najmi, SUN Microsystems ([email protected])

Maewyn Cumming, Cabinet Office, UK ([email protected])

Abstract:

This document contains work-in-progress on the project to provide a proof of concept registry repository for e-Government. It is a working document and will change frequently.

Status:

This document is updated periodically on no particular schedule. Committee members should send comments on this specification to the mailto:[email protected] list.

Table of Contents

31Initial Proposal

31.1 Overview

31.2 Introduction

31.3 Project Outline

41.4 Deliverables

41.5 Timescales

51.6 Resources

62eGovernment – ebXML Registry Technical Note Guidelines for e-Government Service use of ebXML Registry / Repository

62.1 Background

62.2 Service Centric Concepts

72.3 Federated Content Management

82.4 EbXML Registry Version 3

82.5 e-Government Service Requirements

92.6 Schema Component Definitions

102.7 Registration and Storage of EGSM Schema Components

102.8 Schema XML Component Name

112.9 XML Component Names

112.10 Use of Namespaces and Qualifiers

122.11 Version Management of Schema Element

122.12 Registration Of Schema Information

122.13 Publishing of Artifacts

132.14 Access to Registry / Repository

132.15 Classification of Artifacts

142.16 Storage of the knowledge embedded in a registered Data Dictionary

152.17 Discovery and Deployment of Schema Components


162.19 Community Authoring

172.20 XML Component Suitability

172.21 BCM Template

172.22 Use of CAM Templates

193Important Features

193.1 Phase 1

193.2 Phase 2

204Mapping of Metadata to the ebRIM

214.1 Direct Mapping to ebRIM

304.2 Mapping via CCTS

304.3 URIs

31Appendix E: Revision History

32Appendix F: Notices

1 Initial Proposal

Status: v0.4 draft

Editor: Paul Spencer

1.1 Overview

This proposal is for a proof of concept / showcase project to demonstrate how the ebXML registry, with suitable client applications, can meet the needs of governments for a data dictionary and XML schema registry / repository. The UK government has offered to be the trial site for the PoC.

1.2 Introduction

The UK Government has a two-tier approach to XML data dictionaries and XML schemas. At the top is the Government Data Standards Catalogue (http://www.govtalk.gov.uk/gdsc/html/) with its associated XML schemas. These are managed by the Office of the e-Envoy (OeE) through the UK GovTalk™ website (http:www.govtalk.gov.uk). The catalogue holds definitions used commonly throughout government. Below this, each branch of Government holds its own data dictionaries, some of which have XML schemas associated with them. There are no common standards used for the data dictionaries, but W3C XML Schema is the primary standard for schemas.

There is clearly a need for these dictionaries and schema repositories to be based on the same standards and to inter-operate to help the UK and other governments meet their interoperability aims.

1.3 Project Outline

Two branches of UK Government that already have data dictionaries and XML schema registries/repositories are the OeE and the Ministry of Defence (MOD). In neither case are these based around the ebXML registry. This project is therefore to produce a proof of concept (PoC) / showcase of how the ebXML registry and products that implement its specifications can be used, with suitable client applications, to meet the requirements of these two, and by extension, other, government organizations.

The UK Government and these two organizations have been chosen because:

1. they have existing data dictionaries / schema repositories and so have experience of their use;

2. a recent paper for the MOD described a set of requirements that can be used as basis for the PoC;

3. the MOD paper also outlined requirements for the OeE that can be expanded and confirmed; and

4. the UK Government and both organizations are willing to participate.

In both cases, it is important that the full requirements are met. This is likely to mean development of interfaces (such as that required to make the OeE information available via the UK GovTalk™ web site) and client applications (such as that required for the MOD's approval process).

High level requirements currently identified for the PoC are:

· it must hold the data currently held in the MOD ACCORD system and the OeE Government Data Standards Catalogue;

· it must hold XML schema representations of these items and relate them to their definitions;

· it must be possible to create schema documents from components held in the repository;

· it must be possible to hold multiple versions of schema components and complete schema documents;

· it must comply with the UK Government e-GIF standards (http://www.govtalk.gov.uk/schemasstandards/egif_document.asp?docnum=731);

· it must support the UK Government e-GMS metadata standard (http://www.govtalk.gov.uk/schemasstandards/metadata_document.asp?docnum=832) (it is likely that the requirement and format for serialization of the metadata will be reviewed as part of the project);

· it must support the existing processes for the approval, update and removal of entries;

· the solution must be scalable to at least 100,000 data items;

· it must be possible to integrate the registry and repository with others in the UK Government domain, the international military domain and other domains of interest;

· it should be possible to perform a "what-if" analysis, whereby the impact of a planned change or deletion can be assessed; and

· it should also be possible to identify unused definitions so that they can be purged.

1.4 Deliverables

Four deliverables are proposed:

1. A set of requirements to be met by the PoC

2. A description of the PoC

3. The products that prove that the concepts can be achieved

4. A final report on the project

1.5 Timescales

The duration of the project will be xxx months from starting the requirements paper. This will be divided as follows:

· xxx weeks to agree requirements

· xxx weeks to design the system(s)

· xxx weeks to implement the phase 1 systems

· xxx weeks to implement the phase 2 systems

· xxx weeks to populate and use the systems

· xxx weeks to produce a final report

1.6 Resources

We propose that the eGov TC sets up a sub-committee to run this project. Members of this sub-committee should be drawn from both supplier and client organizations.

The project will need developer effort that has not yet been identified.

Resource requirements and how they can be met must be identified as early as possible and confirmed at each stage of the project.

2 eGovernment – ebXML Registry Technical NoteGuidelines for e-Government Service use of ebXML Registry / Repository

Status: v0.1 draft

Editor: Carl Mattocks

The goal of this Section is to provide guidelines on how the standards being developed by the OASIS ebXML Registry, Business Centric Methodology and CAM TCs can help meet the needs of e-Government Service providers. The primary focus of the guidelines is to support a usage scenario that includes –

· Registration and storage of Schema Components used in many distinct Schemas

· Storage of the knowledge embedded in a registered Data Dictionary

· Use of Data Dictionary items when managing schema components

· Use of Registry / Repository ‘Context Declaration’ when managing schemas employing UN/CEFACT Core Components

· Use of a schema assertion facilities such as CAM (Content Assembly Mechanism) for binding structural, contextual and referential information to schema components

· Classification of EGSM, Data Dictionary Items, Schema Components, Context Declarations and Context Assembly Mechanism to facilitate discovery and deployment

2.1 Background

Within the OASIS open source specifications body there are a number of Technical Committee (TC) groups actively contributing to the evolution of e-Government service oriented standards. This technical note is focused on the specifications of the (i) the Business-Centric Methodology (BCM), (ii) the ebXMLRegistry and (iii) the Content Assembly Mechanism TCs that help explain how a Registry / Repository can be used for the management of schema components. Specifically, the goal of this Technical Note is to provide standards –based guidelines on the management of web service artifacts such as business language (nouns & verbs), commerce metadata elements and schema properties.

2.2 Service Centric Concepts

A major emphasis of BCM is that a proper interpretation of the business language semantics found in a SOA (Service Oriented Architecture) metadata framework / classification system is essential for harnessing tacit knowledge and facilitating shared communications. Particularly, the BCM identifies that a Conceptual Layer that enables the exploitation of community-of-interest specific classifications, e-business taxonomies and systemic patterns is a key factor in semantic interoperability. Further, the contents of that BCM Conceptual Layer must be rich enough to resolve all semantic (meaning & operability) conflicts over terminology used to populate the many building blocks of the Lubash Pyramid.

While not defining a mandatory structure BCM Version 1 states that the Conceptual Layer consists of semantic relationships and controlled vocabularies that increase the meaning of metadata and provide context to items that have metadata properties. The simplest form of this is a data dictionary that contains metadata about data elements and their relationship between simple and complex data types. BCM expects that when recorded in a registry the Conceptual Layer has the role of:

· Providing trace-ability from business vision to system implementation

· Ensuring alignment of business concepts with automated procedures

· Facilitating faster information utilization between business parties

· Enabling accurate information discovery and synchronization

· Expanding the ability to integrate information by interest, perspective or requirement.

2.3 Federated Content Management

The BCM also identifies that a registry combined with a repository is a key factor in the management of service-oriented components. Such as, metadata about schemas, data elements, their associative links and any stored artifacts. Wherein, a registry not only acts as an interface to a repository of stored content, it formalizes how information is to be registered and shared. Since, this may beyond a single enterprise or agency, this dictates that the registry catalog must be capable of supporting metadata used for federated content management.

Specifically, a federated content management capability is required when there is as a need for managing and accessing metadata across physical boundaries in a secure manner. Those physical boundaries might be the result of community-of-interest, system, department, or enterprise separation. Irrespective of the boundary type, federated content management enables information users to seamlessly access, share and perform analysis on information. Which may include:

· Map of the critical path of information flowing across a business value chain

· Quality indicators such as statements of information integrity, authentication and certification

· Policies supporting security and privacy requirements

2.4 EbXML Registry Version 3

The EbXMLRegistry is a registry plus a repository. Version 3 of the ebXML Registry / Repository supports the following types of cooperating registry services

· Registration and classification of any type of object

· Objects defined by data type

· Namespaces defined for certain types of content

· Messages defined as XML Schemas

· Taxonomy hosting, browsing and validation

· Association between any two objects

· Registry packages to group any objects

· Links to external content

· Built-in security

· Event notification

· Event-archiving – enabling the production of a complete audit trail

· Service registration and discovery

· Life cycle management of objects

· Flexible query options

Note: For inter-registry relocation, replication, references - federation metadata is stored in one registry; a registry may cooperate with multiple federations for the purpose of federated queries, but not lifecycle mgmt.

2.5 e-Government Service Requirements

A key objective of e-government service management is to achieve common understanding between the customer and provider through managing service level expectations and delivering and supporting desired results. Which in turn requires a common understanding of the elements, which make up those services. To achieve this using a Registry / Repository it is considered that each registered e-Government Service Metadata (EGSM) artifact should be capable of conveying the following information:

· An XML schema may be derived or expressed from the EGSM artifact, yet the EGSM artifact must not preclude other formats of instance data from being used within an operational system in the future.

· The EGSM artifacts shall be readable by both humans and application actors within an infrastructure and that the applications shall be able to consistently derive structure from the EGSM artifacts.

· The EGSM artifacts can explicitly point at or otherwise reference a UML or other modeling artifact via a variety of protocols (examples – HTTP/S, LDAP, FTP).

· The e-Government Service Metadata shall have a binding to a set of RIM metadata and/or shall minimize replication of Registry meta-metadata instances except where required for data portability.

· The e-Government Service Metadata shall not constrain the final representation in any way, yet must be capable of facilitating multiple implementation serializations syntax bindings) as represented via the UN/CEFACT core components technical specification diagram.

· The EGSM artifact shall be capable of conveying semantics of registered Data Dictionary Data elements.

· The EGSM artifact must be in a format capable of expressing multi-byte character encoding such as UTF-16 in order to facilitate internationalization.

· The EGSM artifact must be capable of being transformed easily into other EGSM artifact formats (such as the UN/CEFACT ATG2 Core Components/Business Information Entities Meta-metadata format.)

· The EGSM artifact must be capable of declaring semantic equivalencies to other existing metadata objects. This is a requirements based on an understanding that integration with existing systems will be essential.

· The EGSM artifact must be capable of containing an intrinsic relationship to context declarations in order to facilitate the above requirements, possibly in addition to the registry relationships expressed within a registered data dictionary, ebXML RIM and ISO/EIC 11179 parts 1-5.

· The EGSM artifact must facilitate both basic (atomic) Data Elements as well as more complex aggregates. The aggregates to be designated as UN/CEFACT aggregate core components (ACCs) and represented as aggregate business core components using XML schema.

· The EGSM artifact should be written in a way so programmers can write implementations, yet if the EGSM ARTIFACT model changes, the implementations will not be broken. This is referred to as forwards compatibility.

2.6 Schema Component Definitions

At a business level, the primary function of XML is to provide a meta-language for rigorously specifying the syntax of information exchange. Since information exchange involves multiple parties (at a minimum one sender and one receiver), XML specifies agreements between parties within a community of interest for a particular domain of information. XML itself does not require or provide a mechanism for defining semantics (precisely what is meant by a particular term); however, to achieve interoperability, both the syntax and semantics must be explicitly defined. The process of selecting proper component names and reaching agreements on the definitions is primarily a business function of XML and MUST involve all stakeholders.

The terms (XML) schema and (XML) schema document are often used interchangeably to refer to XML documents containing schema elements expressed in XML as described in the W3C Recommendation. There is also a more precise technical meaning for schema, as the exact abstract data structure required to schema-validate an element of an XML document (this is described in detail in the W3C XML Schema Recommendation Part 1). For the purposes of this document, schema is normally used loosely, to mean a schema element within an XML document. The term schema document is used to mean an XML document containing one or more schema elements.

EbXMLRegistry schema component management involves using a Registry / Repository for the registering and storage of schema elements, XML documents and related artifacts. It specifically includes the tasks of:

· registering proposed schema components as drafts;

· reviewing proposed schema components;

· registering approved schema components;

· discovering schema components;

· assembling complete schemas from components; and

· managing the lifecycle of the components and schemas

2.7 Registration and Storage of EGSM Schema Components

To meet the need of common understanding every registered schema MUST contain the following metadata:

· Schema Name

· Namespace(s)

· A description of the purpose of the schema

· The name of the application or program of record that created and and/or manages the schema

· The version of the service application or program of record

· A short description of the service application interface that uses the description. A URL reference to a more detailed interface description may be provided

· Developer point of contact information to include activity, name and email

2.8 Schema XML Component Name

This section provides guidance on use of the registry, and is non-normative.

To maximize understanding and facilitate automated analysis of schema components during harmonization efforts the selection of XML component names MUST be a thoughtful process involving business, functional, data and system subject matter experts. Use of ISO 11179 conventions is encouraged. For instance, XML components MAY be named after ISO 11179 data element names: XML Elements SHOULD be named after ISO 11179 data element definitions when business terms do not exist. XML Attributes SHOULD be named after ISO 11179 data elements. XML Schema data types MUST be named after ISO 11179 data elements.

Specifically, ISO 11179 part 5 provides a standard for creating data elements. This standard employs a dot notation and white space to separate the various parts of the element and multiple words in a part respectively. In order to meet XML requirements for component naming, the ISO 11179 name must be converted to a Name Token. The ISO 11179 part 5 standard provides a way to precisely create a data element definition and name. Using or referencing this name in a schema provides analysts with a better understanding of XML component semantics, while using business terms as element names improves readability.

2.9 XML Component Names


Authors creating new elements SHOULD follow the ebXML guidance for usage of acronyms or abbreviations in XML component names with the following caveats. Acronyms and abbreviations SHOULD generally be avoided in XML element and attribute names. For XML Schema data types, abbreviations MUST be avoided while acronyms MAY be used consistent with the rest of this guidance. When acronyms are used they MUST be in upper case. Abbreviations SHOULD be treated as words and expressed in upper camel case. The decision to use an acronym or abbreviation MUST be based on the belief that its use will promote common understanding of the information both inside a community of interest as well as across multiple communities of interest. When an acronym or abbreviation does not come from a credible, identifiable source or when it introduces a margin for interpretation error, it MUST NOT be used.

Acronyms and abbreviations used in component names MUST be spelled out in the component definition that is required to be included via schema annotations (as XML comments or inside XML Schema annotation elements) References to authoritative sources from which the acronyms or abbreviations are taken SHOULD also be included in schema documentation

2.10 Use of Namespaces and Qualifiers


When creating a namespace it is recommended that authors use a qualifier (a prefix - normally xsd: or xs: ) for the XML Schema namespace. This makes the usage of namespaces more explicit, and allows schema designers more flexibility in using namespace within the schema. http://www.govtalk.gov.uk/documents/Schema%20Guidelines%202.doc

Make the defaultNamespace for the schema the same as the targetNamespace. This allows architectural schemas with no namespace to be included without causing namespace problems.

Use a suitable qualifier for other namespaces.

Set elementFormDefault to qualified and attributeFormDefault to unqualified. This ensures that the user of a schema does need to understand its internal structure.

2.11 Version Management of Schema Element

The version management capabilities of the Registry / Repository enable three issues of XML management to be addressed:

· proposing and approving XML data types and elements;

· version management of XML data types; and

· assembling data types into schemas for message types.

2.12 Registration Of Schema Information

The following high-level diagram shows the relationship between registry and repository when managing XML schemas and documents support the schemas.

metadata

processor

repository

registry and

indexed metadata

schema &

supporting docs

Fig 1 - registration

2.13 Publishing of Artifacts

In terms of publishing content the ebXML Registry / Repository specification supports:

· publishing to a central registry / repository; or

· publishing to a federation of many individually many registry / repository faculties.

Note: There are therefore two basic models of distributed information - a central repository of shared items, with individual public sector organizations uploading and downloading as required or a fully distributed model with the repository distributed over multiple facilities (a local and many remote).

2.14 Access to Registry / Repository

EbXML Registry specification supports a single access to many federated Registry / Repository facilities. Thus, it allows:

· logical duplication of remote federated repository items into a local federated repository to fit into local policies of information management; or

· aggregation of artifacts in the remote federated repository for creating locally defined components; or

· access to any and all federated repository items as required.

2.15 Classification of Artifacts

To ease discovery and deployment of artifacts the ebXMLRegistry RIM explicitly supports many Classification Schemes. Currently ebXML Registry allows content to be classified using a ClassificationNode within a ClassificationScheme.

The classification scheme identified within the context of ISO 11179 and ebXML

provides for a number of uses:

· Find a single element from among many

· Analyze data elements

· Convey semantic content that may be incompletely specified by other attributes

· such as names and definitions

· Derive names from a controlled vocabulary

· Disambiguate between data elements of varying classification power:

Note:

The basic flow consists of:

1. Schema author publishes schema components

2. Schema author classified schema components using a class reference within a Classification

2.16 Storage of the knowledge embedded in a registered Data Dictionary

It is assumed that a typical Data Dictionary contains between 4000 entries and 100,000 entries. The concepts embedded in Data Dictionary Elements may be sourced from many different contributors. One source the may be the synonymous Business Information Entities used for Core Component developments. The key difference being that UN/CEFACT CCWG Core Component is envisioned as a global set of business collaborations vs. the typical local Data Dictionary has been scoped solely for a particular domain. The following naming rules may also be applied to the management of Data Dictionary Elements;

· The Dictionary Entry Name shall be unique and shall consist of Object Class, a Property Term, and Representation Type.

· The Object Class represents the logical data grouping (in a logical data model) to which a data element belongs” (ISO 11179). The Object Class is the part of a core component’s Dictionary Entry Name that represents an activity or object in a context.

· An Object Class may be individual or aggregated from core components. It may be named by using more than one word.

· The Property Term shall represent the distinguishing characteristic of the business entity. The Property Term shall occur naturally in the definition.

· The Representation Type shall describe the form of the set of valid values for an information element. If the Representation Type of an entry is “code” there is often a need for an additional entry for its textual representation. The Object Class and Property Term of such entries shall be the same. (Example : “Car. Colour. Code” and “Car. Colour. Text”).

· A Dictionary Entry Name shall not contain consecutive redundant words. If the Property Term uses the same word as the Representation Type, this word shall be removed from the Property Term part of the Dictionary Entry Name. For example: If the Object Class is “goods”, the Property Term is “delivery date”, and Representation Type is “date”, the Dictionary Entry Name is ‘Goods. Delivery. Date’. In adoption of this rule the Property Term “Identification” could be omitted if the Representation Type is “Identifier”. For example: The identifier of a party (“Party. Identification. Identifier”) will be truncated to “Party. Identifier”.

· One and only one Property Term is normally present in a Dictionary Entry Name although there may be circumstances where no property term is included; e.g. Currency. Code.

· The Representation Type shall be present in a Dictionary Entry Name. It must not be truncated.

· To identify an object or a person by its name the Representation Type “name” shall be used.

· A Dictionary Entry Name and all its components shall be in singular form unless the concept itself is plural; e.g. goods.

· An Object Class as well as a Property Term may be composed of one or more words.

· The components of a Dictionary Entry Name shall be separated by dots followed by a space character. The words in multi-word Object Classes and multi-word Property Terms shall be separated by the space character. Every word shall start with a capital letter

· Non-letter characters may only be used if required by language rules.

· Abbreviations, acronyms and initials shall not be used as part of a Dictionary Entry Name, except where they are used within business terms like real words; e.g. EAN.UCC global location number, DUNS number

· All accepted acronyms and abbreviations shall be included in an ebXML glossary


It is recognized that the classification approach employed must support the discovery and deployment of schema components in a target namespace relating to the project for which the schema is being developed. The stages of deployment include:

· Search for suitable components

· Develop new components

· Develop the structure of the new schema centric documents/messages

· Register the new schema components and documents/messages

· Notify users of new versions of components that they are using

· Identify users of obsolescent components

· Remove obsolete components


It is recognized that the classification approach employed must support the discovery and deployment of schema components in a target namespace relating to the project for which the schema is being developed. The stages of deployment include:

· search for suitable components

· develop new components

· develop the structure of the new schema centric documents/messages

· register the new schema components and documents/messages

· notify users of new versions of components that they are using

· identify users of obsolescent components

· remove obsolete components

2.19 Community Authoring

Given that artifacts, such as, schema components and dictionary entries often need to be developed collaboratively by a group of geographically dispersed domain experts.

· Each Domain Experts creates a different xml component / dictionary entry

· Each Domain Expert may review the xml component / dictionary entry produced by others.

· Each Domain Expert may edit a xml component / dictionary entry that they or another Domain Expert created with appropriate access control.

Basic Flow :

1. Domain expert #1 publishes a xml component / dictionary entry

2. Domain expert #2 publishes another xml component / dictionary entry and connects it to first xml component / dictionary entry

3. Domain expert #1 and #2 review each others xml component / dictionary entry associations

4. Domain expert #1 and #2 edits xml component / dictionary entry to address comments or fix errors

2.20 XML Component Suitability

Given that authors wish to only develop xml components when they are needed it is recommended that new components are only created when (1) Suitable xml components do not exist, (2) Existing xml components do not suffice or are not appropriate for the intended application Therefore, the ebXMl Registry MUST be searched for existing suitable components prior to creation of new components. There are three possible results for this search. Components may be fully or partially suitable, or no component may be found. A component may considered suitable if:

· It satisfies the element domain requirements,

· It is in upper/lower camel case depending on whether it is an element, attribute or type,

· Is either named after a “business term”, or conforms to ISO 11179 conventions and

· Abbreviations and acronyms are spelled out in the component definition

2.21 BCM Template

Following on from the template definitions in the business layers, the BCM method proceeds first to establish the templates of a collaboration agreement and optionally a traditional memorandum of agreement (item 4 in 5.3.1). Once the collaboration is agreed, then the associated information exchanges to implement that collaboration can be defined (items 8, 9, 10 in 5.3.1). The information transactions require careful detailing of the semantics. There are verbs, nouns, roles, rules and message structures to quantify. In traditional software development this is the place most people begin. The question frequently asked is “do we have a XML schema to use?” with the assumption that if so then the participants are ready to start exchanging XML conforming to the schema and facilitating eBusiness. In order to engage in effective information exchanges and especially across an industry group with multiple participants, experience has shown and the BCM expects a greater depth of semantic knowledge than a simple schema provides. Conversely an OASIS CAM template definition provides the entire noun, verb and context semantics for complete transaction management including integration into a registry vocabulary dictionary without the need for highly specialized software.

2.22 Use of CAM Templates

The Schematron is a language and toolkit for making assertions about patterns found in XML documents. It can be used as a friendly validation language and for automatically generating external annotation (links, RDF, perhaps Topic Maps). Because it uses paths rather than grammars, it can be used to assert many constraints that cannot be expressed by DTDs or XML Schemas.The Content Assembly Mechanism employs templates to bind structural, contextual and referential information to schema components. In order to allow dynamic assignments of context to a Schema Component instance the CAM may be used . The figure below provides an outlines how those information facets maybe brought together for a ‘reliable Messaging System ‘.

XML

business information

Schema

Delivery

Assembly

Schema:

Content structure definition

and

simple content typing

Content Assembly:

Business logic for

content structure decisions and

explicit rules to enforce content,

and interdependencies, with

business exchange context,

and content definition

cross

-

references via

UID

associations

Secure Authenticated Delivery and Tracking:

Reliable Messaging system, envelope format and payload with exch

ange participant profile controls

UID

content referencing system

ensures consistent definition usage

UID

Registry/

Dictionary

UID

–

Universal ID content referencing system

values

–

comprise of domain prefix, six digit integer, optional version,

sub

-

version.

3 Important Features

Status: v0.1 draft


This is an attempt to highlight as a set of bullet points the major features required for the registry/repository at the proof of concept stage. It excludes the features inherent in the registry (such as version control, user notification etc) that are assumed to be included.

Most benefit will be gained by an early release. I have therefore split this into two phases.

3.1 Phase 1

1. The ability to enter schemas and the associated metadata into the registry/repository.

2. The ability to enter schema components (global data types, elements and attributes) and associated metadata into the registry/repository.

3. The ability to enter other document types with associated metadata into the registry/repository.

4. The ability to hold schema definitions in a syntax-independent manner (e.g. as defined in CCTS?). Effectively, this means that, for schema components, sufficient information must be held in the registry to create the component from metadata, although the ability to create the components will not be included..

5. The metadata to be supported will vary according to the three document types (schema, schema component or other) and will be a subset of that defined in the UK e-GMS plus the additional requirements of point 4.

6. The ability to search on certain metadata information and extract all matching schemas, components or other documents.

7. The ability to construct schemas from components.

3.2 Phase 2

8. The ability to interoperate between registries.

9. The ability to add MOD-specific metadata.

4 Mapping of Metadata to the ebRIM

Status: v0.1 draft


It is not clear whether it is best to map metadata elements directly to the ebRIM for Government Use or go through a CCTS mapping as an interim stage. We are therefore trying both. See the email from Carl Mattocks:

… Since, the CCTS approach is still in its infancy, I predict that the task will take longer than we would like. Therefore, I propose that to make the best of the situation, we do -

(1) a FULL direct to ebRIM mapping for the selected e-GMS metdata subset

AND

(2) a FULL mapping to CCTS (and then using CCRIM) for the selected e-GMS metadata subset

AND

(3) document the results of BOTH in the Technical Note. ...

and let the reader be aware they have a choice.

Agreed - it would be useful to publish a sample of (2) above. Hopefully, this can be done in a couple of weeks.

This is the approach we are taking.

4.1 Direct Mapping to ebRIM

Notes:

1. Some names are abbreviated and shown with an ellipsis. See the e-GMS for full names.

2. Where a refinement name starts with the name of its parent item, the parent name has been omitted for brevity. See the e-GMS for full names.

3. The last three columns indicate whether the metadata item is to be supported for schema documents, schema components and other document types. The codes used are:

a. Mmandatory

b. MAmandatory if applicable

c. Rrecommended

d. RArecommended if applicable

e. Ooptional

f. n/anot required

The code is in bold if the metadata item is be supported in the PoC.

4. The usage in schema documents is based on the e-GMS local metadata standard - XML schemas version 3 (draft) and an email from Maewyn Cumming to Paul Spencer on 2004-06-21. It is still under discussion, but the values here should be used for initial implementation.

5. The columns for schema components and other document types are to be completed.

UK e-GMS

Enumeration

Mapping to RIM

Autogenerate

Schema Docs

Schema Comps

Other

Accessibility

n/a

n/a

Addressee

n/a

n/a

Aggregation

this might be modelled through RIM Associations. It is relevant if some document is part of a larger collection.

O

Audience

n/a

n/a

Contributor

Association to Person (User for now) or Organization with associationType “Contributor”

MA

Coverage

MA

Coverage. Spatial

Classification using chosen GEO ClassificationScheme

RA

Coverage. Temporal

Must support

RA

Creator

Association to Person (User for now) or Organization with associationType “Creator”

M

Date

Must support

O

Date. Acquired

n/a

Date. Available

n/a

Date. Created

2003-04-06

R

Date. Cut-off

n/a

n/a

Date. Closed

n/a

n/a

Date. Accepted

n/a

n/a

Date. Copyrighted

O

Date. Submitted

n/a

Date. Declared

n/a

Date. Issued

yes

MA

Date. Modified

yes

M

Date. NextVersionDue

O

Date. UpdatingFrequency

O

Date. Valid

MA

Description

Description

O

Description. Abstract

n/a

Description. TableOfContents

n/a

DigitalSignature

n/a

n/a

Disposal

n/a

n/a

Disposal.AutoRemoveDate

n/a

n/a

Disposal. Action

deprecate, remove, archive

value is String or id of a Action ClassificationNode?

O

Disposal. AuthorisedBy

O

Disposal. Comment

O

Disposal. Conditions

O

Disposal. Date

O

Disposal. ExportStatus

O

Disposal. Review

O

Disposal. ReviewerDetails

value is id of User

O

Disposal. ScheduleID

O

Disposal. TimePeriod

O

Format

for schemas & comps: text/xml

This should probably be supported as an alternative to the refinements below that are not included in the e-GMS v3. For schemas, this would always have the value "Text/http://www.w3.org/2001/XMLSchema" and so could be autogenerated when serialising metadata.

yes (for schemas & comps)

M

Format. Extent

n/a

Format. Medium

n/a

Identifier

ExternalIdentifier

M

Identifier.BibliographicCitation

Identification ClassificationScheme BibliographicCitation

n/a

Identifier. CaseID

Identification ClassificationScheme CaseID

n/a

Identifier. FileplanID

Identification ClassificationScheme FilePlanID

n/a

Identifier. SystemID

Identification ClassificationScheme SystemID

n/a

Language

This could be an enumeration of the ISO 639-2/B language codes using the UBL codelist format, but I would leave it as a slot for now.

R

Location

n/a

n/a

Mandate

n/a

n/a

Mandate.AuthorisingStatute

n/a

n/a

Mandate. DataProtection…

n/a

n/a

Mandate. PersonalData…

n/a

n/a

Preservation

n/a

n/a

Preservation.OriginalFormat

n/a

n/a

Publisher

Association to Person (User for now) or Organization with associationType “Publisher”

?

M

Relation

Association with associationType matching refinement for Relation. Relation can be used without refinements (for example to link to supporting documents).

n/a

n/a

Relation. ConformsTo

http://www.w3.org/2001/XMLSchema

Must Support

M

n/a

Relation. HasFormat

Must Support

MA

n/a

Relation. HasVersion

Must Support

MA

n/a

Relation. HasPart

Association with associationType “HasPart”

yes (for schemas)

MA

n/a

Relation. IsDefinedBy

Must support*

MA

n/a

Relation. IsFormatOf

n/a

n/a

n/a

Relation. IsPartOf

Association with associationType “IsPartOf”

MA

n/a

Relation. IsReferencedBy

n/a

n/a

n/a

Relation. IsReplacedBy

Must support*

MA

n/a

Relation. IsRequiredBy

Must support

n/a

n/a

Relation. IsVersionOf

Must support

MA

n/a

Relation. ProvidesDefinitionOf

Association with associationType “ProvidesDefinitionOf”

MA

n/a

Relation. ReasonForRedaction

n/a

n/a

n/a

Relation. Redaction

n/a

n/a

n/a

Relation. References

n/a

n/a

n/a

Relation. Requires

Association with associationType “Requires”

yes (for schemas)

MA

n/a

Relation. Replaces

Must support*

MA

n/a

Relation. SequenceNo

n/a

n/a

n/a

Rights

n/a

Rights. Copyright

O

Rights. Custodian

value is id of a User or SubjectRole or SubjectGroup

O

Rights. Descriptor

n/a

n/a

Rights. DisclosabilityTo…

n/a

n/a

Rights. DPADataSubject…

n/a

n/a

Rights. EIRDislosability…

n/a

n/a

Rights. EIRExemption

n/a

n/a

Rights. FOIADisclosability…

n/a

n/a

Rights. FOIAExemption

n/a

n/a

Rights. FOIAReleaseDetails

n/a

n/a

Rights. FOIAReleaseDate

n/a

n/a

Rights. GroupAccess

n/a

n/a

Rights. IndividualUser…

n/a

n/a

Rights. LastFOIA…

n/a

n/a

Rights. PreviousProtectiveMarking

n/a

O

Rights. ProtectiveMarking

Could we leave this in until I have spoken to Maewyn. I think the MOD will want this.

O

Rights. ProtectiveMarkingChangeDate

n/a

O

Rights. ProtectiveMarkingExpiryDate

n/a

O

Source

n/a

n/a

Status

Must Support* This seems to complement the RIM status, and could be a qualifier added to that.

O

Subject

n/a

Subject. Category

Uses Government Category List

M

Subject. Keyword

Multiple values for each Slot or multiple slot one per keyword?

O

Subject. Person

n/a

n/a

Subject. ProcessIdentifier

O

Subject. Programme

O

Subject. Project

O

Title

M

Title. AlternativeTitle

n/a

n/a

Type

[empty string]

message

architectural

element

type

M

refinements of Type

n/a

n/a

4.2 Mapping via CCTS

Awaiting information from Carl

4.3 URIs

When mapping to the ebRIM, URIs are used as identifiers in various places. The two types of URI usually used in such cases are URNs (e.g. urn:gov:uk:egms:date) and URLs (e.g. http://www.govtalk.gov.uk/terms/copyrighted). In general, OASIS prefers the use of the URN.

However, the e-GMS is based on Dublin Core, which uses URLs to specify metadata names. This is an extract from an email from Maewyn Cumming (2004-06-18):

We had thought about this for the e-GMS application profile and used the format http://www.govtalk.gov.uk/terms/accessibility for each element, refinement etc. This follows the Dublin Core model, and is what we have put into the AP (though with the caveat that none of these URLS actually work yet). I'd like to keep following the same format.

In discussion, it was agreed that a refinement would use an additional oblique, such as http://www.govtalk.gov.uk/terms/date/created.

This is the format to be used for e-GMS metadata but does not constrain the format for other types of metadata.

Appendix E: Revision History

Rev

Date

What

0.1a

2 July 2004

First draft to pull together some existing documents.

0.1b

6 July 2004

Additional column added to direct mapping table to indicate which metadata items are to be auto-generated by the registry. Other minor changes to table.

Appendix F: Notices

OASIS takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on OASIS's procedures with respect to rights in OASIS specifications can be found at the OASIS website. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification, can be obtained from the OASIS Executive Director.

OASIS invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to implement this specification. Please address the information to the OASIS Executive Director.

Copyright © OASIS Open 2002. All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself does not be modified in any way, such as by removing the copyright notice or references to OASIS, except as needed for the purpose of developing OASIS specifications, in which case the procedures for copyrights defined in the OASIS Intellectual Property Rights document must be followed, or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be revoked by OASIS or its successors or assigns.

This document and the information contained herein is provided on an “AS IS” basis and OASIS DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

�PAGE \# "'Page: '#'�'" ��EML uses Schematron to define context-sensitive rules, and this works well. I don't want to exclude CAM (which I think is Content Assembly Mechanism), but we should be able to link Schematron artifacts as well.

�PAGE \# "'Page: '#'�'" ��I have made these up to allow a classification. Do they seem reasonable? Or is free text better?

�PAGE \# "'Page: '#'�'" ��I don't think so … But we probably don't need it anyway.

32

33

_1081757678.vsd

_1104663501.bin

OASIS€¦ · Web viewThis proposal is for a proof of concept / showcase project to demonstrate...

Documents

Transcript of OASIS€¦ · Web viewThis proposal is for a proof of concept / showcase project to demonstrate...