RDA Data Foundation and Terminology (DFT) WG

12
RDA Data Foundation and Terminology (DFT) WG Prepared for 3 rd Plenary DFT WG Sessions 1 & 2, March 27, 2014 Co-Chairs DFT WG: Gary Berg-Cross, Raphael Ritz, Peter Wittenburg European Early Career Researchers & Scientists working with Data Scribe: Reko Hynönen ., University of Helsinki, Finnish meteorological institute PID +prop BitSequence PID Bit seq* MD +prop transfer replication cksm extension A PID record that points to a metadata record and to instantiations of identical bit-streams that may store additional attributes

description

RDA Data Foundation and Terminology (DFT) WG. Prepared for 3 rd Plenary DFT WG Sessions 1 & 2, March 27, 2014 Co-Chairs DFT WG: Gary Berg-Cross, Raphael Ritz, Peter Wittenburg European Early Career Researchers & Scientists working with Data Scribe : - PowerPoint PPT Presentation

Transcript of RDA Data Foundation and Terminology (DFT) WG

Page 1: RDA Data Foundation and  Terminology (DFT) WG

RDA Data Foundation and

Terminology (DFT) WG

Prepared for 3rd Plenary DFT WG Sessions 1 & 2, March 27, 2014Co-Chairs DFT WG: Gary Berg-Cross, Raphael Ritz, Peter Wittenburg

European Early Career Researchers & Scientists working with Data Scribe:

Reko Hynönen., University of Helsinki, Finnish meteorological institute

PID+prop

Bit Sequence

PID

Bit seq*

MD+prop MD

________

transfer

replicationcksm

extensiontransfer

A PID record that points to a metadata record and to instantiations of identical bit-streams that may store additional attributes

Page 2: RDA Data Foundation and  Terminology (DFT) WG

Outline for Sessions 1 (1100-1230) & 2 (1330-1500)Session 1 (90m)0. BRIEF Intros 5?1. Short Background session -5 2. Where we are now - 5 3. Where we think we are going -2 4. Use Cases

1. Wittenburg & Ritz -102. Reagan Moore 3. Hans Pfeiffenberger - 18

5. Discussion of 3 Core Area -501. Data and Digital Objects & Entities 2. Persistent Identifier / PID Record / PID Attribute /

PID Resolution / Reference Resolution3. Aggregation / Collection / Data Set / Corpus /

Container….

Session 2 (90m)1. Follow up to Session 1 -102. Discussion of 7 Core Areas – 70

• Bit Stream / Instances of Bit Stream / Data Stream

• Identity / Integrity / Authenticity• Object Property / Object Attribute / Property

Record / Internal Property / External Property• Data Organization / Data Model• Repository / Repository of Origin• Data / Realtime Data / Gappy Data / Dynamic

Data• Data LifeCycle (as time permits)

3. Wrap up and Next Steps -10

Page 3: RDA Data Foundation and  Terminology (DFT) WG

Background - DFT GoalsSee Case Statement Briefing(https://rd-alliance.org/filedepot/folder/100?fid=255)

Describe a basic, abstract (but clear) data organization model that systemizes the already large body of definition work on data management terms, especially as involved in RDA’s efforts.• The model and its derived reference data should be sound, practical

and agreed to within the community for use:1. across communities and stakeholders to better synchronize data

conceptualization, 2. to enable better understanding within and between communities, and3. to stimulate adopters & tool building, such as for data services,

supportive of the basic model’s use. • Need to get the story straight on model to govern the use of related tools.

Models & Candidate ListEvolves toRefined List

Cross WGs

DFT WG Discussion& Plenary 3

Future Work 2015?

Page 4: RDA Data Foundation and  Terminology (DFT) WG

Five Stage ProcessData Foundation and Terminology (DFT) Vocabulary Development Process

by Gary Berg-Cross

1. Start up/Scoping Requirements analysis and development of candidate list1. Tool prototyping

2. Vocabulary Analysis & Revision Process (after 2nd Plenary)1. Tool demo and final requirements at 3rd Plenary2. Show Core vocabulary in table form for discussion

3. Focused Vocabulary Design Process and Community Agreement (at and after 3rd Plenary)

4. Refinement & Maintenance (ongoing)5. Draft Vocabulary Publication and Review (4th Plenary)

Page 5: RDA Data Foundation and  Terminology (DFT) WG

Overview of Term Development

Starter areas and items :Persistent Identifiers (PIDs and types)Digital Object - Data ObjectCollection - Data Set - Aggregation

Repository (Registries and related Policies)

ScopeTerms fromModel PapersPlaced In Tool Digital

Object A digital object is composed of structured sequence of bits/bytes. As an object it is named. This bit sequence can be identified & accessed by a unique and persistent identifier or by use of referencing attributes describing its properties.

Defs & Refinement

Analysis and Revision Process

Getting Defs organized for review

Term Definition Tool prototyped and developed at Rechenzentrum Garching (RZG) der Max-Planck-Gesellschaft

Page 6: RDA Data Foundation and  Terminology (DFT) WG

Latest Version of Term Definition Tool (TeD-T?)

Page 7: RDA Data Foundation and  Terminology (DFT) WG

Term Definition Exampledigital entity: An entity represented as, or converted to, a machine-independent data structure consisting of one or more elements in digital form that can be parsed by different information systems; the structure helps to enable interoperability among diverse information systems in the Internet.From Framework for discovery of identitymanagement information

Alternative?

This page was last modified on 9 December 2013, at 14:03.

Revision Discussion: : This definition does not refer to our practice and is not specific enough. A digital object can cover different types of digital information such as data, software, knowledge etc. So we should separate data and other types of digital information. Also the reference to databases is not useful enough since there are many types of “containers” data is in - the term “database” does not help us since it refers to any type of container. And in DFT we need to stress the fact that a DO is something that has an identity one can refer to, that has a number of properties that can be accessed etc. Peter

Page 8: RDA Data Foundation and  Terminology (DFT) WG

PID Term and DiscussionDiscussion on email and Tool (http://smw-rda.esc.rzg.mpg.de/index.php/Talk:Persistent_Identifier_(PID)

• We should emphasize that persistence is not purely technical, which is a point I think John Kunze in particular would agree to - there's social contracts associated with the idea of persistence. If you don't put those policies in place, persistence is undefined at best. Which, on 2nd thought, also means that not just the resolution service is persistent, but also the association between identifier and target object. Which is a contract probably put on the shoulders of the agent requesting the PID in the first place, because the service will be unable to decide/maintain this.-- Tobias

• Tobias, you have evoked a few things such as PID Service (need to include this as a term). So should we have defs with the idea of a Contract by Agent as part of the metadata for a PID? Assertions: PID Requesting Agent (sub-type of Agent) contracts to maintain connection (definition?) between ID & Target Object. TO has contract. PID service is a service.– Gary

• The PID Service and the PID System might be the same thing in reality. One diff may be that the PID System is maintaining a Resolution Service, while the PID Service is the entity with which the contract is made. Each PID Service employs a PID System. Each PID System can be employed by several PID Services.

• Example for a PID Service: DataCite• Example for a PID System: The DOI System• Example for a Resolution Service: 2a00:1a48:7805:112:2c13:65be:ff08:2e89 - better known as dx.doi.org

• (In reality, there really is a contract between e.g. DKRZ and DataCite; so this seems adequate) TobiasWeigel (talk) 09:01, 10 December 2013 (UTC)

Page 9: RDA Data Foundation and  Terminology (DFT) WG

Conceptual Spaces

digitalobject

bit stream

instance of a bit stream

serviceobject

informationalobject

aggregation

dataobjectis_a

is_a

is_ais_part_of

has_a

has_many

collection

is_a

metadatarecord

PIDrecord

has_a has_a

is_a

datastream

is_equal

data set

is_equal

corpusis_equal

attribute

has_a

property

contains_a

Peter’s Original

Refinements

Page 10: RDA Data Foundation and  Terminology (DFT) WG

Status & Plan Going Forward• We now have a table of Core Terms with some initial Definitions• Some are also in the Tool as examples - some still being updated.• The P3 meeting represents an opportunity to take stock and do some editing, testing of

ideas and refining as well as strategize on next steps.• Can we get some sense of agreement and where issues are for the WG-Core???

• Work at 3rd Plenary• Document status• Tool and Demo• Discussion of working Core, getting buy-in and next steps

• Schedule Note– “we will never to ‘done’ “ with the topic, but the WG will complete its targeted mission.

“Unity, not uniformity, must be our aim. We attain unity only through variety. Differences must be integrated, not annihilated, nor absorbed.” Mary Parker Follett , The New State, 1918, p.39.

Page 11: RDA Data Foundation and  Terminology (DFT) WG

Checklist of Issues/Powerful Questions - What is Needed for DFT Term Progress?

• Ramp up of effort by DFT WG Community• Review of table, categories and definition refinement• Confirmation of scope of work• How do we handle points of contention?• What is the process by which we converge and move to adoption?

• Training in and exposure of Term Tool (Demo tomorrow)• Use by other WGs for their needs

• Is our table example useful as a model for them?

• Further test of Use Case Scenarios (as presented at the P3 DFT WG)• Are they useful?• Do they need to be adapted or drilled down to more detail?• Do we need examples of term-concepts involved with real data (such as Reagan’s)?

Page 12: RDA Data Foundation and  Terminology (DFT) WG

Today’s Sessions- A focus on the following terms / term

1. Data / Realtime Data / Dynamic Data2. Digital Object / Data Object / Information Object/Representation Object3. Bit Stream / Instances of Bit Stream / Data Stream4. Identity / Identify Management/Integrity / Authenticity5. Object Property / Object Attribute / Property Record / Internal Property / External

Property6. Persistent Identifier / PID Record / PID Attribute / PID Resolution / Reference Resolution7. Data Organization / Data Model8. Repository / Repository of Origin9. Aggregation / Collection / Data Set / Corpus / Container/ Gappy Data / 10. Data LifeCycle & Operations

we need to make a quantum job in defining a few terms. We need to argue from the different data models/organizations that were presented and of course also look what others have done. Our Core clusters: