Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko,...

33
Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd

Transcript of Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko,...

Page 1: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories

Cynthia Chandler, Bob Arko, Adam Shepherd

Page 2: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

2

BCO-DMO Biological and Chemical Oceanography Data

Management Office (WHOI) Curation of marine ecosystem system data

contributed by NSF funded investigators R2R

Rolling Deck to Repository Curation of routine, underway data from US

academic fleet, and authoritative expedition catalog Members of Marine Data Harmonization IG

US Ocean Science Domain Repositories

Page 3: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

3

Awareness – “I just became aware that the output exists”

Interest – “I heard about the output, have learned about it, and now have an active interest”

Evaluation – “I’ve got a strong interest and am willing to commit to evaluating the output”

Trial – “I’ve evaluated the output, decided I like it, and am willing to give it a try.”

Adoption – “It really works!! I’ve decided to adopt it and make it part of the system, Yay!”

Adoption Process Stages

Page 4: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

4

Data Citation Data Type Registries PID Information Types Data Foundation & Terminology Practical Policy Metadata Data Publishing (3 of 4 groups)

Outcomes of Seven Groups Have Potential for Adoption by Ocean Science Data Repositories

Page 5: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

5

BCO-DMO: transition human -> machine clients Ocean science is interdisciplinary Data curation is distributed RDA outcomes can help address challenges of

Interoperability between distributed systems in ocean sciences Interoperability between different domains (natural, social)

Need solutions that scale for 'Big Data' (VARIETY, VERACITY, velocity, volume)

RDA outcomes developed and vetted by representatives from multiple domains

What does RDA offer domain repositories?

Page 6: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

6Data Citation (DC) of evolving data

DC goals: to create identification mechanisms that: allow us to identify and cite arbitrary views of data, from a single record

to an entire data set in a precise, machine-actionable manner allow us to cite and retrieve that data as it existed at a certain point in

time, whether the database is static or highly dynamic

DC outcomes: 13 recommendations and associated documentation ensuring that data are stored in a versioned and timestamped manner identifying data sets by storing and assigning persistent identifiers

(PIDs) to timestamped queries that can be re-executed against the timestamped data store

Page 7: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

7Description of Data Citation Outputs

»» Data Versioning: For retrieving earlier states of datasets the data needs to be versioned. Markers shall indicate inserts, updates and deletes of data in the database.

»» Data Timestamping: Ensure that operations on data are timestamped, i.e. any additions, deletions are marked with atimestamp.

»» Data Identification: The data used shall be identified via a PID pointing to a time-stamped query, resolving to a landing page.

Page 8: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

8Adoption of Data Citation Outputs

Evaluation Evaluate recommendations Try implementation in existing systems

Trial BCO-DMO: R1-11 fit well with current architecture; R12

doable; test as part of DataONE node membership

R2R: curation of original field data and selected subset of post-field products (ship track); so no evolving data

Both working with DataONE as our aggregation system and service provider

Page 9: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

9Data Type Registry (DTR)

DTR mission: see if it is possible to make implicit assumptions about data contained in datasets explicit and programmatically share these assumptions using types and type registries

DTR outcomes: 1 website and 1 API. The registry website provides a user interface for someone to describe both simple and complex data types used for data within a project. They can also search data types created by others. The API provides a way to programmatically interact with the registry including the ability to import data type descriptions.

Page 10: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

10Description of DTR Outputs

1. Data Type Registry A website with a GUI that provides a way for an authorized

someone to describe a data types used in data products.

2. Data Type Registry API An API that among other things creates JSON

representations of the information about data types. This is a pointer to the API specification implemented in the data type registry mentioned above.

Page 11: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

11Adoption of DTR Outputs

Evaluation evaluate the registry and API for use in existing data repositories try using the prototype registry to record a set of data types and

then provide some example code that uses the API to access the information in the data type registry

Trial BCO-DMO, R2R, GeoLink?

data type determined by instrument type R2R already maintains a de facto library of file types for

environmental sensor systems in the US research fleet, in collaboration with NCEI and Chronopolis. We could publish this as a formal Data Type Registry

Page 12: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

12PID Information Type (PID-IT)

PID-IT goal: Provide a way to harmonize PID information types (and associated information) that are associated PID across disciplines and PID providers. (Also to provide technical solutions)

PID-IT persistent outcomes: Types for example use-cases have been registered in the type registry developed in this WG

Page 13: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

13Description of PID-IT Outputs

1. Type Examples and Illustration Use Cases These are examples of …

2. API Description A description of the API used to access the PID registry created by this

group

3. API Prototype Implementation A working version of the API connected to the PID registry that has been

created

4. Registry Prototype The registry prototype itself

5. Client demonstrator GUI Demonstration of the registry and it’s use via a graphical user interface

developed by the group’s intern.

Page 14: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

14Adoption of PID-IT Outputs

Evaluation evaluate the client registry GUI

Trial

BCO-DMO and R2R PID systems in use: DOI (datasets and expeditions), ORCID, FundRef (US awards), ISNI (global organization), IGSN (global samples), re3data (repository), and domain-specific for instruments and measurements

Possible: R2R, BCO-DMO and NCEI (US ocean archive) joining DataONE; DataONE architecture is well-aligned with PID-IT approach, so perhaps DataONE adopts the PID-IT API and offers that as a service to the community

Page 15: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

15Data Foundations and Terminology (DFT)

DFT mission: to understand what the core of the RDA data domain is and then develop definitions of core terms based on data models. This effort is a part of the effort to form agreement on RDA culture.

DFT persistent outcomes: 4 Documents and 1 Wiki Tool that summarize the work they have done on Terminology. The wiki tool is intended to be used by other RDA WGs and IGs to extend the terminology terms, etc. beyond those determined to date.

Page 16: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

16Description of DFT Outputs

1. Overview Document – model descriptions Report on the discussion about a large number of data models

2. Analysis & Synthesis Document Report on the analysis of the data models considered by the group

3. Term Snapshot Document Report on a snapshot of core terms that have been identified

4. Use Cases (1), (2) Use cases that describe how other working groups use the terms the

group has been talked about

5. Semantic Media Wiki Term Tool Tool to capture initial list of terms and definitions for DFT WG discussions,

open for others to use. (it is kind of persistent at this point)

6. Report of Interactions w others about terms Summary of conversations with ~120 individuals about data in the context

of DFT findings.

Page 17: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

17Adoption of DFT Outputs

Evaluation evaluate core terms in Semantic Media Wiki Term Tool for

ocean science domain as they extend our current term reference source

Trial BCO-DMO: map local system terms to the DFT terms;

adding deployment type (cruise, dive, float, experiment), R2R: already essentially follows this model; publishes

Collections that represent a field expedition (research cruise), having Persistent Identifier (DOI).

Challenges Need dereferenceable term URIs (DTR ?) Does the Semantic Wiki Tool provide relationships or a

way to describe them? Is each term a concept in an ontology (e.g. OWL file)? Governance?

Page 18: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

18

Interaction of DFT, PID-IT, DDRI and DTR ?? Registering terms/types at DTR (federated system of

distributed, production ready, registries) What is the appropriate ‘level’ for a registry?

Professional, domain-specific societies? (ASLO, AGU, EGU)

Institutional library? Community organization (ESIP, OGC, ODIP) ?

Not having operational registries may hinder adoption

Opportunity . . .

Page 19: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

19Practical Policy (PP)

PP goals: To enable sharing, revising, adapting, and re-using of computer actionable policies for sharing data, particularly in a data repository and to suggest a set of generic policies to be applied to our data; collect and register practical policies

PP persistent outcomes:

Practical Policy (PP) WG recommendation package of Policy Examples and Template Workbooks

Page 20: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

20Description of Practical Policy Outputs

1. Policy Template Template that includes a generic set of policies suggesting how they can

be implemented within a data system.

2. Implementations Policy descriptions and implementation details

Page 21: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

21Adoption of Practical Policy Outputs

Evaluation evaluate the policy template and implementation

documents Trial

Identify policies that should be documented BCO-DMO and R2R are DOI Publication Agents, but not

long-term archives. Consider documenting our practices (for archive and

replication) in a computer-actionable format, so data deposits (to NCEI, Chronopolis, DataONE) can be periodically verified as part of a self-audit process

Page 22: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

22Metadata

Metadata group goals: Set up a sustainable, community-driven RDA Metadata Standards

Directory, designed for users rather than automated tools, that provides brief details for common research data.

Compile a set of use cases that analyze and document the various ways in which metadata can be used (e.g. for discovery, exchange, re-use, etc.)

Metadata group outcomes: UK DCC Disciplinary Metadata Standards Catalogue http://

www.dcc.ac.uk/resources/metadata-standards functional GitHub prototype directory with version control

http://rd-alliance.github.io/metadata-directory/

Page 23: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

23Adoption of Metadata Group Outputs

Evaluation Compare DCC lists with current practices Identify standards where we currently have none Identify mismatches and consider addressing them

Current Status BCO-DMO: ISO 19115-2 (19139 compliant), DIF, CF via

NVS, O&M, PROV, RDF, DCAT, Dublin Core, (OAI-ORE soon), (used to do FGDC, but dropped that recently in favor of ISO-19115)

R2R: all of the above plus DataCite and IGSN for samples

Page 24: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

24Publishing Data Working Groups

Publishing Data Workflows Publishing Data Services Publishing Data Bibliometrics

Evaluate and Monitor: R2R and BCO-DMO will evaluate these outcomes and look for

ways to implement in our current architecture, or relevant communities (promoting recommended practices)

GeoLink (NSF EarthCube) using Linked Data (Semantic Web technologies) to connect data and publications

meaningful data use statistics would be very welcome

Page 25: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

25Infrastructure Components

supporting the modern research endeavor, is like creating a quilt; a work of art created by a community of practitioners with a shared goal. Each member of the community lovingly, and laboriously designs and constructs their piece of the quilt.

Page 26: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

26Creating the Infrastructure for a Domain

Putting the pieces together to create the ‘whole’ block for a domain.

Page 27: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

27RDA

Combining the domains,

And adding the unifying framework,

to create theglobal research data quilt (RDA)

Page 28: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

Thank you!

Page 29: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

29

Data Description Registry Interoperability (DDRI) Infrastructure providers & data librarians to find connections across

research data registries and create global views of research data.

Repository Audit and Certification DSA–WDS A convergent DSA-WDS certification standard will help eliminate

duplication of effort, increase certification procedure, coherence and compatibility thus benefitting researchers, data managers, librarians and scientific communities.

RDA/WDS Publishing Data Bibliometrics RDA/WDS Publishing Data Services

universal interlinking service between data and scientific literature

RDA/WDS Publishing Data Workflows Wheat Interoperability

Others (6 other group outcomes available by Sep 2015)

Page 30: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

30SLIDE TEMPLATE: 3 slides per working group

Page 31: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

31Group name (GRP)

GRP goal: Description

GRP outcomes: Description

Page 32: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

32Description of GRP Outputs

1. output description

2. output description

Page 33: Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

33Adoption of GRP Outputs

Suggestions evaluate try using the product

Test cases BCO-DMO

R2R

GeoLink