DataCite , DataONE, Dryad and UC3

17
DataCite, DataONE, Dryad and UC3 William Michener DataONE and University of New Mexico John Kunze and Patricia Cruse University of California Curation Center (UC3), California Digital Library and DataONE Ryan Scherle Dryad (National Evolutionary Synthesis Center) and DataONE

description

DataCite , DataONE, Dryad and UC3. William Michener DataONE and University of New Mexico John Kunze and Patricia Cruse University of California Curation Center (UC3), California Digital Library and DataONE Ryan Scherle Dryad (National Evolutionary Synthesis Center) and DataONE. - PowerPoint PPT Presentation

Transcript of DataCite , DataONE, Dryad and UC3

Page 1: DataCite , DataONE, Dryad and UC3

DataCite, DataONE, Dryad and UC3

William Michener DataONE and University of New Mexico

John Kunze and Patricia CruseUniversity of California Curation Center (UC3), California Digital Library and

DataONE

Ryan ScherleDryad (National Evolutionary Synthesis Center) and DataONE

Page 2: DataCite , DataONE, Dryad and UC3

A Choice

If the scientific record is at risk– Results can’t be reproduced– Science fails, global

catastrophe ensues

The choice: Better data publishing, sharing, and archiving

OR

Planetary destruction?Roberto Rizzato

Page 3: DataCite , DataONE, Dryad and UC3

engaging the scientist in the data curation process supporting the full data life cycle encouraging data stewardship and sharing promoting best practices engaging citizens developing domain-agnostic solutions

Providing universal access to data about life on earth and the environment that sustains it

1. Build on existing cyberinfrastructure

2. Create new cyberinfrastructure

3. Support new communities of practice

A Vision for Change: DataONE

Page 4: DataCite , DataONE, Dryad and UC3

University of California Curation Center, California Digital Library

DataONE CyberinfrastructureMember Nodes

• diverse institutions

• serve local community

• provide resources for managing their data

Coordinating Nodes• retain complete metadata catalog • subset of all data• perform basic indexing• provide network-wide services• ensure data availability (preservation) • provide replication services

Flexible, scalable, sustainable network

Page 5: DataCite , DataONE, Dryad and UC3

DataONE Wish List for Data Citation

• Precise identification of a dataset– At level of version, file, table, cell, etc., or groups thereof– So that readers can find and understand the data

• Credit to data producers and data publishers– Vital incentive for data sharing and archiving

• A link from the traditional literature to the data– Gives intellectual legitimacy to creation of data sets

• Research metrics for datasets– Sponsors want publication and retention numbers

• Coordinated citation support for local data producers, regional archives, and global end-users

Page 6: DataCite , DataONE, Dryad and UC3

Identifier Requirements• To accommodate a diverse set of member nodes that hold a

wide variety of content, the DataONE system must adhere to the following principles:

– Agnosticism – DataONE supports all identifier schemes where the ID can be represented as a Unicode string.

– Opacity – DataONE does not attach any meaning or resolution protocol based on the identifier.

– Authority – The identifier first assigned by a member node is authoritative. Other identifiers may be assigned by other nodes for internal use.

Page 7: DataCite , DataONE, Dryad and UC3

Identifier Requirements

• To participate in the DataONE network, a node must be able to meet the following requirements:

– Uniqueness – Identifiers must be unique across the space of DataONE.

– Granularity – Every item must be assigned an identifier (metadata as well as data).

– Immutability – The object referenced by an identifier cannot change. If an object is modified, it must receive a new identifier.

Page 8: DataCite , DataONE, Dryad and UC3

Think Big, Start SmallCDL leading 2 projects involving DataONE:1. EZID for simple identifier management– Creates ids, stores metadata and resolver target URLs– Supports DataCite DOIs and lower-cost ids (ARKs, URLs)– First customer is DataONE member, Dryad

2. Excel “add-in” project with MS Research– Extend Excel to support data sharing, archiving, and access– E.g., ability to export to data archive in a standard format

with column headings drawn from a shared vocabulary

Page 9: DataCite , DataONE, Dryad and UC3

DataONE/DataCite Example

DataCite Member (eg, CDL)

DataONE Member Node data archive

(eg, Dryad)

Research scientist

6. full citation

7. full citation

1. data + metadata

3. citation + URL + id

DOI resolver and TIB registration

5. URL plus id EZID resolver and

registration service4. save full citation

(opt) CDL-hosted EZID id minting service

DataONE Coordinating Node metadata catalog

(eg, UNM or UCSB)

get unique id string

get unique id string

2. metadata + URL + id

Page 10: DataCite , DataONE, Dryad and UC3

A Repository of DataUnderlying Journal Articles

Page 11: DataCite , DataONE, Dryad and UC3

The Goal• Store all data underlying publications in evolutionary biology,

ecology, and related disciplines, at the time of publication.

GenBank

TreeBASE

Dryad

ccaattggct gttcttcgat tctggcgagt

Page 12: DataCite , DataONE, Dryad and UC3

Identifiers and Versioning

• Each “data package” receives a DOI, which refers to the most recent version of the file.

• doi:10.5061/dryad.20

• When repository content is modified, a version indicator will be appended to the original DOI

• doi:10.5061/dryad.20.2

• To specify a particular file within the data package, a slash is used.

• doi:10.5061/dryad.20.2/3

Page 13: DataCite , DataONE, Dryad and UC3

Identifiers and Versioning

• Metadata and particular formats of the files are not given “true” DOIs. They are reachable by appending a parameter to the DOI.

• doi:10.5061/dryad.20.2/3.1?urlappend=%3fformat=dc• doi:10.5061/dryad.20.2/3.1?urlappend=%3fformat=xls

Page 14: DataCite , DataONE, Dryad and UC3

Citation• When using data from Dryad, please cite the original article.

– Sidlauskas, B. 2007. Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Evolution 61: 299–316.

• Additionally, please cite the Dryad data package. The citation should include the following elements: – Author(s)– The date on which the data was deposited– The name of the data file, if applicable– The title of the data package, which in Dryad is always "Data from: [Article name]"– The name "Dryad Digital Repository"– The data identifier

• For example: – Sidlauskas, B. 2007. Data from: Testing for unequal rates of morphological diversification

in the absence of a detailed phylogeny: a case study from characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20

Page 15: DataCite , DataONE, Dryad and UC3
Page 16: DataCite , DataONE, Dryad and UC3
Page 17: DataCite , DataONE, Dryad and UC3

Challenges/Questions

• Dealing with dynamic streaming data?– How do versions enter into the identifiers

scheme?• Resolving to human or machine-interpretable

description of object?• Need for a registry of name spaces?• Can metadata stds support multiple globally

unique identifiers?