The Dryad Data Repository Ryan Scherle 1, Hilmar Lapp 1, Amol Bapat 2, Sarah Carrier 2, Jane...

1
The Dryad Data Repository Ryan Scherle 1 , Hilmar Lapp 1 , Amol Bapat 2 , Sarah Carrier 2 , Jane Greenberg 2 , Peggy Schaeffer 1 , Todd Vision 1,3 , Hollie White 2 1 National Evolutionary Synthesis Center (NESCent), USA 2 School of Information and Library Science, University of North Carolina, USA 3 Department of Biology, University of North Carolina, USA Joint Data Archiving Policy Partner journals have agreed to jointly enact a data archiving policy. This policy will ensure that all data associated with papers in participating journals is saved in appropriate repositories. The current draft of the policy states: Partner journals A consortium of journals governs Dryad, guiding policy development and ensuring long-term sustainability. NESCent is a collaborative effort of Duke University, The University of North Carolina at Chapel Hill and North Carolina State University. Dryad is supported by NSF grants # EF-0423641, #DBI- 0743720, and #DBI-0753138, and by IMLS grant #LG-07- 08-120-08 Submission system Dryad’s submission system is optimized for quick and easy submissions. Only a few pieces of information about a publication and dataset are required. However, users have the option to enter more detailed descriptions, making data easier for others to find and reuse (and thus more likely to receive subsequent citations). Modifications to DSpace The implementation of Dryad has required many changes to the core DSpace platform, including grouping of search results by publication and the ability to embargo datasets for up to one year. When these modifications meet the needs of the larger DSpace community, they are integrated into the core DSpace software. Handshaking with specialized databases For databases that are widely used by Dryad’s audience (e.g., TreeBASE and GenBank), Dryad will work with the database to mirror data submitted to the database and/or facilitate automatic deposit of Dryad material into the database. Harvesting and searching related content Using the OAI-PMH and OAI-ORE protocols from the Open Archives Initiative, Dryad will harvest content from related repositories, including the Knowledge Network for Biocomplexity and the Long Term Ecological Research Network. Harvested content can be searched alongside native Dryad content, providing a single place to search multiple related repositories. Dryad will also make use of the SRU searching standard to provide searching capabilities for content that cannot be harvested. Machine-readable interfaces Dryad will provide multiple interfaces for researchers and other systems to access content in Dryad. Content can be monitored via RSS feeds, searched via the SRU searching standard, and harvested via the OAI-PMH protocol. Basic search interface Dryad allows data to be searched using standard publication information such as title and authors. Searches can also include more detailed information, such as taxonomic names and geological timespans. GenBank TreeBas e Dryad ccaattggct gttcttcgat tctggcgagt Repository: http://DataDryad.org Project info: http://DataDryad.org/wiki Source code: http://dryad.googlecode.com Journal integration Partner journals forward metadata about accepted publications to Dryad. Authors can import this information, greatly reducing the time required to submit data. When a submission is complete, Dryad returns information to the journal, allowing links from article web pages to related content in Dryad. Related projects The HIVE project is developing tools for integrating controlled vocabularies and ontologies with repositories. HIVE will integrate with the Dryad submission system. Dryad is a member of the DataONE consortium of repositories, which is developing tools for wide-scale data sharing, mirroring, and analysis. “<<Journal>> requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as <<list of approved archives>>. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species.”

Transcript of The Dryad Data Repository Ryan Scherle 1, Hilmar Lapp 1, Amol Bapat 2, Sarah Carrier 2, Jane...

Page 1: The Dryad Data Repository Ryan Scherle 1, Hilmar Lapp 1, Amol Bapat 2, Sarah Carrier 2, Jane Greenberg 2, Peggy Schaeffer 1, Todd Vision 1,3, Hollie White.

The Dryad Data RepositoryRyan Scherle1, Hilmar Lapp1, Amol Bapat2, Sarah Carrier2, Jane Greenberg2, Peggy Schaeffer1, Todd Vision1,3, Hollie White2

1 National Evolutionary Synthesis Center (NESCent), USA 2 School of Information and Library Science, University of North Carolina, USA 3 Department of Biology, University of North Carolina, USA

Joint Data Archiving PolicyPartner journals have agreed to jointly enact a data archiving policy. This policy will ensure that all data associated with papers in participating journals is saved in appropriate repositories. The current draft of the policy states:

Partner journalsA consortium of journals governs Dryad, guiding policy development and ensuring long-term sustainability.

NESCent is a collaborative effort of Duke University, The

University of North Carolina at Chapel Hill and North Carolina

State University.

Dryad is supported by NSF grants # EF-0423641, #DBI-

0743720, and #DBI-0753138, and by IMLS grant #LG-07-08-

120-08

Submission systemDryad’s submission system is optimized for quick and easy submissions. Only a few pieces of information about a publication and dataset are required. However, users have the option to enter more detailed descriptions, making data easier for others to find and reuse (and thus more likely to receive subsequent citations).

Modifications to DSpaceThe implementation of Dryad has required many changes to the core DSpace platform, including grouping of search results by publication and the ability to embargo datasets for up to one year. When these modifications meet the needs of the larger DSpace community, they are integrated into the core DSpace software.

Handshaking with specialized databasesFor databases that are widely used by Dryad’s audience (e.g., TreeBASE and GenBank), Dryad will work with the database to mirror data submitted to the database and/or facilitate automatic deposit of Dryad material into the database.

Harvesting and searching related contentUsing the OAI-PMH and OAI-ORE protocols from the Open Archives Initiative, Dryad will harvest content from related repositories, including the Knowledge Network for Biocomplexity and the Long Term Ecological Research Network. Harvested content can be searched alongside native Dryad content, providing a single place to search multiple related repositories.

Dryad will also make use of the SRU searching standard to provide searching capabilities for content that cannot be harvested.

Machine-readable interfacesDryad will provide multiple interfaces for researchers and other systems to access content in Dryad. Content can be monitored via RSS feeds, searched via the SRU searching standard, and harvested via the OAI-PMH protocol.

Basic search interfaceDryad allows data to be searched using standard publication information such as title and authors. Searches can also include more detailed information, such as taxonomic names and geological timespans.

GenBank

TreeBase

Dryad

ccaattggct gttcttcgat tctggcgagt

Repository: http://DataDryad.orgProject info: http://DataDryad.org/wikiSource code: http://dryad.googlecode.com

Journal integrationPartner journals forward metadata about accepted publications to Dryad. Authors can import this information, greatly reducing the time required to submit data.

When a submission is complete, Dryad returns information to the journal, allowing links from article web pages to related content in Dryad.

Related projectsThe HIVE project is developing tools forintegrating controlled vocabularies and ontologies with repositories. HIVE will integrate with the Dryad submission system.

Dryad is a member of the DataONEconsortium of repositories, which isdeveloping tools for wide-scaledata sharing, mirroring, and analysis.

“<<Journal>> requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as <<list of approved archives>>. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species.”