Sünje Dallmeier-Tiessen: Research data "publishing": models, roles and responsibilities
Data Publishing Models by Sünje Dallmeier-Tiessen
-
Upload
datascienceiqss -
Category
Education
-
view
359 -
download
0
Transcript of Data Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models
Sünje Dallmeier-Tiessen, PhD CERN, Harvard University
For the RDA-WDS Data Publishing Workflow Group
June 9th, 2015
Topics
• What is data publishing • Why do we care about it (today) • Models in data publishing • Building blocks • Information gathered through trusted data publishing • Relevance and conclusions for today’s workshop
This is work conducted by the RDA-WDS group on data publishing workflows, chaired in collaboration with Fiona Murphy and Theo Bloom.
Data Publishing … describes the process of making research data and other research objects available on the web so that they can be discovered and referred to in a unique and persistent way. At its best, data publishing takes place through dedicated data repositories and data journals and ensures that the published research objects are well documented, curated, archived for the long term, interoperable, citable and quality assured. Thus, they are reusable and discoverable on the long term.
Examples
Analysis elements • Discipline, responsible units (i.e. their roles) • Function of workflow • PID assignment: DOI, ARK, etc. • Peer review of data (e.g. by researcher & editorial review) • Curatorial review of metadata (e.g. by institutional or subject repository?) • Technical review & checks (e.g. for data integrity at repository upon
ingestion) • Formats covered • Persons/Roles involved, e.g. editor, publisher, data repository manager,
etc. • Links to additional data products (data paper; review documents; other
journal articles) or “stand-alone” product • Links to grants, usage of author PIDs • Discoverability: Indexing of the data -- if yes, where? • Data citation facilitated • Data life cycle reference • Standards compliance
Repository’s perspective
Data Deposit
Ingest
Quality Assurance
Data Management LT Archiving
Dissemination Access
Producer Consumer/ Reuse
Simplified generic repository workflow Researcher with a central role during submission/deposition
Review/QA mainly internal through dedicated curation personnel
Data Deposit
Ingest
Quality Assurance
Light Data
Management LT Archiving
Dissemination Access
Producer
Consumer (disciplinary)
Ingest
Quality Assurance Detailed
Project Repositories: • Data are published in a federated
data infrastructure • Data are added and corrected • Poor documentation • Usually no data backup • Light-weight quality assurance
against intl. and project standards • Tendency that the project data
never become stable • Currently no PIDs assigned or
reserved but Handles planned
Long-term Archive:
• Data are archived for the long term at a single location
• Data are stable and curated • Detailed documentation • Data backup/redundancy • Quality assurance process is more
detailed and includes a review • Data is a “snapshot” of the project
data at a certain time • DOIs assigned to data collections
Consumer (interdisciplinary)
Dissemination Access
Content provided by M. Stockhause
Disciplinary repository example
Lessons learnt and questions • Very diverse landscape • Discipline-specific and cross-discipline actions • Quality assurance a big topic in discipline-specific
repositories • Widespread persistent identification • Data citation awareness • Challenge: Versioning
Publisher’s perspective
Article preparation
Data Submission
Article submission
Peer Review Process Editing Producer Consumer/
Reuse
Simplified generic publisher workflow Researcher takes over several roles: submitter, reviewer, editor potentially?
- Article/data container
- Separate article and datasets
Publishing
Data repositories
Example Workflows in Dataverse: Connect Data to Journals
A. Journals include Dataverse as a Recommended Repository
B. Authors Contribute Directly to a Journal’s Dataverse
C. Automated Integration of Journal + Dataverse (e.g., OJS)
Slide by Eleni Castro
Example: Dryad repository integrated with journals
Slide by T. Bloom
Data publishing building blocks
Primary data entry with PID
Repository entry
Metadata
Curation
Parallel data description
Data Paper or link to it
Link to results paper
Linked and published quality
assurance
Curation, Editing process
Peer review
Any kind of QA process
Additional visibility
Push to ORCID, author
pages, impact/reputation building
tools
Enable index (Data citation index, crawled
by Google)
Basic published product
Add-ons: workflows for more documentation, QA, visibility
Trusted data publishing contains:
• Standardized information about the data – Disciplinary standards – Basic common metadata sets
• Distinct Roles, Workflows and Responsibilities – Authorship, Submission – Curation – Quality Assurance – Peer review
• Persistent Identification – Permanent reference – Data citation
Challenges
• Interoperability challenges – Different metadata schemas – Rich vs. limited metadata
• Discoverability challenges – E.g. no bi-directional linking – Usability challenges in aggregators
• Metrics and accreditation • What information is needed for future
reuse/remix/reproducibility • How can this information be exposed – human
and machine readable
Thank you!
Data Publishing Workflows
Activities and processes in a digital environment that lead to the publication of research data and other research objects on the Web. These activities may be performed by humans or in an automated fashion. In contrast to the interim or final published products, workflows are the means to curate, document, peer review and thus ensure and enhance the value of the published product.