Data Publishing at Harvard's Research Data Access Symposium
-
Upload
merce-crosas -
Category
Data & Analytics
-
view
352 -
download
1
Transcript of Data Publishing at Harvard's Research Data Access Symposium
![Page 1: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/1.jpg)
DATA PUBLISHING
Mercè Crosas, Ph.D.Chief Data Science and Technology OfficerInstitute for Quantitative Social ScienceHarvard University@mercecrosas
![Page 2: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/2.jpg)
Scholarly Publication Then
350 years ago, the first issue of Philosophical Transactions was published by the Royal Society, under the motto “Nullius in verba” (or “Take nobody’s word for it”)
No digital data – but data are described in detail in the article
![Page 3: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/3.jpg)
Scholarly Publication Nowarticle
analysis
Digital Data
Most scientific studies now involve large amounts of digital data & software for analysis.
Software Publishing
Data Publishing
Article Publishing
![Page 4: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/4.jpg)
Data publishing: It’s good for you and good for the world
You Get credit for your data
Publishers and Journals Verify published work
Federal funding agencies Make public assets accessible
Science Validate, reuse and extend previous work
![Page 5: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/5.jpg)
Sharing Data Increases Citations
From 10,555 studies with gene expression microarray data:
- Studies that shared data received 9% more citations
- Data reuse by third-party investigators continued for 6 years
Piwowar and Vision (2013), Data reuse and the open data citation advantage. PeerJ 1:e175; DOI 10.7717/peerj.175
![Page 6: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/6.jpg)
Pepe, Goodman, Muench, Crosas, Erdmann, 2014 “Sharing, Archiving and Citing Data in Astronomy” PLOSOne
Over time, links to data become invalid
Perc
enta
ge o
f bro
ken
links
to d
ata
1
Analysis of 7,641 Publications from 4 major journals in Astronomy and Astrophysics, between 1997 and 2008
Long-Term Accessibility must be Considered
![Page 7: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/7.jpg)
Data Publishing
A formal data citation• Reference• Access (persistent
identifier)
Information about the data (metadata)• Discovery• Use A trusted data
repository• Access (long-term
archival)
Data Publishing needs to support data discovery, reference, access, and use
![Page 8: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/8.jpg)
Data Citation Principles
1. Data should be citable products of research 2. Credit and Attribution3. Evidence4. Unique Identification5. Access6. Persistence7. Specificity and Verifiability
8. Interoperability and flexibility
Full Principles: https://www.force11.org/datacitation
![Page 9: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/9.jpg)
Data Repositories vs Repository Software
Domain-specific
repositories
Gen Bank
Protein Data Bank
SBGrid Data
…
General-purpose
repositoriesHarvard
Dataverse
DataDryad
Figshare
…
Repository Software
Dataverse Software
Dspace
Fedora
…
![Page 10: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/10.jpg)
dataverse.org
Open-source software developed at Harvard’ IQSS since 2006Installed in 12 sites world wide
Serving 100s of universities and organizations
![Page 11: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/11.jpg)
Harvard Dataverse: dataverse.harvard.eduOpen to all research fields and all researchers
More than 1200 dataversesMore than 59,000 datasets
More than 1,400,000 downloads
![Page 12: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/12.jpg)
Dataverses are containers for Datasets
Each Dataverse can be for a researcher, a research project, a department, a journal, or a larger organization.
![Page 13: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/13.jpg)
Dataverse offers a rich feature setCredit and Visibility
• Standard, persistent data citation
• Branding for each dataverse
• Widgets to embed in your own website
Discovery
• Faceted search for all metadata
• Standard metadata:• citation• scientific
domain• file-level
Access Control & Roles
• CCO waiver for public datasets
• Tiered access:• terms of use• guestbook• restricted data
• Publishing workflow
• Multiple roles:• contribute• curate, review• administrate
Data Features
• Versioning• Conversion of
tabular data files to standard format
• Automatic extraction of file metadata (R, STATA, SPSS, XSD, FITS)
Journal Systems (Open Journal System, ScholarOne); Open Science FrameworkData Analysis (TwoRavens); Spatial Viz (WorldMap); Preservation systems (Archivematica)
Interoperability through APIS
![Page 14: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/14.jpg)
What you can do with file-level metadata and APIs Now
Anti-slavery petitions data Statistical analysis with TwoRavens
Boston Area Research Initiative data visualization in WorldMap
Tuberculosis Genomics data
![Page 15: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/15.jpg)
What you will be able to do with Image Data in Dataverse
OME-TIFF Files
FITS Files
OMERO
WORLD WIDE TELESCOPE
Conversion to standard formats +Extraction of file-level metadata
![Page 16: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/16.jpg)
Current Collaborations
SB Grid Data Repository (HMS, IQSS) Social Science Big Data (IQSS)
Data Provenance (SEAS, IQSS)
Privacy Tools to share sensitive data (SEAS, Berkman, Privacy Lab, IQSS, MIT)
![Page 17: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/17.jpg)
Sharing Sensitive Data with Confidence: DataTags System
DataTag: A set of security features and access requirements for file handling
![Page 18: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/18.jpg)
A DataTags Repository is a repository of files held for Data Sharing that:
1. Supports more than one datatag2. Each file in the repository must have one datatag3. A recipient of a file from the repository must:
a. satisfy file’s access requirements,b. produce sufficient credentials as requested, c. and agree to any terms of use required to acquire the file.
4. Provides technological guarantees for requirements 1, 2 and 3.
Sweeney L, Crosas M, Bar-Sinai M. Sharing Sensitive Data with Confidence: The DataTags System. Technology Science. 2015101601. October 16, 2015. http://techscience.org/a/2015101601
![Page 19: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/19.jpg)
Data Publishing Workflow for Sensitive Data
Sensitive Dataset
Sensitive Dataset
Direct Access
Curator Model
Datatags.org
![Page 20: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/20.jpg)
A Curator Model for Privacy-Preserving Analysis
Acknowledgement: Honaker, J. and Nissim, K., Data Privacy Tools Project
Differentially Private statistics (summaries, causal inference, regression, interactive queries)
![Page 21: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/21.jpg)
DEMO
https://beta.dataverse.org/custom/DifferentialPrivacyPrototype/
![Page 22: Data Publishing at Harvard's Research Data Access Symposium](https://reader035.fdocuments.in/reader035/viewer/2022070519/58ed59901a28abb23d8b467f/html5/thumbnails/22.jpg)
THANKS
Acknowledgement: Latanya Sweeney, James, Honaker, Eleni Castro, Margo Seltzer, Piotrek Sliz, Christine Choirat, Garth Griffin, and the Dataverse team for graphics and slides