CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses,...
Transcript of CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses,...
![Page 1: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/1.jpg)
CLOUD DATAVERSE
Mercè Crosas, Institute for Quantitative Social Science, Harvard University
@mercecrosas
MOC WORKSHOP, OCTOBER 3, 2017, BOSTON UNIVERSITY
![Page 2: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/2.jpg)
OUR INSTITUTE PROVIDES ATECHNOLOGY SOLUTION TO
DATA SHARING
Institute for Quantitative Social Science, Harvard University@IQSS
![Page 3: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/3.jpg)
An open-source software to share, cite, and find data.
Developed at Harvard's Institute for Quantitative Social Science
with the contribution of an active and growing community.
![Page 4: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/4.jpg)
2006 (we started) 2017
dataverse.org
26 Dataverse installations servinghundreds of institutions
![Page 5: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/5.jpg)
HOW RESEARCHERS SHARE & USE DATA WITH DATAVERSE
Harvard Dataverse RepositoryA public repository for research data > 70,000 datasets total > 49,000 datasets uploaded toHarvard Dataverse repository200 datasets/month > 340,000 files4,000 files/month > 2.5 M downloads60,000 downloads/month
Datasets Added
Downloads
dataverse.harvard.edu
![Page 6: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/6.jpg)
King, 1995, Replication,Replication
Altman and King, 2007, A Proposed Standard forthe Scholarly Citation of Quantitative Data
Altman et al, 2001, A Digital Library for the Disseminationand Replication of Quantitative Social Science
King, 2007, An Introduction to the DataverseNetwork as an Infrastructure for Data Sharing
Crosas, Honaker, King, Sweeney, 2015,Automating Open Science for Big Data
Crosas, 2012, The Dataverse Network: an open sourceapplication for sharing, discovering, and preserving research
data
Altman and Crosas, 2013, The Evolution to DataCitation: from principles to implementation
Crosas, 2013, A Data Sharing Story
2014, Joint Declaration of DataCitation Principles
Pepe et al, 2014, How Do Astronomers Share Data?
Goodman et al, 2014, Ten Simple Rules forthe Care and Feeding of Scientific Data
Castro et al, 2015, Achieving Human andMachine Accessibility of Cited Data
Sweeney, Crosas, Bar-Sinai, 2015, Sharing SensitiveData with Confidence: The DataTags System
Meyer et al. 2016, Data Publication with the Structural Biology Data Grid Supports Live Analysis
Wilkinson et al, 2016, The FAIRGuiding Principles for Scientific
Data Management andStewardship
Bierer, Crosas, Pierce, 2017, DataAuthorship as an Incentive to
Data Sharing
OUR CONTRIBUTIONS TO ENHANCE DATA SHARING
2017
![Page 7: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/7.jpg)
FINDABLEACCESSIBLE INTERPOPERABLEREUSABLE
Data should be ...
Wilkinson et al. , 2016, "The FAIR Guiding Principles for Scientific Data Management and Stewardship"
Nature Scientific Data
![Page 8: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/8.jpg)
FAIR DATA IN DATAVERSE
Data Files
Metadata
Data Licenses,User Agreements,
Restrictions
Data Citationwith Persistent
Identifier
Versions
APIs
![Page 9: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/9.jpg)
+
Cloud Dataverse combines the power of cloud computing andstorage with access to thousands of datasets from a feature-richdata repository platform
![Page 10: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/10.jpg)
WHY CLOUD DATAVERSE?
Big Data should also be FAIR Data
Datasets are replicated to the Cloud for efficient access and reuse
Computing on a dataset is enabled directly from any repository
![Page 11: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/11.jpg)
![Page 12: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/12.jpg)
WHAT WE HAVE BUILTDataverse integration with Swift storageCompute access to MOC from a dataset page in DataverseTemporary url to access restricted files in MOC
IN PROGRESS
Replicate data from any Dataverse to Cloud DataverseUpload data directly in Swift; publish dataset from Swift to Dataverse
NEXT
Implement Swift Access Control List (ACL) for file restrictionSupport InCommon for MOC to use same credentials as in Dataverse
![Page 13: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/13.jpg)
INTEGRATION WITHOTHER PROJECTS
![Page 14: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/14.jpg)
BILLION OBJECT PLATFORMBIG GEODATA EXPLORATION AND ANALYTICS
![Page 15: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/15.jpg)
![Page 16: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/16.jpg)
DATA PROVENANCETRACK THE ORIGINAL SOURCE OF A DATASET
![Page 17: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/17.jpg)
Pasquier, Lau, Trisovic, Boose, Coutierer, Crosas, Ellison, GIbson, Jones, Seltzer, 2017, If These Data Could Talk, Nature Scientific Data
(Data Provenance examples from CERN and Harvard Forest)
![Page 18: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/18.jpg)
DATA PRIVACYCLASSIFY AND HANDLE DATASETS BASED ON
THEIR PRIVACY LEVEL
![Page 19: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/19.jpg)
Harvard Data Privacy Tools Project: privacytools.seas.harvard.edu
DataTags Project: datatags.org
![Page 20: CLOUD DATAVERSE - Harvard University · FAIR DATA IN DATAVERSE Data Files Metadata Data Licenses, User Agreements, Restrictions Data Citation with Persistent Identifier Versions](https://reader036.fdocuments.in/reader036/viewer/2022070719/5edf4582ad6a402d666a9e89/html5/thumbnails/20.jpg)
Text
THANKS@mercecrosas
@iqss
scholar.harvard.edu/mercecrosas
dataverse.org