The Economics of Data Sharing
-
Upload
anita-de-waard -
Category
Science
-
view
45 -
download
0
Transcript of The Economics of Data Sharing
| 1
Anita de Waard 0000-0002-9034-4119VP Research Data CollaborationsElsevier RDM [email protected]
CMMI WorkshopFebruary 6, 2016
The Economics of Data Sharing
| 2
How do we get scientists to share their data?
How do we make data repositories sustainable?
• The economics of science• Cost recovery models of data repositories• Some examples that work• Some thoughts on the future.
How do we create effective and sustainable ecosystems for storing, sharing and reusable data—
and get people to use them?
| 3
Debit Economy (like a pie)
• Single pile of ‘stuff’ gets divided:- Thing can only be for one person at
one time- “If you get more, I get less”
• Examples: - Money- Jobs- Samples, equipment, space, etc.
• Behaviors: - Hoarding, secrecy- (Cut-throat) competition- Winning by owning
(and not sharing)
Credit Economy (like a song)
• Credit comes from visibility:- The more you give away,
the more you benefit- “Only if I share do I really own”
(“You need me to do you!” JW)• Examples:
- Papers, citations- Good ideas (if credited)- Skills
• Behaviors: - Open access, citation game- Collaboration with top-X- Winning by sharing
(to enable priority & visibility)
Two Economies of Science [1]:
[1] Paula Stephan: “How Economics Shapes Science”, Harvard University Press, 2012: http://www.jstor.org/stable/j.ctt2jbqd1
<<
< D
ATA
???
| 4
RDA IG Repository Cost Recovery• Interviewed 22 repositories, globally• Different income streams:
1. Structurally funded2. Mostly data access charges3. Mostly data deposit fees4. Membership fees (for deposits and/or access)5. Serial project funding6. Supported by host institution
• Different new models under considerations:• Sponsorships/services for the commercial sector • Contracts for specific services offered (hosting, archiving, curation)• Expanding the number of affiliated institutions• Deposit fees• More services for “national memory institutes”
• Some comments:• Some countries structurally fund repositories (not US!)• Some repositories embedded in scholarly practice• Hard to come up with new models: no time, no skill sets!
| 5
Object of Study Raw
Data
Processed Data
Data With
PaperCurated Record
Method Analysis Tables/Figures Curate
Methods Software
Four Types of Repositories:
ResearchQuestion
NOAA: 20 TB/NASA streaming > 24 PB/day NASA Reverb: 12 PB Data NSSD: > 230 TB of digital dataNSIDC: 1 PB data, : 1 PB totalALMA Telescope: 40 TB/day
Local Storage/Instrument Repositories
Size: PBNr of files: Trillions
Deep Blue (Umich): 80kMIT Dspace: 75 kHAL (France): 60 kD-Space Cambr: 1.5 kOf which data: hundreds
Institutional/Local Repositories
Size: GBNr of files: Billions
Figshare: 1.2 M DataDryad: 3 kDataverse: 58 k
Non-Domain Repositories
Size: MBNr of files: Milliions
Domain Repositories
PetDB: 6 kPDB: 100 kNIST ASD: 170 k
Size: kBNr of files: 100ks
Publication
| 6
YES:• Astronomy: telescopes• High-energy physics: accelerators• Earth science: satellites• Social science: censuses • Medicine (sometimes): patient data in
large studies• Life science: sequence data
NO:• Low-temperature physics: cryostats• Earth science: samples• Materials science: catalysts,
microscopes, etc.• Social science: interviews• Medicine: individual patient data• Neuroscience: microscope
Where is data sharing happening?
• Big equipment, not a single lab/person can run
• Can’t do science without it• Tools in place to be effective
• Small equipment, single lab/person can run
• Can do science without sharing• No effective tools in place
Communicate
Prepare
Observe
Analyze
Ponder
| 7
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observations
Observations
Observations
Identify entities from the start
Connecting small science
| 8
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observations
Observations
Observations
Compare outcome of interactions with these entities
Connecting small science
| 9
Prepare
Analyze Communicate
Prepare
AnalyzeCommunicate
Observations
Observations
Observations
Build a ‘virtual reagent spectrogram’ by comparing how different entities interacted in different experiments
Think
Reason collectively!
Connecting small science
| 10
A small change for small science: Urban Legend [2]
• Encourage data sharing of raw data files + experimental metadata• Add metadata to your experiment while you’re performing it• Improved data practices made lab more productive and more creative, and
enabled effective and novel collaborations• Lesson: split the data storage and curation from data sharing!
- Provide direct reward to storage: now we can find our own data!- Enable simple upload to embargo’d data set when owner is ready.
[2] Tripathy et al, 2014: http://www.frontiersin.org/10.3389/conf.fninf.2014.18.00077/event_abstract
| 11
Researcher
Funding AgencyInstitution
Data Repository
Dataset
JournalPaper
Addressing the fear of scooping with embargo’s:
1. Researcher creates datasets2. Researcher writes paper & publishes in journal3. (Sometimes,) dataset gets posted to repository4. Researcher reports (post-hoc) to Institution and Funder
22
1
3
4
4
| 12
Researcher
Funding AgencyInstitution
Data Repository
Dataset
JournalPaper2
2
1
3
4
4iii. No links between
data and paper
iv. Funders/Institutions informed as an afterthought
i. Too much work for researchers
ii. Data posting not mandatory
Addressing the fear of scooping with embargo’s:
| 13
Researcher
Funding Agency
Institution
Data Repository
Dataset
Journal
Paper
1. Researcher creates datasets and posts to repository(under embargo – not publicly viewable)
2. Funder is automatically notified of dataset posting3. Researcher writes paper & publishes in journal; embargo is lifted and data linked
- NB this also allows release of non-used data for negative result and reproducibility4. Funder and institution get report on publication and embargo lifting
2
11
3
3
3
44
Addressing the fear of scooping with embargo’s:
| 14
A System for Linking Data Links: Scholix
• ICSU-WDS/RDA Publishing Data Service Working group, merged with National Data Service pilot
• Cross-stakeholder – with input from CrossRef, DataCite, OpenAIRE, Europe PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others
• Proposed long-term architecture and interoperability framework: www.scholix.org• Operational prototype at http://dliservice.research-infrastructures.eu/#/api
(including 1.4 Million links from various sources) • Making links between datasets and articles available could/should encourage data
citation and deposition• Together with Force11 Data Citation Principles, encourage Research Object
citation/credit metrics.
| 15
The Commons
Cloud ProviderA
NIH
Option:Direct Funding
NIH BD2K
A System for A New Data Economics: NIH Data Commons
Phil Bourne, Dec15
Enables Search
Discovery Index
Indexes
Search Engines
Cloud ProviderB
Investigator
Provides credits
Uses credits inthe Commons
User
| 16
Drivers for Data Sharing: A Study in Behavioral Economics
• Study scholarly reward systems from point of view of economics• Develop economic model for entire scholarly rewards ecosystem:
career, prestige, tenure, finances, etc• Two intended outcomes:
- Understanding current behavior with respect to data sharing: can we explain what we see, and the differences between different domains?
- Theoretical foundation for recommendations for policies and practices to stakeholders such as funders, publishers and standards bodies
• Small group working on it, planning first meeting:- Mike Huerta (NLM), Micah Altman (MIT), Fran Berman (RPI), Carol
Tenopir (TN), Carole Palmer (UW), Greg Gordon (SSRN).• Thoughts, join?
| 17
• The Economy of Science: pies vs. songs- RDA Data Repositories Cost Recovery IG:- Different types of repositories, different types of science- Need to move from ‘small’ to ‘big’ science thinking
• Some examples of successful data sharing: - Online electronic lab notebooks: making it too easy not to use- RDA Scholix: linking systems of links using existing technology- The NIH Data Commons: enabling a data economy in practice
• Some things we can do:- Embargo pilots: circumvent the fear of scooping- Drivers for data sharing report: science is a human endeavor
In summary: cyberinfrastucture
| 18
Thank you!
Links:• https://www.hivebench.com• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-20
15-international-data-rescue-award-in-the-geosciences
• http://www.journals.elsevier.com/softwarex/• https://www.elsevier.com/books-and-journals/content-innovation/data-base-
linking• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html • https://rd-alliance.org/bof-data-search.html• https://data.mendeley.com/• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data• https://www.force11.org/• http://www.nationaldataservice.org/• https://rd-alliance.org/• https://www.elsevier.com/about/open-science/research-data
Anita de Waard, [email protected]