How and Why to Share Your Data

27
SHARING YOUR DATA Kathleen Fear October 2, 2014

Transcript of How and Why to Share Your Data

SHARING YOUR DATAKathleen Fear

October 2, 2014

What is the Data Services Center?• Numeric and statistical data services

• Finding and providing access to datasets• Planned: statistical consulting

• Spatial data services• Creating and acquiring GIS data

• Research data services

WHY?

Why?

Funders and publishers say you have to…

…and it’s a good thing to do for science…

…and for you.

Open Data citation advantage• Papers that make data available are cited 9 – 69% more

(Dorch, 2012; Sears, 2011; Henneken and Accomazzi, 2011; Pienta et al., 2010; Piwowar et al., 2007)

• Why? (Piwowar and Vision, 2013)

• Data reuse• Credibility signaling• Increased visibility• Early view• Selection bias

WHAT?

You don’t have to share all your data

with anyone who wants it

“at no more than incremental cost and within a reasonable time” (NSF)

“indicate the criteria for deciding who can receive your data” (NIH)

“All data necessary to understand, assess, and extend the conclusions of the manuscript must be

available to any reader.” (Science)

Consider granularity:• What would someone need to reproduce your results?

Data

Processed FinalRaw

Scripts, code libraries, etc.

Metadata

Consider timing:• Before publication? At the time of publication?

• Consider restrictions, embargo, etc. for data that can’t be immediately shared freely• Check with UR Ventures if you have concerns about protecting

patent interests

• Staggered release: metadata, then data later

Consider usability:• Could someone with comparable expertise look at your

data and understand how to use it?• Is it clear how different files relate to each other?• Are your variable names meaningful? File names descriptive?• Include README.txt file or codebook in top level of directory

• Are special tools or software needed to use your data?• Are your files in a proprietary format? Will future users be able to

open them?• Include the necessary tools, or make the data available in open

formats

HOW?

Why can’t I keep in on my computer?• Poor success rates for data sharing requests (Vines et al.,

2013; Savage and Vickers, 2009; Wicherts et al., 2006)

• The older the article, the harder to get the data (Vines et al,

2014):• Odds of a dataset being reported as extant decline by 17% per

year• Odds of finding a working email for first, last, or corresponding

author decline by 7% a year

Why can’t I keep it on my computer?

“Sure I will send you those data, but it's like seven computers ago, and so please allow me some time to hunt

them down” (Wicherts and Bakker, 2012)

• Most refusals are not to protect ongoing work, but because (Vines et al., 2014):• The data are on a computer that got stolen…• The data are in my parents’ attic…• The data are definitely on one of these zip disks…

• …and it will take hours for me to get them, if I can get them at all.

Set it and forget it: put your data in a repository

• Long-term commitment to data preservation

• Reuse tracking and usage statistics

• Permanent URL / DOI enables data citation

Set it and forget it: put your data in a repository

1. Find a disciplinary repository or database• Repository directories: re3data.org; biosharing.org• Typically managed by specialists in the field

Set it and forget it: put your data in a repository

1. Find a disciplinary repository or database• Repository directories: re3data.org; biosharing.org• Typically managed by specialists in the field

2. Use a general-purpose repository• UR Research: https://urresearch.rochester.edu/home.action

• Library-hosted• 2GB soft limit• Backed up, secure• Free!

Set it and forget it: put your data in a repository

1. Find a disciplinary repository or database• Repository directories: re3data.org; biosharing.org• Typically managed by specialists in the field

2. Use a general-purpose repository• UR Research: https://urresearch.rochester.edu/home.action• Dryad: http://datadryad.org

What is Dryad?

• Integration with journal submission processes (http://datadryad.org/pages/integratedJournals)

• Not free: $80/submission. But we provide vouchers!

How to get a voucher• Proposal should include:

• A description of the project to which the data is related;• A description of the data to be archived, including the format(s)

and approximate total size. The RCL will fully fund datasets up to 10GB, with larger data considered on a case-by-case basis.

• Send proposal to [email protected]

But my data’s bigger than that…• An upcoming option: REACTUR (Research data Archiving

and Curation at the University of Rochester)

• River Campus Libraries + CIRC = easy data sharing for large datasets

• $200 / TB / year

• Piloting now, hope to be available for all in Spring 2015

Set it and forget it: put your data in a repository

1. Find a disciplinary repository or database• Repository directories: re3data.org; biosharing.org• Typically managed by specialists in the field

2. Use a general-purpose repository• UR Research: https://urresearch.rochester.edu/home.action• Dryad: http://datadryad.org • REACTUR

RepositoryAmount of

data accepted

CostAbility to

restrict data?Publisher

integration?

UR Research Up to 2GB FreeYes, highly

customizableNo

FigShare

Up to 1GB private,

unlimited public

Free Yes Yes

Dryad Up to 10GB$80 per

submission up to 10GB

No Yes

REACTUR Unlimited$200 / TB /

yearYes Not yet

A little help:• Call me! (Or email, or drop by.)

5-6882

Carlson 313E

[email protected]

• At URMC, contact:

Donna Berryman

5-6877

[email protected]

Linda Hasman

5-3399

[email protected]

Data Workshops• 1st and 3rd Thursdays @ noon, Carlson Library Rm. 310

Fall 2014 Spring 2015

SeptemberWriting a successful data management plan January

R 101

Intro to R SpatialIntro to GIS I

OctoberSharing your data

FebruaryUsing the DMPTool

Intro to GIS II Georeferencing maps

November

Finding and using data from ICPSR March

Basic database design

Web mapping: Google Refine, Open LayersIntro to GIS III

DecemberData visualization

AprilTools for qualitative research

--- Mapping real-world data

References• Dorch, B. (2012). On the Citation Advantage of linking to data. Retrieved from http://hprints.org/hprints-

00714715• Henneken, E. A., & Accomazzi, A. (2011). Linking to Data - Effect on Citation Rates in Astronomy.

arXiv:1111.3618 [astro-Ph]. Retrieved from http://arxiv.org/abs/1111.3618• Pienta, A. M., Alter, G. C., & Lyle, J. A. (2010). The Enduring Value of Social Science Research: The Use and

Reuse of Primary Research Data. Retrieved from http://deepblue.lib.umich.edu/handle/2027.42/78307• Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). Sharing Detailed Research Data Is Associated with

Increased Citation Rate. PLoS ONE, 2(3). doi:10.1371/journal.pone.0000308• Piwowar, H. A., & Vision, T. J. (2013). Data reuse and the open data citation advantage. PeerJ, 1.

doi:10.7717/peerj.175• Sears, J. R. (2011). Data Sharing Effect on Article Citation Rate in Paleoceanography. AGU Fall Meeting

Abstracts, 53, 1628.• Savage, C. J., & Vickers, A. J. (2009). Empirical Study of Data Sharing by Authors Publishing in PLoS Journals.

PLoS ONE, 4(9), e7078. doi:10.1371/journal.pone.0007078• Vines, T. H., Albert, A. Y. K., Andrew, R. L., Debarre, F., Bock, D. G., Franklin, M. T., … Rennison, D. J. (2014).

The Availability of Research Data Declines Rapidly with Article Age. Current Biology, 24(1), 94–97. doi:10.1016/j.cub.2013.11.014

• Vines, T. H., Andrew, R. L., Bock, D. G., Franklin, M. T., Gilbert, K. J., Kane, N. C., … Yeaman, S. (2013). Mandated data archiving greatly improves access to research data. The FASEB Journal, 27(4), 1304–1308. doi:10.1096/fj.12-218164

• Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish your data too? Intelligence, 40(2), 73–76. doi:10.1016/j.intell.2012.01.004

• Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. The American Psychologist, 61(7), 726–728. doi:10.1037/0003-066X.61.7.726