Managing your research data Stephen Grace and David McElroy Managing your research data workshop, 02...
-
Upload
osborn-newton -
Category
Documents
-
view
215 -
download
0
Transcript of Managing your research data Stephen Grace and David McElroy Managing your research data workshop, 02...
Managing your research data
Stephen Grace and David McElroy
Managing your research data workshop, 02 February 2014
Why are you here?
• You’re managing data (your own or your group's)• Or you think you maybe should be• You’re not sure why it matters• You’re not sure how best to do it• You’d like to know whether you’re on the right
track
Why manage research data?
• To make your research easier• To stop yourself drowning in irrelevant stuff• In case you need the data later• To avoid accusations of fraud or bad science• To share your data for others to use & learn from• Potential collaborations• To get credit for producing it (even if you aren’t
the lead author)• Get citations for datasets, independently of
publications• Because somebody else said to do so
What is data management?“the active management and appraisal of data over the lifecycle of scholarly and scientific interest”
Digital Curation Centre
Data management is just part of good research practice
What is involved in RDM?
• Data Management Planning
• Creating data
• Documenting data
• Accessing / using data
• Storage and backup
• Preserving data
• Sharing data
Create
Document
Use
Store
Preserve
Share
Today’s Workshop
1. Defining your data
2. Looking after your data
3. Sharing your data
4. Archiving your data
5. Executing your plan
http://youtu.be/N2zK3sAtr-4
File formats for long-term access
• Unencrypted and uncompressed• Non-proprietary/patent-encumbered• Open, documented standard• Standard representation (ASCII, Unicode)
Type Recommended Avoid for data sharing
Tabular data CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTFPDF/A only if layout matters
Word
Media Container: MP4, OggCodec: Theora, Dirac, FLAC
QuicktimeH264
Images TIFF, JPEG2000, PNG GIF, JPG
Structured data XML, RDF RDBMS
Further examples: http://www.data-archive.ac.uk/create-manage/format/formats-table
Documentation
What would someone unfamiliar with your data need in order to find, evaluate,
understand, and reuse them?
Consider the differences between someone inside your research group, someone outside your group but in your field, and someone outside your field
Two parts: metadata and methods
Metadata
• About the project– Title, people, key dates, funders and grants
• About the data– Title, key dates, creator(s), subjects, rights,
included files, format(s), versions, checksums
• Keep this with the data
Methods
• Document what you did (A published article may not be enough)
• Document any limitations of what you did
• If you ran code on the data, document the code and keep it with the data
• Need a codebook? Or a data dictionary?– If I can’t identify at sight what each bit of your dataset means, yes,
you do need a codebook or data dictionary– DO NOT FORGET THE UNITS!
Reason #1 for not reusing someone else’s data: “I don’t know enough about how it was gathered to trust it.”
Standards
• Why reinvent the wheel? If there’s a standard format for your data or how to describe it, use that!
• The tricky part is finding the right standard– Standards are like toothbrushes...– But using standards is good hygiene!– Your librarian can often help you find relevant standards.– Also check out the DCC catalogue of disciplinary
metadatahttp://www.dcc.ac.uk/resources/metadata-standards
2. Looking after your data
• What if…• Where to store your data• How to backup your data• Sensitive data• What to keep
For more on this story, see “Why YOU need a Data Management Plan” blog post:
http://blogs.ch.cam.ac.uk/pmr/2011/08/01/why-you-need-a-data-management-plan
What if this was your laptop?
Where to store your data?
• Your own drive (PC, server, flash drive, etc.)– And if you lose it? Or it breaks?
• Somebody else’s drive
• University drive
• Cloud services like Dropbox/OneDrive– Do they care as much about your data as you do?
How to backup?
• 3… 2… 1… backup!
– at least 3 copies of a file– on at least 2 different media– with at least 1 offsite
• Use managed services where possible e.g. University filestores rather than local or external hard drives
• Ask IT Services or your supervisor for advice
Is your sensitive data secure?
• Access– Who should/shouldn’t have access to your live data?
• Encryption– Working data, Backups, Shares– TrueCrypt project terminated: VeraCrypt & CipherShed are new
but not compatible with TrueCrypt containers. – Backup your password
• Deletion– Data is stored on drive even after deletion– Software is available to ‘shred’ files– And physical destruction is effective
What to keep?
It’s not possible to keep everything. Select based on:
– What has to be kept e.g. data underlying publications
– What can’t be recreated e.g. environmental recordings
– What is potentially useful to others
– What has scientific, cultural or historical value
– What legally must be destroyed
– ...
How to select and appraise research data:www.dcc.ac.uk/resources/how-guides/appraise-select-research-data
How to share/preserve data?
• What is required?– By your funder– By your publisher– By your university– By your supervisor
• What subject repositories, data centres and structured databases are available?
http://databib.org
Expectations of public access
“Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that
does not harm intellectual property.”
RCUK Common Principles on Data Policyhttp://www.rcuk.ac.uk/research/datapolicy/
If you plan to share your data....
• Have you got consent for sharing?
• Do any licences you’ve signed permit sharing?
• Is your data in suitable formats?
Decisions made early on affect what you can do later
Benefits of sharing data (1)
www.nytimes.com/2010/08/13/health/research/13alzheimer.html?pagewanted=all&_r=0
“It was unbelievable. Its not science the way most of us have practiced in our careers. But we all realised that we would never get biomarkers unless all of us parked our egos and intellectual property noses outside the door and agreed that all of our data would be public immediately.”
Dr John Trojanowski, University of Pennsylvania
... scientific breakthroughs
Benefits of sharing data (2)
www.guardian.co.uk/politics/2013/apr/18/uncovered-error-george-osborne-austerity
... validation of results
“It was a mistake in a spreadsheet that could have been easily overlooked: a few rows left out of an equation to average the values in a column.
The spreadsheet was used to draw the conclusion of an influential 2010 economics paper: that public debt of more than 90% of GDP slows down growth. This conclusion was later cited by the International Monetary Fund and the UK Treasury to justify programmes of austerity that have arguably led to riots, poverty and lost jobs.”
Benefits of sharing data (3)
http://physicsworld.com/cws/article/news/2014/sep/22/bicep2-gravitational-wave-result-bites-the-dust-thanks-to-new-planck-data
"We're still discussing the details but the idea is to exchange data between the two teams and eventually come out with a joint paper," Dr Jan Tauber, Planck satellite project scientist (July 2014)
“Scientists on rival projects looking for evidence that the early Universe underwent a super-expansion are in discussion about working together.”
Initial paper by BICEP2 team has now been shown to misinterpreted results due to cosmic dust. Data from both Planck Satellite and BICEP2 now being studies together, and will lead to more papers.
http://www.bbc.co.uk/news/science-environment-28127576
... important collaborations
Benefits of sharing data (4)
“There is evidence that studies that make their data available do indeed receive more citations than similar studies that do not.”
Piwowar H. and Vision T.J 2013 "Data reuse and the open data citation advantage“ https://peerj.com/preprints/1.pdf
9% - 30% increase
... more citations
Think about barriers to sharing...
Photo by @boetter http://www.flickr.com/photos/jakecaptive/3205277810
4. Archiving your data
• Handing over long-term care and management of data to someone else– A national data centre– A subject repository– UEL’s own data repository– Online repository (e.g. Figshare)
data.uel
• Data repository to complement ROAR (which is for research publications)
• Submit data to data.uel and get citations if it is reused, and statistics on where in the world it is downloaded
• Students, submit your complete thesis to ROAR with data appendix/appendices
• DOI (digital object identifier) for open records
http://data.uel.ac.uk
5. Executing your plan
• Data Management Plans (DMP)• Funders may require a DMP• DMPonline tool to help you write a plan
Data Management Plans
DMPs are often submitted with grant applications, but are useful whenever you are creating data to:
• Make informed decisions to anticipate and avoid problems
• Avoid duplication, data loss and security breaches
• Develop procedures early on for consistency
• Ensure data are accurate, complete, reliable and secure
• Save time and effort – make your life easier!
What do research funders want?
• A brief plan submitted in grant applications, and in the case of NERC, a more detailed plan once funded
• 1-3 sides of A4 as attachment or a section in Je-S form
• Typically a prose statement covering suggested themes
• An outline of data management and sharing plans, justifying decisions and any limitations
Five common themes
1. Description of data to be collected / created (i.e. content, type, format, volume...)
2. Standards / methodologies for data collection & management
3. Ethics and Intellectual Property (highlight any restrictions on data sharing e.g. embargoes, confidentiality)
4. Plans for data sharing and access (i.e. how, when, to whom)
5. Strategy for long-term preservation
Help from the DCC
https://dmponline.dcc.ac.uk
www.dcc.ac.uk/resources/ how-guides/develop-data-plan
a web-based tool to help you write DMPs according to different requirements, with UEL templates for staff and students
How DMPonline works
Create a plan based on relevant funder /
institutional templates...
...and then answer the questions using the guidance provided
DMPonline template for PGR use
Short questions
to take you through
each step
• Designed with PGR students in mind• Another one available for research staff• Try it and share with us for feedback/review
Example plans
• Technical plan submitted to AHRC by Bristol Unihttp://data.bris.ac.uk/files/2013/02/data.bris-AHRC-Technical-Plan-v21.pdf
• Rural Economy & Land Use (RELU) programme exampleshttp://relu.data-archive.ac.uk/data-sharing/planning/examples
• UCSD example DMPs (20+ scientific plans for NSF)http://rci.ucsd.edu/dmp/examples.html
• My DMP – a satire (what not to write!) http://ivory.idyll.org/blog/data-management.html
Tips on writing DMPs
• Keep it simple, short and specific
• Seek advice - consult and collaborate
• Base plans on available skills and support
• Make sure implementation is feasible
• Justify any resources needed or sharing restrictions
http://www.youtube.com/watch?v=7OJtiA53-Fk
– We can help when you write Data Management Plans for grants to increase your chances of getting funded
– Put plans in place to help existing projects– Help you manage/describe/share (if appropriate) your
data more effectively – Give advice and signposting with your own data needs
and questions– Training for staff, students and support staff
Acknowledgements
Based on Sarah Jones “Research Data Management” presentation at UEL 1 May 2013 © DCC 2013 CC-BY
Thanks to Dorothea Salo, Ryan Schryver and colleagues for content from the “Escaping Datageddon” presentation, available at: http://www.slideshare.net/cavlec/escaping-datageddon
And to the Research360 project at the University of Bath for the “Managing your research data” presentation, available at: http://opus.bath.ac.uk/32296
Thank you
Stephen Grace, 020 8223 7858
David McElroy, 020 8223 6467
Questions to [email protected]
Find us at www.uel.ac.uk/researchdata
Blog at datamanagementuel.wordpress.com