Use and reuse: research data locally & globally #esipfed

41
USE AND REUSE Research data locally and globally Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley [email protected] Reusable with attribution: CC-BY The DCC is supported by Jisc & FP7

description

Keynote for ESIP Federation winter meeting in Washing DC, 2014-01-08

Transcript of Use and reuse: research data locally & globally #esipfed

Page 1: Use and reuse: research data locally & globally #esipfed

USE AND REUSEResearch data locally and globally

Kevin Ashley Digital Curation Centre

www.dcc.ac.uk@kevingashley

[email protected]

Reusable with attribution: CC-BYThe DCC is supported by Jisc & FP7

Page 2: Use and reuse: research data locally & globally #esipfed

2

Why does this matter?

• Research quality– How close can we get to

the truth?

• Research speed– How quickly can we get

to the truth?

• Research finance– How much does the

truth cost?

• Improving one or more of these is of interest to all actors:

• Researchers as data creators

• Researchers as data reusers

• Research institutions• Funders – hence

government and society2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 3: Use and reuse: research data locally & globally #esipfed

3

The Data Deluge is upon us

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Sensor’s ability to produce data outstrips IT’s ability to process it

Page 4: Use and reuse: research data locally & globally #esipfed

4

Funders are making demands

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 5: Use and reuse: research data locally & globally #esipfed

2014-01-08Kevin Ashley – ESIP Winter 2014 -

CC-BY5

http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx

EPSRC expects all those institutions it fundsto develop a roadmap that aligns … with EPSRC’s expectations by 1st May 2012;to be fully compliant … by 1st May 2015.

Page 6: Use and reuse: research data locally & globally #esipfed

2014-01-08Kevin Ashley – ESIP Winter 2014 -

CC-BY6

• Awareness of regulatory environment• Data access statement• Policies and processes• Data storage• Structured metadata descriptions• DOIs for data• Securely preserved for a minimum of 10 years

from last use

Page 7: Use and reuse: research data locally & globally #esipfed

7

Where are funders making demands?• USA – NSF, NEH, some philanthropic funders• UK• Germany – DFG• Europe – European Commission (H2020)

Often tied to requirements on open access to research publications – but not as common.

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 8: Use and reuse: research data locally & globally #esipfed

8

To universities, that looks like a problem

• Funder requirements exist for a reason:– That data is valuable

• Value to funder, society from reuse• Value to the institution is there also

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

BIS business case: £1.5m investment in research data services pays back 2.5 times after 5 years

Page 9: Use and reuse: research data locally & globally #esipfed

9

Research Data Centres – the solution!

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

MANY AREAS OF RESEARCH HAVE NO

DATA CENTRE TO SERVE THEM

Page 10: Use and reuse: research data locally & globally #esipfed

10

Data centres deliver valueWant a 400% -> 1200% return on your investment?

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Try BADC!

http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx

Page 11: Use and reuse: research data locally & globally #esipfed

11

Data reuse from Hubble

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 12: Use and reuse: research data locally & globally #esipfed

12

Don’t trust government

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

http://thetyee.ca/News/2013/12/23/Canadian-Science-Libraries/

Page 13: Use and reuse: research data locally & globally #esipfed

13

Commercial services

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 14: Use and reuse: research data locally & globally #esipfed

14

Cloud – sorted!

• Sorry, but it isn’t.• High-use datasets and long tail present

different economic and technical challenges• See David Rosenthal’s analysis of the

economics of Amazon for preservation“Distributed digital preservation in the cloud”IJDC 8(1), 2013 doi:10.2218/ijdc.v8i1.248

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 15: Use and reuse: research data locally & globally #esipfed

15

Cost of data for 100 years – local vs Amazon S3Data from blog.dshr.org/2013/01/talk-at-idcc2013.html © David Rosenthal, used under CC-BY-SA licence

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 16: Use and reuse: research data locally & globally #esipfed

16

Cost of data for 100 years – local vs Amazon S3 AND GlacierData from blog.dshr.org/2013/01/talk-at-idcc2013.html © David Rosenthal, used under CC-BY-SA licence

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 17: Use and reuse: research data locally & globally #esipfed

17

National responses – supporting universities

• USA – NSF initiatives (DataONE, SEAD, Data Conservancy et al)

• Australia – ANDS, RDSI• UK – DCC, Jisc ‘Managing Research Data’

programmes• Netherlands – Research Data Netherlands• Canada – Research Data Canada• Also grassroots or funder-led work in Finland,

Denmark, Germany2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 18: Use and reuse: research data locally & globally #esipfed

18

UK- Jisc acts through DCC to help

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 19: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 19

DCC ‘institutional engagement’

Assess needs

Make the case

Develop support and

services

RDM policy development

Customised Data Management Plans

DAF & CARDIO assessments

Guidance and training

Workflow assessment

DCC support

team

Advocacy with senior management

Institutional data catalogues

Pilot RDM tools

…and support policy implementation2014-01-08

Page 20: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 20http://dataintelligence.3tu.nl/en/home/

http://www

.sheffield.

ac.uk/is/re

search/pro

jects/

rdmrose

Choice of RDM training materials for librarians

Up-skilling for data

http://datalib.edina.ac.uk/mantra/libtraining.html

2014-01-08

Page 21: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 21

Australian National Data Service

2014-01-08

National Service, backed with university-level initiatives

Page 22: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 22

Excuses – and responses• “People will ask questions”

– So use a data centre or repository• “It will be misinterpreted”

– Stuff happens. Also, openness encourages correction• “It’s not interesting”

– Let others be the judge – your noise is my signal• “I might get another paper out of it”

– Up to a point. We might get more research out of it• “I don’t have permission”

– A real problem. But solvable at senior level• “It’s too bad/complicated” –see above• “It’s not a priority”

– Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well

2014-01-08

See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/

Page 23: Use and reuse: research data locally & globally #esipfed

232014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

These excuses bear a strong resemblance to those used by

politicians and civil servants who argue against the release of government

records

This is not a group you want to be compared with

Page 24: Use and reuse: research data locally & globally #esipfed

24

Integrity

• Not everyone publishes here

• Almost all fraud connected to unavailable data

• People suffer & die due to research fraud

• When your research is reproducible – it gets cited

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 25: Use and reuse: research data locally & globally #esipfed

25

Integrity – not without data

• Cyril Burt– Twin studies on intelligence.– Questioned 1976; now discredited

• Duke case– Data hiding leads to wasted treatments, clinical trials,

probable death & huge lawsuits• Dutch cases

– Stapel – 55 publications – “fictitious data”– Poldermans – fabricated data or negligence?

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256

Page 26: Use and reuse: research data locally & globally #esipfed

26

Should all data be open?

• NO• Many reasons – most to do with human

subjects• But data existence should always be open• Allows discovery & negotiation on use• Avoids pointless replication

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 27: Use and reuse: research data locally & globally #esipfed

27

Gentleman’s data centres

• Some data centres have club-like behaviour– Barriers to access– Only for contributors– Territorial

• Not without value, but barriers to progress

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 28: Use and reuse: research data locally & globally #esipfed

28

Citability

• Making data available increases citations• Everyone – academic, funder, institution –

loves citations• Want evidence?

– Alter, Pienta, Lyle – 240%, social sciences *– Piwowar, Vision – 9% (microarray data)†– Henneken, Accomazzi – 20% (astronomy) #

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1

* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.http://hdl.handle.net/2027.42/78307

# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618

Page 29: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 29

Can we find it?

• Data must be discoverable to be reused• Alone, or in conjunction with publication• Institutional catalogues, national data

registries, national and international domain-specific services

2014-01-08

Page 30: Use and reuse: research data locally & globally #esipfed

30

Data discovery around the world

• Research Data Australia• UK data registry pilot &

Gateway2Research• Research Data

Netherlands• World Data System• re3data.org &

databib.org – discovering repositories

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 31: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 31

Repository finders

2014-01-08

A re3data record

Page 32: Use and reuse: research data locally & globally #esipfed

32

A databibrecord

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 33: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 332014-01-08

Page 34: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 342014-01-08

Page 35: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 35

Other global work of note

• Domain initiatives such as Belmont forum• International generic groups – RDA, CODATA• Problem-specific services – Datacite, EZID,…

2014-01-08

Page 36: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 362014-01-08

Idea

Develop

Fund

Plan

Record

Process

Publish

Read

Page 37: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 372014-01-08

Idea

Develop

Fund

Plan

Record

Process

Publish

Read

Idea

Develop

Fund

Plan

Record

Process

Publish

Read

Page 38: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 38

Idea

Develop

Fund

Plan

Record

Process

Publish

Read

2014-01-08

Page 39: Use and reuse: research data locally & globally #esipfed

39

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull

• The 19th-century logs and photographs that help us model climate change

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Often your data tells stories that your

publications do not

Page 40: Use and reuse: research data locally & globally #esipfed

40

3TU treasure chest2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY

Page 41: Use and reuse: research data locally & globally #esipfed

Kevin Ashley – ESIP Winter 2014 - CC-BY 41

Thanks for your attention

[email protected]@kevingashley

2014-01-08