Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

56
Inverting the Pyramid: Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley [email protected] Reusable with attribution: CC-BY The DCC is supported by Jisc Maximising the value of research data to society

Transcript of Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Page 1: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Inverting the Pyramid:

Kevin Ashley Digital Curation Centre

www.dcc.ac.uk@kevingashley

[email protected]

Reusable with attribution: CC-BY

The DCC is supported by Jisc

Maximising the value of research data to society

Page 2: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

My home – the DCC

• Mission – to increase capability and capacity for research data services in UK institutions

• Not just a UK problem – an international one

• Training, shared services, guidance, policy, standards, futures

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 2

Page 3: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

DCC networks and partnerships

Original Slide: Martin Donnelly, DCC

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 3

Page 4: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

About me

• 35 years ago – a mathematician in medical research

• Acquired a skill for rescuing old data:

– Lost code books

– Lost programs

– Bad or obsolete media or systems

• It was fun – but it should not have been necessary

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 4

Page 5: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

My home – the DCC

• Mission – to increase capability and capacity for research data services in UK institutions

• Not just a UK problem – an international one

• Training, shared services, guidance, policy, standards, futures

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 5

Page 6: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Generic science data lifecycle

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 6

Adapted from: Harnessing the Power of Digital Data: Taking the Next Step.‖Scientific Data Management (SDM) for Government Agencies:Report from the Workshop to Improve SDM.

PLAN COLLECT INTEGRATE/TRANSFORM

PUBLISH DISCOVER ARCHIVE/DISCARD

Page 7: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

E-Science curation report - 2003

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 7

Page 8: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)
Page 9: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)
Page 10: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Herve L’Hour’s analysis

• Data lifecycles are linear, cyclical or spiral (sometimes all three)

• See more at http://www.dcc.ac.uk/events/research-data-management-forum-rdmf/rdmf11 - workflows & research data management

• Linear cycles are project-based or repository-based

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 10

Page 11: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Traditional knowledge management view of data

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 11

Image © John Curran @ designedforlearning.co.uk

Image from forwardmotion.eu

Page 12: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

But in research…

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 12

"DIKW-diagram" by RobOnKnowledge - Own work. Licensed under

Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons -http://commons.wikimedia.org/wiki/File:DIKW-diagram.png#mediaviewer/File:DIKW-diagram.png

Page 13: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

I ♥ your data!

I don’t ♥ what you said about it.

Page 14: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

LIDAR & RADAR images of ice cloud –H. Ruschennberg

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY14

Page 15: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -CC-BY

15

The Old weather project

Data for research, not from research

Page 16: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

• The 19th-century ships logs that help us model climate change

• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 16

Page 17: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Data reuse - messages

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 17

Often your data tells stories that your

publications do not

Not all data comes from other researchers

One person’s noise is another person’s signal

Discipline-bounded data discovery doesn’t give us

all we need or want

Page 18: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Understanding Biodiversity

• We don’t understand what drives it

• What helps, hinders speciation

• No one project or data source is enough

• Biology, geology, climate science, chemistry…

• Big and small problems

• Reanalysis & gap analysis

Page 19: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Research on Biodiversity…

• Requires many different data sources

• Not all will be published

• Not all publications are for similar research reasons, so…

• Citing the publication is irrelevant

• Some is research data, other government or reference data

Page 20: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 20

Why care?

• Data is expensive – an investment

• Reuse:

– More research

– Teaching & Learning

– Planning

• Impact – with or without publication

• Accountability

• Legal & regulatory requirements

Page 21: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Why does this matter?

• Research quality– How close can we get to

the truth?

• Research speed– How quickly can we get

to the truth?

• Research finance– How much does the

truth cost?

• Improving one or more of these is of interest to all actors:

• Researchers as data creators

• Researchers as data reusers

• Research institutions

• Funders – hence government and society

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -CC-BY

21

Page 22: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Creative data reuse

• http://vimeo.com/38402965

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 22

Page 23: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Integrity – not without data

• Cyril Burt– Twin studies on intelligence.– Questioned 1976; now discredited

• Duke case– Data hiding leads to wasted treatments, clinical

trials, probable death & huge lawsuits

• Dutch cases– Stapel – 55 publications – “fictitious data”– Poldermans – fabricated data or negligence?

2014-01-08 Kevin Ashley – ESIP Winter 2014 - CC-BY 23

“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256

Page 24: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Without data reuse:

•We can waste billions

•People suffer & die

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 24

Page 25: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Data reuse from Hubble

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 25

Page 26: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Data reuse is already happening – and researchers can change

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 26

Page 27: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Where can it happen

Global, international

Nationally

InstitutionBy Subject

Research Group

Page 28: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY28

Page 29: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Research data centres are good value!

• See Jisc reports on ADS, BADC, UKDA:

• Returns on investment between 400% and 1200%

• Unfortunately – many research domains have no relevant data centres

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 29

http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx

Page 30: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

“Provision for data management, for curation and long-term preservation, and for the sharing and re-use of data, varies wildly between subject areas.”

“The data management needs of many researchers are little considered or catered for.”

If greater provision is to be

made, a shortfall in

infrastructure (both technical

and human) must be

overcome.

Policy makers are aware that in many areas of enquiry, researchers’ access to well-managed, open and reusable data opens up significant opportunities.

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY30

All from JISC MRD 2 call, 2010

Page 31: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY31

Page 32: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY32

Page 33: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

The library as custodian

• Increasing role for library to provide access to institutional assets

• See Lorcan Dempsey’s thoughts on the inside-out library vs outside-in library

– http://www.slideshare.net/lisld/the-inside-out-library

• Build on library strengths – preservation, access, curation, selection

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 33

Page 34: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

G8UK - Endorses

OA

Open Data

Charter

Policy Paper

18 June 2013

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY34

Page 35: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Funder requirements

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 35

http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx

UK - RCUK

Canada

UK - RCUK

USA – NSF, NEH, etcDenmark

USA – non-government funders (Sloan, Gates,…)

Europe

Page 36: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

RCUK policy - The 1-minute version

• Research data are a public good – make openly available in timely & responsible way

• Have policies & plans. Data with long-term value should be preserved & usable

• Metadata for discovery & reuse. Link publications & data

• Sometimes law, ethics get in the way. We understand.

• Limited embargos OK. Recognition is important –always cite data sources

• OK to use public money to do this. Do it efficiently.

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 36

Page 37: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

EPSRC policy points

• Awareness of regulatory environment

• Data access statement

• Policies and processes

• Data storage

• Structured metadata descriptions

• DOIs for data

• Securely preserved for a minimum of 10 years from last use

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY

Compliance expected by 2015

Page 38: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY38

DCC Policy Summary

http://www.dcc.ac.uk/resources/policy-and-legal

Page 39: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Helping make data reuse possible –experience from the DCC

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 39

Page 40: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Some lessons – a summary• Data reuse is rarely as simple as people think it is• It is already happening• It is good for research, for researchers, for funders, for

universities• Without senior management attention and researcher

involvement, your initiative will fail• Research data management services cannot involve the

library alone• Researchers need to know your services exist• Training for young researchers in good data practice is

valuable

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 40

Page 41: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

DCC ‘institutional engagement’Assess needs

Make the case

Develop support and

services

RDM policy development

Customised Data Management Plans

DAF & CARDIO assessments

Guidance and training

Workflow assessment

DCC support

team

Advocacy with senior management

Institutional data catalogues

Pilot RDM tools

…and support policy implementation2014-11-25

Kevin Ashley –IMCW/ICKM-2014, Antalya -CC-BY

41

Original Slide: Graham Pryor, DCC

Page 42: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 42

Some institutional roles

• Leadership – coordinate action• Audit – who has what, where does it go?• Advice on access – data, wherever it is• Preservation – permanence• Citability• Data/publication linking• Promoting data in teaching• Selection• Education – early career researchers

Page 43: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Who (in the UK) is leading RDM work?

Library

IT

Research

Office

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 43

RESEARCHERS

Page 44: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

INSTITUTIONAL SERVICES

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 44

Page 45: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Some example services

• Storage – persistent, shareable

• Permanent, citeable identifiers

• Database as a service (e.g. Oxford ORDS)

• Embed tools in Excel – Dataup, others

• Workflow management – Taverna

• Training for early career researchers

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 45

Page 46: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Make data creation easier

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 46

Page 47: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Make data citable

• Making data available increases citations

• Everyone – academic, funder, institution –loves citations

• Want evidence?– Alter, Pienta, Lyle – 240%, social sciences *

– Piwowar, Vision – 9% (microarray data)†

– Henneken, Accomazzi – 20% (astronomy) #

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 47

† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1

* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.http://hdl.handle.net/2027.42/78307

# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618

Page 48: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Make data discoverable

• Data must be discoverable to be reused

• Alone, or in conjunction with publication

• Services include:

– Institutional catalogues

– national data registries

– Repository registries – databib, re3data

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 48

Page 49: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Dataverse –helping

researchers make data findable & reusable

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 49

Gking.harvard.edu/data

Page 50: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

DCC guidance

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 50

Page 51: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

http://dataintelligence.3tu.nl/en/home/

Choice of RDM training

materials for librarians

Up-skilling

for data

http://datalib.edina.ac.uk/mantra/libtraining.html

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY51

Page 52: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY 52

What data to keep

Page 53: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

The Data Deluge is upon us

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 53

Sensor’s ability to produce data outstrips IT’s ability to process it

Page 54: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

Roles and Responsibilities

What data to keep

2014-11-25Kevin Ashley –IMCW/ICKM-2014, Antalya -

CC-BY54

Page 55: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

IDCC15 – London, Feb 9-12 2015

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 55

http://www.dcc.ac.uk/events/idcc15

The 10th

International Digital Curation Conference

Page 56: Inverting the data pyramid: maximising the value of data reuse (IMCW2014/ICKM2014 keynote)

My message to researchers• The credit belongs to you

• The data belongs to all of us

• Share, and we all reap the benefits

• The story doesn’t end with a publication

2014-11-25 Kevin Ashley –IMCW/ICKM-2014, Antalya - CC-BY 56