I conference2015 goble-finalupload

83
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks Professor Carole Goble CBE FREng FBCS The University of Manchester, UK The Software Sustainability Institute [email protected] iConference, 26 March 2015, Newport Beach, Los Angeles, USA

Transcript of I conference2015 goble-finalupload

Results Vary The Pragmatics of Reproducibility and Research Object FrameworksProfessor Carole Goble CBE FREng FBCS

The University of Manchester UK

The Software Sustainability Institute

carolegoblemanchesteracuk

iConference 26 March 2015 Newport Beach Los Angeles USA

What do I do CyberInfrastructure EcoSystems

e-Lab Collabs ampShared Asset Repositories

Knowledge Metadata Linked Data Ontologies

Software Engineering for Scientists

ComputationalWorkflow Systems

Scholarly Comms

Reproducibility

MicroPublications

Open Science

Research Objects

Linked Data forScience

Scientific EgoSystems

Biodiversity

Systems Biology

Synthetic Biology

Astronomy

HelioPhysics

Genomics

Health Epidemiology

Digital Preservation

Social Science

Pharmacology

Knowledge Turning Flow

Barriers to Cure

raquo Access to scientific resources

raquo Coordination and Collaboration

raquo Flow of Information

httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

What do I do CyberInfrastructure EcoSystems

e-Lab Collabs ampShared Asset Repositories

Knowledge Metadata Linked Data Ontologies

Software Engineering for Scientists

ComputationalWorkflow Systems

Scholarly Comms

Reproducibility

MicroPublications

Open Science

Research Objects

Linked Data forScience

Scientific EgoSystems

Biodiversity

Systems Biology

Synthetic Biology

Astronomy

HelioPhysics

Genomics

Health Epidemiology

Digital Preservation

Social Science

Pharmacology

Knowledge Turning Flow

Barriers to Cure

raquo Access to scientific resources

raquo Coordination and Collaboration

raquo Flow of Information

httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Scientific EgoSystems

Biodiversity

Systems Biology

Synthetic Biology

Astronomy

HelioPhysics

Genomics

Health Epidemiology

Digital Preservation

Social Science

Pharmacology

Knowledge Turning Flow

Barriers to Cure

raquo Access to scientific resources

raquo Coordination and Collaboration

raquo Flow of Information

httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Knowledge Turning Flow

Barriers to Cure

raquo Access to scientific resources

raquo Coordination and Collaboration

raquo Flow of Information

httpforatv20100423Sage_Commons_Josh_Sommer_Chordoma_Foundation

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

[Pettifer Attwood]

httpgetutopiacom

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Virtual WitnessingScientific publications

raquo announce a result

raquo convince readers the result is correct

ldquopapers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extensionrdquo

Jill Mesirov Broad Institute 2010

Accessible Reproducible Research Science 22 January 2010 Vol 327 no 5964 pp 415-416 DOI 101126science1179653

Leviathan and the Air-Pump Hobbes Boyle and the Experimental Life (1985) Shapin and Schaffer

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases 2015

ldquoOnly one of the 58 papers reported all essential criteria on our checklist Animal age gender housing conditions and mortalitymorbidity were all poorly reportedhelliprdquo

httpwwwnaturecomnewsmale-researchers-stress-out-rodents-115106

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

ldquoAn article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship The actual scholarship is the complete software development environment [the complete data] and the complete set of instructions which generated the figuresrdquo

David Donoho ldquoWavelab and Reproducible Researchrdquo 1995

Datasets Data collectionsStandard operating proceduresSoftware algorithmsConfigurations Tools and apps servicesCodes code librariesWorkflows scriptsSystem software Infrastructure Compilers hardware

Morin et al Shining Light into Black Boxes Science 2012 336(6078) 159-160 Ince et al The case for open computer programs Nature 482 2012

50 papers randomly chosen from 378

manuscripts in 2011 that use Burrows Wheeler Aligner for mapping Illumina reads

31 no sw version parameters exact

version of genomic reference sequence

26 no access to primary data sets

Nekrutenko amp Taylor Next-generation sequencing data interpretation enhancing reproducibility and accessibility Nature Genetics 13 (2012)

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Broken software Broken science

raquo Geoffrey Chang Scripps Institute

raquo Homemade data-analysis program inherited from another lab

raquo Flipped two columns of data inverting the electron-density map used to derive protein structure

raquo Retract 3 Science papers and 2 papers in other journals

raquo One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right)

Miller A Scientists Nightmare Software Problem Leads to Five Retractions Science 22 December 2006 vol 314 no 5807 1856-1857httpwwwsoftwareacukblog2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Software making practices

ldquoAs a general rule

researchers do not

test or document their

programs rigorously

and they rarely

release their codes

making it almost

impossible to

reproduce and verify

published results

generated by

scientific softwarerdquo

2000 scientists JE Hannay et al ldquoHow Do Scientists Develop and Use Scientific Softwarerdquo Proc ICSE Workshop Software Eng for Computational Science and Eng 2009 pp 1ndash8

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

republic of science

regulation of science

institution cores libraries

Mertonrsquos four norms of scientific behaviour (1942)

public services

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Tools StandardsMachine actionableFormats Reporting Policies Practices

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Record and AutomateEverything

Potential Trace Heaven Folks

recomputationorg

sciencecodemanifestoorg

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Honest Error Science is messy

Inherent

ReinhartRogoff Austerity economicsThomas Herndon

Nature Oct rsquo12

Zoeuml Corbyn

Fraud

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

ldquoI canrsquot immediately reproduce the research in my own laboratory It took an estimated 280 hours for an average user to approximately reproduce the paperrdquo

Prof Phil BourneAssociate Director NIH Big Data 2 Knowledge Program

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

When research goes ldquowrongrdquo

raquo Tainted resources

raquo Black boxes

raquo Poor Reporting

raquo Unavailable resources results data software

raquo Bad maths

raquo Sins of omission

raquo Poor training sloppiness

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Ioannidis Why Most Published Research Findings Are False August 2005Joppa et alTroubling Trends in Scientific Software Use SCIENCE 340 May 2013

Scientific method

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Social environmentraquo Impact factor mania

raquo Pressure to publish

raquo Broken peer review

raquo Research never reported

raquo Disorganisation

raquo Time pressures

raquo Prep amp curate costs

When research goes ldquowrongrdquo

httpswwwsciencenewsorgarticle12-reasons-research-goes-wrong (adapted)

Morrison

Do a Replication Study No thanks Not FAIR

Hard Resource intensiveUnrecognised TrolledJust gathering the bits together

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Cross-Institutional e-Laboratory Fragmentation

Scattered parts Subject specific General resources

101 Innovations in Scholarly Communication - the Changing Research Workflow Boseman and Kramer 2015 httpfigsharecomarticles101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow1286826

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Process at ScaleMore on Models

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

httpsdoiorg1015490seek1investigation56

[Snoep 2015]

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

httpsdoiorg1015490seek1investigation56

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Personal Data

Local Stores

External

Databases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Aggregated Commons Infrastructure

Consistent Comparative Reporting

Design protocols samples software modelshellip

httpwwwseek4scienceorg httpwwwfair-domorg httpisatoolsorg

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Pop-Up Start UpsLittle Science within Big Science

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

How do Scientists Collaborate amp Cooperatively ExchangeCautiously Its all about The Trust

Extrinsic Driver

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

How do you get Scientists and Developers to work together Socially Its all about The Trust

Jam today Jam tomorrow Jam for all Just enough Jam Just in Time not Just in Case

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Research Objects

Compound Interconnected Investigations Research Products

Multi-variousProductsPlatformsResources

Units of exchange commons contextual metadata

httpwwwresearchobjectorg

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

httpwwwresearchobjectorg

First class citizens - data software methods - id manage credit track profile focus

A Framework to Bundle and Link (scattered) resources related experiments Metadata Objects that carry Research Context

Research Objects

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Bigger on the inside than the outside

Contentbull closed lt-gt open bull local lt-gt alienbull embed lt-gt referbull fixed lt-gt fluidbull nestedbull cite resolve steward

Contributionsbull multi ndashtyped stewarded

sited authoredbull span research researchers

platforms timebull cite resolve steward

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Identity + Minimal Provenance

RO Resolution and Citation

rsaquo Defend it (snapshot)

rsaquo Locate it (most recent)

rsaquo Reuse it (a version a component)

rsaquo Credit it (contributory authorship)

rsaquo Cross link it (connections)

Biological Study Records (eg PRIDE) stableBiological Knowledge (eg UNIPROT) evolving

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Goble De Roure Bechhofer Accelerating Knowledge Turns I3CK 2013

means

ends

driver

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Research Object packages codes study and metadata to exchange descriptions of clinical study cohorts statistical scripts data Farr Research Object Commons

STELAR Asthma e-Lab Study Team for Early Life Asthma Research

Platform exchange ClinicalCodesorg coded patient cohorts exchange with NHS FARSITE system

STELAR e-Lab

Platform 1

Platform 2

Platform 3

A multi-site collaboration to support safe use of patient and research data for medical research

Research Object CurrencyCohort Studies

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Focus on methods models workflows scripts software data figureshellip

Research Object Pivots and Profiles

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Focus on the figure F1000Research Living Figuresversioned articles in-article data manipulation

R Lawrence Force2015 Vision Award Runner Up httpf1000compostersbrowsesummary1097482

Simply data + code

Can change the definition of

a figure and ultimately the

journal article

Colomb J and Brembs B

Sub-strains of Drosophila Canton-S differ

markedly in their locomotor behavior [v1

ref status indexed httpf1000res3is]

F1000Research 2014 3176

Other labs can replicate the study or

contribute their data to a meta-

analysis or disease model - figure

automatically updates

Data updates time-stamped

New conclusions added via versions

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Jennifer Schopf Treating Data Like Software A Case for Production Quality Data JCDL 2012

Software-like Release paradigm Not a static document paradigm

Reproduce looks backwards -gt Release looks forwards

raquo Science methods data change -gt agile evolution

raquo Comparisons versions forks amp merges dependencies

raquo Id amp Citations

raquo Interlinked ROs

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

[McEntyre]

Retrospective Release Research Object

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

The ROs Meme

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

recompute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

What IS reproducibilityRe ldquodo againrdquo ldquoreturn to original staterdquo

ldquoshow A is true by doing Brdquo

verify but not falsify[Yong Nature 485 2012]

robustness tolerance

verification compliance

validation assurance

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

1 Science Changes So does the Lab

ldquoThe questions donrsquot

change but the

answers dordquoDan Reed

The lab is not fixedUpdated resources

UncertaintyBioSTIF

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Zhao et al Why workflows break - Understanding and combating decay in Taverna workflows 8th Intl Conf e-Science 2012

2 Instruments Break Labs Decaymaterials become unavailable technicians leave

Reproducibility Window

raquo Bit rot Black boxes

raquo Proprietary Licenses

raquo Clown services

raquo Partial replication

raquo Prepare to Repair

rsaquo form or function

rsaquo preserve or sustain

Jason Scott

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

RO as Instrument Materials Method

Input Data

Software

Output Data

ConfigParameters

Methods(techniques algorithms

spec of the steps)

Materials(datasets parameters

algorithm seeds)

Experiment

Instruments(codes services scripts

underlying libraries)

Laboratory(sw and hw infrastructure

systems software

integrative platforms)

Setup

Drummond Replicability is not Reproducibility Nor is it Good Science onlinePeng Reproducible Research in Computational Science Science 2 Dec 2011 1226-1227

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Research Environment

submit articleand move onhellip

publish articlePublication Environment

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Research Environment

publish articlePublication Environment

submit articleand move onhellip

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

[Adapted Freire 2013]

transparencydependencies

steps featuresprovenance trace

portability

robustness

preservation

accessavailable

descriptionintelligible

standardscommon APIs

licensing

standardscommon

metadata

change managementversioning

packaging

Machine actionable

Machine actionable

Reproducibility Framework

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

submit articleand move onhellip

Reporting

Documentation

Provenance ndashThick Trace Data

to Distilled Reporting

Distillation and

Summarisation

Alper P et al LabelFlow Exploiting Workflow Provenance to Surface Scientific Data Provenance IPAW 2014 84-96

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Reproduce by ReadingArchived Record Retain the ProcessCode

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

The IT Crowd Series 3 Episode 4

The eLab Virtual Machine (or Docker Image)

a black box thoughdockercom

Reproduce by Running Active InstrumentRetain the bits

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Portability

Transparency

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

ReproZip

Workflowsmakefiles

serviceScience as a Service

Integrative frameworks

Open Source

WorkflowsScripts

Virtual Machines

Portable Packaging

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

Shared Repository

Personal Notebook

Community Registry

Publishing Resource

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Fifty Shades of Research Object

Workflow Instrument

Example data and configComponentsPlug-ins Versions

Workflow System Instrument

Software package

Workflow RunsData and configsProvenance logs

Study

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

standardsAdobe

UCFORE PROVODF

formats

api

Instrument

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

httpwwwcnrirestonvauspapersOverviewDigitalObjectArchitecturepdf

Instrument

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

NISO-JATS

Instrument

J Zhao G Klyne M Gamble CA Goble - A Checklist-Based Approach for Quality Assessment of Scientific InformationProceedings of the Third Linked Science Workshop 2013

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Platform profiles

NISO-JATS

Instrument

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Container

Manifest

OMEX archive

httpsresearchobjectgithubiospecificationsbundle

Bergman et al COMBINE archive and OMEX format one file to share all information to reproduce a modeling project BMC Bioinformatics 2014 15369

Retro-Fitted ROsusing off the shelf

platforms

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Method Matters

Reproducibility Smarts

Commons not Repository

Research Tardis

Retro-fit ROs

Do As Little As Possible

Make -gt Born

Native RO platforms

RARE amp FAIR Knowledge Turns Means Research Objects

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

httpdoctorwhosite1weeblycomsonic-screwdrivershtml

Researchers

Silver bullet tools

Psychic paper

httpbowjamesbowca20080608shhhhhhh-silencshtml

PI Team

RARE Research Reality Check

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

RARE Research Reality Check

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Tribal Behaviour

raquo Gangs share but not with the public

raquo Tribal behaviours rsaquo Modellers share more than Experimentalists

rsaquo Experimentalists reuse models more than Modellers

raquo Trading behavioursrsaquo Collaboration ndash complementarity

correlations

raquo Structured consortia less likely to publicly share than individuals

raquo Post-hoc rationalised DataModel Cycles

[Garza 2014]

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

raquo Fluid transient collaborations gt ldquomy gangrdquo management

raquo Shameless exploitation of head teacher (PI) competitiveness amp vanity

raquo Class captains (prefects)

raquo Get the cool kids on board

raquo Head teacher leadership

[Garza 2014]

Playground Rules

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Trace Data

27032015 74

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

me

ME

my team

closecolleagues

peers

The Research Release Creep Spiral

raquo Data Hugging amp Flirting

raquo Reciprocity norms

raquo Hans W request

raquo Dowry phenomenon

raquo Private installations

raquo Private spaces on shared installations

raquo Safe havens

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Too ugly to show anyone else

Readers who have access will want user support No-one else would be interestedfind it usefulbe able to use it

The code is too sophisticated for most readersrefereesI didnt work out all the details

I didnt actually write the code -- my student did

My competitors would be unfair to me

Its valuable intellectual property

It would make papers much longer

Referees would never agree to check the code

My code invokes other code with unpublished (proprietary) code

Randall J LeVeque Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM NewsVictoria Stodden AMP 2011 httpwwwstoddennetAMP2011

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Drivers

love money

fame duty

fear timeeffort

shame duty

[Apologies to Resnick and Malone]

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Stealthy not Sneakyreduce the friction

instrumentationspan RARE and FAIR

Optimising The Neylon Equation

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Interface Framingraquo Limited scheduled sharing choices

rsaquo Never say never

raquo ldquoCitablerdquo not ldquoSharedrdquo

raquo Feedback

rsaquo Guilt tripping

rsaquo Outlier finger pointing

[Garzia]

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Auto-magical end-to-end Instrumentationhttpswwwyoutubecomwatchv=QVQwSOX5S08

ELNs and Authoring Platforms

Sweave

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Credit ne AuthorshipResearch Currencies

ldquoResearchBitCoinrdquo

Citation Semantics

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Training

56Of UK researchers develop their own research software or scripts

73 Of UK researchers have had no formal software engineering training

Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014 406 respondents covering representative range of funders discipline and seniority

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

httpwwwrseacuk

Instrument Artisans

[Shapin 84]

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Make Software Visible[1960s Boeing 747-100 Software Configuration]

Howison and Bullard 2014 The visibility of software in the scientific literature how do scientists mention software and how effective are those mentions J Assoc fo Info Science and Technology In review

87 software findable78 credit37 formal citation 5 actual version

90 Bio articles24 journals had citation policy

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

BUThelliphellip

two years time when the paper is writtenreviewers want additional workstatistician wants more runsanalysis may need to be repeatedpost-doc leaves student arrivesnew data revised dataupdated versions of algorithmscodessample was contaminated

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Inspired by Bob Harrison

bull Incremental shift for infrastructure providers

bull Moderate shift for policy makers and stewards

bull Paradigm shift for researchers and their institutions

The RO amp Reproducibility Challenge

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

All the members of the Wf4Ever teamColleagues in Manchesterrsquos Information Management Group

httpwwwresearchobjectorg

httpwwwwf4ever-projectorg

httpwwwfair-domorg

httpseek4scienceorg

httprightfieldorguk

httpwwwsoftwareacuk

httpwwwdatafairportorgAlan WilliamsJo McEntyreNorman MorrisonStian Soiland-ReyesPaul GrothTim ClarkJuliana FreireAlejandra Gonzalez-BeltranPhilippe Rocca-SerraIan CottamSusanna SansoneKristian Garza

Barend MonsSean BechhoferPhilip BourneMatthew GambleRaul PalmaJun ZhaoNeil Chue HongJosh SommerMatthias ObstJacky SnoepDavid GavaghanRebecca Lawrence

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble

Contacthellip

Professor Carole Goble

The University of Manchester UK

carolegoblemanchesteracuk

httpssitesgooglecomsitecarolegoble

CaroleAnneGoble