Rin goble-published

21
Data sharing Data management The SysMO- SEEK Story Professor Carole Goble FREng FBCS CITP University of Manchester, UK [email protected]

description

My experiences of the SysMO-DB project for data sharing and data management in the field by systems biologists.

Transcript of Rin goble-published

Page 1: Rin goble-published

Data sharingData management

The SysMO-SEEK Story

Professor Carole Goble FREng FBCS CITPUniversity of Manchester, [email protected]

Page 2: Rin goble-published

13 teams91 institutes, 300 scientistsMulti-site, multi-disciplinaryEach three year duration

Data generationData consumptionData analysis

Data management:Local – Shared – Long term

Pan European Systems Biology

http://www.sysmo.net

Page 3: Rin goble-published
Page 4: Rin goble-published

Own data solutions. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets.

Extreme caution over sharing.Modellers vs experimentalist tribalism

Many institutions, many projects, overlapping memberships, changing membership. Projects ending, starting, carrying on the same, carrying on differently.

Legacy

Suspicion

Dynamics

Expert scientists, inexpert informaticians. Few resources.

Skills

Patchy standards, incomparable data, afterthought.

Data

Page 5: Rin goble-published

Scientist Lab Collaborators Competitors

Programm

ePublished

Post-Publication

Pre-Publication

Page 6: Rin goble-published

Data mine-ing

“my impression of researchers, and I can criticize myself in this, is that we’re much more interested in sharing data when we mean sharing somebody else’s as opposed [to] sharing ours.”

E-infrastructure - taking forward the strategy, RIN report, 2010

Page 7: Rin goble-published

Competitive advantage.Adoption.

Kudos & Credit.Help.Fame.

Reputation.

Being scooped.Scrutiny.

Misinterpretation.Cost.

Blame. Reputation.

Rew

ards

Risk

s

Nature 461, 145 (10 September 2009)

1. Sharing

Page 8: Rin goble-published

“It’s not ready yet”

“I need to get (another) publication first”

“We don’t have the resources or skills to prepare it for others, esp. now we finished that project”

“Its faster/easier to do it myself, and will keep the credit/control too”

“Its not described enough to be usable”

“I don’t trust the quality. Its not reliable enough. Its too noisy.

“Others won’t use it properly.” “It’s not worth my while”“They are my competitors!!”

Page 9: Rin goble-published

Pseudo Sharing

Page 10: Rin goble-published

2. Preparation for Use Curation StandardsReusabilityReproducibilityAccountability & QualityData discipline Silo busting

Page 11: Rin goble-published

CIMR Core Information for Metabolomics ReportingMIABE Minimal Information About a Bioactive Entity MIACA Minimal Information About a Cellular Assay MIAME Minimum Information About a Microarray Experiment MIAME/Env MIAME / Environmental transcriptomic experiment MIAME/Nutr MIAME / Nutrigenomics MIAME/Plant MIAME / Plant transcriptomics MIAME/Tox MIAME / Toxicogenomics MIAPA Minimum Information About a Phylogenetic Analysis MIAPAR Minimum Information About a Protein Affinity Reagent MIAPE Minimum Information About a Proteomics Experiment MIARE Minimum Information About a RNAi Experiment MIASE Minimum Information About a Simulation Experiment MIENS Minimum Information about an ENvironmental Sequence MIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGen Minimum Information about a Genotyping Experiment MIGS Minimum Information about a Genome Sequence MIMIx Minimum Information about a Molecular Interaction Experiment MIMPP Minimal Information for Mouse Phenotyping Procedures MINI Minimum Information about a Neuroscience Investigation MINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFE Minimal Information for Protein Functional Evaluation MIQAS Minimal Information for QTLs and Association Studies MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experimentMIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry

ExperimentsSTRENDA Standards for Reporting Enzymology DataTBC Tox Biology Checklist

BioPAX : Biological Pathways Exchange http://www.biopax.org/FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditionshttp://www.mibbi.org/index.php/MIBBI_portal

Minimum Information for Biological and Biomedical Investigations

Metadata Minefield

Page 12: Rin goble-published

http://usefulchem.wikispaces.com/page/code/EXPLAN001

http://www.mygrid.org.uk/tools/taverna/

Publishing Process

modelssoftware

methods

scripts

http://openwetware.org

standard operating procedures

Page 13: Rin goble-published

Community Curation Responsiblity

Page 14: Rin goble-published

Blue Collar ScienceJohn Quackenbush

Difficult and time consuming

Poor Creditor Reward

Shabby CareerPaths & Prospects

Page 15: Rin goble-published

3. Credit Crisis• Reward sharing, curation and

reuse rather than reinvention. • Credit. Attribution. Citation.• For software, methods and

standards too.

• Technical (DataCite.org).• Cultural (Respected policy).• Institutional.• Funding bodies.

Page 16: Rin goble-published

4. Infrastructure, Capability & Capacity• Three year

PhD/project cycle• Local data control• Realistic paths to

adoption by busy people.

• Spreadsheets, wikis, catalogues and yellow pages.

• Content and Tools

Page 17: Rin goble-published

http://www.biosharing.org

Identity ManagementSharednames DataCiteLSID DOIs ORCID

5. Data Ecosystem

Resources

Page 18: Rin goble-published

6. Sustained Resources• Three year projects.• Three year lifespan of data (and its software).• Sunsets and Sustains• Reinvention rewarded

• Institution.• Funding councils.• Funding panels.• Publishers• Libraries• National data centres• International data centres

Free. Like Puppies

Page 19: Rin goble-published

Incentives.Sensitivity to Behaviours

Infrastructure

Community building

Trusted service

CoordinationGovernance

Policy

Capability

Community Integration

Page 20: Rin goble-published

A Partnership• Software engineers• Computational scientists• Experimental Scientists• Domain informaticians• Service providers• Funding agencies

• But the community credit crisis continues….

Page 21: Rin goble-published

Summary• Science is a complex social activity

undertaken by tribes of people and dominated by trust issues.

• Infrastructure has to be there and fit for purpose but its not the real the problem.

• Need a cultural shift (on all sides) that truly honours data.