B4OS-2012
-
Upload
susanna-assunta-sansone -
Category
Education
-
view
107 -
download
0
description
Transcript of B4OS-2012
![Page 1: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/1.jpg)
Data management and curation:
the other side of bioinformatics
Susanna-Assunta Sansone, PhD Principal Investigator and Team Leader,
University of Oxford e-Research Centre, Oxford, UK
http://uk.linkedin.com/in/sasansone
Bioinformatics for Omics Sciences (B4OS), CNR Naples, 25-17 Sep 2012
http://www.slideshare.net/SusannaSansone/B4OS-2012
![Page 2: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/2.jpg)
Oxford e-Research Centre
![Page 3: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/3.jpg)
Oxford e-Research Centre
![Page 4: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/4.jpg)
Providing research computing, high-performance computing
Integrating with national and international infrastructure
Supporting leading edge facilities through education and training
Oxford e-Research Centre
![Page 5: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/5.jpg)
Oxford e-Research Centre
Collaborating with European and wider international groups in, e.g.:
• energy, • radio astronomy, • biological data federation, • life sciences simulation, • biodiversity, • computational chemistry, • neuroscience, • digital humanities tools, • digital music analysis
Research in • computation, • data infrastructure and analysis, • visualisation
![Page 6: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/6.jpg)
tox/pharma
env
health
agro
My team’s activities and groups we work with
data management, biocuration, development of software, databases and community-driven standards and ontology
![Page 7: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/7.jpg)
http://www.flickr.com/photos/12308429@N03/4957994485/ CC BY
![Page 8: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/8.jpg)
Today:
“The buzz around reproducible bioscience data -
the policies, the communities and the standards”
Thursday:
“The reality from the buzz: how to deliver
reproducible bioscience data”
![Page 9: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/9.jpg)
9
Harmonize collection across sites Find matching studies
Data dissemination Long-term data stewardship
Preserve institutional /
corporate memory
![Page 10: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/10.jpg)
10
Utilize public data
Identify suitable data Retrieve
Curate and harmonize Re-analyze
![Page 11: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/11.jpg)
11
Address reproducibility /
reuse of public data
![Page 12: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/12.jpg)
12
Address reproducibility /
reuse of public data
![Page 13: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/13.jpg)
13
Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 149-55 (2009) doi:10.1038/ng.295
Address reproducibility /
reuse of public data
![Page 14: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/14.jpg)
14
14
Address reproducibility /
reuse of public data
![Page 15: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/15.jpg)
15
Address reproducibility /
reuse of public data
15
![Page 16: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/16.jpg)
16
16
Address reproducibility /
reuse of public data
![Page 17: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/17.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
17
Growing, worldwide movement for reproducible research
“Publicly-funded research data are a public good, produced in the public interest”
“Publicly-funded research data should be openly available to the maximum extent possible”
Shared, annotated research data and methods offer new discovery opportunities and prevent unnecessary repetition of work.
Improved data sharing underpins science of the future
![Page 18: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/18.jpg)
http://www.flickr.com/photos/notbrucelee/8016189356/ CC BY
![Page 19: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/19.jpg)
Reproducible & Reusable
Bioscience Research
![Page 20: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/20.jpg)
Reproducible & Reusable
Bioscience Research
Well-annotated & Structured Data
reasoning
analysis
exchange
integration
visualization
browsing retrieval
![Page 21: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/21.jpg)
Reproducible & Reusable
Bioscience Research
Well-annotated & Structured Data
reasoning
analysis
exchange
integration
visualization
browsing retrieval
Community Standards
Software Tools
![Page 22: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/22.jpg)
Source of the figure: EBI website
§ Is interdisciplinary and integrative in character • need to deal with new and existing datasets • deal with a variety of data types
§ ‘How the organism works’ is the focus • Twenty years ago data was the center
Experimental and
computational data
Publications
Today’s bioscience research
![Page 23: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/23.jpg)
Example from the toxicogenomics domain
Study looking at the effect of a compound inducing liver damage by characterizing/measuring
- the metabolic profile by MS and NMR
- protein expression in liver by MS
- gene expression by DNA microarray
- conducting genetic and phenotypical analysis
Information contributing to the construction and validation of system biology models
![Page 24: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/24.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
24
Example of experiments by InnoMed PredTox a FP6 public-private consortium
![Page 25: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/25.jpg)
§ Capture all salient features of the experimental workflow
§ Make annotation explicit and discoverable
§ Structure the descriptions for consistency, tracking § independent variables § dependent variables using § cross reference and
resolvable identifiers
Structured description of datasets
![Page 26: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/26.jpg)
§ We must strike a balance between • depth and breadth of
information; and • sufficient information
required to reuse the data
Not too much, not too little, just ‘right’
![Page 27: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/27.jpg)
Information intensive experiments
![Page 28: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/28.jpg)
To make the experiments comprehensible and reusable,
underpinning future investigations, we need
common ways to report and share the experimental details and the associated data.
Consistent reporting will have a positive and long-lasting impact
on the value of collective scientific outputs.
Information intensive experiments
![Page 29: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/29.jpg)
§ The challenges we face
• Large in volume: lots of data types and metadata! • Lots of free text descriptions: hard to mine, subject to mistakes! • Babel of terminologies: lack of definitions, hard to map! • Heterogeneous file formats: software lock-in!
§ Need for reporting standards • Minimal reporting descriptors
- Report the same ‘core essentials’ • Controlled vocabularies or ontology
- Use the same word and mean the same thing • Common exchange formats
- Make tools interoperable, allow data exchange and integration
Common ways to report and share
![Page 30: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/30.jpg)
§ Describe and communicate the information to others, in an unambiguous manner
§ To unlock the value in the data • Compare, query and evaluate data
- Facilitate scientific validation of the findings • Understand variability within/between different technologies and
protocols - Facilitate technical validation - Enable optimization of the experimental designs - Identify critical checkpoints and develop quality metrics
§ To define submission and/or publication requirements • Journals • Databases
§ To ensure data integrity, reproducibility and (re)use
Reporting standards – the benefits
![Page 31: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/31.jpg)
Genome annotation www.geneontology.org
Functional Genomics Data Society (FGED)
www.fged.org
HUPO- Proteomics Standards Initiative (PSI)
http://www.psidev.info
Cheminformatics www.ebi.ac.uk/chebi
Pathways www.biopax.org
Systems modelling standards
www.sbml.org
Metabolomics Standards Initiative (MSI) http://www.metabolomicssociety.org
Genomics Standards Consortium (GSC)
gensc.org
Escalating number of standardization efforts in bioscience, e.g.:
Enzymology data standards
www.strenda.org
![Page 32: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/32.jpg)
Different community, different norms and standards, e.g.:
report the same core, essential information
use the same word and refer to the same ‘thing’ allow data to flow from
one system to another
![Page 33: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/33.jpg)
Different community, different norms and standards, e.g.:
report the same core, essential information
use the same word and refer to the same ‘thing’ allow data to flow from
one system to another
![Page 34: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/34.jpg)
Different community, different norms and standards, e.g.:
report the same core, essential information
use the same word and refer to the same ‘thing’ allow data to flow from
one system to another
Challenges: lack of coordination, fragmentation and uneven coverage
![Page 35: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/35.jpg)
report the same core, essential information
use the same word and refer to the same ‘thing’ allow data to flow from
one system to another
Is this ‘general mobilization’ good or bad?
§ Difference in structures and processes: • organization types (open, close to members, society, WG…) • standards development (how to design, develop, evaluate, maintain…) • adoption, uptake, outreach (link to journals, funders, commercial sector…) • funds (sponsors, memberships, grants, volunteering…)
![Page 36: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/36.jpg)
report the same core, essential information
use the same word and refer to the same ‘thing’ allow data to flow from
one system to another
§ Fragmentation of the standards is a major issue • Being focused on particular communities’ interests, be their individual
technologies or biological/biomedical disciplines, leads to duplication of effort, and more seriously, the development of (largely arbitrarily) different standards
• This severely hinders the interoperability of databases and tools and ultimately the integration of datasets
Is this ‘general mobilization’ good or bad?
![Page 37: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/37.jpg)
VO!
miame!MIAPA!
MIRIAM!MIQAS!MIX!
MIGEN!
CIMR!MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!GCDML!
SRAxml!SOFT! FASTA!
DICOM!
MzML !SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!CHEBI!
OBI!
PATO! ENVO!MOD!
BTO!IDO…!
TEDDY!
PRO!XAO!
DO
Growing number of reporting standards
![Page 38: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/38.jpg)
Growing number of reporting standards
+ 130
Estimated
+ 150
Source: MIB
BI,
EQU
ATOR
+ 303
Source: BioPortal
Databases, annotation,
curation tools
miame!MIAPA!
MIRIAM!MIQAS!MIX!
MIGEN!
CIMR!MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!GCDML!
SRAxml!SOFT! FASTA!
DICOM!
MzML !SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!CHEBI!
OBI!
PATO! ENVO!MOD!
BTO!IDO…!
TEDDY!
PRO!XAO!
DO
VO!
![Page 39: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/39.jpg)
But how much do we know about these standards
miame!MIAPA!
MIRIAM!MIQAS!MIX!
MIGEN!
CIMR!MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!GCDML!
SRAxml!SOFT! FASTA!
DICOM!
MzML !SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!CHEBI!
OBI!
PATO! ENVO!MOD!
BTO!IDO…!
TEDDY!
PRO!XAO!
DO
VO!
![Page 40: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/40.jpg)
Which one are mature enough for
me to use or recommend?
I work on plants, are these just for
biomedical applications?
What are the criteria to evaluate
their status and value?
How can I get involved to
propose extensions or modifications?
Which tools and databases
implement which standards?
I use high throughput sequencing technologies, which one are applicable
to me?
But how much do we know about these standards
![Page 41: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/41.jpg)
§ A bewildering array of standards is available, but
• these are hard to find, at different levels of maturity; in
some areas duplications or gaps in coverage also exist
§ Standards are just a ‘means to an end’, therefore
• we want to make them discoverable and accessible,
maximizing their use to assist the virtuous data cycle,
from generation to standardization through publication to
subsequent sharing and reuse
But how much do we know about these standards
![Page 42: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/42.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
42
A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation)
Field*, Sansone* et al., Omics data sharing. Science 326, 234-36 (2009) doi:0.1126/science.1180598
![Page 43: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/43.jpg)
• A coherent, curated and searchable catalogue of data sharing resources • Bioscience standards and associated data-sharing policies, publications, tools and databases • Assessment criteria for usability and popularity of standards • Relationships among standards • Encouragement for communication & interaction among groups • Promoting interoperability & informed decisions about standards
![Page 44: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/44.jpg)
Example of multi-assays study – how many ‘standards’ are applicable to this?
![Page 45: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/45.jpg)
Example of multi-assays study – how many ‘standards’ are applicable to this?
![Page 46: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/46.jpg)
Example of multi-assays study – how many ‘standards’ are applicable to this?
![Page 47: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/47.jpg)
Example of multi-assays study – how many ‘standards’ are applicable to this?
![Page 48: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/48.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
Smith et al, 2007
![Page 49: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/49.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
Smith et al, 2007
Taylor, Field, Sansone et al, 2008
![Page 50: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/50.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
50
List of databases, linked to standards a collaboration with Database Issue
![Page 51: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/51.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
51
List of databases, linked to standards a collaboration with Database Issue
![Page 52: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/52.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
52
List of databases, linked to standards a collaboration with Database Issue
![Page 53: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/53.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
53
The relationship among popular standard formats for pathway information BioPAX and PSI-MI are designed for data exchange to and from databases and pathway and network data integration. SBML and CellML are designed to support mathematical simulations of biological systems and SBGN represents pathway diagrams.
CREDIT: Demir, et al., The BioPAX community standard for pathway data sharing, 2010.
Major challenge: define ‘relations’ among standards
![Page 54: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/54.jpg)
![Page 55: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/55.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
55
This is not just a technical but also a social engineering challenge!
![Page 56: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/56.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
56
Ownership of open standards can be problematic in broad, grass-root collaborations; it
requires improved models, to encourage maintenance of and contributions to these efforts,
supporting their evolutions
![Page 57: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/57.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
57
The extensive ‘social engineering’ and community liaison needs to be managed
and funded; rewards and incentives need to be identified
for all contributors
![Page 58: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/58.jpg)
CC BY
http://www.flickr.com/photos/idiolector/289490834/
![Page 59: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/59.jpg)
![Page 60: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/60.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
60
The cost of implementing a standards-supported data
sharing vision is as large as the number of stakeholders that must operate synchronously
![Page 61: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/61.jpg)
§ Several data preservation, management and sharing policies have emerged in response to increased funding for omics domains
§ Even if in general terms, standards are recognized as necessary ‘tools’ to unambiguously represent, describe and communicate research data
1. Funders actively developing data policies
![Page 62: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/62.jpg)
![Page 63: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/63.jpg)
§ “… lack of standardized data affects CDER’s review processes by curtailing a reviewer’s ability to perform integral tasks such as rapid acquisition, storage, analysis......efficient management of a portfolio of standards projects will require coordinated efforts and clear roles for multiple participants within/outside FDA”
2. Similar trend in the regulatory arena
![Page 64: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/64.jpg)
![Page 65: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/65.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
65
§ Continue to support the development of open standards and tools • to support sharing of sufficiently well annotated datasets • to enable comprehensible, reusable, reproducible research
3. Publishes have become strong advocators
![Page 66: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/66.jpg)
….the rise of data-driven journals, e.g.:
partnering with:
![Page 67: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/67.jpg)
![Page 68: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/68.jpg)
The rise of data-driven journals, e.g.:
partnering with:
![Page 69: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/69.jpg)
§ R&D has invested heavily in procedures and tools that integrate external information with their own data to enhance the decision-making process
• Now joining forces to streamline non-competitive elements of the life science workflow by the specification of common standards, business terms, relationships and processes
4. Similar trend in the commercial sector
![Page 70: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/70.jpg)
Big Life Science
Company
Yesterday Today Tomorrow
Yesterday Today Tomorrow Innovation Model
Innovation inside Searching for Innovation Heterogeneity of collaborations; part of the wider ecosystem
IT Internal apps & data Struggling with change security and trust
Cloud, services
Data Mostly inside In and out Distributed
Portfolio Internally driven and owned Partially shared Shared portfolio
Credit to: Pistoia Alliance
Big Life Science
Company
Proprietary content provider
Public content provider
Academic group
Software vendor
CRO
Service provider
Regulatory authorities
....their information landscape is evolving
![Page 71: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/71.jpg)
u Contribute to the reproducible research movement
u Think about data management as a career path
u Learn more about open community-standards
u Get involved, e.g.:
Open Bioinformatics Foundation
Take home messages
![Page 72: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/72.jpg)
http://www.flickr.com/photos/jackofspades/4500411648/ CC BY
Data is not like a $ bill….
![Page 73: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/73.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
73
http://www.flickr.com/photos/equinoxefr/2620239993/ CC BY
Your research and all (publicly funded) research should make
make an … impact
![Page 74: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/74.jpg)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
74
http://www.flickr.com/photos/webhamster/2582189977/ CC BY
…..the biggest possible impact!
![Page 75: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/75.jpg)
Today:
“The buzz around reproducible bioscience data -
the policies, the communities and the standards”
Thursday:
“The reality from the buzz: how to deliver
reproducible bioscience data”
![Page 76: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/76.jpg)
Is it possible to achieve a common, structured
representation of diverse bioscience experiments that:
• follows the appropriate community standards and
• delivers richly-annotated datasets?
![Page 77: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/77.jpg)
Tim Berners-Lee’s 5-star deployment scheme for Linked Open Data
![Page 78: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/78.jpg)
Notes in Lab Books(information for humans)
Spreadsheets and Tables( the compromise)
Facts as RDF statements(information for machines)
Increasing level of structure
www.biosharing.org
www.isacommons.org
TOWARDS INTEROPERABLE BIOSCIENCE DATA
Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios J, Hide W.
Feb 2012
www.isacommons.org
doi:10.1038/ng.1054
![Page 79: B4OS-2012](https://reader034.fdocuments.in/reader034/viewer/2022051400/54c69f8d4a795911758b4591/html5/thumbnails/79.jpg)
1. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ; OBI Consortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11):1251-1255 (2007)
2. Taylor CF,* Field D*, Sansone SA*, Aerts J, Apweiler R, Ashburner M, Ball CA, Binz PA, Bogue M, Booth T, Brazma A, Brinkman RR, Michael Clark A, Deutsch EW, Fiehn O, Fostel J, Ghazal P, Gibson F, Gray T, Grimes G, Hancock JM, Hardy NW, Hermjakob H, Julian RK Jr, Kane M, Kettner C, Kinsinger C, Kolker E, Kuiper M, Le Novère N, et al.: Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 26(8):889-896 (2008)
3. Field D*, Sansone SA*, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy K, Kolar P, Kolker E, Maxon M, Millard S, Mugabushaka AM, Perrin N, Remacle JE, Remington K, Rocca-Serra P, Taylor CF, Thorley M, Tiwari B, Wilbanks J: Megascience. 'Omics data sharing. Science 326(5950):234-236 (2009)
4. Harland L, Larminie C, Sansone SA, Popa S, Marshall MS, Braxenthaler M, Cantor M, Filsell W, Forster MJ, Huang E, Matern A, Musen M, Saric J, Slater T, Wilson J, Lynch N, Wise J, Dix I: Empowering industrial research with shared biomedical vocabularies. Drug Discov Today 16(21-22):940-947 (2011)
5. Sansone SA and Rocca-Serra P: On the evolving portfolio of community-standards and data sharing policies: turning challenges into new opportunities. GigaScience 1:10 (2012)
References