Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure...
Transcript of Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure...
Towards a European bioinformatics
Infrastructure in the international context
Janet Thornton
European Bioinformatics Institute
US-EC Workshop on Infrastructure Needs of
Systems Biology
14.09.20071
Genome EmbryoCell
Fruitfly
Protein/DNA
Mouse Organism
Biological Information & Databases
From Molecules to Organisms
Current Status of
Biomolecular Information
Resources in Europe
From Molecules to Systems:
Molecular Databases @ EBI:GenomesEnsembl,
Genome Reviews
GenomesEnsembl,
Genome Reviews
Nucleotide sequenceEMBL-Bank
Nucleotide sequenceEMBL-Bank
Gene expressionArrayExpress
Gene expressionArrayExpress
Protein sequenceUniProt
Protein sequenceUniProt
Protein families, motifs and domains
InterPro
Protein families, motifs and domains
InterPro
Protein structureMSD (wwPDB)
Protein structureMSD (wwPDB)
Protein interactionsIntAct, PRIDE
Protein interactionsIntAct, PRIDE
Chemical entitiesChEBI, BRENDA
Chemical entitiesChEBI, BRENDA
PathwaysReactome
PathwaysReactome
SystemsBioModels
SystemsBioModels
Europe
USA
Japan
Global Context
Data are freely deposited
Data are freely exchanged daily
Data are made freely available to all
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Jun-8
2
Jun-8
4
Jun-8
6
Jun-8
8
Jun-9
0
Jun-9
2
Jun-9
4
Jun-9
6
Jun-9
8
Jun-0
0
Jun-0
2
Jun-0
4
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
Jan-8
6
Jan-8
8
Jan-9
0
Jan-9
2
Jan-9
4
Jan-9
6
Jan-9
8
Jan-0
0
Jan-0
2
Jan-0
4
0
5000
10000
15000
20000
25000
30000
35000
1972
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
EMBL-Bank
1982-2005UniProt etc.
1986-2005
MSD
1972-2005
Megabases
Entries
Entries
All EBI’s Data Resources are growing rapidly
Core databases
• They are universally relevant to biomolecular
science.
• They have a huge user community.
• They aim to be complete collections.
• Completeness is assured by exchange agreements
with other data centres world-wide (typically the
USA, Japan and Europe at present).
• The science they represent is stable enough to
allow standardisation of the data structure.
• Standards where available are followed.
• They are actively involved in relevant standard
Example core databases
• EMBL-Bank
• UniProt
• The Macromolecular Structure Database
• ArrayExpress
• Ensembl
• InterPro
Specialised Molecular Data Resources
Nucleotide Sequences7%
RNA sequences5%
Protein sequences15%
Structure9%
Genomics (non-human)18%
Metabolic enzymes/pathways
5%
Human & Vertebrate Genomes
9%
Human Genes and Diseases10%
Expression data6%
Plant7%
Proteomics1%
Organelle3%
Immunological3%
Others2%
Galperin (2005 NAR)
More than 700 in total
30% in Europe
(All use core resources as reference data)
Non-core
• Data are more specialised (e.g., one species or family) and do not aim
to be comprehensive.
• They are investigator-led products of research groups with content
which reflects the research interests of their provider.
• Many are derivative or ‘summarising’ databases which combine and
organise data from a range of other databases.
• Most offer a more limited service, and may be less stable and designed
only for experts.
• Some may be candidates for core
Criteria for ‘assessing’ data
resources
• Usage
• Cost & Value for Money
• Stability
• Standards
• Size
• International Status
Genes
Proteins
Pathways
Expression
DataGenomes
Data Integration is vital
fructose-1,6-bisphosphate
fructose-6-phosphate
H+
ADP
pfkA
pfkB
ATP
LINKS to LITERATURE
CiteXplore – data statistics Jan 07
16,731,346PubMed/Medline
1,146,819Patents
493,423CiteSeer
132,763Chinese Biological
Abstracts
EntriesLiterature
Current ‘International’ Resources
• DNA Sequences – EMBL Bank; GenBank; DDBj
• Genomes – Ensembl (Human) – consensus build
• Protein Sequences: UniProt – US; EBI; SIB
• Protein Structures: wwPDB RCSB;MSD;PDBj
• Protein Interactions: Imex Consortium - US; Italy; EBI
• InterPro – (~12 collaborators) Many resources in Europe – a few now in US
• Model Organism data resources – mainly US funded
Community needs exchange agreements and protocols to be established for :
Expression Data
Human variation data
Image data – of many sorts
Proteomic Data – HUPO in progress
Metabolomic Data – in progress
PLUS ??? …….
EMBRACE EU Network of ExcellenceMaking tools available easily to all
Web ServicesTools
Data
Test problems Compute Power
BioSapiens EU Network of Excellence
Making annotations available
Tools
Data
AnnotationsGenes
Promoters
Variations
Proteins
PTM
Localisation
Function
Structure
Interactions
Pathways
UsersBiologists
Clinicians
Environmentalists
Chemists
..
..DAS
Integration with Other Data is increasingly important
e.g. Linking from Molecules to Medicine & Agriculture
Genomes
Proteins
Metabolites
Flybase
MGD
SGD
BRENDA
Chemical
data
resources
Medical data
resources
Biodiversity
data
resources
IMGT
Pasteur DBs
Eumorphia/
Phenotypes
Core
biomolecular
resources
Specialist biomolecular data
resource examples
Mutants
Large resources in related disciplines
Model organism resource examples
Mouse Atlas
14.09.200719
BUT for these infrastructures funding is neither
sensibly organised nor adequate in Europe
14.09.200720
The Challenge: Who should pay?
• Bioinformatics is international
• Resources are used by everyone
• Who should pay?
• National countries do not want to take responsibility – for an international effort
• US has been very pro-active – supporting public data resources and making data freely available to all through web – therefore commercial solution is not available option – their data resources are supported by rolling funding with peer review provided to government laboratories & ‘designated’ core resources
• EMBL provides core funding – but limited budget
• EU has provided >20% of EBI Funding – but non-rolling and member states are reluctant to fund infrastructures through the EU
14.09.200721
EMBL-EBI Funding: 2006
EMBL
50%
EU
22%
USA
8%
Other
3%
Industry
3%
Wellcome
Trust
7%
UK RCs
7%
2005 budget – 26Meuros
2011 projected budget – 43Meuros
14.09.200722
EU Funding for Bioinformatics
• Infrastructure Programme Support:
FP6 for Bioinformatics under ‘Research Infrastructures Action’• FELICS – I3 (EBI/SIB/EPO/Uni Koln)
• EuroCarbDB –Design Study
• Research Programme Support:• 3 Bioinformatics NoEs - coordinated by EBI
• Many other research grants with small informatics component, coordinated elsewhere
• Europe needs to design a system of predictable funding with regular competitive renewal for those who provide its integrated core and specialised data resources
14.09.200723
National Infrastructure Funding for Bioinformatics in Europe
• To my knowledge there is little ‘national’ funding for core bioinformatics data
resources, apart from in Switzerland (SIB)
• BUT recent UK BBSRC Initiative to support bioinformatics resources
• One-off grants for specialist data resources – awarded to individual investigators,
but rarely long term
• Some funding from charities – Wellcome Trust
National funding has to be part of the solution to the problem
ESFRI-European Strategy Forum
for Research Infrastructures
Set up in 2002 by the Competitiveness Council
Independent from the Commission
33 Member States (and the European Commission)
The GOAL is to describe the scientific infrastructural
needs in Europe for the next 10 to 20 for the next 10 to 20 for the next 10 to 20 for the next 10 to 20 years
23.11.2006Rome
14.09.200725
ESFRIEuropean Strategy Forum on Research Infrastructures
• 6 Biomedical & Life Sciences
• EATRIS – Centres for Translational Research
• Bio-banking
• INFRAFRONTIER – Mouse phenome & archive
• Clinical Trials & Biotherapy Facilities
• Integrated Structural Biology Centres
• Upgrade to European Bioinformatics Infrastructure
• LifeWatch – infrastructure for monitoring European Biodiversity
This list was ‘adopted’ by the EU but they have very little
funding to support it.Therefore EU have developed the concept of a ‘Preparatory Phase’ to create
consortia of European Funding Bodies willing to support individual
infrastructures
14.09.200726
Phase 1 (Preparatory Phase)
Total funds: 230 Meuros from EU
Application submitted May 1st (max ~5m euros)
Deliverable: Consortium agreement for the action construction
Phase 2 (Construction)
Total funds: 100 Meuros from EU
Partners carry out construction or major upgrade as agreed in
consortium agreement
Partners: Members of the consortium
Funding: Members of the consortium
FP7 research infrastructure funding
14.09.200727
ELIXIR Proposal
European Life-science Infrastructure for Biological
Information
ELIXIR will:
• Establish a trans-national infrastructure for biological information and service providers, including existing national infrastructures and networks.
• Implement a major upgrade tat the European Bioinformatics Institute (EMBL-EBI), including construction of a European Biomolecular Data Centre.
• Promote the use of state-of-the-art IT technology for data integration and database interoperability
• Promote and further develop the use of distributed annotation technologies for large scale European collaborations in the life science databases.
• Promote the development of infrastructures for biological information in the new accession states.
• Develop an appropriate legal and financial framework for the construction and sustainable operation of this infrastructure.
• Promote the formation of an associated European framework for Training and Outreach.
14.09.200728
Proposed Structure of ELIXIR
• An interlinked collection of ‘core’ and specialised biological data resources and literature.
• Standards and ontologies for newly emerging data.
• A major upgrade for the core information resources at the EBI.
• New data resources as appropriate.
• Integration and interoperability of diverse heterogeneous data.
• Rapid search and access through friendly portal(s) supported by appropriate infrastructure.
• Infrastructure linking core data resources and national bioinformatics data and service providers.
• Infrastructure to enable Distributed Annotations and Tool Development.
• The opportunity to establish infrastructures for life science information in the accession states.
• Links between molecular resources and developing resources for medicine (e.g. biobanks), agriculture and the environment (e.g. biodiversity).
• Access to high performance computing, through links to Europe’s Supercomputer Centres.
• Coordination and Provision of Training and Outreach across Europe to enhance national efforts.
• Strong links to European bio-industries to ensure the optimal translation of life science research into the bio-industrial sector in Europe.
Flybase
MGD
SGD
BRENDA
Chemical
data
resources
Medical data
resources
Biodiversity
data
resources
IMGT
Pasteur DBs
Eumorphia/
Phenotypes
Core
biomolecular
resources
Specialist biomolecular data
resource examples
Mutants
Large resources in related disciplines
Model organism resource examples
Mouse Atlas
EBI
Member StatesScientists, ServiceCentres and Networks
Core & more
14.09.200731
Funding for ELIXIR
• EMBL member states
• (will only fund through the current EMBL budget what they agreed to in EMBL Programme 2007-2011). Are they prepared to make special contributions?
• New EMBL member states (Luxembourg, Czech Republic)
• Other EU Member states
• Non-Member states
• EC
• Others
Stakeholders
• Funders
– EMBL; EU; National Government Funding Bodies; Charities; Industry
• Data Resource Providers
– Core Resources (EBI; SIB; Patent Office; Sanger….)
– Specialist (Many investigators - distributed)
• Data Providers
– Experimentalists
• Tool Providers
– Bioinformatics Groups
• Users
14.09.200733
Proposed Total Budget for ELIXIR in ESFRI Report
Preparatory phase 7M€ per year (+30M€ data centre)
Construction phase 67M€ per year
Operation 7 M€ per year
Total 567 M€ in 7 years (2007-2013)
14.09.200734
New funding needed for:
• New data resources e.g.
• Chemicals in biology & medicine (metabolites, pharmaceuticals)
• Imaging (from cells to organisms)
• Human variation data
• Major Literature Resource
• Funding for distributed specialist resources
• Hardware, software and personnel to integrate the specialist distributed
resources with core resources
These areas are included in the ELIXIR Proposal
14.09.200735
Preparatory Phase
• WP1 Management of the Contract
• WP2 ELIXIR Strategy
• WP3 Coordination & Participation • User Committees (inc international committee)
• WP4 Organisational & Legal
• WP5 Funding
• WP6 Physical Infrastructure
• WP7 Integration & interoperability
• WP8 Scientific Literature
• WP9 Medical & Nutrition
• WP10 Chemistry, Plant & Agriculture
• WP11 Training Strategy
• WP12 Tools Integration Infrastructure
• WP13 Recommendations and Reporting
• WP14 Technical Feasibility Studies
14.09.200736
Preparatory Phase
• 27 partners
• 14 European countries
• 50% Scientists
• 50% Funders
Flybase
MGD
SGD
BRENDA
Chemical
data
resources
Medical data
resources
Biodiversity
data
resources
IMGT
Pasteur DBs
Eumorphia/
Phenotypes
Core
biomolecular
resources
Specialist biomolecular data
resource examples
Mutants
Large resources in related disciplines
Model organism resource examples
Mouse Atlas
Additional Needs of Systems
Biology
• Same as all biologists
• Focus increasingly on whole systems – how
do we characterise these:
– Spatially – atlases
– Temporally – time courses
– As networks/pathways
– Images
– Reporter molecules – e.g. GFP
14.09.200739
Items for Discussion
• Core & Specialist resources – how best to link?
• Which new resources are needed for Systems Biology
• How best to develop the ELIXIR model in an international context?
• Links between Funders- absolutely necessary
• How to integrate national funding of specialised resources
• Central vs Distributed
•