Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure...

40
Towards a European bioinformatics Infrastructure in the international context Janet Thornton European Bioinformatics Institute US-EC Workshop on Infrastructure Needs of Systems Biology

Transcript of Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure...

Page 1: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Towards a European bioinformatics

Infrastructure in the international context

Janet Thornton

European Bioinformatics Institute

US-EC Workshop on Infrastructure Needs of

Systems Biology

Page 2: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.20071

Genome EmbryoCell

Fruitfly

Protein/DNA

Mouse Organism

Biological Information & Databases

From Molecules to Organisms

Page 3: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Current Status of

Biomolecular Information

Resources in Europe

Page 4: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

From Molecules to Systems:

Molecular Databases @ EBI:GenomesEnsembl,

Genome Reviews

GenomesEnsembl,

Genome Reviews

Nucleotide sequenceEMBL-Bank

Nucleotide sequenceEMBL-Bank

Gene expressionArrayExpress

Gene expressionArrayExpress

Protein sequenceUniProt

Protein sequenceUniProt

Protein families, motifs and domains

InterPro

Protein families, motifs and domains

InterPro

Protein structureMSD (wwPDB)

Protein structureMSD (wwPDB)

Protein interactionsIntAct, PRIDE

Protein interactionsIntAct, PRIDE

Chemical entitiesChEBI, BRENDA

Chemical entitiesChEBI, BRENDA

PathwaysReactome

PathwaysReactome

SystemsBioModels

SystemsBioModels

Page 5: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Europe

USA

Japan

Global Context

Data are freely deposited

Data are freely exchanged daily

Data are made freely available to all

Page 6: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

Jun-8

2

Jun-8

4

Jun-8

6

Jun-8

8

Jun-9

0

Jun-9

2

Jun-9

4

Jun-9

6

Jun-9

8

Jun-0

0

Jun-0

2

Jun-0

4

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

Jan-8

6

Jan-8

8

Jan-9

0

Jan-9

2

Jan-9

4

Jan-9

6

Jan-9

8

Jan-0

0

Jan-0

2

Jan-0

4

0

5000

10000

15000

20000

25000

30000

35000

1972

1974

1976

1978

1980

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

EMBL-Bank

1982-2005UniProt etc.

1986-2005

MSD

1972-2005

Megabases

Entries

Entries

All EBI’s Data Resources are growing rapidly

Page 7: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Core databases

• They are universally relevant to biomolecular

science.

• They have a huge user community.

• They aim to be complete collections.

• Completeness is assured by exchange agreements

with other data centres world-wide (typically the

USA, Japan and Europe at present).

• The science they represent is stable enough to

allow standardisation of the data structure.

• Standards where available are followed.

• They are actively involved in relevant standard

Page 8: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Example core databases

• EMBL-Bank

• UniProt

• The Macromolecular Structure Database

• ArrayExpress

• Ensembl

• InterPro

Page 9: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Specialised Molecular Data Resources

Nucleotide Sequences7%

RNA sequences5%

Protein sequences15%

Structure9%

Genomics (non-human)18%

Metabolic enzymes/pathways

5%

Human & Vertebrate Genomes

9%

Human Genes and Diseases10%

Expression data6%

Plant7%

Proteomics1%

Organelle3%

Immunological3%

Others2%

Galperin (2005 NAR)

More than 700 in total

30% in Europe

(All use core resources as reference data)

Page 10: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Non-core

• Data are more specialised (e.g., one species or family) and do not aim

to be comprehensive.

• They are investigator-led products of research groups with content

which reflects the research interests of their provider.

• Many are derivative or ‘summarising’ databases which combine and

organise data from a range of other databases.

• Most offer a more limited service, and may be less stable and designed

only for experts.

• Some may be candidates for core

Page 11: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Criteria for ‘assessing’ data

resources

• Usage

• Cost & Value for Money

• Stability

• Standards

• Size

• International Status

Page 12: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Genes

Proteins

Pathways

Expression

DataGenomes

Data Integration is vital

fructose-1,6-bisphosphate

fructose-6-phosphate

H+

ADP

pfkA

pfkB

ATP

Page 13: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure
Page 14: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

LINKS to LITERATURE

CiteXplore – data statistics Jan 07

16,731,346PubMed/Medline

1,146,819Patents

493,423CiteSeer

132,763Chinese Biological

Abstracts

EntriesLiterature

Page 15: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Current ‘International’ Resources

• DNA Sequences – EMBL Bank; GenBank; DDBj

• Genomes – Ensembl (Human) – consensus build

• Protein Sequences: UniProt – US; EBI; SIB

• Protein Structures: wwPDB RCSB;MSD;PDBj

• Protein Interactions: Imex Consortium - US; Italy; EBI

• InterPro – (~12 collaborators) Many resources in Europe – a few now in US

• Model Organism data resources – mainly US funded

Community needs exchange agreements and protocols to be established for :

Expression Data

Human variation data

Image data – of many sorts

Proteomic Data – HUPO in progress

Metabolomic Data – in progress

PLUS ??? …….

Page 16: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

EMBRACE EU Network of ExcellenceMaking tools available easily to all

Web ServicesTools

Data

Test problems Compute Power

Page 17: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

BioSapiens EU Network of Excellence

Making annotations available

Tools

Data

AnnotationsGenes

Promoters

Variations

Proteins

PTM

Localisation

Function

Structure

Interactions

Pathways

UsersBiologists

Clinicians

Environmentalists

Chemists

..

..DAS

Page 18: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Integration with Other Data is increasingly important

e.g. Linking from Molecules to Medicine & Agriculture

Genomes

Proteins

Metabolites

Page 19: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Flybase

MGD

SGD

BRENDA

Chemical

data

resources

Medical data

resources

Biodiversity

data

resources

IMGT

Pasteur DBs

Eumorphia/

Phenotypes

Core

biomolecular

resources

Specialist biomolecular data

resource examples

Mutants

Large resources in related disciplines

Model organism resource examples

Mouse Atlas

Page 20: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200719

BUT for these infrastructures funding is neither

sensibly organised nor adequate in Europe

Page 21: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200720

The Challenge: Who should pay?

• Bioinformatics is international

• Resources are used by everyone

• Who should pay?

• National countries do not want to take responsibility – for an international effort

• US has been very pro-active – supporting public data resources and making data freely available to all through web – therefore commercial solution is not available option – their data resources are supported by rolling funding with peer review provided to government laboratories & ‘designated’ core resources

• EMBL provides core funding – but limited budget

• EU has provided >20% of EBI Funding – but non-rolling and member states are reluctant to fund infrastructures through the EU

Page 22: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200721

EMBL-EBI Funding: 2006

EMBL

50%

EU

22%

USA

8%

Other

3%

Industry

3%

Wellcome

Trust

7%

UK RCs

7%

2005 budget – 26Meuros

2011 projected budget – 43Meuros

Page 23: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200722

EU Funding for Bioinformatics

• Infrastructure Programme Support:

FP6 for Bioinformatics under ‘Research Infrastructures Action’• FELICS – I3 (EBI/SIB/EPO/Uni Koln)

• EuroCarbDB –Design Study

• Research Programme Support:• 3 Bioinformatics NoEs - coordinated by EBI

• Many other research grants with small informatics component, coordinated elsewhere

• Europe needs to design a system of predictable funding with regular competitive renewal for those who provide its integrated core and specialised data resources

Page 24: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200723

National Infrastructure Funding for Bioinformatics in Europe

• To my knowledge there is little ‘national’ funding for core bioinformatics data

resources, apart from in Switzerland (SIB)

• BUT recent UK BBSRC Initiative to support bioinformatics resources

• One-off grants for specialist data resources – awarded to individual investigators,

but rarely long term

• Some funding from charities – Wellcome Trust

National funding has to be part of the solution to the problem

Page 25: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

ESFRI-European Strategy Forum

for Research Infrastructures

Set up in 2002 by the Competitiveness Council

Independent from the Commission

33 Member States (and the European Commission)

The GOAL is to describe the scientific infrastructural

needs in Europe for the next 10 to 20 for the next 10 to 20 for the next 10 to 20 for the next 10 to 20 years

23.11.2006Rome

Page 26: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200725

ESFRIEuropean Strategy Forum on Research Infrastructures

• 6 Biomedical & Life Sciences

• EATRIS – Centres for Translational Research

• Bio-banking

• INFRAFRONTIER – Mouse phenome & archive

• Clinical Trials & Biotherapy Facilities

• Integrated Structural Biology Centres

• Upgrade to European Bioinformatics Infrastructure

• LifeWatch – infrastructure for monitoring European Biodiversity

This list was ‘adopted’ by the EU but they have very little

funding to support it.Therefore EU have developed the concept of a ‘Preparatory Phase’ to create

consortia of European Funding Bodies willing to support individual

infrastructures

Page 27: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200726

Phase 1 (Preparatory Phase)

Total funds: 230 Meuros from EU

Application submitted May 1st (max ~5m euros)

Deliverable: Consortium agreement for the action construction

Phase 2 (Construction)

Total funds: 100 Meuros from EU

Partners carry out construction or major upgrade as agreed in

consortium agreement

Partners: Members of the consortium

Funding: Members of the consortium

FP7 research infrastructure funding

Page 28: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200727

ELIXIR Proposal

European Life-science Infrastructure for Biological

Information

ELIXIR will:

• Establish a trans-national infrastructure for biological information and service providers, including existing national infrastructures and networks.

• Implement a major upgrade tat the European Bioinformatics Institute (EMBL-EBI), including construction of a European Biomolecular Data Centre.

• Promote the use of state-of-the-art IT technology for data integration and database interoperability

• Promote and further develop the use of distributed annotation technologies for large scale European collaborations in the life science databases.

• Promote the development of infrastructures for biological information in the new accession states.

• Develop an appropriate legal and financial framework for the construction and sustainable operation of this infrastructure.

• Promote the formation of an associated European framework for Training and Outreach.

Page 29: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200728

Proposed Structure of ELIXIR

• An interlinked collection of ‘core’ and specialised biological data resources and literature.

• Standards and ontologies for newly emerging data.

• A major upgrade for the core information resources at the EBI.

• New data resources as appropriate.

• Integration and interoperability of diverse heterogeneous data.

• Rapid search and access through friendly portal(s) supported by appropriate infrastructure.

• Infrastructure linking core data resources and national bioinformatics data and service providers.

• Infrastructure to enable Distributed Annotations and Tool Development.

• The opportunity to establish infrastructures for life science information in the accession states.

• Links between molecular resources and developing resources for medicine (e.g. biobanks), agriculture and the environment (e.g. biodiversity).

• Access to high performance computing, through links to Europe’s Supercomputer Centres.

• Coordination and Provision of Training and Outreach across Europe to enhance national efforts.

• Strong links to European bio-industries to ensure the optimal translation of life science research into the bio-industrial sector in Europe.

Page 30: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Flybase

MGD

SGD

BRENDA

Chemical

data

resources

Medical data

resources

Biodiversity

data

resources

IMGT

Pasteur DBs

Eumorphia/

Phenotypes

Core

biomolecular

resources

Specialist biomolecular data

resource examples

Mutants

Large resources in related disciplines

Model organism resource examples

Mouse Atlas

Page 31: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

EBI

Member StatesScientists, ServiceCentres and Networks

Core & more

Page 32: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200731

Funding for ELIXIR

• EMBL member states

• (will only fund through the current EMBL budget what they agreed to in EMBL Programme 2007-2011). Are they prepared to make special contributions?

• New EMBL member states (Luxembourg, Czech Republic)

• Other EU Member states

• Non-Member states

• EC

• Others

Page 33: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Stakeholders

• Funders

– EMBL; EU; National Government Funding Bodies; Charities; Industry

• Data Resource Providers

– Core Resources (EBI; SIB; Patent Office; Sanger….)

– Specialist (Many investigators - distributed)

• Data Providers

– Experimentalists

• Tool Providers

– Bioinformatics Groups

• Users

Page 34: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200733

Proposed Total Budget for ELIXIR in ESFRI Report

Preparatory phase 7M€ per year (+30M€ data centre)

Construction phase 67M€ per year

Operation 7 M€ per year

Total 567 M€ in 7 years (2007-2013)

Page 35: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200734

New funding needed for:

• New data resources e.g.

• Chemicals in biology & medicine (metabolites, pharmaceuticals)

• Imaging (from cells to organisms)

• Human variation data

• Major Literature Resource

• Funding for distributed specialist resources

• Hardware, software and personnel to integrate the specialist distributed

resources with core resources

These areas are included in the ELIXIR Proposal

Page 36: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200735

Preparatory Phase

• WP1 Management of the Contract

• WP2 ELIXIR Strategy

• WP3 Coordination & Participation • User Committees (inc international committee)

• WP4 Organisational & Legal

• WP5 Funding

• WP6 Physical Infrastructure

• WP7 Integration & interoperability

• WP8 Scientific Literature

• WP9 Medical & Nutrition

• WP10 Chemistry, Plant & Agriculture

• WP11 Training Strategy

• WP12 Tools Integration Infrastructure

• WP13 Recommendations and Reporting

• WP14 Technical Feasibility Studies

Page 37: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200736

Preparatory Phase

• 27 partners

• 14 European countries

• 50% Scientists

• 50% Funders

Page 38: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Flybase

MGD

SGD

BRENDA

Chemical

data

resources

Medical data

resources

Biodiversity

data

resources

IMGT

Pasteur DBs

Eumorphia/

Phenotypes

Core

biomolecular

resources

Specialist biomolecular data

resource examples

Mutants

Large resources in related disciplines

Model organism resource examples

Mouse Atlas

Page 39: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

Additional Needs of Systems

Biology

• Same as all biologists

• Focus increasingly on whole systems – how

do we characterise these:

– Spatially – atlases

– Temporally – time courses

– As networks/pathways

– Images

– Reporter molecules – e.g. GFP

Page 40: Towards a European bioinformatics Infrastructure in the ... · European Life-science Infrastructure for Biological Information ELIXIR will: • Establish a trans-national infrastructure

14.09.200739

Items for Discussion

• Core & Specialist resources – how best to link?

• Which new resources are needed for Systems Biology

• How best to develop the ELIXIR model in an international context?

• Links between Funders- absolutely necessary

• How to integrate national funding of specialised resources

• Central vs Distributed