Bioinformatics tools for biologists @ the EBI An overview.

50
Bioinformatics tools for biologists @ the EBI An overview

Transcript of Bioinformatics tools for biologists @ the EBI An overview.

Page 1: Bioinformatics tools for biologists @ the EBI An overview.

Bioinformatics tools for biologists @ the EBI

An overview

Page 2: Bioinformatics tools for biologists @ the EBI An overview.

2 EBI Overview

Bioinformatics

• The science of storing, retrieving and analyzing large amounts of biological information

• An interdisciplinary science, involving biologists, computer scientists and mathematicians

• At the heart of modern biology

Page 3: Bioinformatics tools for biologists @ the EBI An overview.

3 EBI Overview

“Large-scale” focus

• Data explosion and new types of data

• High-throughput biology

• Emphasis on systems, not reductionism

• Large community of users with no training in bioinformatics

• Growth of applied biology – molecular medicine, agriculture, food, environmental sciences…

Page 4: Bioinformatics tools for biologists @ the EBI An overview.

4 EBI Overview4

What is EMBL-EBI?

• Based on the Wellcome Trust Genome Campus near Cambridge, UK

• Part of the European Molecular Biology Laboratory

• Non-profit organization

Page 5: Bioinformatics tools for biologists @ the EBI An overview.

5 EBI Overview

The EBI’s mission

• To provide freely available data and bioinformatics servicesservices to all facets of the scientific community in ways that promote scientific progress

• To contribute to the advancement of biology through basic investigator-driven researchresearch in bioinformatics

• To provide advanced bioinformatics trainingtraining to scientists at all levels, from PhD students to independent investigators

• To help disseminate cutting-edge technologies to industryindustry

Filler text

Page 6: Bioinformatics tools for biologists @ the EBI An overview.

Databases and toolswww.ebi.ac.uk

Page 7: Bioinformatics tools for biologists @ the EBI An overview.

New types of data

GenomesGenomes

DNA & RNA sequenceDNA & RNA sequence

Gene expressionGene expression

Protein sequenceProtein sequence

Protein families, motifs and domains

Protein families, motifs and domains

Protein structureProtein structure

Protein interactionsProtein interactions

Chemical entitiesChemical entities

PathwaysPathways

SystemsSystems

Literature and ontologiesLiterature and ontologies

7 EBI Overview

Page 8: Bioinformatics tools for biologists @ the EBI An overview.

8 EBI Overview

GenomesEnsembl

Ensembl Genomes EGA

GenomesEnsembl

Ensembl Genomes EGA

Nucleotide sequenceEMBL-Bank

Nucleotide sequenceEMBL-Bank

Microarray & gene expression data

ArrayExpress

Microarray & gene expression data

ArrayExpress

ProteomesUniProt, PRIDE

ProteomesUniProt, PRIDE

Protein families, motifs and domains

InterPro

Protein families, motifs and domains

InterPro

Protein structurePDBe

Protein structurePDBe

Protein interactionsIntAct

Protein interactionsIntAct

Chemical entitiesChEBI

Chemical entitiesChEBI

PathwaysReactome

PathwaysReactome

SystemsBioModels

SystemsBioModels

Literature and ontologiesCiteXplore, GO

Literature and ontologiesCiteXplore, GO

8

Databases: molecules to systems

Page 9: Bioinformatics tools for biologists @ the EBI An overview.

9

Database collaborations

9 EBI Overview

Page 10: Bioinformatics tools for biologists @ the EBI An overview.

10 EBI Overview10

Standards development – international collaborations

Genome annotationwww.geneontology.org

Genome annotationwww.geneontology.org

Microarray and Gene Expression Data (MGED)

www.mged.org

Microarray and Gene Expression Data (MGED)

www.mged.org

Protein sequencewww.uniprot.org

Protein sequencewww.uniprot.org

HUPO- Proteomics Standards

Initiative (PSI)www.psidev.info

HUPO- Proteomics Standards

Initiative (PSI)www.psidev.info

Protein structurewww.wwpdb.org

Protein structurewww.wwpdb.org

Cheminformaticswww.ebi.ac.uk/chebi

Cheminformaticswww.ebi.ac.uk/chebi

Pathwayswww.reactome.org

www.biopax.org

Pathwayswww.reactome.org

www.biopax.org

Systems modeling standards

www.sbml.org

Systems modeling standards

www.sbml.orgMetabolomics Standards Initiative (MSI)www.metabolomicssociety.org

Metabolomics Standards Initiative (MSI)www.metabolomicssociety.org

Genomics Standards Consortium (GSC)http://gensc.org

Genomics Standards Consortium (GSC)http://gensc.org

Nucleotide sequencewww.insdc.org

Nucleotide sequencewww.insdc.org

Page 11: Bioinformatics tools for biologists @ the EBI An overview.

EBI website: www.ebi.ac.uk

11 EBI Overview

Databases Tools

Page 12: Bioinformatics tools for biologists @ the EBI An overview.

12 EBI Overview

Search all main databases in one go

Search all main databases in one go

EBI search engine: EB-eye

Page 13: Bioinformatics tools for biologists @ the EBI An overview.

13

Nucleotides: European Nucleotide Archive (ENA)

• ENA provides a comprehensive, accessible and publicly available repository for nucleotide sequence data

• Collaboration with GenBank and DDBJ for data sharing

• It consolidates information from EMBL-Bank, the European Trace Archive (containing raw data from electrophoresis-based sequencing machines) and the Sequence Read Archive (containing raw data from next-generation sequencing platforms)

• Provides access to the whole scale of sequencing information: from raw data, through assembly and mapping information, through to high-level functional annotation (see figure).

EBI Overview

Page 14: Bioinformatics tools for biologists @ the EBI An overview.

Nucleotides: ENADownload dataDownload data

Navigate to view related data, e.g.

taxon-specific data

Navigate to view related data, e.g.

taxon-specific data

Other type of data include SRA experiments

Other type of data include SRA experiments

14 EBI Overview

Page 15: Bioinformatics tools for biologists @ the EBI An overview.

Genomes: Ensembl & Ensembl Genomes

• Genome browser providing free access to the complete sequences of higher and model organism

• With Ensembl you can: Retrieve all or part of a genome sequence Perform sequence alignment using BLAST or BLAT Link to genome annotation from microarray results View expressed mRNA, protein, etc. in a chromosomal region View variations such as SNPs across strains or populations View all alternative splicing for a gene Explore homologues and phylogenetic tree across > 30 species View conserved regions across species

• Ensembl Genomes extends to non-vertebrate genomes

15 EBI Overview

Page 16: Bioinformatics tools for biologists @ the EBI An overview.

Genomes: Ensembl

Across species Within species

SyntenySynteny

Pick a genomePick a genome

OrthologyOrthology

Genomic alignmentsGenomic alignments

Gene familiesGene families

SNPsSNPs

GenesGenesChromosomesChromosomes

16 EBI Overview

Page 17: Bioinformatics tools for biologists @ the EBI An overview.

Genomes: Ensembl Genomes

17 EBI Overview

Across species View options

Ensembl Metazoa

Ensembl Metazoa

Ensembl BacteriaEnsembl Bacteria

Ensembl-like genome browser for non-vertebrate species

Ensembl-like genome browser for non-vertebrate species

Select Orthologue view to see putative orthologues

Using view options, you can select to view only the current gene or the entire expanded gene tree

Using view options, you can select to view only the current gene or the entire expanded gene tree

Page 18: Bioinformatics tools for biologists @ the EBI An overview.

Retrieving data with Biomart

• BioMart is a search engine that can be used to download data into a table format

• Many EBI databases are powered by Biomart

• For example, you can use Ensembl Biomart to retrieve:

All the genes for one species

Or… only genes on one specific region of a chromosome

Or… genes on one region of a chromosome associated with an InterPro domain

Or…etc.

18 EBI Overview

Page 19: Bioinformatics tools for biologists @ the EBI An overview.

Biomart – how it works

First Step:

Choose a dataset

Second step:

Add filters to define a gene set

Third step:

Add attributes to determine column output

19 EBI Overview

Page 20: Bioinformatics tools for biologists @ the EBI An overview.

Biomart results

20 EBI Overview

Page 21: Bioinformatics tools for biologists @ the EBI An overview.

www.biomart.org

21 EBI Overview

Page 22: Bioinformatics tools for biologists @ the EBI An overview.

ArrayExpress & Atlas of Gene Expression

• ArrayExpress Archive is a public repository of functional genomics experiments, including gene expression, supporting scientific publications

• You can query it to retrieve experimental information and download functional genomics data

• Atlas of Gene Expression contains a subset of curated and re-annotated Archive data

• Can be queried for individual gene expression under different biological conditions across experiments

22 EBI Overview

Page 23: Bioinformatics tools for biologists @ the EBI An overview.

Transcriptomes: ArrayExpress

Expand resultsExpand results

Spreadsheets describing the

experiment, sample properties or array

design

Spreadsheets describing the

experiment, sample properties or array

design

Search by keywordSearch by keyword

ArrayExpress Archive: browse

experiments

ArrayExpress Archive: browse

experiments

23 EBI Overview

Page 24: Bioinformatics tools for biologists @ the EBI An overview.

Transcriptomes: Atlas of Gene Expression

Search by gene name or biological condition

Search by gene name or biological condition

Gene summary page

Gene summary page

Atlas interfaceAtlas interface

Experiment pageExperiment page24 EBI Overview

Page 25: Bioinformatics tools for biologists @ the EBI An overview.

Protein sequence: UniProt• Provides the scientific community with a

comprehensive, richly curated, high-quality and freely accessible resource of protein sequence and functional information

• Users can perform simple and complex text-based queries, run sequence-based searches, perform multiple sequence alignments, etc.

• Consists of: UniProtKB/Swiss-prot, manually annotated UniProtKB/TrEMBL, computationally analyzed

records Uniref, clustered by sequence identity UniParc, most comprehensive publicly available

non-redundant protein sequence db, un-annotated UniMES, protein sequence from metagenomic and

environmental data

25 EBI Overview

Page 26: Bioinformatics tools for biologists @ the EBI An overview.

UniPort text search for Brca1

26 EBI Overview

Page 27: Bioinformatics tools for biologists @ the EBI An overview.

• Integrated documentation resource for protein families, domains and functional sites

• Protein signatures from different member databases describing the same biological protein family or domain are united into a single InterPro entry containing information about the signature(s) and links to the protein in UniProt

• Links to Gene Ontology indicate the biological function and process that the proteins are involved in

27 EBI Overview

Protein families, motifs & domains: InterPro

Page 28: Bioinformatics tools for biologists @ the EBI An overview.

Protein families, motifs and domains: InterPro

View architectures of proteins containing a signature

View architectures of proteins containing a signature

Compare methods of protein signature prediction

Compare methods of protein signature prediction

Visualize the taxonomic range for a protein signature

Visualize the taxonomic range for a protein signature

28 EBI Overview

Page 29: Bioinformatics tools for biologists @ the EBI An overview.

Molecular interaction database: Intact

• IntAct provides a freely available, open source database system and analysis tools for protein interaction data.

• All interactions are derived from literature curation or direct user submissions

• With Intact you can: Find molecules that interact with your

protein of interest

Display interaction networks

Analyze interaction networks using GO terms, molecule type, role, etc.

Download data

Install IntAct system locally

29 EBI Overview

Page 30: Bioinformatics tools for biologists @ the EBI An overview.

The Protein Data Bank in Europe (PDBe)

• PDBe is a resource for the collection, organization and dissemination of data about biological macromolecular structures

• A suite of web-based services allows you to: PDBeView and PDBeLite provide a flexible and user-friendly query interface to the PDBe

database

PDBeAnalysis provides searches and statistical analyses of macromolecular structure and residue information

PDBeFold allows performing pairwise or multiple comparisons as well as 3D alignments of structures

PDBeChem allows searching for and visualize any molecule in the PDB’s ligand dictionary

PDBePisa is an interactive tool for exploring macromolecular interfaces and surfaces, predicting probable quaternary structures (assemblies) and searching the PDB for structurally similar interfaces and assemblies

PDBeMotif allows complex searches of the PDB based on small 3D motifs, sequence motifs in conjunction with ligand environment, secondary structure patterns

Many more tools available

30 EBI Overview

Page 31: Bioinformatics tools for biologists @ the EBI An overview.

Structures: PDBe

LigandsLigands

Sequence mapping

Sequence mapping

Linking to domain data

Linking to domain data

AssembliesAssemblies

Surface matching

Surface matching

Fold matchingFold matching

Active sitesActive sites

Electron density

visualization

Electron density

visualization

31 EBI Overview

Page 32: Bioinformatics tools for biologists @ the EBI An overview.

PRoteomics IDEntifications database (PRIDE)

• PRIDE is a centralized, standards compliant, public data repository for proteomics data

• Provides the proteomics community with a public repository for protein and peptide identifications together with the evidence supporting these identifications.

• PRIDE is also able to capture details of post-translational modifications coordinated relative to the peptides in which they have been found.

32 EBI Overview

Page 33: Bioinformatics tools for biologists @ the EBI An overview.

Enzymes: IntEnz

• IntEnz (Integrated relational Enzyme database) is a freely available resource focused on enzyme nomenclature.

• IntEnz contains the recommendations of the Nomenclature Committee of the IUBMB on the nomenclature and classification of enzyme-catalysed reactions.

33 EBI Overview

Page 34: Bioinformatics tools for biologists @ the EBI An overview.

Chemical entities: ChEBI

• ChEBI is a freely available, manually annotated database of small molecular entities

• A molecular entity is any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity, not directly encoded by the genome

• With ChEBI you can: Find the correct chemical terminolgy using name, formula or registry number

Visualize chemical structures

Perform similarity searches

View the relationship between molecules using the chEBI ontology

Bridge the gap between small molecules and the macromolecules they interact with (crosslink to UniProt and Reactome)

Downoload chemical structures

Submit new structures

34 EBI Overview

Page 35: Bioinformatics tools for biologists @ the EBI An overview.

Chemical entities: ChEBI

Link to other databases

Link to other databases

View mappings to other databases such as

Reactome and Uniprot

View mappings to other databases such as

Reactome and Uniprot

View structure, nomenclature,

formula and more

View structure, nomenclature,

formula and more

View relationships in

the ChEBI Ontology

View relationships in

the ChEBI Ontology

Download flat files, database dumps and the ChEBI Ontology for local installation

Download flat files, database dumps and the ChEBI Ontology for local installation

35 EBI Overview

Page 36: Bioinformatics tools for biologists @ the EBI An overview.

• ChEMBL is a publicly available database of drugs, drug-like small molecules and their targets

• The data includes information about how small molecules bind to their targets, how these compounds affect cells and whole organisms, and information on the molecules’ absorption, distribution, metabolism, excretion and toxicity.

• ChEMBL holds two-dimensional structures, calculated molecular properties (e.g. logP, molecular weight, Lipinski ‘Rule of Five’ parameters) and bioactivity data (such as binding constants and pharmacology).

• The bioactivity data is tagged to show links between molecular targets and published assays, with a set of varying confidence levels.

• Additional data on the clinical progress of compounds is being integrated into ChEMBL.

36 EBI Overview

Chemogenomics: ChEMBL

Page 37: Bioinformatics tools for biologists @ the EBI An overview.

Chemogenomics: ChEMBL

ChEMBL

37 EBI Overview

Page 38: Bioinformatics tools for biologists @ the EBI An overview.

Pathways: Reactome

• A free, online, open-source curated database of pathways and reactions in human biology

• Information in the database is authored by expert biologist researchers, maintained by Reactome editorial staff

• Used to infer orthologous events in 22 non-human species including mouse, rat, chicken, puffer fish, worm, fly, yeast

• Extensively cross-referenced to other resources e.g. NCBI, Ensembl, UCSC genome Browser, UniProt, PubMed, KEGG, ChEBI and GO.

38 EBI Overview

Page 39: Bioinformatics tools for biologists @ the EBI An overview.

Pathways: Reactome

View reactions and events in detail

View reactions and events in detail

Select a pathway

Select a pathway

Compare events in different species

Compare events in different species

Export pathwayExport pathway

Page 40: Bioinformatics tools for biologists @ the EBI An overview.

Pathways: Reactome

Display expression dataDisplay expression data

Link to source databases

Link to source databases40 EBI Overview

Page 41: Bioinformatics tools for biologists @ the EBI An overview.

Biological ontologies: Gene Ontology (GO)

• The GO project is a collaborative effort to address the need for consistent descriptions of gene products in different databases

• GO develops ontologies that describe biological processes, cellular components and molecular functions in a species-independent manner

• Also GO annotates several of the EBI’s databases with GO terms

41 EBI Overview

Page 42: Bioinformatics tools for biologists @ the EBI An overview.

User support

• 2Can bioinformatics user support – www.ebi.ac.uk/2Can

• Online help pages – www.ebi.ac.uk/help

• E-mail support – www.ebi.ac.uk/support

42 EBI Overview

Page 43: Bioinformatics tools for biologists @ the EBI An overview.

http://www.ebi.ac.uk/Information/Brochures/

43 EBI Overview

Page 44: Bioinformatics tools for biologists @ the EBI An overview.

Researchwww.ebi.ac.uk/groups

Page 45: Bioinformatics tools for biologists @ the EBI An overview.

45 EBI Overview45

Key facts about research

• The EBI provides a unique environment for bioinformatics research

• Seven dedicated research groups aim to understand biology through new approaches to interpreting biological data

• Services teams also carry out R&D to enhance existing services and develop new ones

• Research program complements services and the two are mutually supportive

Page 46: Bioinformatics tools for biologists @ the EBI An overview.

Mammalian stem cell differentiation and development Bertone

Vertebrate genome annotationFlicek

Genome analysis using evolutionary toolsGoldman

Transcriptome analysis on a genomic scaleBrazma

Functional genomics and small RNA analysisEnright

Literature analysis and semantic data integration in life science researchRebholz-Schuhmann

Protein sequence analysis and functional annotationApweiler

Cheminformatics and metabolismSteinbeck

Chemogenomics and drug discoveryOverington

Neurobiology networks and systemsLe Novère

Genome-scale analysis of regulatory systemsLuscombe

Analysis of protein structure, function and evolutionThornton

Algorithmic methods for genome analysisBirney

Analysis and validation of protein structures; protein–ligand interactionsKleywegt

Research

Systems BiomedicineSaez-Rodriguez

Evolutionary biologyMarioni

Page 47: Bioinformatics tools for biologists @ the EBI An overview.

Trainingwww.ebi.ac.uk/training

Page 48: Bioinformatics tools for biologists @ the EBI An overview.

48 EBI Overview4848

Bioinformatics Roadshow

eLearning programme

Hands-on training at EMBL-EBI

A tripartite user-training programme

Training comes to youwww.ebi.ac.uk/training/roadshow

Training comes to youwww.ebi.ac.uk/training/roadshow

Training any time, anywhere, at any pace

www.ebi.ac.uk/training/elearning

Training any time, anywhere, at any pace

www.ebi.ac.uk/training/elearning

Hands-on user training on all our core data resources for researchers

www.ebi.ac.uk/training/handson

Hands-on user training on all our core data resources for researchers

www.ebi.ac.uk/training/handson

Page 49: Bioinformatics tools for biologists @ the EBI An overview.

49 EBI Overview49

Hands-on training for all levels of experience

• Interactive training in our purpose-built IT training suite at EMBL-EBI, Hinxton, Cambridge

• Learn from the EBI’s experts through a combination of talks and practical exercises

• Take a tour of all our core data resources, or focus in on specific data types

• Full programme at www.ebi.ac.uk/training/handson

Page 50: Bioinformatics tools for biologists @ the EBI An overview.

50 EBI Overview50

eLearning project – pilot phase

50

• Do you want to learn at your own pace at a time that suits you?

• We are developing a new eLearning platform and need our users to help us test it

• If you would like to get involved, contact: [email protected]