Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology

Post on 11-Jan-2016

14 views 1 download

Tags:

description

Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology. S. cerevisiae. D. melanogaster. Cells that normally survive. CED-3 CED-4 OFF. CED-9 ON. Cells that normally die. CED-3 CED-4 ON. CED-9 OFF. C elegans. M. musculus. - PowerPoint PPT Presentation

Transcript of Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology

Lecture Four:

GO: The Gene Ontology----Infrastructure for Systems Biology

S. cerevisiae

D. melanogaster

Cells that normally surviveCED-9

ON

CED-3CED-4OFF

CED-9OFF

CED-3CED-4

ON

Cells that normally die

C elegans

M. musculus

MCM3

MCM2

CDC46/MCM5

CDC47/MCM7

CDC54/MCM4

MCM6

These proteins form a hexamer in the species that have been examined

Comparison of sequences from 4 organisms

A Common Language for Annotation of Genes from

Yeast, Flies and Mice

The Gene Ontologies

…and Plants and Worms

…and Humans

…and anything else!

Gene Ontology - 1998

FlyBase Drosophila Cambridge, EBI, HarvardBerkeley & Bloomington.

SGD Saccharomyces Stanford.

MGI Mus Jackson Labs., Bar Harbor.

Gene Ontology -now

• Fruitfly - FlyBase• Budding yeast - Saccharomyces Genome Database (SGD)• Mouse - Mouse Genome Database (MGD & GXD)• Rat - Rat Genome Database (RGD)• Weed - The Arabidopsis Information Resource (TAIR)• Worm - WormBase• Dictyostelium discoidem - Dictybase• InterPro/UniProt at EBI - InterPro• Fission yeast - Pombase• Human - UniProt, Ensembl, NCBI, Incyte, Celera, Compugen• Parasites - Plasmodium, Trypanosoma, Leishmania - GeneDB - Sa

nger• Microbes - Vibrio, Shewanella, B. anthracus, … - TIGR• Grasses - rice & maize - Gramene database• zebra fish – Zfin.........

To provide

structured controlled vocabularies

for the

representation of biological knowledge

in

biological databases.

• Be open source

• Use open standards

• Make data & code available without constraint

• Involve your community

Gene Ontology Objectives• GO represents concepts used to classify

specific parts of our biological knowledge:– Biological Process– Molecular Function– Cellular Component

• GO develops a common language applicable to any organism

• GO terms can be used to annotate gene products from any species, allowing comparison of information across species

GO: Three ontologies

Where does it act?

What processes is it involved in?

What does it do? Molecular Function

Cellular Component

Biological Process

gene product

Molecular Function 7,309 terms Biological Process 10,041 terms Cellular Component 1,629 terms

Total 18, 975 terms

Definitions: 94.9 %Obsolete terms: 992

Content of GO

term: gluconeogenesis

id: GO:0006094

definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol.

What’s in a GO term?

Mitochondrial P450

Annotation of gene products with GO terms

Cellular component: mitochondrial inner membrane GO:0005743

Biological process:Electron transportGO:0006118

Molecular function: monooxygenase activity GO:0004497substrate + O2 = CO2 +H20 product

Other gene products annotated to monooxygenase activity (GO:0004497)

- monooxygenase, DBH-like 1 (mouse)

- prostaglandin I2 (prostacyclin) synthase (mouse)

- flavin-containing monooxygenase (yeast)   

- ferulate-5-hydrolase 1 (arabidopsis)

What’s in a name?

• Glucose synthesis• Glucose biosynthesis• Glucose formation• Glucose anabolism• Gluconeogenesis

• All refer to the process of making glucose from simpler components

tree directed acyclic graph

Nucleus

Nucleoplasm Nuclearenvelope

ChromosomePerinuclear spaceNucleolus

A child is a subset ofa parent’s elements

The cell component term Nucleus has 5 children

Parent-Child Relationships

Ontology RelationshipsDirected Acyclic Graph

Evidence Codes for GO Evidence Codes for GO AnnotationsAnnotations

http://www.geneontology.org/doc/GO.evidence.html

IEA Inferred from Electronic Annotation

ISS Inferred from Sequence Similarity

IEP Inferred from Expression Pattern

IMP Inferred from Mutant Phenotype

IGI Inferred from Genetic Interaction

IPI Inferred from Physical Interaction

IDA Inferred from Direct Assay

RCA Inferred from Reviewed Computational Analysis

TAS Traceable Author Statement

NAS Non-traceable Author Statement

IC Inferred by Curator

ND No biological Data available

Meloidogyne incognita: McCarter et al. 2003

Annotation summaries

Two types of GO Annotations:

Electronic Annotation

Manual Annotation

All annotations must:

• be attributed to a source

• indicate what evidence was found to support the GO term-gene/protein association

Manual Annotations

• High–quality, specific gene/gene product associations made, using:

• Peer-reviewed papers

• Evidence codes to grade evidence

BUT – is very time consuming and requires trained biologists

1. Extract information from published literature

2. Curators performs manual sequence similarity analyses to transfer annotations between highly similar gene products (BLAST, protein domain analysis)

Manual Annotations: Methods

Finding GO termsIn this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GFP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…

Process: response to wounding GO:0009611

serine/threonine kinase activity,

Function: protein serine/threonine kinase activity GO:0004674

integral membrane protein

Component: integral to plasma membrane GO:0005887

PubMed ID: 12374299wound response

Electronic Annotations

• Provides large-coverage

• High-quality

BUT – annotations tend to use high-level GO terms and provide little detail.

1. Database entries

• Manual mapping of GO terms to concepts external to GO (‘translation tables’)

• Proteins then electronically annotated with the relevant GO term(s)

2. Automatic sequence similarity analyses to transfer annotations between highly similar gene products

Electronic Annotations: Methods

Fatty acid biosynthesis (Swiss-Prot Keyword)

EC:6.4.1.2 (EC number)

IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry)

GO:Fatty acid biosynthesis

(GO:0006633)

GO:acetyl-CoA carboxylase activity

(GO:0003989)

GO:acetyl-CoA carboxylaseactivity

(GO:0003989)

Electronic Annotations

Mappings of external concepts to GO

EC:1.1.1.1 > GO:alcohol dehydrogenase activity ; GO:0004022EC:1.1.1.10 > GO:L-xylulose reductase activity ; GO:0050038EC:1.1.1.104 > GO:4-oxoproline reductase activity ; GO:0016617EC:1.1.1.105 > GO:retinol dehydrogenase activity ; GO:0004745

• A gene product can have several functions, cellular locations and be involved in many processes

• Annotation of a gene product to one ontology is independent from its annotation to other ontologies

• Annotations are only to terms reflecting a normal activity or location

• Usage of ‘unknown’ GO terms

Additional points

Unknown v.s. Unannotated

• “Unknown” is used when the curator has determined that there is no existing literature to support an annotation.– Biological process unknown GO:0000004– Molecular function unknown GO:0005554– Cellular component unknown GO:0008372

• NOT the same as having no annotation at all – No annotation means that no one has looked yet

Annotation of a genome

• GO annotations are always work in progress

• Part of normal curation process

– More specific information

– Better evidence code

• Replace obsolete terms

• “Last reviewed” date

How to access the Gene ontology and its annotations

1. Downloads • Ontologies

• Annotations : Gene association files

• Ontologies and Annotations

2. Web-based access • AmiGO (http://www.godatabase.org)

• QuickGO

(http://www.ebi.ac.uk/ego)

among others…

组别 第四讲:讨论论文(课堂讨论时间 5 分左右)

A

C

D

E

H

M

S