Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology
description
Transcript of Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology
![Page 1: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/1.jpg)
Lecture Four:
GO: The Gene Ontology----Infrastructure for Systems Biology
![Page 2: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/2.jpg)
S. cerevisiae
![Page 3: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/3.jpg)
D. melanogaster
![Page 4: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/4.jpg)
Cells that normally surviveCED-9
ON
CED-3CED-4OFF
CED-9OFF
CED-3CED-4
ON
Cells that normally die
C elegans
![Page 5: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/5.jpg)
M. musculus
![Page 6: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/6.jpg)
MCM3
MCM2
CDC46/MCM5
CDC47/MCM7
CDC54/MCM4
MCM6
These proteins form a hexamer in the species that have been examined
Comparison of sequences from 4 organisms
![Page 7: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/7.jpg)
A Common Language for Annotation of Genes from
Yeast, Flies and Mice
The Gene Ontologies
…and Plants and Worms
…and Humans
…and anything else!
![Page 8: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/8.jpg)
Gene Ontology - 1998
FlyBase Drosophila Cambridge, EBI, HarvardBerkeley & Bloomington.
SGD Saccharomyces Stanford.
MGI Mus Jackson Labs., Bar Harbor.
![Page 9: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/9.jpg)
Gene Ontology -now
• Fruitfly - FlyBase• Budding yeast - Saccharomyces Genome Database (SGD)• Mouse - Mouse Genome Database (MGD & GXD)• Rat - Rat Genome Database (RGD)• Weed - The Arabidopsis Information Resource (TAIR)• Worm - WormBase• Dictyostelium discoidem - Dictybase• InterPro/UniProt at EBI - InterPro• Fission yeast - Pombase• Human - UniProt, Ensembl, NCBI, Incyte, Celera, Compugen• Parasites - Plasmodium, Trypanosoma, Leishmania - GeneDB - Sa
nger• Microbes - Vibrio, Shewanella, B. anthracus, … - TIGR• Grasses - rice & maize - Gramene database• zebra fish – Zfin.........
![Page 10: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/10.jpg)
To provide
structured controlled vocabularies
for the
representation of biological knowledge
in
biological databases.
![Page 11: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/11.jpg)
• Be open source
• Use open standards
• Make data & code available without constraint
• Involve your community
![Page 12: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/12.jpg)
Gene Ontology Objectives• GO represents concepts used to classify
specific parts of our biological knowledge:– Biological Process– Molecular Function– Cellular Component
• GO develops a common language applicable to any organism
• GO terms can be used to annotate gene products from any species, allowing comparison of information across species
![Page 13: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/13.jpg)
GO: Three ontologies
Where does it act?
What processes is it involved in?
What does it do? Molecular Function
Cellular Component
Biological Process
gene product
![Page 14: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/14.jpg)
Molecular Function 7,309 terms Biological Process 10,041 terms Cellular Component 1,629 terms
Total 18, 975 terms
Definitions: 94.9 %Obsolete terms: 992
Content of GO
![Page 15: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/15.jpg)
term: gluconeogenesis
id: GO:0006094
definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol.
What’s in a GO term?
![Page 16: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/16.jpg)
Mitochondrial P450
Annotation of gene products with GO terms
![Page 17: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/17.jpg)
Cellular component: mitochondrial inner membrane GO:0005743
Biological process:Electron transportGO:0006118
Molecular function: monooxygenase activity GO:0004497substrate + O2 = CO2 +H20 product
![Page 18: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/18.jpg)
Other gene products annotated to monooxygenase activity (GO:0004497)
- monooxygenase, DBH-like 1 (mouse)
- prostaglandin I2 (prostacyclin) synthase (mouse)
- flavin-containing monooxygenase (yeast)
- ferulate-5-hydrolase 1 (arabidopsis)
![Page 19: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/19.jpg)
![Page 20: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/20.jpg)
![Page 21: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/21.jpg)
What’s in a name?
• Glucose synthesis• Glucose biosynthesis• Glucose formation• Glucose anabolism• Gluconeogenesis
• All refer to the process of making glucose from simpler components
![Page 22: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/22.jpg)
tree directed acyclic graph
![Page 23: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/23.jpg)
Nucleus
Nucleoplasm Nuclearenvelope
ChromosomePerinuclear spaceNucleolus
A child is a subset ofa parent’s elements
The cell component term Nucleus has 5 children
Parent-Child Relationships
![Page 24: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/24.jpg)
Ontology RelationshipsDirected Acyclic Graph
![Page 25: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/25.jpg)
![Page 26: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/26.jpg)
Evidence Codes for GO Evidence Codes for GO AnnotationsAnnotations
http://www.geneontology.org/doc/GO.evidence.html
![Page 27: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/27.jpg)
IEA Inferred from Electronic Annotation
ISS Inferred from Sequence Similarity
IEP Inferred from Expression Pattern
IMP Inferred from Mutant Phenotype
IGI Inferred from Genetic Interaction
IPI Inferred from Physical Interaction
IDA Inferred from Direct Assay
RCA Inferred from Reviewed Computational Analysis
TAS Traceable Author Statement
NAS Non-traceable Author Statement
IC Inferred by Curator
ND No biological Data available
![Page 28: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/28.jpg)
Meloidogyne incognita: McCarter et al. 2003
Annotation summaries
![Page 29: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/29.jpg)
![Page 30: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/30.jpg)
Two types of GO Annotations:
Electronic Annotation
Manual Annotation
All annotations must:
• be attributed to a source
• indicate what evidence was found to support the GO term-gene/protein association
![Page 31: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/31.jpg)
Manual Annotations
• High–quality, specific gene/gene product associations made, using:
• Peer-reviewed papers
• Evidence codes to grade evidence
BUT – is very time consuming and requires trained biologists
![Page 32: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/32.jpg)
1. Extract information from published literature
2. Curators performs manual sequence similarity analyses to transfer annotations between highly similar gene products (BLAST, protein domain analysis)
Manual Annotations: Methods
![Page 33: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/33.jpg)
Finding GO termsIn this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GFP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…
Process: response to wounding GO:0009611
serine/threonine kinase activity,
Function: protein serine/threonine kinase activity GO:0004674
integral membrane protein
Component: integral to plasma membrane GO:0005887
PubMed ID: 12374299wound response
![Page 34: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/34.jpg)
Electronic Annotations
• Provides large-coverage
• High-quality
BUT – annotations tend to use high-level GO terms and provide little detail.
![Page 35: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/35.jpg)
1. Database entries
• Manual mapping of GO terms to concepts external to GO (‘translation tables’)
• Proteins then electronically annotated with the relevant GO term(s)
2. Automatic sequence similarity analyses to transfer annotations between highly similar gene products
Electronic Annotations: Methods
![Page 36: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/36.jpg)
Fatty acid biosynthesis (Swiss-Prot Keyword)
EC:6.4.1.2 (EC number)
IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry)
GO:Fatty acid biosynthesis
(GO:0006633)
GO:acetyl-CoA carboxylase activity
(GO:0003989)
GO:acetyl-CoA carboxylaseactivity
(GO:0003989)
Electronic Annotations
![Page 37: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/37.jpg)
Mappings of external concepts to GO
EC:1.1.1.1 > GO:alcohol dehydrogenase activity ; GO:0004022EC:1.1.1.10 > GO:L-xylulose reductase activity ; GO:0050038EC:1.1.1.104 > GO:4-oxoproline reductase activity ; GO:0016617EC:1.1.1.105 > GO:retinol dehydrogenase activity ; GO:0004745
![Page 38: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/38.jpg)
• A gene product can have several functions, cellular locations and be involved in many processes
• Annotation of a gene product to one ontology is independent from its annotation to other ontologies
• Annotations are only to terms reflecting a normal activity or location
• Usage of ‘unknown’ GO terms
Additional points
![Page 39: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/39.jpg)
Unknown v.s. Unannotated
• “Unknown” is used when the curator has determined that there is no existing literature to support an annotation.– Biological process unknown GO:0000004– Molecular function unknown GO:0005554– Cellular component unknown GO:0008372
• NOT the same as having no annotation at all – No annotation means that no one has looked yet
![Page 40: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/40.jpg)
Annotation of a genome
• GO annotations are always work in progress
• Part of normal curation process
– More specific information
– Better evidence code
• Replace obsolete terms
• “Last reviewed” date
![Page 41: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/41.jpg)
How to access the Gene ontology and its annotations
1. Downloads • Ontologies
• Annotations : Gene association files
• Ontologies and Annotations
2. Web-based access • AmiGO (http://www.godatabase.org)
• QuickGO
(http://www.ebi.ac.uk/ego)
among others…
![Page 42: Lecture Four: GO: The Gene Ontology ---- Infrastructure for Systems Biology](https://reader035.fdocuments.in/reader035/viewer/2022062801/56814406550346895db09b30/html5/thumbnails/42.jpg)
组别 第四讲:讨论论文(课堂讨论时间 5 分左右)
A
C
D
E
H
M
S