Light Intro to the Gene Ontology
description
Transcript of Light Intro to the Gene Ontology
![Page 1: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/1.jpg)
Introduction to the Gene Ontology
Nic WeberLIS 590 Ontology Development in Natural Sciences
9/24/2010All works referenced at first use,
all images are CC except where notes
![Page 2: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/2.jpg)
Gene Ontology
Why : “The main opportunity lies in the possibility of automated transfer of biological annotations from the experimentally tractable model organisms to the less tractable organisms based on gene and protein sequence similarity.” Ashburner et al. p 25
*Breakthroughs in sequencing show large fraction of genes specifying core bio functions are shared by all eukaryotes (commonalities at cellular level) *Knowledge of role of shared protein in one organism can often transferred (less duplication of work / saved money)
*Sequencing takes place at large scale, new discoveries constant (need for documenting change in controlled way)
*Traditional Indexing efforts proved “unwieldy” in fruit fly and mouse sequencing
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics, 25(1), 25-9. doi: 10.1038/75556.
![Page 3: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/3.jpg)
Gene Ontology
Goals 1. Produce a dynamic, controlled vocabulary of that can be
applied to eukaryotes. Provide formal structure to document and adopt change.
2. Facilitate the annotation of and dissemination of annotations for genes and gene products
For problematic reasons with hierarchal models (EC), indexing, and biological terminology like “functions”, three ontologies were developed
1.Biological Process2. Molecular Function3. Cellular Component
![Page 4: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/4.jpg)
Biological Process
The biolgical objective to which the gene or gene product contributes. A process is accomplished via one or more ordered assemblies of molecular functions.
*(This is an ordered process in that something goes in, something different comes out)
![Page 5: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/5.jpg)
Molecular Function
The biochemical activity (incuding binding ) of a gene product. Also applies to the capability that a gene product carries as a potential. Describes only what is done, not when or where.
![Page 6: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/6.jpg)
Cellular Component
The place in all cells where a gene product is active. These terms reflect our understanding of eukaryotic cell structure. (i.e. ‘ribosome’ or ‘nuclear membrane’)
![Page 7: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/7.jpg)
Dependent vs. Independent Entities
1. Biological Process: Dependent (“occurrents that require support from some substance in order to allow them to occur.” Smith et al. p4)
2. Molecular Function: Dependent (“which means entities which have a necessary reference to the sub- stances in which they inhere.” ibid)
3. Cellular Component: Independent
![Page 8: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/8.jpg)
GO “Terms”
Each “Ontology” defines terms representing gene product properties.
Each GO term within the ontology contains the following: 1. unique alphanumeric identifier2. term name (which may be a word or string of words)3. definition with cited sources 4. namespace indicating the domain to which it belongs.
*Terms may also have synonyms, which are classed as being exactly equivalent to the term name, broader, narrower, or related
4. references to equivalent concepts in other databases5. comments on term meaning or usage.
![Page 9: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/9.jpg)
Example GO Term
[Term] id: GO:0000010 name: trans-hexaprenyltranstransferase activity namespace: molecular_function def: "Catalysis of the reaction: all-trans-hexaprenyl diphosphate + isopentenyl
diphosphate = diphosphate + all-trans-heptaprenyl diphosphate." [EC:2.5.1.30]
subset: gosubset_proksynonym: "all-trans-heptaprenyl-diphosphate synthase activity" EXACT
[EC:2.5.1.30]synonym: "all-trans-hexaprenyl-diphosphate:isopentenyl-diphosphate
hexaprenyltranstransferase activity" EXACT [EC:2.5.1.30]synonym: "heptaprenyl diphosphate synthase activity" EXACT [EC:2.5.1.30]synonym: "heptaprenyl pyrophosphate synthase activity" EXACT [EC:2.5.1.30] synonym: "heptaprenyl pyrophosphate synthetase activity" EXACT
[EC:2.5.1.30]xref: EC:2.5.1.30xref: MetaCyc:TRANS-HEXAPRENYLTRANSTRANSFERASE-RXN is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than
methyl) groups
![Page 10: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/10.jpg)
How Do GO Terms Work
GO terms are connected into nodes of a network, thus the connections between its parents and children are known and form what are technically described as directed acyclic graphs.
In a GO DAG- Terms are nodes and Relationships among them are edges.
![Page 11: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/11.jpg)
What the F*@% is a Directed Acyclic Graph?
directed graph- a set A whose elements are called nodes or verticies and a set E with connecting arcs or edges.
So that G = (V,E)
Directed Acyclic Graph- a directed graph with no directed cycles.
*Formed by a collection of vertices and directed edges*Each edge connecting one vertex to another, so that
there is no way to start at some vertex A and follow a sequence of edges that eventually loops back to A again.
*Important note : DAGs are distinct from hierarchies, in that each term in a DAG may have more than one parent term; these terms are generally connected by ‘is-a’ and ‘part-of’ relations.
Images via: commons.wikimedia.org
![Page 12: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/12.jpg)
GO Directed Acyclic Graph
Image via: commons.wikimedia.org
![Page 13: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/13.jpg)
“Relationships”
Each term has a defined “relationship” to another term in the same ontology or a related ontology (in GO.)
is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups
![Page 14: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/14.jpg)
is_a …part_ofOriginally only two relationship types. is_a = subsumption ; part_of = patromonic inclusion
New Types In last year regulates, positively-regulates, and
negatively regulates have been added to distinguish gene products that play a regulatory vs. direct role in a biological process
Relationship types
![Page 15: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/15.jpg)
Problems… is_a
Meant to facilitate “instance of ”
In practice often used to model as “is a kind of” relationships between universals.
The is_a relation in its intended meaning indicates a necessary relationship. That is, when we say “euka- ryotic cell is_a cell”, we mean that every eukaryotic cell is a cell.
In practice, cases of non-necessary subsumption
(i.e. transport, or cell growth)
![Page 16: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/16.jpg)
Problems…part_of
Explained usage = “can be a part of, not is always a part of”
In GO, part_of is used transitively (e.g. where A = B; and B = C; then also A = C)
Can’t significantly represent an occurrent , meaning the notion of time is not accurately represented in these relations.
![Page 17: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/17.jpg)
Part – Whole …. has_part
Also introduced has_part “…In GO, the relationship A has_part B means that A necessarily (always) has B as a part; i.e., if A exists then B also exists as a part of A. If A does not exist, B may or may not exist.
Example ‘cell envelope’ has_part ‘plasma membrane’”
From: Consortium, G. O. (2010). The Gene Ontology in 2010: extensions and refinements. Nucleic acids research, 38(Database issue), D331-5. doi: 10.1093/nar/gkp1018.
![Page 18: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/18.jpg)
has_part modeled
![Page 19: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/19.jpg)
Annotations (applied terms)
Capture data about a gene or gene product, GO provides terms to do so. These annotations allow for genomic information to be uploaded and shared.
When a gene is annotated to a term, associations between the gene and terms’ parents are implicitly inferred.
Annotations are either generated by a curator or automatically through predictive methods (Rhee et al. p 509)
![Page 20: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/20.jpg)
Annotation Structure
• Gene product identifier• Relevant GO termGO annotations have the following data:• Reference of the annotation (e.g. a journal article)• Evidence code denoting the type of evidence upon
which the annotation is based• Date of annotation • Creator of annotation
![Page 21: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/21.jpg)
Evidence Codes
Evidence codes are of four types:1. Experimental 2. Computational3. Indirectly derived from exp or comp4. unknown 95% of annotations are computational, this is
problematic in that computational annotations increase coverage but also likely to be false positives
![Page 22: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/22.jpg)
Annotation Qualifiers
Colocallizes_with
Contributes_to
Not (most vital) – indicates a lack of properties.
![Page 23: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/23.jpg)
Annotation in EMBL-EBI
http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0006915#term=info
(In case link fails, this is a quick view from GO)
Gene product: Actin, alpha cardiac muscle 1, UniProtKB:P68032GO term: heart contraction ; GO:0060047 (biological process) Evidence code: Inferred from Mutant Phenotype (IMP) Reference: PMID:17611253 Assigned by: UniProtKB, June 06, 2008
![Page 24: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/24.jpg)
Universals and Particulars
Universal: species E-coli; function: boost insulinParticulars: E-coli in this petri dish; function:
boost insulin in subject X pancreas
“GO terms correspond, in philosophical terminology, to universals…and each universal corresponding to the term Cell is instantiated by every actual cell.” Smith et al. p 3
![Page 25: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/25.jpg)
Continuants vs. Occurrents
Continuants: entities that continue to exist throughout time (cells, organisms, chromosomes) Preserve their identity, while undergoing variety of changes.
Occurrents (events, processes): Unfold through time.
![Page 26: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/26.jpg)
But…
“Biological process, molecular function and cellular components are all attributes of genes, gene products or gene-product groups.” p. 27
..do we usually model attributes as ontologies?
Are genes, gene products or gene product groups, “backbone” ontologies, OR Super Classes? If these aren’t Top Level Ontologies, what are they?
![Page 27: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/27.jpg)
Smith et al. ; Yu’s “other” example *Recall Yu’s Fourth Definition of Ontologies“The Gene Ontology, in spite of its name, is not
an ontology as the latter term is commonly used either by information scientists or by philosophers.It is, as the GO Consortium puts it, a ‘controlled vocabulary’…. their efforts have been directed toward providing a practically useful framework for keeping track of the biological annotations that are applied to gene products.” Smith et al. p 1
![Page 28: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/28.jpg)
Problems and Potential Solutions
Each new term requires understanding of the whole. Therefore curators must be subject experts in order to perform meaningful enhancement.
Solution: make explicit the criteria used for discriminating subclassifications by introducing a decision-tree methodology into the construction of each hierarchy. ( Is this a good solution?)
![Page 29: Light Intro to the Gene Ontology](https://reader035.fdocuments.in/reader035/viewer/2022062708/558c8ff9d8b42af2428b45f6/html5/thumbnails/29.jpg)
Drawbacks to GO
1) It is unclear what kinds of reasoning are permissible on the basis of GO’s hierarchies.
2) The rationale of GO’s subclassifications is un- clear. The reasoning that went into current choices has not been preserved and thus cannot be explained to or re-examined by a third party.
3) No procedures are offered by which GO can be validated. 4) There are insufficient rules for determining how to recognize
whether a given concept is or is not present in GO. The use of a mere string search pre- supposes that all concepts already have a single standardized representation, which is not the case.
Smith et al. p6