Outline
• Principles of Compositionality• Tour of PATO• Pre vs post composition• Quantitative phenotypes• Next steps
Phenotype annotation: why?
• To shed light on the relationships between genes, environment and phenotype
• To compare genes and phenotypes across organisms
• To improve human health and wellbeing
Difficulties
• Phenotypes can be complex– Descriptions are often composite– Encompass relationships between different
kinds of entities, at different levels of granularity
– Different ways of describing the same thing
• Descriptions must be rigorous and unambiguous– Ensures meaningful analyses and
comparisons within and between organisms
Compositionality is essential for describing
phenotypes• Compositionality is a principle of good
ontology design– aka building blocks, cross-products,
normalised/modular design– Create complex descriptions (definitions)
from simpler ones
• Descriptions can be composed at any time– Ontology construction time (pre-composition)– Annotation time (post-composition)
An example of compositionality
• Plasma membrane of spermatocyte• Plasma membrane [GO CC]• Spermatocyte [OBO Cell]
• Formal means of composition• Genus-differentia
a plasma membrane which is part_of a spermatocyte
GO-CC OBO-REL Cell
Genus Differentia
Compositionality and ontology tools
• Composition supported by:– Phenote– OBO-Edit
• Cross-product plugin
– Protégé-OWL– SWOOP– …and others
Advantage: Automatic DAG calculation
a plasma membrane which is part_of a spermatocyte
a membrane which is part_of a germ cell
The building blocks of phenotype descriptions:
EQ• Entities and qualities (EQ)
– (Bearer) Entity• E.g: compound eye, spermatocyte, blood, wing
growth, scale morphogenesis
– Quality (aka property, attribute)• A kind of dependent continuant• Defined in PATO• E.g: green, hot, squamous, rugose, edematous,
light-sensitivity, luminescent, ectopic, arrested, decomposed
Formal treatment of EQ
• We must be clear about what we mean when we compose an E and a Q– Otherwise we will have incomplete query
results and erroneous statistics in annotations
– The meaning must be computable
• Formally, an EQ description defines:a Quality which inheres_in a bearer entity
Example
normal eya[1]/eya[1]
E Q
Cell death in eye
Increased rate
Eye disc cell small
Eye disc cell refractile
Kinds of entities which can be bearers of biological
qualities• Continuants (3D entities)
– Cell parts (GO)– Cells (OBO Cell ontology)– Gross anatomical entities (CARO,
FMA, flyAO, MA, zfishAO, …)– Aggregates of organisms (?)
• Occurrents (4D entities)– Biological processes (GO)
normal eya[1]/eya[1]
E Q
Cell death in eye
Increased rate
Eye disc cell small
Eye disc cell refractile
GO FlyAO
PATO
Tour of PATO
• Tour from the top-down• The top level of PATO has been built
according to formal ontological principles– This helps us define terms in a consistent
and unambiguous way– The top level can be hidden from end-users
by means of ontology views (aka slims)– Still subject to change
• Feedback welcome!
PATO: Top level division
Quality
Quality of a continuantA quality which inheresIn a continuant
Quality of an occurrentA quality which inheresIn a process or spatiotemporalregion
arrested
color premature delayed
durationmorphology
physical
quality
density shape size structure
Note: some nodes omitted
for brevity
cellular
quality rate
Divisions by granularity
Monadic quality of a continuant
Physical qualityA quality that exists throughaction of continuants at thephysical level of organisation
Cellular qualityA quality that exists atthe cellular level of organisation
potency
color
hot
nucleatequality
ploidytemperatu
re mass
……
cold
diploidhaploidaneuploi
d
multipotenttotipotent
oligoptent
greenpink
yellowanucleate
binculeate
largemass
smallmass
Monadic vs relational
quality of a continuant
Monadic quality of a CA quality of a C that inheres solely in the bearer and does not require another entity
Relational quality of a CA quality of a C that requires anotherentity apart from its bearer to exist
Displacement
(with)
Physical quality
Connected-ness
(to)
Sensitivity(to)
Cellular
quality morpholog
y
……
shape size structure
Example relational quality
• Sensitivity– Directed towards some entity type
• E.g.– Sensitivity of an eye to red light
• The quality inheres_in the eye• With respect to (towards) red light
– Pheno-syntax:• E= eye Q= sensitivity E2= red_light
On absence
• Annotation patterns for absence, counts are currently under discussion
• “spermatocyte devoid of asters”– E= CL:spermatocyte
• Inheres in the spermatocyte
– Q= PATO:lacks_part• The quality/relation of missing some part or parts
– E2= GO-CC:aster• The quality is with respect to the type “aster”
Pre- vs post- composition
• When do we build the phenotype description?– In the ontology– During annotation?
• Reconciling pre and post composition: An analysis of the plant_trait ontology
When do we build the phenotype description?
• Early?– Pre-composed phenotype definitions
• MP:0000017 “big ears”• TO:0000227 “root length”• TO:0000029 “chlorine sensitivity”
• Late?– Post-composed phenotype definitions
• E= MA:ear Q= PATO:big• E= PO:root Q= PATO:length• E= organism Q= PATO:sensitivity E2=
CHEBI:chlorine
Is this comparable?
MP:0000285 “abnormal cardiac valve morphology”
MP:0000287 “heart valve hypoplasia”
E= MA:heart_valve Q=PATO:hypoplastic
PATO:0000141 “structure”
PATO:0000645 “hypoplastic”
PATO:0000051 “morphology”?
Yes: if term is decomposable
MP:0000285 “abnormal cardiac valve morphology”
MP:0000287 “heart valve hypoplasia”
E= MA:heart_valve Q=PATO:hypoplastic
PATO:0000141 “structure”
PATO:0000645 “hypoplastic”
PATO:0000051 “morphology”=
Def: a hypoplasticity which inheres_in a heart valve
Comparing phenotypes
• We want to compare and query both within and across species– For gross anatomical phenotypes to be
compared across species, descriptions must be decomposed or decomposable to anatomical terms
• Anatomical terms must be comparable– Homology links– CARO: Common Anatomy Reference
Ontology
Case study: Defining plant traits with PATO
• OBO Plant Trait ontology• Pre-composed phenotype terms
– Analagous to OBO mammalian_phenotype ontology
• Task: Define these terms with PATO– A good test of PATO– Demonstration of compositional approach– Allows meaningful comparison across plant
species– Pilot study before applying to metazoans
http://www.bioontology.org/wiki/index.php/PATO:Pre_vs_Post_Coordinating
Methods
• Creation of genus-differentia definitions– First pass: Obol– Second pass: manual editing
• Ontologies used– PATO– Plant anatomical entities (PO)– Gramene environment (GEO)– Chemical entities of biological interest (CHEBI)– GO
Basic phenotype terms
• “root length” (TO:00000227)– E= PO:root Q= PATO:length– Formally:
Def: a length which inheres_in a root
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Relational qualities involving types of
chemical• “Chlorine sensitivity”
[TO:0000029]• Directed towards an additional
entity type– Q= PATO:sensitivity E2= CHEBI:chlorine
Def: a sensitivity which is directed towards chlorine[ inheres_in organism ]
Relational qualities involving the environment• “drought sensitivity” [TO:0000029]
– Directed towards an additional entity type
– Q= PATO:sensitivity E2= EO:droughtDef: a sensitivity which is directed towards drought[ inheres_in organism ]
OBO needs a good environment ontology
Complex phenotypes
• “Chinsura boro”– "Abortion of microspore development
at trinucleate stage”
Def: a arrested which inheres_in ( microspore development which during trinucleate stage )
Results of plant_trait analysis
• 252/784 terms provided with genus-differentia definitions so far
• Helped find inconsistencies and problems in the ontology
• New term suggestions for PATO– proportionality
• Approach should work for animal phenotype ontologies
Bacterial phenotypes
• Performed similar analysis on bacterial phenotype terms– Provided by Garrity & Hozzein
• Results (morphological only):– 26 new terms added to PATO– Rugose, rhizoidal, lobate, filamentous, …– Todo: chemical utilization phenotypes
• Required:– Ontologies for aggregates of organisms– Assay ontology
Measurements
• Ontologies provide qualitative partitions on the kinds of entities we find in nature
• We may also want to record quantitative information– Comes from measurements of qualities– The measurement is not the phenotype
• Phenotypes exist independently of our measurements of them
Measurement schema
• A measurement record consists of– The quality being measured
• E.g. the length of a particular mouse tail
– The unit type• From PATO UO
– A magnitude• Floating point number• Error measure [optional]
Sample of PATO UO
• Unit– Base unit
• Length unit– Angstrom– meter
• Mass unit– Dalton– Gram
• Substance unit
– Derived unit• Concentration unit
– pH
• Quality– Morphology
• Sizelength
– Physical quality• Mass
Phenotype exchange formats
• Genotypes and phenotypes:– Pheno-syntax– Pheno-XML
• General purpose– OWL (using canonical EQ encoding)
• Also has Obo equivalent
• GO annotation files– Works with pre-coordinated terms only
OBD-Phenotype
• A database for phenotype associations• Built on OBD framework
– Tuned for inference and reasoning– Graph traversal built in from the start
• Results– Annotations on data from OMIM, ZFIN and
FlyBase– Currently too small a dataset to do
analysis
Next steps
• Get PATO & Phenote used across multiple organisms and projects– MODs, BIRN, OMIM,
• Collect annotation data from multiple sources in one repository (OBD)– Both pre + post composed– Demonstrated improved analysis of
annotation data using PATO
• filamentous - having thin filamentous extensions at its edge• pleomorphic - a quality inhering in a cell by virtue of it ability to take on two or
more different shapes during its life cycle• pulvinate - shaped like a cushion or has a marked convex cushion-like form• umbonate - having a knob or knoblike protuberance • rugose - having many wrinkles or creases on the surface• glistening - emitting or reflecting lots of light• dull - emitting or reflecting little or no light• viscid - covered with a sticky or clammy coating• mucoid - consistency of mucus• spiral - plane curve traced by a point circling about the center but at
increasing distances from the center• rhizoidal - having root like extensions radiating from its center• spiny - having spines, thorns or similar stiff projections on its surface• warty - having a hard rough surface; not smooth • curled - having parallel chains in undulate fashion on the border• fragile - easily damaged or disrupted; brittle• butyraceous - resembling butter in appearance and consistency• undulate - having a wavy, shallow edge• punctiform - small and resembling a point• lobate - a morphological quality in which the bearer has deeply
undulated edges forming lobes• erose - having an irregularly toothed edge• raised - is a thick colony that appear above the medium surface with
terraced edges• convex - a shape that obtains by virtue of having inward facing edges; having a
surface or boundary that curves or bulges outward, as the exterior of a sphere
Proportions
• “amylose to amylopectin ratio”TO:0000372
Def: a compositionality which is directed towards amylose relative_to amylopectin[ inheres_in organism ]
Top Related