Post on 08-Jan-2018
description
Species and Classification in Biology
Barry Smithhttp://ifomis.org
http:// ifomis.org 2
http:// ifomis.org 3
DNA
10-9 m
http:// ifomis.org 4
DNA
Protein
Organelle
Cell
Tissue
Organ
Organism
10-5 m
10-1 m
10-9 m
http:// ifomis.org 5
New golden age of classification*
~ 30 million species30,000 genes in human200,000 proteins100s of cell types100,000s of disease types 1,000,000s of biochemical pathways
(including disease pathways)
*… legacy of Human Genome Project
http:// ifomis.org 6
DNA
Protein
Organelle
Cell
Tissue
Organ
Organism
10-5 m
10-1 m
10-9 m
http:// ifomis.org 7
FUNCTIONAL GENOMICS
proteomics, reactomics, metabonomics,
phenomics, behaviouromics,
toxicopharmacogenomics…
http:// ifomis.org 8
The incompatibilities between different scientific cultures and terminologies
immunologygenetics
cell biology
http:// ifomis.org 9
have resurrected the problem of the unity of science in a new guise:
The logical positivist solution to this problem addressed a world in which sciences are associated with printed texts.What happens when sciences are associated with databases?
http:// ifomis.org 10
… when each (chemical, pathological, immunological, toxicological) information system uses its own classifications
how can we overcome the incompatibilities which become apparent when data from distinct sources are combined?
http:// ifomis.org 11
Answer:
“Ontology”
http:// ifomis.org 12
= building software artefactsstandardized classification systems/
controlled vocabularies
so that data from one source should be expressed in a language which
makes it compatible with data from every other source
http:// ifomis.org 13
Google hits (in millions) 25.4.06
ontology 52.4ontology + philosophy 2.7ontology + information science 6.0
ontology + database 7.8
http:// ifomis.org 14
A Linnaean Species Hierarchy
http:// ifomis.org 15
(Small) Disease Hierarchy
http:// ifomis.org 16
Combining hierarchies
OrganismsDiseases
http:// ifomis.org 17
via Dependence Relations
Organisms Diseases
http:// ifomis.org 18
A Window on Reality
http:// ifomis.org 19
A Window on RealityOrganisms
Diseases
http:// ifomis.org 20
A Window on Reality
http:// ifomis.org 21
How to understand species (aka types, universals, kinds)
Species are something like invariants in reality which can be studied by science
Species have instances: this mouse, this cell, this cell membrane ...
http:// ifomis.org 22
Entity =def
anything which exists, including things and processes, functions and qualities, beliefs and actions, documents and software
http:// ifomis.org 23
Domain =def
a portion of reality that forms the subject-matter of a single science or technology or mode of study;
proteomicsradiologyviral infections in mouse
http:// ifomis.org 24
Representation =def
an image, idea, map, picture, name or description ... of some entity or entities.
http:// ifomis.org 25
Analogue representations
http:// ifomis.org 26
Representational units =def
terms, icons, photographs, identifiers ... which refer, or are intended to refer, to entities
http:// ifomis.org 27
Composite representation =def
representation (1) built out of representational units
which(2) form a structure that mirrors, or is intended to mirror, the entities in some domain
http:// ifomis.org 28
Periodic TableThe Periodic Table
http:// ifomis.org 29
Ontologies are here
http:// ifomis.org 30
Ontologies are representational artifacts
http:// ifomis.org 31
What do ontologies represent?
http:// ifomis.org 32
A 515287 DC3300 Dust Collector FanB 521683 Gilmer BeltC 521682 Motor Drive Belt
http:// ifomis.org 33
A 515287 DC3300 Dust Collector FanB 521683 Gilmer BeltC 521682 Motor Drive Belt
instances
types
http:// ifomis.org 34
Two kinds of composite representational artifacts
Databases, inventories: represent what is particular in reality = instances
Ontologies, terminologies, catalogs: represent what is general in reality = types
http:// ifomis.org 35
What do ontologies represent?
http:// ifomis.org 36
Ontologies do not represent concepts in people’s heads
http:// ifomis.org 37
Ontology is a tool of science
Scientists do not describe the concepts in scientists’ heads
They describe the types in reality, as a step towards finding ways to reason about (and treat) instances of these types
http:// ifomis.org 38
The biologist has a cognitive representation which involves theoretical knowledge
derived from textbooks
http:// ifomis.org 39
An ontology is like a scientific text; it is a representation of types in reality
http:// ifomis.org 40
Two kinds of composite representational artifacts
Databases represent instancesOntologies represent types
http:// ifomis.org 41
Instances stand in similarity relations
Frank and Bill are similar as humans, mammals, animals, etc.
Human, mammal and animal are types at different levels of granularity
http:// ifomis.org 42
siamese
mammal
cat
organism
substancetypes
animal
instances
frog
http:// ifomis.org 43
science needs to find uniform ways of representing types
ontology =def a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent1. types in reality2. those relations between these types which obtain universally (= for all instances)
lung is_a anatomical structurelobe of lung part_of lung
http:// ifomis.org 44
is_a
A is_a B =def
For all x, if x instance_of A then x instance_of B
cell division is_a biological process
http:// ifomis.org 45
Entities
http:// ifomis.org 46
Entities
universals (species, types, taxa, …)
particulars (individuals, tokens, instances)
http:// ifomis.org 47
Canonical instances within the realm of individuals
= those individuals which 1. instantiate universals (entering into biological laws)2. are prototypical
Canonical Anatomy: no Siamese twins, no six-fingered giants, no amputation stumps, …
http:// ifomis.org 48
Entities
universals
instancesjunkjunk
junk
example of junk particulars: desk-mountain
http:// ifomis.org 49
Entities
human
Jane
inst
http:// ifomis.org 50
Ontologies are More than Just Taxonomies
http:// ifomis.org 51
The Gene Ontology
7 million google hits
a cross-species controlled vocabulary for annotations of genes and gene products
deeper than Darwinianism
http:// ifomis.org 52
When a gene is identified
three important types of questions need to be addressed:
1. Where is it located in the cell? 2. What functions does it have on the
molecular level? 3. To what biological processes do these
functions contribute?
http:// ifomis.org 53
GO has three ontologies
molecular functions
cellular components
biological processes
http:// ifomis.org 54
GO astonishingly influential
used by all major species genome projectsused by all major pharmacological research
groupsused by all major bioinformatics research
groups
http:// ifomis.org 55
GO part of the Open Biological Ontologies consortium
Fungal OntologyPlant Ontology Yeast OntologyDisease Ontology
Mouse Anatomy OntologyCell OntologySequence OntologyRelations Ontology
http:// ifomis.org 56
Each of GO’s ontologies
is organized in a graph-theoretical structure involving two sorts of links or edges:
is-a (= is a subtype of )(copulation is-a biological process)
part-of (cell wall part-of cell)
http:// ifomis.org 57
http:// ifomis.org 58
The Gene Ontology
a ‘controlled vocabulary’designed to standardize annotation of genes and gene productsused by over 20 genome database and many other groups in academia and industryand methodology much imitated
http:// ifomis.org 59
The Methodology of AnnotationsScientific curators use experimental observations
reported in the biomedical literature to link gene products with GO terms in annotations.
The gene annotations taken together yield a slowly growing computer-interpretable map of biological reality,
The process of annotating literature also leads to improvements and extensions of the ontology, which institutes a virtuous cycle of improvement in the quality and reach of future annotations and of the ontology itself.
The Gene Ontology as Cartoon
http:// ifomis.org 60
cellular componentsmolecular functions biological processes
1372 component terms7271 function terms8069 process terms
http:// ifomis.org 61
The Cellular Component Ontology (counterpart of anatomy)
membranenucleus
http:// ifomis.org 62
The Molecular Function Ontology
protein stabilization
The Molecular Function ontology is (roughly) an ontology of actions on the molecular level of granularity
http:// ifomis.org 63
Biological Process Ontology
death
An ontology of occurrents on the level of granularity of cells, organs and whole organisms
http:// ifomis.org 64
GO here an example
a. of the sorts of problems confronting life science data integration
b. of the degree to which formal methods are relevant to the solution of these problems
http:// ifomis.org 65
Each of GO’s ontologies
is organized in a graph-theoretical data structure involving two sorts of links or edges:
is-a (= is a subtype of )(copulation is-a biological process)
part-of (cell wall part-of cell)
http:// ifomis.org 66
Linnaeus
http:// ifomis.org 67
http:// ifomis.org 68
Entities
http:// ifomis.org 69
Entities
universals (kinds, types, taxa, …)
particulars (individuals, tokens, instances …)
Axiom: Nothing is both a universal and a particular
http:// ifomis.org 70
Entities
universals*
*natural, biological, kinds
http:// ifomis.org 71
Entities
universals
instances
http:// ifomis.org 72
universals are natural kinds
Instances are natural exemplars of natural kinds(problem of non-standard instances) Not all individuals are instances of universals
http:// ifomis.org 73
Entities
universals
instancesinstances
penumbra of borderline cases
http:// ifomis.org 74
Entities
universals
instancesjunkjunk
junk
example of junk: beachball-desk
http:// ifomis.org 75
Primitive relations: inst and part
inst(Jane, human being)part(Jane’s heart, Jane’s body)
A universal is anything that is instantiatedAn instance as anything (any individual) that
instantiates some universal
http:// ifomis.org 76
Entities
human
Jane
inst
http:// ifomis.org 77
A is_a B genus(B)
species(A)
instances
http:// ifomis.org 78
is-a
D3* e is a f =def universal(e) universal(f) x (inst(x, e) inst(x, f)).
genus(A)=def universal(A) B (B is a A B A)
species(A)=def universal(A) B (A is a B B A)
http:// ifomis.org 79
solve problem of false positives
insist that
A is_a B
holds always as a matter of scientific law
http:// ifomis.org 80
nearest species
nearestspecies(A, B)=def A is_a B &
C ((A is_a C & C is_a B) (C = A or C = B)
B
A
http:// ifomis.org 81
Definitions
highest genus
lowest species
instances
http:// ifomis.org 82
Lowest Species and Highest Genus
lowestspecies(A)=def
species(A) & not-genus(A)highestgenus(A)=def
genus(A) & not-species(A)
Theorem:universal(A) (genus(A) or
lowestspecies(A))
http:// ifomis.org 83
Axioms
Every universal has at least one instance
Distinct lowest species never share instances
SINGLE INHERITANCE: Every species is the nearest species to
exactly one genus
http:// ifomis.org 84
Axioms governing instgenus(A) & inst(x, A)
B nearestspecies(B, A) & inst(x, B) EVERY GENUS HAS AN INSTANTIATED
SPECIES
nearestspecies(A, B) A’s instances are properly included in B’s instances
EACH SPECIES HAS A SMALLER CLASS OF INSTANCES THAN ITS GENUS
http:// ifomis.org 85
Axiomsnearestspecies(B, A)
C (nearestspecies(C, A) & B C)EVERY GENUS HAS AT LEAST TWO CHILDREN
nearestspecies(B, A) & nearestspecies(C, A) & B C) not-x (inst(x, B) & inst(x, C))SPECIES OF A COMMON GENUS NEVER SHARE INSTANCES
http:// ifomis.org 86
Theorems
(genus(A) & inst(x, A)) B (lowestspecies(B) & B is_a A & inst(x, B))EVERY INSTANCE IS ALSO AN INSTANCE OF SOME LOWEST SPECIES
(genus(A) & lowestspecies(B) & x(inst(x, A) & inst(x, B)) B is_a A)IF AN INSTANCE OF A LOWEST SPECIES IS AN INSTANCE OF A GENUS THEN THE LOWEST SPECIES IS A CHILD OF THE GENUS
http:// ifomis.org 87
Theorems
universal(A) & universal(B) (A = B or A is_a B or B is_a A or not-x(inst(x, A) & inst(x, B)))
DISTINCT UNIVERSALS EITHER STAND IN A PARENT-CHILD RELATIONSHIP OR THEY HAVE NO INSTANCES IN COMMON
http:// ifomis.org 88
Theorems
A is_a B & A is_a C (B = C or B is_a C or C is_a
B)
UNIVERSALS WHICH SHARE A CHILD IN COMMON ARE EITHER IDENTICAL OR ONE IS SUBORDINATED TO THE OTHER
http:// ifomis.org 89
Theorems
(genus(A) & genus(B) & x(inst(x, A) & inst(x, B))) C(C is_a A & C is_a B)
IF TWO GENERA HAVE A COMMON INSTANCE THEN THEY HAVE A COMMON CHILD
http:// ifomis.org 90
Expanding the theory
Sexually reproducing organismsOrganisms in general
To take account of development (child, adult; larva, butterfly)
Biological processesBiological functions
-- at different levels of granularity
http:// ifomis.org 91
How to understand species (aka types, universals, kinds)
Species are something like invariants in reality which can be studied by science
Species have instances: this mouse, this cell, this cell membrane ...
http:// ifomis.org 92
Universal, Classes, Sets
A class is the extension of universal
http:// ifomis.org 93
Class =def
a maximal collection of particulars determined by a general term (‘cell’, ‘mouse’, ‘Saarländer’)
the class A = the collection of all particulars x for which ‘x is A’ is true
http:// ifomis.org 94
Universals and Classes vs. SumsThe former are marked by granularity: they divide up the domain into whole units, whose interior parts are traced over. The universal human being is instantiated only by human beings as single, whole units.
A mereological sum is not granular in this sense (molecules are parts of the mereological sum of human beings)
http:// ifomis.org 95
A bad solutionIdentify both universals and classes with sets in
the mathematical sense
Problem of false positives
adult childlion in Leipzig lionanimal owned by the Emporer mammalmammal weighing less than 200 Kg animal
http:// ifomis.org 96
Sets in the mathematical sense are marked by granularity
Granularity = each class or set is laid across reality like a grid consisting (1) of a number of slots or pigeonholes each (2) occupied by some member.
Each set is (1) associated with a specific number of slots, each of which (2) must be occupied by some specific member.
A class survives the turnover in its instances: both (1) the number of slots and (2) the individuals occupying these slots may vary with time
http:// ifomis.org 97
But sets are timelessA set is an abstract structure, existing outside time and space. The set of human beings existing at t is (timelessly) a different entity from the set of human beings existing at t because of births and deaths. Biological classes exist in timeDarwin: because the universals of which they are extensions exist in time