Franz. 2014. Explaining taxonomy's legacy to computers – how and why?
-
Upload
taxonbytes -
Category
Science
-
view
1.314 -
download
1
description
Transcript of Franz. 2014. Explaining taxonomy's legacy to computers – how and why?
Explaining taxonomy's legacy
to computers – how and why?
Nico M. Franz 1,2
Arizona State University
http://taxonbytes.org/
1 Concepts and tools developed jointly with members of the Ludäscher Lab (UC Davis & UIUC):
Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers & Bertram Ludäscher
2 The Meaning of Names: Naming Diversity in the 21st Century
September 30, 2014; Museum of Natural History, University of Colorado
On-line @ http://www.slideshare.net/taxonbytes/franz-2014-explaining-taxonomys-legacy-to-computers-how-and-why
Alternative title:
Concept taxonomy –
now with logic reasoning.
Taxonomic concept: 1
The circumscription of a perceived
(or, more accurately, hypothesized)
taxonomic group, as advocated by
a particular author and source.
Definitional preliminaries, 1
1 Not the same as species concepts, which are theories about what species are, and/or how they are recognized.
Provenance: 1
Information describing the origin, derivation,
history, custody, or context of an entity (etc.).
Provenance establishes the authenticity, integrity
and trustworthiness of information about entities.
Definitional preliminaries, 2
1 See, e.g.: http://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance
• An emerging solution to the challenge of tracking stability
and change across multiple taxonomic name usages.
Concept taxonomy in three introductory phrases
• An emerging solution to the challenge of tracking stability
and change across multiple taxonomic name usages.
• Fully compatible with Linnaean nomenclature (Codes).
Concept taxonomy in three introductory phrases
• An emerging solution to the challenge of tracking stability
and change across multiple taxonomic name usages.
• Fully compatible with Linnaean nomenclature (Codes).
• The focus is on building sound provenance chains amenable
to computational representation and reasoning; irrespective
of whether the nomenclatural/taxonomic history of a perceived
lineage of organisms was perfectly stable since the times of
Linnaeus, or continues to undergo major alterations.
Concept taxonomy in three introductory phrases
Overview of today's presentation
• The challenge (1.0): Limitations of the name taxon reference model.
• The challenge (2.0): How to track taxonomic concept provenance?
• Introducing Euler/X – overview of workflow and user/reasoner interaction.~ 8 mins.
Overview of today's presentation
• The challenge (1.0): Limitations of the name taxon reference model.
• The challenge (2.0): How to track taxonomic concept provenance?
• Introducing Euler/X – overview of workflow and user/reasoner interaction.
• How does it work?
• Use case 1: Dwarf lemur classifications sec. 1993 & 2005.
• From simple to complex merge taxonomies.
• How can we represent taxonomic concept overlap?
• Scalability & information gain: How many articulations?
• Why? Insights into the performance of names as concept identifiers.
• Use case 2: Andropogon glomeratus sec. auctorum.
• In conclusion – feasibility, accessibility, and what it means.
~ 8 mins.
~ 15 mins.
The challenge (1.0):
Often, we make statements like this:
"Andropogon glomeratus
is a species of grass (Poaceae)
that occurs in the Southern U.S." Photo by Max Licher (ASU Herbarium); Cottonwood, Arizona.http://swbiodiversity.org/seinet/imagelib/imgdetails.php?imgid=431755
Thereby we stipulate a direct
name taxon reference relationship.
"Andropogon glomeratus
is a species of grass (Poaceae)
that occurs in the Southern U.S."
Taxonomic name
Taxon (species)
Biological data
Reference relation:name refers to entity
Proposition 1: names refer (directly) to taxa
"Andropogon glomeratus
is a species of grass (Poaceae)
that occurs in the Southern U.S."
Taxonomic name
Taxon (species)
Biological data
Reference relation:name refers to entity
Data transmission:facilitated by name
Proposition 1: names refer (directly) to taxa
Yet, the legacy of taxonomy is more complicated:
the name taxon relationship can change.1
This poses some representation challenges…
1 See Franz et al. 2008. On the use of taxonomic concepts in support of biodiversity research and taxonomy; pp. 63–86. In: The New Taxonomy, Systematics Association Special Volume 74. Taylor & Francis, Boca Raton.
"Andropogon glomeratus
is a species of grass (Poaceae)
that occurs in the Southern U.S."
Taxonomic nameIdentity of the name/reference relation is regulated by Codes
(e.g., Typification)
Challenge 1: by necessity, a name refers only to a type (specimen)
"Andropogon glomeratus
is a species of grass (Poaceae)
that occurs in the Southern U.S."
Taxonomic name
Taxon (species)
Identity of the name/reference relation is regulated by Codes
(e.g., Typification)
Challenge 2: the discovery of 'true' taxon boundaries is contingent
The boundaries of taxon identity have the property of contingent, scientific hypotheses = concepts
"Andropogon glomeratus
is a species of grass (Poaceae)
that occurs in the Southern U.S."
Taxonomic name
Taxon (species)
Identity of the name/reference relation is regulated by Codes
(e.g., Typification)
Precise,reliablemapping?
Challenge 3: name/taxon (concept) changes are semi-independent
The boundaries of taxon identity have the property of contingent, scientific hypotheses = concepts
"Andropogon glomeratus
is a species of grass (Poaceae)
that occurs in the Southern U.S."
Taxonomic name
Taxon (species)
Biological data
Identity of the name/reference relation is regulated by Codes
(e.g., Typification)
Name-based data transmission:reliability is also contingent
Reference limitations!
Consequence: the name taxon reference model is often too simple
Precise,reliablemapping?
The boundaries of taxon identity have the property of contingent, scientific hypotheses = concepts
If we accept a contingent, changing
name concept taxon reference model,
then perhaps we should always say this:
"Andropogon glomeratus
is a species of grass (Poaceae)
that occurs in the Southern U.S."
..is the (Latin) name (string), nomenclaturally anchored with a type specimen, that can participate in the (more precisely in-dividuated) concept label "Andropogon glomeratus sec. Barkworth et al. 2014" (reference: Manual of Grasses for North America), which in turn refers to..
Proposition 2: concept labels refer (directly) to taxonomic concepts
"Andropogon glomeratus
is a species of grass (Poaceae)
that occurs in the Southern U.S."
..is the (Latin) name (string), nomenclaturally anchored with a type specimen, that can participate in the (more precisely in-dividuated) concept label "Andropogon glomeratus sec. Barkworth et al. 2014" (reference: Manual of Grasses for North America), which in turn refers to..
..a feature-based circumscription ("Plants cespitose, upper portion dense, …
Pedicellate spikelets vestigial or absent, sterile. 2n = 20.") – the taxonomic concept as advocated by this reference – which may or may not align accurately with a (presumably existing and) relatively stable evolutionary lineage of organisms in nature for which..
Proposition 2: concept labels refer (directly) to taxonomic concepts
"Andropogon glomeratus
is a species of grass (Poaceae)
that occurs in the Southern U.S."
..is the (Latin) name (string), nomenclaturally anchored with a type specimen, that can participate in the (more precisely in-dividuated) concept label "Andropogon glomeratus sec. Barkworth et al. 2014" (reference: Manual of Grasses for North America), which in turn refers to..
..a feature-based circumscription ("Plants cespitose, upper portion dense, …
Pedicellate spikelets vestigial or absent, sterile. 2n = 20.") – the taxonomic concept as advocated by this reference – which may or may not align accurately with a (presumably existing and) relatively stable evolutionary lineage of organisms in nature for which..
..biological occurrence data are on hand.
Proposition 2: concept labels refer (directly) to taxonomic concepts
Hence:
The challenge (2.0):
If we individuate taxonomic concepts
and their labels consistently, ..
1889
1933
1948
1950
1968
1979
1983
2006
2014
Chain of A. glomeratus concepts, 1889-2014.
..then how can we track concept provenance?
1889
1933
1948
1950
1968
1979
1983
2006
2014
?
Provenance representation challenge:How is each concept articulated to another?
Proposed solution:
We articulate them with (RCC-5)
concept-to-concept relationships..
1889
1933
1948
1950
1968
1979
1983
2006
2014
Congruence [==]
Congruence [==]
Proper inclusion [>]
Inverse proper inclusion [<]
Overlap [><]
Congruence [==]
Exclusion [|]
Future Floras: Congruence? [==]
RCC-5 = Region Connection Calculus with five basic relations.
…and utilize logic reasoning to
infer consistent merge taxonomies.
Merge – A. glomeratus sec. Blomquist (1948) / sec. Campbell (1983)
Proper inc
lusion
[>]
Congruence [==]
Merge View Legend
We now have a tool for this: Euler/X
https://bitbucket.org/eulerx
Euler/X toolkit in a single screenshot (desktop version, IX-2014)
Euler/X applies logic reasoning
to support the following workflow:
T1 = Taxonomy 1T2 = Taxonomy 2A = Input articulations [==, >, <, ><, |]C = Taxonomic constraints
User/reasoner interaction: achieving well-specified alignments
T1 = Taxonomy 1T2 = Taxonomy 2A = Input articulations [==, >, <, ><, |]C = Taxonomic constraints
User/reasoner interaction: achieving well-specified alignments
Articulations are asserted by taxonomic experts.
Data format for an Euler/X alignment input file
T2 Year Author
T2 Year Author
Parentconcept Child
concepts
Data format for an Euler/X alignment input file
T2 Year Author
Parentconcept Child
concepts
T1
Data format for an Euler/X alignment input file
T2 Year Author
Parentconcept Child
concepts
T1
T2 to T1
Articulations
(as provided
by the user)
Data format for an Euler/X alignment input file
User/reasoner interaction: achieving well-specified alignments
Input visualization of the 2005/1993 concept trees & articulations
Input articulations2005 concepts
1993 concepts
User/reasoner interaction: achieving well-specified alignments
No!
No Possible World merge [empty canvas, nothing to report]
User/reasoner interaction: achieving well-specified alignments
No!
User/reasoner interaction: achieving well-specified alignments
No!
Yes
Nine Possible World merges for an under-specified use case input
User/reasoner interaction: achieving well-specified alignments
No!
Yes
User/reasoner interaction: achieving well-specified alignments
Yes
Yes
User/reasoner interaction: achieving well-specified alignments
MIR =Maximally Informative Relations
[==, >, <, ><, |]for each concept pair
Yes
Yes
Use case 1: dwarf lemurs sec. 1993 & 2005 1
Chirogaleus furcifer sec. Mühel (1890) – Brehms Tierleben.Public Domain: http://books.google.com/books?id=sDgQAQAAMAAJ
1 Franz et al. 2014. Two influential primate classifications logically aligned. (unpublished)
The 2nd & 3rd Editions of the Mammal Species of the World
Primates sec. Groves (1993)
317 taxonomic concepts, 233 at the species level.
Primates sec. Groves (2005)
483 taxonomic concepts, 376 at the species level.
1993 2005
Δ = 143
species-
level
concepts
Primate 1993/2005 concept alignments:
From simple to complex merge taxonomies.
1. Input concepts & articulations
Microcebus rufus sec. 2005 – same name, congruent concepts [==]
Merge View Legend
1. Input concepts & articulations
2. Merge visualization
Grey rectangle, round corners Taxonomic congruence
Microcebus rufus sec. 2005 – same name, congruent concepts [==]
Merge View Legend
Mirza coquereli sec. 2005 – name change, congruent concepts [==]
1. Input concepts & articulations
2. Merge visualization
Merge View Legend
Microcebus murinus (et al.) sec. 2005 – "lumping / splitting" [> , <]
1. Input concepts & articulations
Merge View Legend
1. Input concepts & articulations
2. Merge visualization
Yellow octagon Unique to T1 (1993)
Green rectangle Unique to T2 (2005)
Microcebus murinus (et al.) sec. 2005 – "lumping / splitting" [> , <]
Merge View Legend
Microcebus (part) & Mirza sec. 2005 – monotypic parent concepts
1. Input concepts & articulations
Mirza & M. coquereli sec. Groves (2005)are two co-extensional concepts in T2
Microcebus (part) & Mirza sec. 2005 – monotypic parent concepts
1. Input concepts & articulations
2. Merge visualization
Mirza & M. coquereli sec. Groves (2005)are two co-extensional concepts in T2
Three conceptsare congruent!
How can we represent concept overlap?
Microcebus (all) & Mirza sec. 2005 – concept overlap [><]
Merge visualization: containment, with overlap [-e mnpw --rcgo]
Dashed blue line Overlap [><]
Microcebus (all) & Mirza sec. 2005 – concept overlap [><]
Merge visualization: containment, with overlap [-e mnpw --rcgo]
Unique to 1993.Microcebus(2005 Mirza/coquereli)
Dashed blue line Overlap [><]
Microcebus (all) & Mirza sec. 2005 – concept overlap [><]
Merge visualization: containment, with overlap [-e mnpw --rcgo]
Unique to 1993.Microcebus(2005 Mirza/coquereli)
Unique to 2005.Microcebus(1993 undescribed)
Dashed blue line Overlap [><]
Microcebus (all) & Mirza sec. 2005 – concept overlap [><]
Merge visualization: containment, with overlap [-e mnpw --rcgo]
Unique to 1993.Microcebus(2005 Mirza/coquereli)
Unique to 2005.Microcebus(1993 undescribed)
Dashed blue line Overlap [><]
Shared, congruent
child concepts
We can resolve the merge overlap products.
Microcebus (all) & Mirza sec. 2005 – concept overlap [><]
Merge visualization: "merge concept" representation [-e mncb]
Red lines Newly inferred articulations (to and from merge concepts)
Microcebus (all) & Mirza sec. 2005 – concept overlap [><]
Merge visualization: "merge concept" representation [-e mncb]
Red lines Newly inferred articulations (to and from merge concepts)
2005.Microcebus*1993.Microcebus Shared merge concept
Microcebus (all) & Mirza sec. 2005 – concept overlap [><]
Merge visualization: "merge concept" representation [-e mncb]
2005.Microcebus\1993.Microcebus Merge concept unique to 2005
2005.Microcebus*1993.Microcebus Shared merge concept
Red lines Newly inferred articulations (to and from merge concepts)
1993.Microcebus\2005.Microcebus Merge concept unique to 1993
Scalability & information gain:
How many input articulations are sufficient?
Cheirogaleoidae sec. 2005 – how many articulations are sufficient?
T2: 27 concepts; T1: 14 concepts; 22 input articulations
Cheirogaleoidae sec. 2005 – how many articulations are sufficient?
T2: 27 concepts; T1: 14 concepts; 22 input articulations
17 'non-new' 2005 species-level concepts Articulated to 1993 species-level concepts
Cheirogaleoidae sec. 2005 – how many articulations are sufficient?
T2: 27 concepts; T1: 14 concepts; 22 input articulations
4 'new' 2005 species-level concepts Exclusion (|) from 1993 family-level concept
Cheirogaleoidae sec. 2005 – how many articulations are sufficient?
T2: 27 concepts; T1: 14 concepts; 22 input articulations
1 additional highest-level articulation 2005.Cheirogaleoidae > 1993.Cheirogaleidae Eliminates 15 additional Possible Worlds
Cheirogaleoidae sec. 2005 – how many articulations are sufficient?
T2: 27 concepts; T1: 14 concepts; 22 input articulations
No genus-/subfamily level articulations are needed
Cheirogaleoidae sec. 2005 – how many articulations are sufficient?
Well-specified merge: 378 Maximally Informative Relations
~ 17x information gain through reasoning
Cheirogaleoidae sec. 2005 – how many articulations are sufficient?
Well-specified merge: 378 Maximally Informative Relations
~ 17x information gain through reasoning
Primates: 483x317 = 800 concepts 402 articulations 153,111 MIR
~ 380x information gain!
Why?
Performance of names as concept identifiers.
MSW 2nd/3rd Edition name/concept identity relations
56.4% of the paired name lineages are taxonomically reliable.
Þ Computers need concept resolution to track taxonomic provenance.
And Andropogon glomeratus sec. auctorum? 1
Use case 2
"Andropogon glomeratus
is a species of grass (Poaceae)
that occurs in the Southern U.S."
Photo by Max Licher (ASU Herbarium); Cottonwood, Arizona.http://swbiodiversity.org/seinet/imagelib/imgdetails.php?imgid=431755
1 See Franz et al. 2014. Names are not good enough: reasoning over taxonomic change in the Andropogon complex. Semantic Web – Interoperability, Usability, Applicability – Special Issue on Semantics for Biodiversity. (in press)
In brief: Things are very messy.
Question 1: Which concept labels have included the name string "Andropogon glomeratus" in past eight classifications?
Tabular alignment of eight Andropogon classifications: 1889 to 2006
Þ 6 / 8 classifications are taxonomically unique for the concept of A. glomeratus sec. auctorum.
Þ No two concepts including the "A. glomeratus" name string are taxonomically congruent.
Question 2: Which previously named concepts are congruent with Andropogon glomeratus sec. Weakley (2006)?
Tabular alignment of eight Andropogon classifications: 1889 to 2006
Þ What Weakley (2006) refers to as "A. glomeratus" was previously referred to as:
1889: A. macrourus var. hirsutior + A. macrourus var. abbreviatus 1933: A. glomeratus (in part, I) 1948: A. glomeratus (?) 1950: A. virginicus var. hisutior + A. glomeratus (in part, II) 1968: A. virginicus (in part) 1979: A. virginicus var. abbreviatus (in part) 1983: A. glomeratus (in part, I)
Logic representation: Easy!
Case 1: 1948.Blomquist vs. 1950.Hitchcock & Chase (Δ = 2 years)
T2: 7 concepts (1950); T1: 7 concepts (1948) – containment view
Merge: 3 congruent regions, 3 with same name 6 unique regions, 4 with non-unique name
T2: 7 concepts (1950); T1: 7 concepts (1948) – containment view
Merge: 3 congruent regions, 3 with same name 6 unique regions, 4 with non-unique name
A. glomeratus sec. 1950 and A. glomeratus sec. 1948 are overlapping, as each concept includes a non-congruent variety-level concept.
Interestingly, the shared concept region has no unique name in either taxonomy. It is 'un-named', at least within the context of the 1950/1948 classifications.
Case 1: 1948.Blomquist vs. 1950.Hitchcock & Chase (Δ = 2 years)
T2: 7 concepts (1950); T1: 7 concepts (1948) – merge concept view
Merge: 3 congruent regions, 3 with same name 6 unique regions, 4 with non-unique name
The shared, overlapping region is more informatively resolved and labeled in the merge
concept visualization; the region 1950.A._glomeratus * 1948.A_glomeratus contains no
subelements that carry the name "A. virginicus" in either classification.
Case 1: 1948.Blomquist vs. 1950.Hitchcock & Chase (Δ = 2 years)
Case 2: 1889.Hackel vs. 2006.Weakley (Δ = 117 years)
T2: 12 concepts (2006); T1: 12 concepts (1889)
Merge: 8 congruent regions, 0 with same name (!) 5 unique regions, 1 with non-unique name
Case 2: 1889.Hackel vs. 2006.Weakley (Δ = 117 years)
T2: 12 concepts (2006); T1: 12 concepts (1889)
Merge: 8 congruent regions, 0 with same name (!) 5 unique regions, 1 with non-unique name
Þ Hackel & Weakley agree very substantively on what entities are
'out there in nature'; however, more than a century of Code-
compliant name changes has obscured their agreements.
Case 3: 1983.Campbell vs. 2006.Weakley (Δ = 23 years)
T2: 12 concepts (2006); T1: 14 concepts (1983) – containment view
Merge: 9 congruent regions, 5 with same name 6 unique regions, 4 with non-unique name
Case 3: 1983.Campbell vs. 2006.Weakley (Δ = 23 years)
T2: 12 concepts (2006); T1: 14 concepts (1983) – containment view
Merge: 9 congruent regions, 5 with same name 6 unique regions, 4 with non-unique name
One of the simpler merge taxonomies in this use case, although
8 / 15 merge regions have taxonomically misleading names (i.e.,
congruence/different names; non-congruence/same names).
This ratio is near-average through nine pairwise alignments.
In conclusion:
In conclusion – feasibility, accessibility, and what it means.
• Feasibility of tracking taxonomic concept provenance in computational logic:
• We are making leaps and bounds in feasibility (and in scalability) right now.
• However, many interesting challenges remain (e.g., user/reasoner interaction).
• Feasibility of tracking taxonomic concept provenance in computational logic:
• We are making leaps and bounds in feasibility (and in scalability) right now.
• However, many interesting challenges remain (e.g., user/reasoner interaction).
• Accessibility and acceptance of the RCC-5/reasoning approach:
• We need more use cases, and users – the Euler/X approach works!
• It can be applied to any new or legacy systematic publication, biodiversity
database, checklist, classification, phylogeny, or other kinds of taxonomic
syntheses (print or virtual) and versions thereof; complementing the Linnaean
system while providing superior individuation of taxonomic content.
• Having a sound web service is the next critical step in advancing the approach.
In conclusion – feasibility, accessibility, and what it means.
• Feasibility of tracking taxonomic concept provenance in computational logic:
• We are making leaps and bounds in feasibility (and in scalability) right now.
• However, many interesting challenges remain (e.g., user/reasoner interaction).
• Accessibility and acceptance of the RCC-5/reasoning approach:
• We need more use cases, and users – the Euler/X approach works!
• It can be applied to any new or legacy systematic publication, biodiversity
database, checklist, classification, phylogeny, or other kinds of taxonomic
syntheses (print or virtual) and versions thereof; complementing the Linnaean
system while providing superior individuation of taxonomic content.
• Having a sound web service is the next critical step in advancing the approach.
• What does it all mean?
• The legacy of taxonomic name and concept authoring is amenable to
computational logic and provenance tracking. We can likely derive much data
integration power from further developments in this direction.
In conclusion – feasibility, accessibility, and what it means.
Acknowledgments
• Robert Guralnick, Susanna Drogsvold & all CU Museum of Natural History "The Meaning of Names" conference organizers!
• Euler/X team: Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers
& Bertram Ludäscher
• Juliana Cardona-Duque (weevils), Naomi Pier (primates) & Alan Weakley (grasses)
• taxonbytes lab members: Andrew Johnston & Guanyang Zhang
• NSF DEB–1155984 & DBI–1342595 (PI Franz); IIS–118088 & DBI–1147273
(PI Ludäscher)
https://sols.asu.edu/ Franz Lab: http://taxonbytes.org/
Select references on concept taxonomy and the Euler/X toolkit
• Franz & Peet. 2009. Towards a language for mapping relationships among taxonomic concepts. Systematics and Biodiversity 7: 5–20. Link
• Chen et al. 2014. Euler/X: a toolkit for logic-based taxonomy integration. WFLP 2013 – 22nd International Workshop on Functional and (Constraint) Logic Programming. Link
• Chen et al. 2014. A hybrid diagnosis approach combining Black-Box and White-Box reasoning. Lecture Notes in Computer Science 8620: 127–141. Link
• Franz et al. 2014. Names are not good enough: reasoning over taxonomic change in the Andropogon complex. Semantic Web – Interoperability, Usability, Applicability – Special Issue on Semantics for Biodiversity. (in press) Link
• Franz et al. 2014. Reasoning over taxonomic change: exploring alignments for the Perelleschus use case. PLoS ONE. (in review)
• Euler/X toolkit: https://bitbucket.org/eulerx/euler-project
• Euler web service (in progress): http://euler.asu.edu/
• Concept taxonomy @ taxonbytes: http://taxonbytes.org/tag/concept-taxonomy/
Miscellaneous appended slides
The good: names refer to type specimens necessarily
Source: Witteveen. 2014. Biology & Philosophy. (in press)
The challenge: names refer to non-type specimens contingently
Source: Dubois. 2005. Zoosystema 27: 365-426.
Names
Non-types
We may categorize kinds of nomenclatural
and taxonomic change, and opportunities,
to track each, as follows:
Nomenclatural/taxonomic change & provenance tracking square
E.g.: - A binomial name is formed incorrectly. - A homonym is discovered, requiring name change.
Nomenclatural/taxonomic change & provenance tracking square
E.g.: - A type specimen is lost, a neotype must be designated. - "One fungus (a-/sexual), one name" – Melbourne Code.
Nomenclatural/taxonomic change & provenance tracking square
E.g.: - A heterotypic synonymy is established (inferred). - a Priority-carrying name is newly 'transferred'.
Nomenclatural/taxonomic change & provenance tracking square
E.g.: - A junior genus-level name is transferred among tribes. - An informal clade name is redefined across treatments.
Nomenclatural/taxonomic change & provenance tracking square
Question: Which changes are most common in a particular group?Answer: Concept-level resolution is needed to assess this.
Manychanges
Somechanges
Manychanges
MOSTCHANGES
???
Question: What is the proper scope of reference for representing our
progress in inferring the tree of life?
Suggested answer: Even though the name taxon mapping is the
ultimate aim..
..in effect we only need to represent the name concept mapping.
Congruence over time will suggest that we are 'getting taxa right'.
R32 lattice of RCC-5 articulations (lighter color = less certainty)
Higher-level primate classifications
– 1993 versus 2005:
Many recurrent names,
little taxonomic congruence.
Primates sec. 1993 & 2005Order to Subfamily-level
Not much is grey.
Strepsirrhinisec. 2005
Haplorrhinisec. 2005
Catarrhinisec. 2005
Use case 2: Perelleschus sec. 2001 & 2006 1
Perelleschus salpinflexus sec. Franz & Cardona-Duque (2013)DOI:10.1080/14772000.2013.806371
1 Input articulations: Franz & Cardona-Duque. 2013. Description of two new species and phylogenetic reassessment of Perelleschus Wibmer & O'Brien, 1986 (Coleoptera: Curculionidae), with a complete taxonomic concept history of Perelleschus sec. Franz & Cardona-Duque, 2013. 2013. Systematics and Biodiversity 11: 209–236. Merge analyses: Franz et al. 2014. Reasoning over taxonomic change: exploring alignments for the Perelleschus use case. PLoS ONE. (in press)
Goal: align two phylogenies with differential taxon sampling
T1: Perelleschus sec. 2001• Phylogenetic revision• 8 ingroup species concepts• 2 outgroup concepts• 18 concepts total
Goal: align two phylogenies with differential taxon sampling
T1: Perelleschus sec. 2001• Phylogenetic revision• 8 ingroup species concepts• 2 outgroup concepts• 18 concepts total
T2: Perelleschus sec. 2006• Exemplar analysis• 2 ingroup species concepts• 1 outgroup concept• 7 concepts total
Logic representation challenge:
Perelleschus sec. 2001 & 2006 concepts
have incongruent sets of subordinate members,
yet each concept has congruent synapomorphies.
Ostensive alignment: the congruence among higher-level concepts is assessed in relation to their entailed members.
Ostension: giving meaning through an act of pointing out.
Definitional preliminaries 1
1 See Bird & Tobin. 2012. Natural Kinds. URL: http://plato.stanford.edu/archives/win2012/entries/natural-kinds/
Ostensive alignment: the congruence among higher-level concepts is assessed in relation to their entailed members.
Ostension: giving meaning through an act of pointing out.
Intensional alignment: the congruence among higher-level concepts is assessed in relation to their properties.
Intension: giving meaning through the specification of properties.
Definitional preliminaries 1
1 See Bird & Tobin. 2012. Natural Kinds. URL: http://plato.stanford.edu/archives/win2012/entries/natural-kinds/
Ostensive alignment – members are all that counts
Challenge 1: Differential outgroup sampling (2 / 1 concepts)
T2: 2006.PHY & 2006.PHYsubcin T1: 2006.PHY only
Input constraints
Ostensive alignment2001 & 2006
Ostensive alignment – members are all that counts
Challenge 1: Differential outgroup sampling (2 / 1 concepts)
T2: 2006.PHY & 2006.PHYsubcin T1: 2006.PHY only
Solution: Locally relax coverage with "nc" = "no coverage"
Input constraints
Ostensive alignment2001 & 2006
Ostensive alignment – members are all that counts
Challenge 1: Differential outgroup sampling (2 / 1 concepts)
T2: 2006.PHY & 2006.PHYsubcin T1: 2006.PHY only
Solution: Locally relax coverage with "nc" = "no coverage"
Result: 2006.PHY == 2001.PHY
Outgroups are held congruent.
Input constraints
Ostensive alignment2001 & 2006
Ostensive alignment – members are all that counts
Challenge 2: Ostensive alignmentInput constraints
Ostensive alignment2001 & 2006
Ostensive alignment – members are all that counts
Challenge 2: Ostensive alignment
Solution: 11 ingroup concept articulations are coded ostensively – either as <, ><, or | – to represent non- congruence in the representation of child concepts
Input constraints
Ostensive alignment2001 & 2006
Ostensive alignment – members are all that counts
Challenge 2: Ostensive alignment
Solution: 11 ingroup concept articulations are coded ostensively – either as <, ><, or | – to represent non- congruence in the representation of child concepts
Result: 2006.PER < 2001.PER 2006.PER | 2001.[5 species concepts] etc.
Input constraints
Ostensive alignment2001 & 2006
<
5 x |2 x ><
Intensional alignment – representation of congruent synapomorphies
Input constraints
Intensional alignment2001 & 2006
Challenge 3: Intensional alignment
Intensional alignment – representation of congruent synapomorphies
Input constraints
Intensional alignment2001 & 2006
Challenge 3: Intensional alignment
Solution: An Implied Child (_IC) concept is added to the undersampled (2006) clade concept; and the (5) "missing" species-level concepts are included within this Implied Child
Intensional alignment – representation of congruent synapomorphies
Input constraints
Intensional alignment2001 & 2006
Challenge 3: Intensional alignment
Solution: An Implied Child (_IC) concept is added to the undersampled (2006) clade concept; and the (5) "missing" species-level concepts are included within this Implied Child
11 ingroup concept articulations are coded intensionally – as == or > – to reflect congruent synapomorphies of 2001 & 2006
Intensional alignment – representation of congruent synapomorphies
Input constraints
Intensional alignment2001 & 2006
Challenge 3: Intensional alignment
Result: The genus- and ingroup clade-level concepts are inferred as congruent:
2006. PER == 2001.PER 2006.PcarPeve == 2001.PcarPsul etc.
==
Review – representing ostensive versus intensional alignments
Ostensive alignment2001.PER includes morespecies-level conceptsthan 2006.PER [>].
Review – representing ostensive versus intensional alignments
Ostensive alignment2001.PER includes morespecies-level conceptsthan 2006.PER [>].
Intensional alignment2006.PER reconfirms the synapomorphies inferred in 2001.PER [==].
The other piece in the puzzle: Concept-to-voucher identifications
Source: Baskauf & Webb. 214. Darwin-SW. URL: http://www.semantic-web-journal.net/system/files/swj635.pdf