Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Explaining taxonomy's legacy

to computers – how and why?

Nico M. Franz 1,2

Arizona State University

http://taxonbytes.org/

1 Concepts and tools developed jointly with members of the Ludäscher Lab (UC Davis & UIUC):

Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers & Bertram Ludäscher

2 The Meaning of Names: Naming Diversity in the 21st Century

September 30, 2014; Museum of Natural History, University of Colorado

On-line @ http://www.slideshare.net/taxonbytes/franz-2014-explaining-taxonomys-legacy-to-computers-how-and-why


http://www.slideshare.net/taxonbytes/franz-2014-explaining-taxonomys-legacy-to-computers-how-and-why



Alternative title:

Concept taxonomy –

now with logic reasoning.

Taxonomic concept: 1

The circumscription of a perceived

(or, more accurately, hypothesized)

taxonomic group, as advocated by

a particular author and source.

Definitional preliminaries, 1

1 Not the same as species concepts, which are theories about what species are, and/or how they are recognized.

Provenance: 1

Information describing the origin, derivation,

history, custody, or context of an entity (etc.).

Provenance establishes the authenticity, integrity

and trustworthiness of information about entities.

Definitional preliminaries, 2

1 See, e.g.: http://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance

http://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance



• An emerging solution to the challenge of tracking stability

and change across multiple taxonomic name usages.

Concept taxonomy in three introductory phrases



• Fully compatible with Linnaean nomenclature (Codes).




• Fully compatible with Linnaean nomenclature (Codes).

• The focus is on building sound provenance chains amenable

to computational representation and reasoning; irrespective

of whether the nomenclatural/taxonomic history of a perceived

lineage of organisms was perfectly stable since the times of

Linnaeus, or continues to undergo major alterations.


Overview of today's presentation

• The challenge (1.0): Limitations of the name taxon reference model.

• The challenge (2.0): How to track taxonomic concept provenance?

• Introducing Euler/X – overview of workflow and user/reasoner interaction.~ 8 mins.

Overview of today's presentation

• The challenge (1.0): Limitations of the name taxon reference model.

• The challenge (2.0): How to track taxonomic concept provenance?

• Introducing Euler/X – overview of workflow and user/reasoner interaction.

• How does it work?

• Use case 1: Dwarf lemur classifications sec. 1993 & 2005.

• From simple to complex merge taxonomies.

• How can we represent taxonomic concept overlap?

• Scalability & information gain: How many articulations?

• Why? Insights into the performance of names as concept identifiers.

• Use case 2: Andropogon glomeratus sec. auctorum.

• In conclusion – feasibility, accessibility, and what it means.

~ 8 mins.

~ 15 mins.

The challenge (1.0):

Often, we make statements like this:

"Andropogon glomeratus

is a species of grass (Poaceae)

that occurs in the Southern U.S." Photo by Max Licher (ASU Herbarium); Cottonwood, Arizona.http://swbiodiversity.org/seinet/imagelib/imgdetails.php?imgid=431755

http://swbiodiversity.org/seinet/imagelib/imgdetails.php?imgid=431755


Thereby we stipulate a direct

name taxon reference relationship.



that occurs in the Southern U.S."

Taxonomic name

Taxon (species)

Biological data

Reference relation:name refers to entity

Proposition 1: names refer (directly) to taxa




Taxonomic name

Taxon (species)

Biological data

Reference relation:name refers to entity

Data transmission:facilitated by name

Proposition 1: names refer (directly) to taxa

Yet, the legacy of taxonomy is more complicated:

the name taxon relationship can change.1

This poses some representation challenges…

1 See Franz et al. 2008. On the use of taxonomic concepts in support of biodiversity research and taxonomy; pp. 63–86. In: The New Taxonomy, Systematics Association Special Volume 74. Taylor & Francis, Boca Raton.




Taxonomic nameIdentity of the name/reference relation is regulated by Codes

(e.g., Typification)

Challenge 1: by necessity, a name refers only to a type (specimen)




Taxonomic name

Taxon (species)

Identity of the name/reference relation is regulated by Codes


Challenge 2: the discovery of 'true' taxon boundaries is contingent

The boundaries of taxon identity have the property of contingent, scientific hypotheses = concepts




Taxonomic name

Taxon (species)



Precise,reliablemapping?

Challenge 3: name/taxon (concept) changes are semi-independent





Taxonomic name

Taxon (species)

Biological data



Name-based data transmission:reliability is also contingent

Reference limitations!

Consequence: the name taxon reference model is often too simple

Precise,reliablemapping?


If we accept a contingent, changing

name concept taxon reference model,

then perhaps we should always say this:




..is the (Latin) name (string), nomenclaturally anchored with a type specimen, that can participate in the (more precisely in-dividuated) concept label "Andropogon glomeratus sec. Barkworth et al. 2014" (reference: Manual of Grasses for North America), which in turn refers to..

Proposition 2: concept labels refer (directly) to taxonomic concepts





..a feature-based circumscription ("Plants cespitose, upper portion dense, …

Pedicellate spikelets vestigial or absent, sterile. 2n = 20.") – the taxonomic concept as advocated by this reference – which may or may not align accurately with a (presumably existing and) relatively stable evolutionary lineage of organisms in nature for which..






..a feature-based circumscription ("Plants cespitose, upper portion dense, …

Pedicellate spikelets vestigial or absent, sterile. 2n = 20.") – the taxonomic concept as advocated by this reference – which may or may not align accurately with a (presumably existing and) relatively stable evolutionary lineage of organisms in nature for which..

..biological occurrence data are on hand.


Hence:

The challenge (2.0):

If we individuate taxonomic concepts

and their labels consistently, ..

1889

1933

1948

1950

1968

1979

1983

2006

2014

Chain of A. glomeratus concepts, 1889-2014.

..then how can we track concept provenance?

1889

1933

1948

1950

1968

1979

1983

2006

2014

?

Provenance representation challenge:How is each concept articulated to another?

Proposed solution:

We articulate them with (RCC-5)

concept-to-concept relationships..

1889

1933

1948

1950

1968

1979

1983

2006

2014

Congruence [==]

Congruence [==]

Proper inclusion [>]

Inverse proper inclusion [<]

Overlap [><]

Congruence [==]

Exclusion [|]

Future Floras: Congruence? [==]

RCC-5 = Region Connection Calculus with five basic relations.

…and utilize logic reasoning to

infer consistent merge taxonomies.

Merge – A. glomeratus sec. Blomquist (1948) / sec. Campbell (1983)

Proper inc

lusion

[>]

Congruence [==]

Merge View Legend

We now have a tool for this: Euler/X

https://bitbucket.org/eulerx



Euler/X toolkit in a single screenshot (desktop version, IX-2014)

Euler/X applies logic reasoning

to support the following workflow:

T1 = Taxonomy 1T2 = Taxonomy 2A = Input articulations [==, >, <, ><, |]C = Taxonomic constraints

User/reasoner interaction: achieving well-specified alignments

T1 = Taxonomy 1T2 = Taxonomy 2A = Input articulations [==, >, <, ><, |]C = Taxonomic constraints


Articulations are asserted by taxonomic experts.

Data format for an Euler/X alignment input file

T2 Year Author

T2 Year Author

Parentconcept Child

concepts


T2 Year Author

Parentconcept Child

concepts

T1


T2 Year Author

Parentconcept Child

concepts

T1

T2 to T1

Articulations

(as provided

by the user)


Input visualization of the 2005/1993 concept trees & articulations

Input articulations2005 concepts

1993 concepts


No!

No Possible World merge [empty canvas, nothing to report]


No!


No!

Yes

Nine Possible World merges for an under-specified use case input


No!

Yes


Yes

Yes


MIR =Maximally Informative Relations

[==, >, <, ><, |]for each concept pair

Yes

Yes

Use case 1: dwarf lemurs sec. 1993 & 2005 1

Chirogaleus furcifer sec. Mühel (1890) – Brehms Tierleben.Public Domain: http://books.google.com/books?id=sDgQAQAAMAAJ

1 Franz et al. 2014. Two influential primate classifications logically aligned. (unpublished)

http://books.google.com/books?id=sDgQAQAAMAAJ

http://books.google.com/books?id=sDgQAQAAMAAJ

The 2nd & 3rd Editions of the Mammal Species of the World

Primates sec. Groves (1993)

317 taxonomic concepts, 233 at the species level.

Primates sec. Groves (2005)

483 taxonomic concepts, 376 at the species level.

1993 2005

Δ = 143

species-

level

concepts

Primate 1993/2005 concept alignments:

From simple to complex merge taxonomies.

1. Input concepts & articulations

Microcebus rufus sec. 2005 – same name, congruent concepts [==]

Merge View Legend


2. Merge visualization

Grey rectangle, round corners Taxonomic congruence

Microcebus rufus sec. 2005 – same name, congruent concepts [==]

Merge View Legend

Mirza coquereli sec. 2005 – name change, congruent concepts [==]



Merge View Legend

Microcebus murinus (et al.) sec. 2005 – "lumping / splitting" [> , <]


Merge View Legend



Yellow octagon Unique to T1 (1993)

Green rectangle Unique to T2 (2005)

Microcebus murinus (et al.) sec. 2005 – "lumping / splitting" [> , <]

Merge View Legend

Microcebus (part) & Mirza sec. 2005 – monotypic parent concepts


Mirza & M. coquereli sec. Groves (2005)are two co-extensional concepts in T2

Microcebus (part) & Mirza sec. 2005 – monotypic parent concepts



Mirza & M. coquereli sec. Groves (2005)are two co-extensional concepts in T2

Three conceptsare congruent!

How can we represent concept overlap?

Microcebus (all) & Mirza sec. 2005 – concept overlap [><]

Merge visualization: containment, with overlap [-e mnpw --rcgo]

Dashed blue line Overlap [><]



Unique to 1993.Microcebus(2005 Mirza/coquereli)





Unique to 2005.Microcebus(1993 undescribed)





Unique to 2005.Microcebus(1993 undescribed)


Shared, congruent

child concepts

We can resolve the merge overlap products.


Merge visualization: "merge concept" representation [-e mncb]

Red lines Newly inferred articulations (to and from merge concepts)




2005.Microcebus*1993.Microcebus Shared merge concept



2005.Microcebus\1993.Microcebus Merge concept unique to 2005

2005.Microcebus*1993.Microcebus Shared merge concept


1993.Microcebus\2005.Microcebus Merge concept unique to 1993

Scalability & information gain:

How many input articulations are sufficient?

Cheirogaleoidae sec. 2005 – how many articulations are sufficient?

T2: 27 concepts; T1: 14 concepts; 22 input articulations



17 'non-new' 2005 species-level concepts Articulated to 1993 species-level concepts



4 'new' 2005 species-level concepts Exclusion (|) from 1993 family-level concept



1 additional highest-level articulation 2005.Cheirogaleoidae > 1993.Cheirogaleidae Eliminates 15 additional Possible Worlds



No genus-/subfamily level articulations are needed


Well-specified merge: 378 Maximally Informative Relations

~ 17x information gain through reasoning


Well-specified merge: 378 Maximally Informative Relations

~ 17x information gain through reasoning

Primates: 483x317 = 800 concepts 402 articulations 153,111 MIR

~ 380x information gain!

Why?

Performance of names as concept identifiers.

MSW 2nd/3rd Edition name/concept identity relations

56.4% of the paired name lineages are taxonomically reliable.

Þ Computers need concept resolution to track taxonomic provenance.

And Andropogon glomeratus sec. auctorum? 1

Use case 2




Photo by Max Licher (ASU Herbarium); Cottonwood, Arizona.http://swbiodiversity.org/seinet/imagelib/imgdetails.php?imgid=431755

1 See Franz et al. 2014. Names are not good enough: reasoning over taxonomic change in the Andropogon complex. Semantic Web – Interoperability, Usability, Applicability – Special Issue on Semantics for Biodiversity. (in press)




In brief: Things are very messy.

Question 1: Which concept labels have included the name string "Andropogon glomeratus" in past eight classifications?

Tabular alignment of eight Andropogon classifications: 1889 to 2006

Þ 6 / 8 classifications are taxonomically unique for the concept of A. glomeratus sec. auctorum.

Þ No two concepts including the "A. glomeratus" name string are taxonomically congruent.

Question 2: Which previously named concepts are congruent with Andropogon glomeratus sec. Weakley (2006)?

Tabular alignment of eight Andropogon classifications: 1889 to 2006

Þ What Weakley (2006) refers to as "A. glomeratus" was previously referred to as:

1889: A. macrourus var. hirsutior + A. macrourus var. abbreviatus 1933: A. glomeratus (in part, I) 1948: A. glomeratus (?) 1950: A. virginicus var. hisutior + A. glomeratus (in part, II) 1968: A. virginicus (in part) 1979: A. virginicus var. abbreviatus (in part) 1983: A. glomeratus (in part, I)

Logic representation: Easy!

Case 1: 1948.Blomquist vs. 1950.Hitchcock & Chase (Δ = 2 years)

T2: 7 concepts (1950); T1: 7 concepts (1948) – containment view

Merge: 3 congruent regions, 3 with same name 6 unique regions, 4 with non-unique name



A. glomeratus sec. 1950 and A. glomeratus sec. 1948 are overlapping, as each concept includes a non-congruent variety-level concept.

Interestingly, the shared concept region has no unique name in either taxonomy. It is 'un-named', at least within the context of the 1950/1948 classifications.


T2: 7 concepts (1950); T1: 7 concepts (1948) – merge concept view


The shared, overlapping region is more informatively resolved and labeled in the merge

concept visualization; the region 1950.A._glomeratus * 1948.A_glomeratus contains no

subelements that carry the name "A. virginicus" in either classification.


Case 2: 1889.Hackel vs. 2006.Weakley (Δ = 117 years)

T2: 12 concepts (2006); T1: 12 concepts (1889)

Merge: 8 congruent regions, 0 with same name (!) 5 unique regions, 1 with non-unique name

Case 2: 1889.Hackel vs. 2006.Weakley (Δ = 117 years)

T2: 12 concepts (2006); T1: 12 concepts (1889)

Merge: 8 congruent regions, 0 with same name (!) 5 unique regions, 1 with non-unique name

Þ Hackel & Weakley agree very substantively on what entities are

'out there in nature'; however, more than a century of Code-

compliant name changes has obscured their agreements.

Case 3: 1983.Campbell vs. 2006.Weakley (Δ = 23 years)



Case 3: 1983.Campbell vs. 2006.Weakley (Δ = 23 years)



One of the simpler merge taxonomies in this use case, although

8 / 15 merge regions have taxonomically misleading names (i.e.,

congruence/different names; non-congruence/same names).

This ratio is near-average through nine pairwise alignments.

In conclusion:

In conclusion – feasibility, accessibility, and what it means.

• Feasibility of tracking taxonomic concept provenance in computational logic:

• We are making leaps and bounds in feasibility (and in scalability) right now.

• However, many interesting challenges remain (e.g., user/reasoner interaction).




• Accessibility and acceptance of the RCC-5/reasoning approach:

• We need more use cases, and users – the Euler/X approach works!

• It can be applied to any new or legacy systematic publication, biodiversity

database, checklist, classification, phylogeny, or other kinds of taxonomic

syntheses (print or virtual) and versions thereof; complementing the Linnaean

system while providing superior individuation of taxonomic content.

• Having a sound web service is the next critical step in advancing the approach.





• Accessibility and acceptance of the RCC-5/reasoning approach:

• We need more use cases, and users – the Euler/X approach works!

• It can be applied to any new or legacy systematic publication, biodiversity

database, checklist, classification, phylogeny, or other kinds of taxonomic

syntheses (print or virtual) and versions thereof; complementing the Linnaean

system while providing superior individuation of taxonomic content.

• Having a sound web service is the next critical step in advancing the approach.

• What does it all mean?

• The legacy of taxonomic name and concept authoring is amenable to

computational logic and provenance tracking. We can likely derive much data

integration power from further developments in this direction.


Acknowledgments

• Robert Guralnick, Susanna Drogsvold & all CU Museum of Natural History "The Meaning of Names" conference organizers!

• Euler/X team: Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers

& Bertram Ludäscher

• Juliana Cardona-Duque (weevils), Naomi Pier (primates) & Alan Weakley (grasses)

• taxonbytes lab members: Andrew Johnston & Guanyang Zhang

• NSF DEB–1155984 & DBI–1342595 (PI Franz); IIS–118088 & DBI–1147273

(PI Ludäscher)

https://sols.asu.edu/ Franz Lab: http://taxonbytes.org/

https://sols.asu.edu/

https://sols.asu.edu/




Select references on concept taxonomy and the Euler/X toolkit

• Franz & Peet. 2009. Towards a language for mapping relationships among taxonomic concepts. Systematics and Biodiversity 7: 5–20. Link

• Chen et al. 2014. Euler/X: a toolkit for logic-based taxonomy integration. WFLP 2013 – 22nd International Workshop on Functional and (Constraint) Logic Programming. Link

• Chen et al. 2014. A hybrid diagnosis approach combining Black-Box and White-Box reasoning. Lecture Notes in Computer Science 8620: 127–141. Link

• Franz et al. 2014. Names are not good enough: reasoning over taxonomic change in the Andropogon complex. Semantic Web – Interoperability, Usability, Applicability – Special Issue on Semantics for Biodiversity. (in press) Link

• Franz et al. 2014. Reasoning over taxonomic change: exploring alignments for the Perelleschus use case. PLoS ONE. (in review)

• Euler/X toolkit: https://bitbucket.org/eulerx/euler-project

• Euler web service (in progress): http://euler.asu.edu/

• Concept taxonomy @ taxonbytes: http://taxonbytes.org/tag/concept-taxonomy/

http://www.tandfonline.com/doi/abs/10.1017/S147720000800282X%23.VBZQ7y5dVVM

http://arxiv.org/abs/1402.1992

http://link.springer.com/chapter/10.1007/978-3-319-09870-8_9

http://www.semantic-web-journal.net/system/files/swj623.pdf

https://bitbucket.org/eulerx/euler-project

https://bitbucket.org/eulerx/euler-project

http://euler.asu.edu/

http://euler.asu.edu/

http://taxonbytes.org/tag/concept-taxonomy/

http://taxonbytes.org/tag/concept-taxonomy/

Miscellaneous appended slides

The good: names refer to type specimens necessarily

Source: Witteveen. 2014. Biology & Philosophy. (in press)

The challenge: names refer to non-type specimens contingently

Source: Dubois. 2005. Zoosystema 27: 365-426.

Names

Non-types

We may categorize kinds of nomenclatural

and taxonomic change, and opportunities,

to track each, as follows:

Nomenclatural/taxonomic change & provenance tracking square

E.g.: - A binomial name is formed incorrectly. - A homonym is discovered, requiring name change.


E.g.: - A type specimen is lost, a neotype must be designated. - "One fungus (a-/sexual), one name" – Melbourne Code.


E.g.: - A heterotypic synonymy is established (inferred). - a Priority-carrying name is newly 'transferred'.


E.g.: - A junior genus-level name is transferred among tribes. - An informal clade name is redefined across treatments.


Question: Which changes are most common in a particular group?Answer: Concept-level resolution is needed to assess this.

Manychanges

Somechanges

Manychanges

MOSTCHANGES

???

Question: What is the proper scope of reference for representing our

progress in inferring the tree of life?

Suggested answer: Even though the name taxon mapping is the

ultimate aim..

..in effect we only need to represent the name concept mapping.

Congruence over time will suggest that we are 'getting taxa right'.

R32 lattice of RCC-5 articulations (lighter color = less certainty)

Higher-level primate classifications

– 1993 versus 2005:

Many recurrent names,

little taxonomic congruence.

Primates sec. 1993 & 2005Order to Subfamily-level

Not much is grey.

Strepsirrhinisec. 2005

Haplorrhinisec. 2005

Catarrhinisec. 2005

Use case 2: Perelleschus sec. 2001 & 2006 1

Perelleschus salpinflexus sec. Franz & Cardona-Duque (2013)DOI:10.1080/14772000.2013.806371

1 Input articulations: Franz & Cardona-Duque. 2013. Description of two new species and phylogenetic reassessment of Perelleschus Wibmer & O'Brien, 1986 (Coleoptera: Curculionidae), with a complete taxonomic concept history of Perelleschus sec. Franz & Cardona-Duque, 2013. 2013. Systematics and Biodiversity 11: 209–236. Merge analyses: Franz et al. 2014. Reasoning over taxonomic change: exploring alignments for the Perelleschus use case. PLoS ONE. (in press)

Goal: align two phylogenies with differential taxon sampling

T1: Perelleschus sec. 2001• Phylogenetic revision• 8 ingroup species concepts• 2 outgroup concepts• 18 concepts total

Goal: align two phylogenies with differential taxon sampling

T1: Perelleschus sec. 2001• Phylogenetic revision• 8 ingroup species concepts• 2 outgroup concepts• 18 concepts total

T2: Perelleschus sec. 2006• Exemplar analysis• 2 ingroup species concepts• 1 outgroup concept• 7 concepts total

Logic representation challenge:

Perelleschus sec. 2001 & 2006 concepts

have incongruent sets of subordinate members,

yet each concept has congruent synapomorphies.

Ostensive alignment: the congruence among higher-level concepts is assessed in relation to their entailed members.

Ostension: giving meaning through an act of pointing out.

Definitional preliminaries 1

1 See Bird & Tobin. 2012. Natural Kinds. URL: http://plato.stanford.edu/archives/win2012/entries/natural-kinds/

http://plato.stanford.edu/archives/win2012/entries/natural-kinds/






Ostensive alignment: the congruence among higher-level concepts is assessed in relation to their entailed members.

Ostension: giving meaning through an act of pointing out.

Intensional alignment: the congruence among higher-level concepts is assessed in relation to their properties.

Intension: giving meaning through the specification of properties.

Definitional preliminaries 1

1 See Bird & Tobin. 2012. Natural Kinds. URL: http://plato.stanford.edu/archives/win2012/entries/natural-kinds/







Ostensive alignment – members are all that counts

Challenge 1: Differential outgroup sampling (2 / 1 concepts)

T2: 2006.PHY & 2006.PHYsubcin T1: 2006.PHY only

Input constraints

Ostensive alignment2001 & 2006




Solution: Locally relax coverage with "nc" = "no coverage"

Input constraints





Solution: Locally relax coverage with "nc" = "no coverage"

Result: 2006.PHY == 2001.PHY

Outgroups are held congruent.

Input constraints



Challenge 2: Ostensive alignmentInput constraints



Challenge 2: Ostensive alignment

Solution: 11 ingroup concept articulations are coded ostensively – either as <, ><, or | – to represent non- congruence in the representation of child concepts

Input constraints



Challenge 2: Ostensive alignment

Solution: 11 ingroup concept articulations are coded ostensively – either as <, ><, or | – to represent non- congruence in the representation of child concepts

Result: 2006.PER < 2001.PER 2006.PER | 2001.[5 species concepts] etc.

Input constraints


<

5 x |2 x ><

Intensional alignment – representation of congruent synapomorphies

Input constraints

Intensional alignment2001 & 2006

Challenge 3: Intensional alignment


Input constraints



Solution: An Implied Child (_IC) concept is added to the undersampled (2006) clade concept; and the (5) "missing" species-level concepts are included within this Implied Child


Input constraints



Solution: An Implied Child (_IC) concept is added to the undersampled (2006) clade concept; and the (5) "missing" species-level concepts are included within this Implied Child

11 ingroup concept articulations are coded intensionally – as == or > – to reflect congruent synapomorphies of 2001 & 2006


Input constraints



Result: The genus- and ingroup clade-level concepts are inferred as congruent:

2006. PER == 2001.PER 2006.PcarPeve == 2001.PcarPsul etc.

==

Review – representing ostensive versus intensional alignments

Ostensive alignment2001.PER includes morespecies-level conceptsthan 2006.PER [>].

Review – representing ostensive versus intensional alignments

Ostensive alignment2001.PER includes morespecies-level conceptsthan 2006.PER [>].

Intensional alignment2006.PER reconfirms the synapomorphies inferred in 2001.PER [==].

The other piece in the puzzle: Concept-to-voucher identifications

Source: Baskauf & Webb. 214. Darwin-SW. URL: http://www.semantic-web-journal.net/system/files/swj635.pdf



Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Science

Transcript of Franz. 2014. Explaining taxonomy's legacy to computers – how and why?