Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

132
Explaining taxonomy's legacy to computers – how and why? Nico M. Franz 1,2 Arizona State University http://taxonbytes.org/ Concepts and tools developed jointly with members of the Ludäscher Lab (UC Davis & UIUC Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers & Bertram Ludäscher 2 The Meaning of Names: Naming Diversity in the 21 st Century September 30, 2014; Museum of Natural History, University of Colorado On-line @ http ://www.slideshare.net/taxonbytes/franz-2014-explaining-taxonomys-legacy-to-computers-how-and- why

description

Slides presented on the Euler/X projected (http://taxonbytes.org/prior-work-on-concept-taxonomy-2013/ & https://bitbucket.org/eulerx/euler-project) - for the conference "The Meaning of Names: Naming Diversity in the 21st Century", CU Natural History Museum, September 30, 2014.

Transcript of Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Page 1: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Explaining taxonomy's legacy

to computers – how and why?

Nico M. Franz 1,2

Arizona State University

http://taxonbytes.org/

1 Concepts and tools developed jointly with members of the Ludäscher Lab (UC Davis & UIUC):

Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers & Bertram Ludäscher

2 The Meaning of Names: Naming Diversity in the 21st Century

September 30, 2014; Museum of Natural History, University of Colorado

On-line @ http://www.slideshare.net/taxonbytes/franz-2014-explaining-taxonomys-legacy-to-computers-how-and-why

Page 2: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Alternative title:

Concept taxonomy –

now with logic reasoning.

Page 3: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Taxonomic concept: 1

The circumscription of a perceived

(or, more accurately, hypothesized)

taxonomic group, as advocated by

a particular author and source.

Definitional preliminaries, 1

1 Not the same as species concepts, which are theories about what species are, and/or how they are recognized.

Page 4: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Provenance: 1

Information describing the origin, derivation,

history, custody, or context of an entity (etc.).

Provenance establishes the authenticity, integrity

and trustworthiness of information about entities.

Definitional preliminaries, 2

1 See, e.g.: http://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance

Page 5: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

• An emerging solution to the challenge of tracking stability

and change across multiple taxonomic name usages.

Concept taxonomy in three introductory phrases

Page 6: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

• An emerging solution to the challenge of tracking stability

and change across multiple taxonomic name usages.

• Fully compatible with Linnaean nomenclature (Codes).

Concept taxonomy in three introductory phrases

Page 7: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

• An emerging solution to the challenge of tracking stability

and change across multiple taxonomic name usages.

• Fully compatible with Linnaean nomenclature (Codes).

• The focus is on building sound provenance chains amenable

to computational representation and reasoning; irrespective

of whether the nomenclatural/taxonomic history of a perceived

lineage of organisms was perfectly stable since the times of

Linnaeus, or continues to undergo major alterations.

Concept taxonomy in three introductory phrases

Page 8: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Overview of today's presentation

• The challenge (1.0): Limitations of the name taxon reference model.

• The challenge (2.0): How to track taxonomic concept provenance?

• Introducing Euler/X – overview of workflow and user/reasoner interaction.~ 8 mins.

Page 9: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Overview of today's presentation

• The challenge (1.0): Limitations of the name taxon reference model.

• The challenge (2.0): How to track taxonomic concept provenance?

• Introducing Euler/X – overview of workflow and user/reasoner interaction.

• How does it work?

• Use case 1: Dwarf lemur classifications sec. 1993 & 2005.

• From simple to complex merge taxonomies.

• How can we represent taxonomic concept overlap?

• Scalability & information gain: How many articulations?

• Why? Insights into the performance of names as concept identifiers.

• Use case 2: Andropogon glomeratus sec. auctorum.

• In conclusion – feasibility, accessibility, and what it means.

~ 8 mins.

~ 15 mins.

Page 10: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

The challenge (1.0):

Often, we make statements like this:

Page 11: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

"Andropogon glomeratus

is a species of grass (Poaceae)

that occurs in the Southern U.S." Photo by Max Licher (ASU Herbarium); Cottonwood, Arizona.http://swbiodiversity.org/seinet/imagelib/imgdetails.php?imgid=431755

Page 12: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Thereby we stipulate a direct

name taxon reference relationship.

Page 13: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

"Andropogon glomeratus

is a species of grass (Poaceae)

that occurs in the Southern U.S."

Taxonomic name

Taxon (species)

Biological data

Reference relation:name refers to entity

Proposition 1: names refer (directly) to taxa

Page 14: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

"Andropogon glomeratus

is a species of grass (Poaceae)

that occurs in the Southern U.S."

Taxonomic name

Taxon (species)

Biological data

Reference relation:name refers to entity

Data transmission:facilitated by name

Proposition 1: names refer (directly) to taxa

Page 15: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Yet, the legacy of taxonomy is more complicated:

the name taxon relationship can change.1

This poses some representation challenges…

1 See Franz et al. 2008. On the use of taxonomic concepts in support of biodiversity research and taxonomy; pp. 63–86. In: The New Taxonomy, Systematics Association Special Volume 74. Taylor & Francis, Boca Raton.

Page 16: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

"Andropogon glomeratus

is a species of grass (Poaceae)

that occurs in the Southern U.S."

Taxonomic nameIdentity of the name/reference relation is regulated by Codes

(e.g., Typification)

Challenge 1: by necessity, a name refers only to a type (specimen)

Page 17: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

"Andropogon glomeratus

is a species of grass (Poaceae)

that occurs in the Southern U.S."

Taxonomic name

Taxon (species)

Identity of the name/reference relation is regulated by Codes

(e.g., Typification)

Challenge 2: the discovery of 'true' taxon boundaries is contingent

The boundaries of taxon identity have the property of contingent, scientific hypotheses = concepts

Page 18: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

"Andropogon glomeratus

is a species of grass (Poaceae)

that occurs in the Southern U.S."

Taxonomic name

Taxon (species)

Identity of the name/reference relation is regulated by Codes

(e.g., Typification)

Precise,reliablemapping?

Challenge 3: name/taxon (concept) changes are semi-independent

The boundaries of taxon identity have the property of contingent, scientific hypotheses = concepts

Page 19: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

"Andropogon glomeratus

is a species of grass (Poaceae)

that occurs in the Southern U.S."

Taxonomic name

Taxon (species)

Biological data

Identity of the name/reference relation is regulated by Codes

(e.g., Typification)

Name-based data transmission:reliability is also contingent

Reference limitations!

Consequence: the name taxon reference model is often too simple

Precise,reliablemapping?

The boundaries of taxon identity have the property of contingent, scientific hypotheses = concepts

Page 20: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

If we accept a contingent, changing

name concept taxon reference model,

then perhaps we should always say this:

Page 21: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

"Andropogon glomeratus

is a species of grass (Poaceae)

that occurs in the Southern U.S."

..is the (Latin) name (string), nomenclaturally anchored with a type specimen, that can participate in the (more precisely in-dividuated) concept label "Andropogon glomeratus sec. Barkworth et al. 2014" (reference: Manual of Grasses for North America), which in turn refers to..

Proposition 2: concept labels refer (directly) to taxonomic concepts

Page 22: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

"Andropogon glomeratus

is a species of grass (Poaceae)

that occurs in the Southern U.S."

..is the (Latin) name (string), nomenclaturally anchored with a type specimen, that can participate in the (more precisely in-dividuated) concept label "Andropogon glomeratus sec. Barkworth et al. 2014" (reference: Manual of Grasses for North America), which in turn refers to..

..a feature-based circumscription ("Plants cespitose, upper portion dense, …

Pedicellate spikelets vestigial or absent, sterile. 2n = 20.") – the taxonomic concept as advocated by this reference – which may or may not align accurately with a (presumably existing and) relatively stable evolutionary lineage of organisms in nature for which..

Proposition 2: concept labels refer (directly) to taxonomic concepts

Page 23: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

"Andropogon glomeratus

is a species of grass (Poaceae)

that occurs in the Southern U.S."

..is the (Latin) name (string), nomenclaturally anchored with a type specimen, that can participate in the (more precisely in-dividuated) concept label "Andropogon glomeratus sec. Barkworth et al. 2014" (reference: Manual of Grasses for North America), which in turn refers to..

..a feature-based circumscription ("Plants cespitose, upper portion dense, …

Pedicellate spikelets vestigial or absent, sterile. 2n = 20.") – the taxonomic concept as advocated by this reference – which may or may not align accurately with a (presumably existing and) relatively stable evolutionary lineage of organisms in nature for which..

..biological occurrence data are on hand.

Proposition 2: concept labels refer (directly) to taxonomic concepts

Page 24: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Hence:

The challenge (2.0):

If we individuate taxonomic concepts

and their labels consistently, ..

Page 25: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

1889

1933

1948

1950

1968

1979

1983

2006

2014

Chain of A. glomeratus concepts, 1889-2014.

Page 26: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

..then how can we track concept provenance?

Page 27: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

1889

1933

1948

1950

1968

1979

1983

2006

2014

?

Provenance representation challenge:How is each concept articulated to another?

Page 28: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Proposed solution:

We articulate them with (RCC-5)

concept-to-concept relationships..

Page 29: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

1889

1933

1948

1950

1968

1979

1983

2006

2014

Congruence [==]

Congruence [==]

Proper inclusion [>]

Inverse proper inclusion [<]

Overlap [><]

Congruence [==]

Exclusion [|]

Future Floras: Congruence? [==]

RCC-5 = Region Connection Calculus with five basic relations.

Page 30: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

…and utilize logic reasoning to

infer consistent merge taxonomies.

Page 31: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Merge – A. glomeratus sec. Blomquist (1948) / sec. Campbell (1983)

Proper inc

lusion

[>]

Congruence [==]

Merge View Legend

Page 32: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

We now have a tool for this: Euler/X

https://bitbucket.org/eulerx

Page 33: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Euler/X toolkit in a single screenshot (desktop version, IX-2014)

Page 34: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Euler/X applies logic reasoning

to support the following workflow:

Page 35: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

T1 = Taxonomy 1T2 = Taxonomy 2A = Input articulations [==, >, <, ><, |]C = Taxonomic constraints

User/reasoner interaction: achieving well-specified alignments

Page 36: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

T1 = Taxonomy 1T2 = Taxonomy 2A = Input articulations [==, >, <, ><, |]C = Taxonomic constraints

User/reasoner interaction: achieving well-specified alignments

Articulations are asserted by taxonomic experts.

Page 37: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Data format for an Euler/X alignment input file

T2 Year Author

Page 38: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

T2 Year Author

Parentconcept Child

concepts

Data format for an Euler/X alignment input file

Page 39: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

T2 Year Author

Parentconcept Child

concepts

T1

Data format for an Euler/X alignment input file

Page 40: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

T2 Year Author

Parentconcept Child

concepts

T1

T2 to T1

Articulations

(as provided

by the user)

Data format for an Euler/X alignment input file

Page 41: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

User/reasoner interaction: achieving well-specified alignments

Page 42: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Input visualization of the 2005/1993 concept trees & articulations

Input articulations2005 concepts

1993 concepts

Page 43: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

User/reasoner interaction: achieving well-specified alignments

No!

Page 44: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

No Possible World merge [empty canvas, nothing to report]

Page 45: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

User/reasoner interaction: achieving well-specified alignments

No!

Page 46: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

User/reasoner interaction: achieving well-specified alignments

No!

Yes

Page 47: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Nine Possible World merges for an under-specified use case input

Page 48: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

User/reasoner interaction: achieving well-specified alignments

No!

Yes

Page 49: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

User/reasoner interaction: achieving well-specified alignments

Yes

Yes

Page 50: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

User/reasoner interaction: achieving well-specified alignments

MIR =Maximally Informative Relations

[==, >, <, ><, |]for each concept pair

Yes

Yes

Page 51: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Use case 1: dwarf lemurs sec. 1993 & 2005 1

Chirogaleus furcifer sec. Mühel (1890) – Brehms Tierleben.Public Domain: http://books.google.com/books?id=sDgQAQAAMAAJ

1 Franz et al. 2014. Two influential primate classifications logically aligned. (unpublished)

Page 52: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

The 2nd & 3rd Editions of the Mammal Species of the World

Primates sec. Groves (1993)

317 taxonomic concepts, 233 at the species level.

Primates sec. Groves (2005)

483 taxonomic concepts, 376 at the species level.

1993 2005

Δ = 143

species-

level

concepts

Page 53: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Primate 1993/2005 concept alignments:

From simple to complex merge taxonomies.

Page 54: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

1. Input concepts & articulations

Microcebus rufus sec. 2005 – same name, congruent concepts [==]

Merge View Legend

Page 55: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

1. Input concepts & articulations

2. Merge visualization

Grey rectangle, round corners Taxonomic congruence

Microcebus rufus sec. 2005 – same name, congruent concepts [==]

Merge View Legend

Page 56: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Mirza coquereli sec. 2005 – name change, congruent concepts [==]

1. Input concepts & articulations

2. Merge visualization

Merge View Legend

Page 57: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Microcebus murinus (et al.) sec. 2005 – "lumping / splitting" [> , <]

1. Input concepts & articulations

Merge View Legend

Page 58: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

1. Input concepts & articulations

2. Merge visualization

Yellow octagon Unique to T1 (1993)

Green rectangle Unique to T2 (2005)

Microcebus murinus (et al.) sec. 2005 – "lumping / splitting" [> , <]

Merge View Legend

Page 59: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Microcebus (part) & Mirza sec. 2005 – monotypic parent concepts

1. Input concepts & articulations

Mirza & M. coquereli sec. Groves (2005)are two co-extensional concepts in T2

Page 60: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Microcebus (part) & Mirza sec. 2005 – monotypic parent concepts

1. Input concepts & articulations

2. Merge visualization

Mirza & M. coquereli sec. Groves (2005)are two co-extensional concepts in T2

Three conceptsare congruent!

Page 61: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

How can we represent concept overlap?

Page 62: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Microcebus (all) & Mirza sec. 2005 – concept overlap [><]

Merge visualization: containment, with overlap [-e mnpw --rcgo]

Dashed blue line Overlap [><]

Page 63: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Microcebus (all) & Mirza sec. 2005 – concept overlap [><]

Merge visualization: containment, with overlap [-e mnpw --rcgo]

Unique to 1993.Microcebus(2005 Mirza/coquereli)

Dashed blue line Overlap [><]

Page 64: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Microcebus (all) & Mirza sec. 2005 – concept overlap [><]

Merge visualization: containment, with overlap [-e mnpw --rcgo]

Unique to 1993.Microcebus(2005 Mirza/coquereli)

Unique to 2005.Microcebus(1993 undescribed)

Dashed blue line Overlap [><]

Page 65: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Microcebus (all) & Mirza sec. 2005 – concept overlap [><]

Merge visualization: containment, with overlap [-e mnpw --rcgo]

Unique to 1993.Microcebus(2005 Mirza/coquereli)

Unique to 2005.Microcebus(1993 undescribed)

Dashed blue line Overlap [><]

Shared, congruent

child concepts

Page 66: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

We can resolve the merge overlap products.

Page 67: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Microcebus (all) & Mirza sec. 2005 – concept overlap [><]

Merge visualization: "merge concept" representation [-e mncb]

Red lines Newly inferred articulations (to and from merge concepts)

Page 68: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Microcebus (all) & Mirza sec. 2005 – concept overlap [><]

Merge visualization: "merge concept" representation [-e mncb]

Red lines Newly inferred articulations (to and from merge concepts)

2005.Microcebus*1993.Microcebus Shared merge concept

Page 69: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Microcebus (all) & Mirza sec. 2005 – concept overlap [><]

Merge visualization: "merge concept" representation [-e mncb]

2005.Microcebus\1993.Microcebus Merge concept unique to 2005

2005.Microcebus*1993.Microcebus Shared merge concept

Red lines Newly inferred articulations (to and from merge concepts)

1993.Microcebus\2005.Microcebus Merge concept unique to 1993

Page 70: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Scalability & information gain:

How many input articulations are sufficient?

Page 71: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Cheirogaleoidae sec. 2005 – how many articulations are sufficient?

T2: 27 concepts; T1: 14 concepts; 22 input articulations

Page 72: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Cheirogaleoidae sec. 2005 – how many articulations are sufficient?

T2: 27 concepts; T1: 14 concepts; 22 input articulations

17 'non-new' 2005 species-level concepts Articulated to 1993 species-level concepts

Page 73: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Cheirogaleoidae sec. 2005 – how many articulations are sufficient?

T2: 27 concepts; T1: 14 concepts; 22 input articulations

4 'new' 2005 species-level concepts Exclusion (|) from 1993 family-level concept

Page 74: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Cheirogaleoidae sec. 2005 – how many articulations are sufficient?

T2: 27 concepts; T1: 14 concepts; 22 input articulations

1 additional highest-level articulation 2005.Cheirogaleoidae > 1993.Cheirogaleidae Eliminates 15 additional Possible Worlds

Page 75: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Cheirogaleoidae sec. 2005 – how many articulations are sufficient?

T2: 27 concepts; T1: 14 concepts; 22 input articulations

No genus-/subfamily level articulations are needed

Page 76: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Cheirogaleoidae sec. 2005 – how many articulations are sufficient?

Well-specified merge: 378 Maximally Informative Relations

~ 17x information gain through reasoning

Page 77: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Cheirogaleoidae sec. 2005 – how many articulations are sufficient?

Well-specified merge: 378 Maximally Informative Relations

~ 17x information gain through reasoning

Primates: 483x317 = 800 concepts 402 articulations 153,111 MIR

~ 380x information gain!

Page 78: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Why?

Performance of names as concept identifiers.

Page 79: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

MSW 2nd/3rd Edition name/concept identity relations

56.4% of the paired name lineages are taxonomically reliable.

Þ Computers need concept resolution to track taxonomic provenance.

Page 80: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

And Andropogon glomeratus sec. auctorum? 1

Use case 2

"Andropogon glomeratus

is a species of grass (Poaceae)

that occurs in the Southern U.S."

Photo by Max Licher (ASU Herbarium); Cottonwood, Arizona.http://swbiodiversity.org/seinet/imagelib/imgdetails.php?imgid=431755

1 See Franz et al. 2014. Names are not good enough: reasoning over taxonomic change in the Andropogon complex. Semantic Web – Interoperability, Usability, Applicability – Special Issue on Semantics for Biodiversity. (in press)

Page 81: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

In brief: Things are very messy.

Page 82: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Question 1: Which concept labels have included the name string "Andropogon glomeratus" in past eight classifications?

Tabular alignment of eight Andropogon classifications: 1889 to 2006

Þ 6 / 8 classifications are taxonomically unique for the concept of A. glomeratus sec. auctorum.

Þ No two concepts including the "A. glomeratus" name string are taxonomically congruent.

Page 83: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Question 2: Which previously named concepts are congruent with Andropogon glomeratus sec. Weakley (2006)?

Tabular alignment of eight Andropogon classifications: 1889 to 2006

Þ What Weakley (2006) refers to as "A. glomeratus" was previously referred to as:

1889: A. macrourus var. hirsutior + A. macrourus var. abbreviatus 1933: A. glomeratus (in part, I) 1948: A. glomeratus (?) 1950: A. virginicus var. hisutior + A. glomeratus (in part, II) 1968: A. virginicus (in part) 1979: A. virginicus var. abbreviatus (in part) 1983: A. glomeratus (in part, I)

Page 84: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Logic representation: Easy!

Page 85: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Case 1: 1948.Blomquist vs. 1950.Hitchcock & Chase (Δ = 2 years)

T2: 7 concepts (1950); T1: 7 concepts (1948) – containment view

Merge: 3 congruent regions, 3 with same name 6 unique regions, 4 with non-unique name

Page 86: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

T2: 7 concepts (1950); T1: 7 concepts (1948) – containment view

Merge: 3 congruent regions, 3 with same name 6 unique regions, 4 with non-unique name

A. glomeratus sec. 1950 and A. glomeratus sec. 1948 are overlapping, as each concept includes a non-congruent variety-level concept.

Interestingly, the shared concept region has no unique name in either taxonomy. It is 'un-named', at least within the context of the 1950/1948 classifications.

Case 1: 1948.Blomquist vs. 1950.Hitchcock & Chase (Δ = 2 years)

Page 87: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

T2: 7 concepts (1950); T1: 7 concepts (1948) – merge concept view

Merge: 3 congruent regions, 3 with same name 6 unique regions, 4 with non-unique name

The shared, overlapping region is more informatively resolved and labeled in the merge

concept visualization; the region 1950.A._glomeratus * 1948.A_glomeratus contains no

subelements that carry the name "A. virginicus" in either classification.

Case 1: 1948.Blomquist vs. 1950.Hitchcock & Chase (Δ = 2 years)

Page 88: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Case 2: 1889.Hackel vs. 2006.Weakley (Δ = 117 years)

T2: 12 concepts (2006); T1: 12 concepts (1889)

Merge: 8 congruent regions, 0 with same name (!) 5 unique regions, 1 with non-unique name

Page 89: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Case 2: 1889.Hackel vs. 2006.Weakley (Δ = 117 years)

T2: 12 concepts (2006); T1: 12 concepts (1889)

Merge: 8 congruent regions, 0 with same name (!) 5 unique regions, 1 with non-unique name

Þ Hackel & Weakley agree very substantively on what entities are

'out there in nature'; however, more than a century of Code-

compliant name changes has obscured their agreements.

Page 90: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Case 3: 1983.Campbell vs. 2006.Weakley (Δ = 23 years)

T2: 12 concepts (2006); T1: 14 concepts (1983) – containment view

Merge: 9 congruent regions, 5 with same name 6 unique regions, 4 with non-unique name

Page 91: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Case 3: 1983.Campbell vs. 2006.Weakley (Δ = 23 years)

T2: 12 concepts (2006); T1: 14 concepts (1983) – containment view

Merge: 9 congruent regions, 5 with same name 6 unique regions, 4 with non-unique name

One of the simpler merge taxonomies in this use case, although

8 / 15 merge regions have taxonomically misleading names (i.e.,

congruence/different names; non-congruence/same names).

This ratio is near-average through nine pairwise alignments.

Page 92: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

In conclusion:

Page 93: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

In conclusion – feasibility, accessibility, and what it means.

• Feasibility of tracking taxonomic concept provenance in computational logic:

• We are making leaps and bounds in feasibility (and in scalability) right now.

• However, many interesting challenges remain (e.g., user/reasoner interaction).

Page 94: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

• Feasibility of tracking taxonomic concept provenance in computational logic:

• We are making leaps and bounds in feasibility (and in scalability) right now.

• However, many interesting challenges remain (e.g., user/reasoner interaction).

• Accessibility and acceptance of the RCC-5/reasoning approach:

• We need more use cases, and users – the Euler/X approach works!

• It can be applied to any new or legacy systematic publication, biodiversity

database, checklist, classification, phylogeny, or other kinds of taxonomic

syntheses (print or virtual) and versions thereof; complementing the Linnaean

system while providing superior individuation of taxonomic content.

• Having a sound web service is the next critical step in advancing the approach.

In conclusion – feasibility, accessibility, and what it means.

Page 95: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

• Feasibility of tracking taxonomic concept provenance in computational logic:

• We are making leaps and bounds in feasibility (and in scalability) right now.

• However, many interesting challenges remain (e.g., user/reasoner interaction).

• Accessibility and acceptance of the RCC-5/reasoning approach:

• We need more use cases, and users – the Euler/X approach works!

• It can be applied to any new or legacy systematic publication, biodiversity

database, checklist, classification, phylogeny, or other kinds of taxonomic

syntheses (print or virtual) and versions thereof; complementing the Linnaean

system while providing superior individuation of taxonomic content.

• Having a sound web service is the next critical step in advancing the approach.

• What does it all mean?

• The legacy of taxonomic name and concept authoring is amenable to

computational logic and provenance tracking. We can likely derive much data

integration power from further developments in this direction.

In conclusion – feasibility, accessibility, and what it means.

Page 96: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Acknowledgments

• Robert Guralnick, Susanna Drogsvold & all CU Museum of Natural History "The Meaning of Names" conference organizers!

• Euler/X team: Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers

& Bertram Ludäscher

• Juliana Cardona-Duque (weevils), Naomi Pier (primates) & Alan Weakley (grasses)

• taxonbytes lab members: Andrew Johnston & Guanyang Zhang

• NSF DEB–1155984 & DBI–1342595 (PI Franz); IIS–118088 & DBI–1147273

(PI Ludäscher)

https://sols.asu.edu/ Franz Lab: http://taxonbytes.org/

Page 97: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Select references on concept taxonomy and the Euler/X toolkit

• Franz & Peet. 2009. Towards a language for mapping relationships among taxonomic concepts. Systematics and Biodiversity 7: 5–20. Link

• Chen et al. 2014. Euler/X: a toolkit for logic-based taxonomy integration. WFLP 2013 – 22nd International Workshop on Functional and (Constraint) Logic Programming. Link

• Chen et al. 2014. A hybrid diagnosis approach combining Black-Box and White-Box reasoning. Lecture Notes in Computer Science 8620: 127–141. Link

• Franz et al. 2014. Names are not good enough: reasoning over taxonomic change in the Andropogon complex. Semantic Web – Interoperability, Usability, Applicability – Special Issue on Semantics for Biodiversity. (in press) Link

• Franz et al. 2014. Reasoning over taxonomic change: exploring alignments for the Perelleschus use case. PLoS ONE. (in review)

• Euler/X toolkit: https://bitbucket.org/eulerx/euler-project

• Euler web service (in progress): http://euler.asu.edu/

• Concept taxonomy @ taxonbytes: http://taxonbytes.org/tag/concept-taxonomy/

Page 98: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Miscellaneous appended slides

Page 99: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

The good: names refer to type specimens necessarily

Source: Witteveen. 2014. Biology & Philosophy. (in press)

Page 100: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

The challenge: names refer to non-type specimens contingently

Source: Dubois. 2005. Zoosystema 27: 365-426.

Names

Non-types

Page 101: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

We may categorize kinds of nomenclatural

and taxonomic change, and opportunities,

to track each, as follows:

Page 102: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Nomenclatural/taxonomic change & provenance tracking square

E.g.: - A binomial name is formed incorrectly. - A homonym is discovered, requiring name change.

Page 103: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Nomenclatural/taxonomic change & provenance tracking square

E.g.: - A type specimen is lost, a neotype must be designated. - "One fungus (a-/sexual), one name" – Melbourne Code.

Page 104: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Nomenclatural/taxonomic change & provenance tracking square

E.g.: - A heterotypic synonymy is established (inferred). - a Priority-carrying name is newly 'transferred'.

Page 105: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Nomenclatural/taxonomic change & provenance tracking square

E.g.: - A junior genus-level name is transferred among tribes. - An informal clade name is redefined across treatments.

Page 106: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Nomenclatural/taxonomic change & provenance tracking square

Question: Which changes are most common in a particular group?Answer: Concept-level resolution is needed to assess this.

Manychanges

Somechanges

Manychanges

MOSTCHANGES

???

Page 107: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Question: What is the proper scope of reference for representing our

progress in inferring the tree of life?

Page 108: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Suggested answer: Even though the name taxon mapping is the

ultimate aim..

Page 109: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

..in effect we only need to represent the name concept mapping.

Congruence over time will suggest that we are 'getting taxa right'.

Page 110: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

R32 lattice of RCC-5 articulations (lighter color = less certainty)

Page 111: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Higher-level primate classifications

– 1993 versus 2005:

Many recurrent names,

little taxonomic congruence.

Page 112: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Primates sec. 1993 & 2005Order to Subfamily-level

Not much is grey.

Page 113: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Strepsirrhinisec. 2005

Haplorrhinisec. 2005

Catarrhinisec. 2005

Page 114: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Use case 2: Perelleschus sec. 2001 & 2006 1

Perelleschus salpinflexus sec. Franz & Cardona-Duque (2013)DOI:10.1080/14772000.2013.806371

1 Input articulations: Franz & Cardona-Duque. 2013. Description of two new species and phylogenetic reassessment of Perelleschus Wibmer & O'Brien, 1986 (Coleoptera: Curculionidae), with a complete taxonomic concept history of Perelleschus sec. Franz & Cardona-Duque, 2013. 2013. Systematics and Biodiversity 11: 209–236. Merge analyses: Franz et al. 2014. Reasoning over taxonomic change: exploring alignments for the Perelleschus use case. PLoS ONE. (in press)

Page 115: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Goal: align two phylogenies with differential taxon sampling

T1: Perelleschus sec. 2001• Phylogenetic revision• 8 ingroup species concepts• 2 outgroup concepts• 18 concepts total

Page 116: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Goal: align two phylogenies with differential taxon sampling

T1: Perelleschus sec. 2001• Phylogenetic revision• 8 ingroup species concepts• 2 outgroup concepts• 18 concepts total

T2: Perelleschus sec. 2006• Exemplar analysis• 2 ingroup species concepts• 1 outgroup concept• 7 concepts total

Page 117: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Logic representation challenge:

Perelleschus sec. 2001 & 2006 concepts

have incongruent sets of subordinate members,

yet each concept has congruent synapomorphies.

Page 118: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Ostensive alignment: the congruence among higher-level concepts is assessed in relation to their entailed members.

Ostension: giving meaning through an act of pointing out.

Definitional preliminaries 1

1 See Bird & Tobin. 2012. Natural Kinds. URL: http://plato.stanford.edu/archives/win2012/entries/natural-kinds/

Page 119: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Ostensive alignment: the congruence among higher-level concepts is assessed in relation to their entailed members.

Ostension: giving meaning through an act of pointing out.

Intensional alignment: the congruence among higher-level concepts is assessed in relation to their properties.

Intension: giving meaning through the specification of properties.

Definitional preliminaries 1

1 See Bird & Tobin. 2012. Natural Kinds. URL: http://plato.stanford.edu/archives/win2012/entries/natural-kinds/

Page 120: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Ostensive alignment – members are all that counts

Challenge 1: Differential outgroup sampling (2 / 1 concepts)

T2: 2006.PHY & 2006.PHYsubcin T1: 2006.PHY only

Input constraints

Ostensive alignment2001 & 2006

Page 121: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Ostensive alignment – members are all that counts

Challenge 1: Differential outgroup sampling (2 / 1 concepts)

T2: 2006.PHY & 2006.PHYsubcin T1: 2006.PHY only

Solution: Locally relax coverage with "nc" = "no coverage"

Input constraints

Ostensive alignment2001 & 2006

Page 122: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Ostensive alignment – members are all that counts

Challenge 1: Differential outgroup sampling (2 / 1 concepts)

T2: 2006.PHY & 2006.PHYsubcin T1: 2006.PHY only

Solution: Locally relax coverage with "nc" = "no coverage"

Result: 2006.PHY == 2001.PHY

Outgroups are held congruent.

Input constraints

Ostensive alignment2001 & 2006

Page 123: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Ostensive alignment – members are all that counts

Challenge 2: Ostensive alignmentInput constraints

Ostensive alignment2001 & 2006

Page 124: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Ostensive alignment – members are all that counts

Challenge 2: Ostensive alignment

Solution: 11 ingroup concept articulations are coded ostensively – either as <, ><, or | – to represent non- congruence in the representation of child concepts

Input constraints

Ostensive alignment2001 & 2006

Page 125: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Ostensive alignment – members are all that counts

Challenge 2: Ostensive alignment

Solution: 11 ingroup concept articulations are coded ostensively – either as <, ><, or | – to represent non- congruence in the representation of child concepts

Result: 2006.PER < 2001.PER 2006.PER | 2001.[5 species concepts] etc.

Input constraints

Ostensive alignment2001 & 2006

<

5 x |2 x ><

Page 126: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Intensional alignment – representation of congruent synapomorphies

Input constraints

Intensional alignment2001 & 2006

Challenge 3: Intensional alignment

Page 127: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Intensional alignment – representation of congruent synapomorphies

Input constraints

Intensional alignment2001 & 2006

Challenge 3: Intensional alignment

Solution: An Implied Child (_IC) concept is added to the undersampled (2006) clade concept; and the (5) "missing" species-level concepts are included within this Implied Child

Page 128: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Intensional alignment – representation of congruent synapomorphies

Input constraints

Intensional alignment2001 & 2006

Challenge 3: Intensional alignment

Solution: An Implied Child (_IC) concept is added to the undersampled (2006) clade concept; and the (5) "missing" species-level concepts are included within this Implied Child

11 ingroup concept articulations are coded intensionally – as == or > – to reflect congruent synapomorphies of 2001 & 2006

Page 129: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Intensional alignment – representation of congruent synapomorphies

Input constraints

Intensional alignment2001 & 2006

Challenge 3: Intensional alignment

Result: The genus- and ingroup clade-level concepts are inferred as congruent:

2006. PER == 2001.PER 2006.PcarPeve == 2001.PcarPsul etc.

==

Page 130: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Review – representing ostensive versus intensional alignments

Ostensive alignment2001.PER includes morespecies-level conceptsthan 2006.PER [>].

Page 131: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

Review – representing ostensive versus intensional alignments

Ostensive alignment2001.PER includes morespecies-level conceptsthan 2006.PER [>].

Intensional alignment2006.PER reconfirms the synapomorphies inferred in 2001.PER [==].

Page 132: Franz. 2014. Explaining taxonomy's legacy to computers – how and why?

The other piece in the puzzle: Concept-to-voucher identifications

Source: Baskauf & Webb. 214. Darwin-SW. URL: http://www.semantic-web-journal.net/system/files/swj635.pdf