Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
-
Upload
taxonbytes -
Category
Education
-
view
1.285 -
download
1
description
Transcript of Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
![Page 1: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/1.jpg)
Reconciling succeeding
taxonomic classifications
Nico M. Franz
School of Life Sciences, Arizona State University
Mingmin Chen, Shizhuo Yu, Bertram Ludäscher *
Department of Computer Science, University of California at Davis
ESA Annual Meeting 2012
November 14, 2012 – Knoxville, TN
* PI – NSF-IIS 1118088: A logic-based, provenance-aware system for merging scientific data under context and classification constraints.
![Page 2: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/2.jpg)
Challenge – describing classification provenance beyond synonymy
Source: Weakley. 2005. Flora of the Carolinas, Virginia, and Georgia. Available at http://www.herbarium.unc.edu/flora.htm
Andropogon spp. in the Carolinas, from Hackel 1889 to Weakley 2005
![Page 3: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/3.jpg)
Challenge – describing classification provenance beyond synonymy
Source: Weakley. 2005. Flora of the Carolinas, Virginia, and Georgia. Available at http://www.herbarium.unc.edu/flora.htm
Andropogon spp. in the Carolinas, from Hackel 1889 to Weakley 2005
Individual columns represent past classifications of Andropogon.
![Page 4: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/4.jpg)
Challenge – describing classification provenance beyond synonymy
Andropogon spp. in the Carolinas, from Hackel 1889 to Weakley 2005
Individual rows represent equivalent taxonomic entities, (almost)regardless of their name labels.
![Page 5: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/5.jpg)
Challenge – describing classification provenance beyond synonymy
Andropogon spp. in the Carolinas, from Hackel 1889 to Weakley 2005
Individual rows represent equivalent taxonomic entities, (almost)regardless of their name labels.Name/synonymy relationships are not sufficiently granular tocapture this evolution of taxonomic views of Andropogon species.
![Page 6: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/6.jpg)
Tracking classification provenance with concepts and articulations
Definition: A taxonomic concept is the underlying meaning of a scientific name as stated
by a particular author and publication. It represents the author's full-blown
view of how the name reaches out to un-/observed objects in nature.
Labeling: The abbreviation sec. for the Latin secundum, or "according to", is preceded by
the full Linnaean name and followed by the specific author and publication.
Source: Berendsohn. 1995. The concept of "potential taxa" in databases. Taxon 44: 207–212.
![Page 7: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/7.jpg)
Tracking classification provenance with concepts and articulations
Definition: A taxonomic concept is the underlying meaning of a scientific name as stated
by a particular author and publication. It represents the author's full-blown
view of how the name reaches out to un-/observed objects in nature.
Labeling: The abbreviation sec. for the Latin secundum, or "according to", is preceded by
the full Linnaean name and followed by the specific author and publication.
Examples: Andropogon virginicus L. sec. Radford et al. (1968) [earlier, wider concept]
Andropogon virginicus L. sec. Weakley (2005) [later, narrower concept]
Utility: Representing multiple classifications (revisions) through concepts makes it possible
to track their similarities and differences through articulations.
Source: Berendsohn. 1995. The concept of "potential taxa" in databases. Taxon 44: 207–212.
![Page 8: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/8.jpg)
Five basic articulations between two concepts C1, C2 (set theory)
equivalence proper inclusion
overlapinverse proper
inclusion
exclusion
Source: Franz & Peet. 2009. Towards a language for mapping relationships among taxonomic concepts. Syst. Biodiv. 7: 5–20.
Use of "OR" to express uncertainty.Example: C1 == OR > C2
![Page 9: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/9.jpg)
How does it work? Connecting Hackel 1889 and Small 1933
Hackel 1889 (1-12)
Small 1933 (13-16)
Step 1: Transcribe two concept hierarchies… …and add unique IDs
![Page 10: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/10.jpg)
Hackel 1889 (1-12)
Step 2: Create a table with all concept labels
Small 1933 (13-16)
How does it work? Connecting Hackel 1889 and Small 1933
![Page 11: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/11.jpg)
Hackel 1889 (1-12)
Step 3: Create a table with corresponding parent/child relationships ('is_a')
Small 1933 (13-16)
How does it work? Connecting Hackel 1889 and Small 1933
![Page 12: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/12.jpg)
Hackel 1889 (1-12)
Step 4: Create a table with a suitable set of articulations
Small 1933 (13-16)
How does it work? Connecting Hackel 1889 and Small 1933
![Page 13: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/13.jpg)
Hackel 1889 (1-12)
Step 4: Create a table with a suitable set of articulations
Small 1933 (13-16)
How does it work? Connecting Hackel 1889 and Small 1933
Translation
Congruence
![Page 14: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/14.jpg)
Co
nce
pt
hie
rarc
hie
s
Articulations
![Page 15: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/15.jpg)
Technical challenges to creating articulations
Input of concept hierarchies
Lack of a server-based platform (e.g. Global Names Architecture)
Lack of user-friendly classification input / visualization tools
![Page 16: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/16.jpg)
Input of concept hierarchies
Lack of a server-based platform (e.g. Global Names Architecture)
Lack of user-friendly classification input / visualization tools
Input of articulations (goal: achieve a complete and consistent mapping)
Taxonomic experts will not input ∞ articulations
Taxonomic experts will miss relevant articulations ("mir")
Taxonomic experts could be uncertain of articulations ("possible worlds")
Taxonomic experts could posit logically inconsistent articulations
Technical challenges to creating articulations
![Page 17: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/17.jpg)
Input of concept hierarchies
Lack of a server-based platform (e.g. Global Names Architecture)
Lack of user-friendly classification input / visualization tools
Input of articulations (goal: achieve a complete and consistent mapping)
Taxonomic experts will not input ∞ articulations
Taxonomic experts will miss relevant articulations ("mir")
Taxonomic experts could be uncertain of articulations ("possible worlds")
Taxonomic experts could posit logically inconsistent articulations
"CleanTax" is being developed to explore solutions to these challenges. 1
Technical challenges to creating articulations
1 There is continuation/overlap with the "Exploring Taxonomic Concepts" project that focuses on character matching (DBI-1147266).
![Page 18: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/18.jpg)
CleanTax – technical specifications
CleanTax = a set of Python programming scripts stored on bitbucket.org
(initially developed by Dave Thau; now being developed further on many fronts)
CleanTax reads in concept/articulation tables from a PostgreSQL database
CleanTax transforms the input for processing by logic reasoners; including:
Prover9 / Mace4 theorem provers – first-order logic [thorough, yet slow]
OWL / HermiT – description logic , knowledge representation [complex]
DLV System – propositional logic, answer set programming [promising!]
![Page 19: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/19.jpg)
CleanTax = a set of Python programming scripts stored on bitbucket.org
(initially developed by Dave Thau; now being developed further on many fronts)
CleanTax reads in concept/articulation tables from a PostgreSQL database
CleanTax transforms the input for processing by logic reasoners; including:
Prover9 / Mace4 theorem provers – first-order logic [thorough, yet slow]
OWL / HermiT – description logic , knowledge representation [complex]
DLV System – propositional logic, answer set programming [promising!]
CleanTax assesses consistency and completeness of articulations
Output of the set of maximally informative relationships – "mir"
Report , causal explanation, interactive repair of inconsistent articulations
Calculate multiple possible worlds (if ambiguous articulations are present)
CleanTax – technical specifications
![Page 20: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/20.jpg)
CleanTax = a set of Python programming scripts stored on bitbucket.org
(initially developed by Dave Thau; now being developed further on many fronts)
CleanTax reads in concept/articulation tables from a PostgreSQL database
CleanTax transforms the input for processing by logic reasoners; including:
Prover9 / Mace4 theorem provers – first-order logic [thorough, yet slow]
OWL / HermiT – description logic , knowledge representation [complex]
DLV System – propositional logic, answer set programming [promising!]
CleanTax assesses consistency and completeness of articulations
Output of the set of maximally informative relationships – "mir"
Report , causal explanation, interactive repair of inconsistent articulations
Calculate multiple possible worlds (if ambiguous articulations are present)
CleanTax creates multiple user-preferred views of the input and merge taxonomies
Reduced Containment Graph – RCG; and Directed Acyclic Graph – DAG
CleanTax – technical specifications
![Page 21: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/21.jpg)
'Training' CleanTax on abstract examples
Initial expert-madeset of articulationsNew!
![Page 22: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/22.jpg)
'Training' CleanTax on abstract examples
Input Output – raw hmtl list of articulations ("look-up" + inferred)
![Page 23: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/23.jpg)
'Training' CleanTax on abstract examples
Input Output – 72 maximally informative relationships = mir
Based on the mir, all theoretically possible articulations
of the R32 lattice can be logically deduced.
![Page 24: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/24.jpg)
Abstract Example 1 – Reduced Contained Graph of the merge
Blue circles shared concepts
Black circles unique concepts
Black solid arrows expert input
Grey dashed arrows deducible
Red solid arrows newly inferred
Input
![Page 25: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/25.jpg)
More CleanTax training… our infamous Abstract Example 4
Example 4 – representing multiple 'possible worlds'
3/5 articulations are disjoint (OR)
![Page 26: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/26.jpg)
Reduced Containment Graphs of 7 'possible worlds' (combined or's)
Example 4 – CleanTax infers 7 possible worlds (user can view / select / repair / rerun)
Asserted by expert
Implied articulations
Inferred by CleanTax
Shared concepts
Unique concepts
Reduced Containment Graphs (RCGs)
![Page 27: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/27.jpg)
Exploring "views" of the merge - circular Euler diagrams of PW1
Table of mir Corresponding Euler diagram (circular)
Identical
informationcontent
![Page 28: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/28.jpg)
Correspondence of circular and Directed Acyclic Diagrams
PW1: Typical Euler circles Euler-DAG of PW1
Identical
informationcontent
![Page 29: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/29.jpg)
Real life examples
![Page 30: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/30.jpg)
Real-life examples, I – reconciling two weevil classifications 1
Curculionoidea sec. Kuschel 1995 Curculionoidea sec. Marvaldi & Morrone 2000
Concepts 117-157
Concepts 348-372
1 Initial articulations provided by NMF.
![Page 31: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/31.jpg)
Merge taxonomy of Kuschel 1995 / Marvaldi & Morrone 2000
CleanTax RCG – 1 newly inferred articulation ( ) + several inconsistencies
Microcerinae sec. M&M 2000 [363] are included in Brachycerinae sec. KU 1995 [148]
(yes, I missed that; Kuschel 1995 only mentions it in the text, not in the main taxon list)
![Page 32: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/32.jpg)
Real-life examples, II – reconciling two weevil classifications
Curculionoidea sec. Crowson 1981 Curculionoidea sec. Marvaldi & Morrone 2000
Concepts 1-17
Concepts 348-372
![Page 33: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/33.jpg)
CleanTax RCG – 4 newly inferred articulations ( ) / does not depict overlap (><)
e.g. {Aglycyderidae [2], Allocorynidae [3], Oxycorynidae [17]} sec. Crowson 1981
are included in Belidae [353] sec. M&M 2000
Merge taxonomy of Crowson 1981 / Marvaldi & Morrone 2000
![Page 34: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/34.jpg)
Euler-DAG of the Crowson / Marvaldi & Morrone merge taxonomy
Solid lines – proper inclusion
Black solid line given
Green solid line inferred
Orange solid line explanatory
[Red solid line inconsistent]
Dashed lines - overlap
Black dashed line given
Green dashed line inferred
Orange dashed line explanatory
Red dashed line inconsistent
Concept boxes - concepts
Orange square box shared
Black square box unique
Dashed square box combined
Dashed oval box inconsistent
![Page 35: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/35.jpg)
DAGs generate "combined concepts" intersections of overlaps
Belidae
sec. MM2000
Belidae
sec. Cro1981
"Belidae"
INT(Cro/MM)
Shared - [2,3,17,357]
![Page 36: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/36.jpg)
Concept AInput
Output
Concept B
Concept A – Concept B
AAttelabidae CR81
AttCR81 [9]
BAttelabidae MM00
AttMM00 [55]
ABAttelabidae CR81 – Attelabidae MM00
AttCR81.AttMM00
* Simple extension to three or more congruent concepts.
New naming/viewing conventions – simple merges (shared, unique) *
![Page 37: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/37.jpg)
Concept AInput
Euler
Concept B
ABelidae CR81BelCR81 [10]
BBelidae MM00
BelMM00 [353]
AbBelCR81.belMM00
A B
Ab AB aB
aBBelMM00.
belCR81
ABBelCR81.BelMM00
DAG
New naming/viewing conventions – combined merges (overlap; T1, T2)
![Page 38: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/38.jpg)
DAG A B
Abc ABc aBc
C
abCaBCAbC
ABC
EulerAbc
CURCR81.
curKU95.
curMM00
aBcCurKU95.
curCR81.
curMM00
abCCurMM00.
curCR81.
curKU95
AbCCurCR81.
CurMM00.
curKU95
aBCCurKU95.
CurMM00.
curCR81
ABcCurCR81.
CurKU95.
curMM00
ABCCurCR81.
CurKU95.
CurMM00
Concept AInput Concept BA
Curculionidae CR81CurCR81
BCurculionidae KU95
CurKU95
Concept CC
Curculionidae s.s. MM00CurMM00T1, T2, T3
![Page 39: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/39.jpg)
Future directions
![Page 40: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/40.jpg)
Current workflow / "usability" (CleanTax on "Lore" server, UC Davis)
Input script
Output file
Inconsistency Repair, explanation
Possible worlds
VisualizationEuler-DAG
Interactivereduction of PWs
(decision tree)
![Page 41: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/41.jpg)
Shared, real use cases (Perelleschus) with ETC feature-based project
5 taxonomies, 48 concepts, expert articulations, plus textual feature diagnoses
![Page 42: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/42.jpg)
Conclusions and outlook
Improvements to CleanTax will remove many of the technical challenges towards a
full-blown taxon concept approach ( improved tracking of classification provenance).
Other technical challenges are being addressed (server platform, algorithmic
scalability, intensional/ostensive articulations, visualization [Euler, combined
concepts], workflow integration).
Many non-technical challenges remain (in short: transparent/consistent use).
![Page 43: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/43.jpg)
Conclusions and outlook
Improvements to CleanTax will remove many of the technical challenges towards a
full-blown taxon concept approach ( improved tracking of classification provenance).
Other technical challenges are being addressed (server platform, algorithmic
scalability, intensional/ostensive articulations, visualization [Euler, combined
concepts], workflow integration).
Many non-technical challenges remain (in short: transparent/consistent use).
The current approach treats concepts as a 'black box' – the input data are simple and
make no reference to type specimens, synapomorphies, diagnostic features, etc.
"Exploring Taxonomic Concepts" project will develop tools for a balanced view.
Nevertheless, the articulations can expose deep and varied semantic links among
succeeding classifications.
![Page 44: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/44.jpg)
Conclusions and outlook
Improvements to CleanTax will remove many of the technical challenges towards a
full-blown taxon concept approach ( improved tracking of classification provenance).
Other technical challenges are being addressed (server platform, algorithmic
scalability, intensional/ostensive articulations, visualization [Euler, combined
concepts], workflow integration).
Many non-technical challenges remain (in short: transparent/consistent use).
The current approach treats concepts as a 'black box' – the input data are simple and
make no reference to type specimens, synapomorphies, diagnostic features, etc.
"Exploring Taxonomic Concepts" project will develop tools for a balanced view.
Nevertheless, the articulations can expose deep and varied semantic links among
succeeding classifications.
CleanTax may be the first attempt to 'explain' classification provenance to logic
reasoners. This could have considerable implications for future data integration.
![Page 45: Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012](https://reader031.fdocuments.in/reader031/viewer/2022020218/559b17231a28ab94308b4741/html5/thumbnails/45.jpg)
Acknowledgments
Shawn Bowers, Dave Thau, Alan Weakley
NSF-IIS 1118088: "III-SMALL: A logic-based, provenance-aware system for merging scientific data under
context and classification constraints"
"Euler" team, UC Davis