Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
-
Upload
taxonbytes -
Category
Science
-
view
91 -
download
0
Transcript of Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Non-unitary synthesesof systematic knowledge
Please
@taxonbytes
Nico Franz
School of Life Sciences, Arizona State University
CIRSS Seminar – Center for Informatics Research in Science and Scholarship
February 17, 2017 – iSchool, University of Illinois Urbana-Champaign
@ http://www.slideshare.net/taxonbytes/franz-2017-uiuc-cirss-non-unitary-syntheses-of-systematic-knowledge
• Why are phylogenies and classifications (so) unstable?
• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?
• Introducing the Euler/X alignment tool
• The primate use case (classifications)
• The avian use case (phylogenies)
• Biodiversity data aggregation
• Implications of achieving synthesis (CSCW..)
Overview
• Why are phylogenies and classifications (so) unstable?
• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?
• Introducing the Euler/X alignment tool
• The primate use case (classifications)
• The avian use case (phylogenies)
• Biodiversity data aggregation
• Implications of achieving synthesis (CSCW..)
Overview
doi:10.1038/nature.2016.20567
"Here, we use new genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included."
doi:10.1038/nmicrobiol.2016.48
The pluralistic domain of human taxonomy making
Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387
"100 yearsof primate
taxonomies"
The pluralistic domain of human taxonomy making• Taxonomies are endorsed by us (humans); more or less democratically.
• They consist of sets of labels, data, and theories about the natural world.
Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387
"100 yearsof primate
taxonomies"
The pluralistic domain of human taxonomy making• Taxonomies are endorsed by us (humans); more or less democratically.
• They consist of sets of labels, data, and theories about the natural world.
• Over time, these theories change – converge or conflict (often in parallel).
Source: Rylands & Mittermeyer. 2014. Primate taxonomy: species and conservation. doi:10.1002/evan.21387
"100 yearsof primate
taxonomies"
A model to separate the human-made versus natural domains• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa –
which 'took' millions of years to realize – tend to not change much.
Domain of human taxonomy making("mimic")
• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa – which 'took' millions of years to realize – tend to not change much.
Natural domain ("model")
A model to separate the human-made versus natural domains
Domain of human taxonomy making("mimic")
• While human taxonomy making unfolds (e.g. 1758 onwards), natural taxa – which 'took' millions of years to realize – tend to not change much.
• At any time, our labels and theories (concepts) aim to stand for taxa; yet the correspondence may be approximate.
Reliable?
Reliable?
Reliable?
A model to separate the human-made versus natural domains
Natural domain ("model")
Domain of human taxonomy making("mimic")
Remsen: Using names, we're lucky when revisions are infrequent
"In biology, there are many taxa that are so under-studied thatthey are only known from their original description and
none or very few subsequent references […].
The name alone, so long as it is a unique name,is sufficient to locate all related material."
– David Remsen 2016: 213
Source: Remsen. 2016. The use and limits of scientific names in biological informatics. doi:10.3897/zookeys.550.9546
• Why are phylogenies and classifications (so) unstable?
• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?
• Introducing the Euler/X alignment tool
• The primate use case (classifications)
• The avian use case (phylogenies)
• Biodiversity data aggregation
• Implications of achieving synthesis (CSCW..)
Overview
http://taxonbytes.org/wp-content/uploads/2014/10/Peet-BIGCB-2014-Changing-Perspectives-on-Plant-Distributions.pdf
The challenge: names refer to non-type specimens contingently
Source: Dubois. 2005. Zoosystema 27: 365-426. http://sciencepress.mnhn.fr/sites/default/files/articles/pdf/z2005n2a8.pdf
Names
Non-types
doi:10.1017/S1477200003001063
4: Amauris (Amaura) (damocles) hyalites makuyuensis Carcasson (1964) sec. Vane-Wright (2003)genus superspecies subspecies subgenus semispecies
Oscillating meanings of the species epithet hyalites – 1911 to 2003
Phenotypic diversityTy
pe-a
ncho
red
nam
e id
entit
y re
latio
ns
Narrowest holotype "region"
Connecting to the occurrence level.
Holdings, May 2015• 27 herbarium collections• 607,300 occurrences• 17,300 species-level units
sernecportal.org
Introducing the SERNEC portal (sustained by Symbiota)
Andropogon glomeratus- Bushy bluestem
Photo by Max Licher (ASU Herbarium); Cottonwood, Arizona.http://swbiodiversity.org/seinet/imagelib/imgdetails.php?imgid=431755
Ok. SERNEC search!
Search for "Andropogon glomeratus" returns 255 occurrences1
Source herbaria: 9Year collected: 1885-2013Year identified: 1973-2010Identifier named: 161 occ.
1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.
Isn't that one similar to virginicus?
Source herbaria: 13Year collected: 1873-2013Year identified: 1973-2015Identifier named: 200 occ.
Search for "Andropogon virginicus" returns 442 occurrences1
1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.
What about the nominal subspecies?
Source herbaria: 6Year collected: 1920-2013Year identified: 2003Identifier named: 66 occ.
Search for "A. virginicus var. virginicus" returns 101 occurrences1
1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.
I believe some Floras recognize capillipes.
Source herbaria: 5Year collected: 1940-2006Year identified: 1986Identifier named: 1 occ.
Search for "Andropogon capillipes" returns 72 occurrences1
1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.
Show four-in-one occurrence-based maps.
Combined four-in-one search returns 769 occurrences1
Source herbaria: 13Year collected: 1873-2013Year identified: 1973-2015Identifier named: 407
1 SERNEC portal, May 15, 2015; with synonyms, raw taxonomy.
Ready to do science?
Maybe. There are some issues.
Taxonomic concept alignment, Andropogon glomeratus-virginicus complex, spanning across 11 classifications authored 1889-2015
• 36 unique taxonomic names
• 88 taxonomic concept labels name sec. author strings
• Alignment by A.S. Weakley row position = congruence
• 1/36 names with unique 1 : 1 name : meaning cardinality across all classifications
• Andropogon virginicus
• Source: Franz et al. 20161
1 Franz et al. 2016. Names are not good enough: reasoning over taxonomic change in the Andropogon complex. Semantic Web Journal (IOS). doi:10.3233/SW-160220
Also: This is how we built this.(provenance tracking)
"When I first came here, this was all swamp. Everyone said I was daft to build a castle on a swamp, but I built it all the same, just to show them."
"It sank into the swamp."
"So I built a second one."
"That sank into the swamp."
"So I built a third."
"That burned down, fell over, then sank into the swamp."
"But the fourth one stayed up. And that's what you're going to get, Lad, the strongest castle in all of England."
• Why are phylogenies and classifications (so) unstable?
• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?
• Introducing the Euler/X alignment tool
• The primate use case (classifications)
• The avian use case (phylogenies)
• Biodiversity data aggregation
• Implications of achieving synthesis (CSCW..)
Overview
Why Euler/X?
Concepts: tracking progress and conflict in the human domain• Taxonomic names and nomenclatural relationships are only so-so in terms of
tracking congruent and incongruent taxonomic perspectives.
• Taxonomic names and nomenclatural relationships are only so-so in terms of tracking congruent and incongruent taxonomic perspectives.
• Logic-based multi-taxonomic alignments require better contextualization of labels and relationships, and better specification of "taxonomic sameness".
1912 vs. 1967Logically
reconcilable?
Δ = ?Δ
Δ
Δ
Concepts: tracking progress and conflict in the human domain
Querying systematic advancement – premises & questions
• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).
• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:
Querying systematic advancement – premises & questions
• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:
1. "Does this sequence of related systematic inferenceshave a stabilizing or destabilizing trend?"
Querying systematic advancement – premises & questions
• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:
1. "Does this sequence of related systematic inferenceshave a stabilizing or destabilizing trend?"
2. "Are two or more tree hierarchies – each differentially sub-sampledat lower levels – in congruence or in conflict?"
Querying systematic advancement – premises & questions
• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:
1. "Does this sequence of related systematic inferenceshave a stabilizing or destabilizing trend?"
2. "Are two or more tree hierarchies – each differentially sub-sampledat lower levels – in congruence or in conflict?"
3. "How can an applied comparative study tied to one (earlier) hierarchybe "updated" (integrated) with another (later) hierarchy?"
Querying systematic advancement – premises & questions
• The term "tree of life" characterizes a goal that we strive to reach eventually, more so than where we are now (for many perceived groups).
• Therefore it would be useful to have a "systematic knowledge advancement service". The service satisfies queries such as:
1. "Does this sequence of related systematic inferenceshave a stabilizing or destabilizing trend?"
2. "Are two or more tree hierarchies – each differentially sub-sampledat lower levels – in congruence or in conflict?"
3. "How can an applied comparative study tied to one (earlier) hierarchybe "updated" (integrated) with another (later) hierarchy?"
Service We can prioritize research agendas accordingly.
Service Sampling an issue? Or are signals complementary?
Service Effects of "systematic variable" on conclusions can be controlled for.
Querying systematic advancement – premises & questions
An update on Euler/X:
Logic, use cases, and novel services
Euler/X – logically consistent RCC–5 alignments
• Input: multiple taxonomies and/or phylogenies; expert-provided articulations.
• Output: logic consistency checking; Maximally Informative Relations (MIR); alignment visualizations.
Products – concept taxonomy in theory and in practice ZooKeys. doi:10.3897/zookeys.528.6001
Semantic Web. doi:10.3233/SW-160220
Biological Theory. doi:10.1007/s13752-017-0259-5
PloS ONE. doi:10.1371/journal.pone.0118247
Systematics Biodiv. doi:10.1080/14772000.2013.806371
Systematic Biology. doi:10.1093/sysbio/syw023
Biodiversity Data Journal. doi:10.3897/BDJ.5.e10469 Research Ideas and Outcomes. doi: 10.3897/rio.2.e10610
Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf
Region Connection Calculus (set constraints)
== < > >< !• Two regions N, M are either:
• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)
Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf
Region Connection Calculus (set constraints)
== < > >< !• Two regions N, M are either:
• congruent (N == M)• properly inclusive (N < M)• inversely properly inclusive (N > M)• overlapping (N >< M)• exclusive of each other (N ! M)
• RCC–5 articulations answer the query: "can we join regions N and M?"
• Taxonomies have multiple RCC–5 alignable components: nodes (parents, children), node-associated traits, even node-anchoring specimens.
Use cases – primate classifications & avian phylogenies
1. Primate classifications sec. MSW2 (1993) versus MSW3 (2005)
a. Microcebus + Mirza sec. MSW3 (2005) with coverage constraint
b. Quantifying name (identifier) reliability
c. Reasoning achieves scalability (matrix)
2. Avian phylogenies sec. Prum et al. (2015) versus Jarvis et al. (2014)
a. Psittaciformes with & without coverage
b. Alignment of the "Neoavian explosion"
Use case 1:
Two primate classifications –
MSW2 (1993) versus MSW3 (2005)
Starts with a live demo.
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
"Taxonomic concept labels"identify input concept regions
RCC–5 articulations providedfor each species-level concept
• Input visualization: MSW3 (2005) versus MSW2 (1993)
Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023
• Alignment visualization: "grey means taxonomically congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
One name &congruent region
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
One name &congruent region
Many names &congruent region
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
One name &congruent region
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
• Application of coverage constraint: parent-to-parent articulations (><) are fully defined by alignment signal propagated from their respective children.
Sensible when complete sampling of children is intended.
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
Use case 1.b.: Quantifying name (identifier) reliability
One name &congruent region
• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
One name &congruent region
• Alignment visualization: RCC–5 as an identifier assessment tool [good / not]
Many names &congruent region
One name &non-congruent regions
Many names &non-congruent regions
New names &exclusive regions
• Query services rendered: (1) MSW3 destabilizes MSW2; (2) non-congruence is not only caused by differential low-level sampling; (3) alignment constitutes a taxonomic meaning integration map to navigate across MSW3 & MSW2.
Use case 1.b.: Quantifying name (identifier) reliability
1 in 3 names is unreliable across MSW2/MSW3 classifications
Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023
Use case 1.c.: Reasoning achieves scalability (MIR matrix)
Source: Dang et al. 2015. ProvenanceMatrix: a visualization tool for multi-taxonomy alignments. CEUR Workshop Proceedings 1456: 13–24. http://ceur-ws.org/Vol-1456/paper2.pdf
• Input: 402 articulations. Output: 153,111 Maximally Informative Relations
Salmon cells↔ reasoning
Use case 2:
Avian phylogenies sec. Prum et al. (2015)
versus Jarvis et al. (2014)
Source: Thomas, G.H. 2015. An avian explosion. Nature 526: 516–517. doi:10.1038/nature15638
2015 2014
Phylogenetic inferencescan vary over time.
Use case 2: Aves sec. Prum et al. (2015) versus Jarvis et al. (2014)
• Sampling is highly differential: 198 versus 48 species-level entities• Only 12 species-level concept pairs are congruent [green cells]
Use case 2.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with global coverage constraint
Input visualization Only disjoint articulations
• Psittaciformes sec. 2015 – with global coverage constraint• No low-level congruence ↔ no congruent alignment regions
Input visualization Only disjoint articulations
Alignment visualization 108 MIR; all disjoint
Use case 2.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed
Input visualization
Use case 2.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed• "No coverage" constraint for 2014/2015.[Psittacidae, Nestor]
Input visualization
Use case 2.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed• "No coverage" constraint for 2014/2015.[Psittacidae, Nestor]
• Allows for 3 congruent & 7 inclusive RCC–5 articulations
Input visualization
Use case 2.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence
• 160 MIR: 10 congruent; 65 (inversely) properly inclusive
Alignment visualization
Use case 2.a.: Psittaciformes with & without coverage constraint
• Psittaciformes sec. 2015 – with coverage locally relaxed• Higher-level congruence despite low-level non-congruence
• 160 MIR: 10 congruent; 65 (inversely) properly inclusive
Alignment visualization
Additional 2015 low-level sampling
Use case 2.a.: Psittaciformes with & without coverage constraint
Use case 2.b.: Alignment of the "Neoavian explosion"
• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed
• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed
Non-congruence within2015.Paleognathae Non-congruence within
2014.Pelecanimorphae
Use case 2.b.: Alignment of the "Neoavian explosion"
• Aves sec. 2015/2014, down to ordinal level – with coverage locally relaxed
Non-congruence within2015/2014.Neoaves
(see next slide)
Use case 2.b.: Precise semiotics for the "avian explosion"
• Neoaves sec. 2015/2014, and 3–4 less inclusive levels
26 overlapping articulations in the sub- Neoavian alignment region cannot be assigned to differential sampling 'Genuine' phylogenetic conflict
Use case 2.b.: Precise semiotics for the "avian explosion"
• Why are phylogenies and classifications (so) unstable?
• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?
• Introducing the Euler/X alignment tool
• The primate use case (classifications)
• The avian use case (phylogenies)
• Biodiversity data aggregation
• Implications of achieving synthesis (CSCW..)
Overview
Largely derived from doi:10.3897/rio.2.e10610
91dd0ee1-8a37-4efc-85b7-8176874cf5be
Thesis: Unitary hierarchies create mistrust in aggregated data
91dd0ee1-8a37-4efc-85b7-8176874cf5be
• Many aggregators are designed to impose a single taxonomic hierarchy –
one at a time – onto all taxonomically annotated records.
91dd0ee1-8a37-4efc-85b7-8176874cf5be
• Many aggregators are designed to impose a single taxonomic hierarchy –
one at a time – onto all taxonomically annotated records.
• By design, these "backbones" are rarely attributable to individual (expert)
authors, but instead are newly created systematic theories that only appear
at the system level.
Thesis: Unitary hierarchies create mistrust in aggregated data
91dd0ee1-8a37-4efc-85b7-8176874cf5be
• Many aggregators are designed to impose a single taxonomic hierarchy –
one at a time – onto all taxonomically annotated records.
• By design, these "backbones" are rarely attributable to individual (expert)
authors, but instead are newly created systematic theories that only appear
at the system level.
• Data are aggregated accordingly; yet backbone-driven modifications may
newly disrupt the original integrity of submitted data packages.
Thesis: Unitary hierarchies create mistrust in aggregated data
91dd0ee1-8a37-4efc-85b7-8176874cf5be
• Many aggregators are designed to impose a single taxonomic hierarchy –
one at a time – onto all taxonomically annotated records.
• By design, these "backbones" are rarely attributable to individual (expert)
authors, but instead are newly created systematic theories that only appear
at the system level.
• Data are aggregated accordingly; yet backbone-driven modifications may
newly disrupt the original integrity of submitted data packages.
• By deflecting on responsibilities, aggregators may cause additional self-harm.
Ultimately, the power balance – as presently built in – must shift to bring
experts back into the process of licensing succinct, trustworthy data
packages.
Thesis: Unitary hierarchies create mistrust in aggregated data
Let's re-diagnose:
What happens in dynamic,
open systems?
Charly Lewisw, CC BY-SA 3.0
Taxonomic views of a frequently revised organismal lineage
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids, "pogonias")
Snapshot of a more frequently revised organismal lineage
• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids, "pogonias")
• Vertical sections identify taxonomic concept regions
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
Snapshot of a more frequently revised organismal lineage
• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids, "pogonias")
• Vertical sections identify taxonomic concept regions
• Colors identify lineages of taxonomic names (epithets) in use
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
Snapshot of a more frequently revised organismal lineage
• 9 schemata for the NA Cleistes/Cleistesiopsis complex (orchids)
• Vertical sections identify taxonomic concept regions
• Colors identify lineages of taxonomic names (epithets) in use
• There is no consensus! Five incongruent schemata are used concurrently
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
Further diagnosis:
If incongruent taxonomies are endorsed– locally, provisionally, and democratically –
then what is the impact foraggregated biodiversity data?
Further diagnosis:
Taxonomy becomes a variable that we need to represent,
and thereby control for (at the system level)
The 'consensus'
• Query: "Where do these orchid species occur?"
• Same set of 250 orchid specimens, according to 4 taxonomies.
"Contr
olling
the t
axonom
ic var
iable" Example: the Cleistes use case
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
The 'consensus' The 'bible'
"Contr
olling
the t
axonom
ic var
iable"
• Query: "Where do these orchid species occur?"
• Same set of 250 orchid specimens, according to 4 taxonomies.
Example: the Cleistes use case
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
The 'consensus' The 'bible'
The (formerly) federal 'standard'"C
ontr
olling
the t
axonom
ic var
iable"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
The 'consensus' The 'bible'
The (formerly) federal 'standard'
The 'best', latest regional flora"C
ontr
olling
the t
axonom
ic var
iable"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
The 'consensus' The 'bible'
The (formerly) federal 'standard'
The 'best', latest regional flora"C
ontr
olling
the t
axonom
ic var
iable"
Expert views are in conflict
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
The 'consensus' The 'bible'
The (formerly) federal 'standard'
The 'best', latest regional flora"C
ontr
olling
the t
axonom
ic var
iable"
Expert views are in conflict
"Just bad"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
The 'consensus' The 'bible'
The (formerly) federal 'standard'
The 'best', latest regional flora
Impact:Name-based aggregation has created
a novel synthesis that nobody believes in
"Contr
olling
the t
axonom
ic var
iable"
"Just bad"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
The 'consensus' The 'bible'
The (formerly) federal 'standard'
The 'best', latest regional flora"C
ontr
olling
the t
axonom
ic var
iable"
"Just bad"
Expert views are in conflict
Solution:Instead of aggregating
an artificial 'consensus', …
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
The 'consensus' The 'bible'
The (formerly) federal 'standard'
The 'best', latest regional flora"C
ontr
olling
the t
axonom
ic var
iable"
"Just bad"
Expert views are reconciled
Solution:Instead of aggregating
an artificial 'consensus',build translation services
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
Challenges:
How can we redesign aggregation to yieldhigh-quality biodiversity data packages?
Challenges:
How can we redesign aggregation to yieldhigh-quality biodiversity data packages?
What does this mean for Darwin Core1
and how we use this aggregation standard?
1 Wieczorek et al. 2012. Darwin Core: an evolving […]. PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715
Preview of solution with eight steps
• DwC is insufficient, and part of the problem
# 5: Identify occurrence records only to TCLs
Records: EKY39235 MTSU003611 NCSC00040204 …
Records: BOON8098 CLEMS0061133 WILLI39399 …
Records: GMUF-0039355 IBE006808 USCH58399 …
Records: CONV0006268 MDKY00006482 NCU00038930 …
Records: BRYV0023582, BRYV0023584 KHD00032030, MISS0016604 MMNS000227, NCSC00040206 USMS_000002923, USMS_000002924 VSC0053223, VSC0065528 …
Records: ARIZ393087 DBG39049 USCH51217 …
Records: NCU00040710 USCH96248 VSC0053218 …
Records: CLEMS0012881 FUGR0003293 GA023130 …
Records: BOON8100 NCSC00040210 SJNM45487 …
Records: GA023144 LSU00012494 MISS0016608 …
Records: IBE006810, IND-0012374, MMNS000227
Records: NY8654
• Syntax (ID): Occurrence / organism is identified to TCL
"CLEMS0012881"is identified to
Cleistes divaricata sec. Smith et al. 2004
[additional ID metadata]
# 6: Generate comprehensive, consistent RCC–5 alignments
• Euler/X is a toolkit that infers logically consistent RCC–5 alignments
# 6: Generate comprehensive, consistent RCC–5 alignments
• Valued-added: MIR – set of Maximally Informative Relations containing
the RCC–5 articulation for every possible TCL pair scalability
Reasoner inference
# 7: Joining occurrence-to-TCL identifications & RCC–5 alignments
Records: BOON8098, CLEMS0061133, CONV0006268, EKY39235 GMUF-0039355, IBE006808, IBE006810, IND-0012374 MDKY00006482, MMNS000227, MTSU003611, NCSC00040204 NCU00038930, NY8654, USCH58399, WILLI39399 …
Records: ARIZ393087, BRYV0023582, BRYV0023584, DBG39049 KHD00032030, MISS0016604, MMNS00022, NCSC00040206 USMS_000002923, USMS_000002924, VSC0053223, VSC0065528 …
Records: BOON8100, CLEMS0012881, FUGR0003293 GA023130, GA023144, LSU00012494 MISS0016608, NCSC00040210, NCU00040710 SJNM45487, USCH96248, VSC0053218 …
• Specimen integration is fully driven by TCL-to-TCL RCC–5 signals
The 'consensus' The 'bible'
The (formerly) federal 'standard'
The 'best', latest regional flora"C
ontr
olling
the t
axonom
ic var
iable"
Impact:"Please select your preference (A – D);
we can perform all translations"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
• We can now respond to queries such as:
• "Show all specimens identified to the taxonomic name Cleistes divaricata"
• Returns many records resolves incongruent lineage of name usages
# 8: "Do you trust us now?" Aggregation as a translational service
• We can now respond to queries such as:
• "Show all specimens identified to the taxonomic name Cleistes divaricata"
• Returns many records resolves incongruent lineage of name usages
• "Now show specimens with the TCL Cleistesiopsis divaricata sec. Weakley 2015"
• Returns record subset resolving only one narrowly circumscribed concept
# 8: "Do you trust us now?" Aggregation as a translational service
# 8: "Do you trust us now?" Aggregation as a translational service
• We can now respond to queries such as:
• "Show all specimens identified to the taxonomic name Cleistes divaricata"
• Returns many records resolves incongruent lineage of name usages
• "Now show specimens with the TCL Cleistesiopsis divaricata sec. Weakley 2015"
• Returns record subset resolving only one narrowly circumscribed concept
• "Now show specimens identified to the TCL Cleistes divaricata sec. RAB 1968,
yet translated into the more granular TCLs sec. Weakley 2015"
• Returns (again) many records, yet represents and contrasts two treatments,
as opposed to providing the ambiguous lineage view (above)
• "Show all specimens with ambiguous 2010/2015 TCL identifications…" (etc.)
• Why are phylogenies and classifications (so) unstable?
• How (well) can taxonomic names and relationships – the "Linnaean system" – manage the taxonomic similarities and differences across versions?
• Introducing the Euler/X alignment tool
• The primate use case (classifications)
• The avian use case (phylogenies)
• Biodiversity data aggregation
• Implications of achieving synthesis (CSCW..)
Overview
"As the ongoing efforts to integrate prokaryote phylogenyinto universal phylogeny demonstrate, integration does not alwaysmean greater inclusiveness of data, methods or explanation […].
Integration may involve considerable exclusivenessto achieve the desired integrative aim.
– Maureen O'Malley 2013: 559
Source: O'Malley. 2013. When integration fails. Stud. Hist. Philos. Biol. Biomed. Sci. doi:10.1016/j.shpsc.2012.10.003
Rethinking systematic synthesis as an agreed-upon conflict alignment
Acknowledgements & links to products and references
• CIRSS hosts: Bertram L. & Janet Eke!
• Euler/X & ETC teams (extended): Shawn Bowers, Mingmin Chen, Hong Cui, Parisa Kianmajd, James Macklin, Timothy McPhillips, Robert Morris, Thomas Rodenhausen, and Shizhuo Yu.
• ProvenanceMatrix: Tuan Nhon Dang.
• NSF DEB–1155984, DBI–1342595 (PI Franz).
• NSF IIS–118088, DBI–1147273 (PI Ludäscher).
• Information @ http://taxonbytes.org/tag/concept-taxonomy/
• Euler/X code @ https://github.com/EulerProject/EulerX
Interested in exploringmulti-taxonomy & -
phylogeny alignments?Please contact us.
[email protected]@taxonbytes
https://biokic.asu.edu/