BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains
Transcript of BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains
Donat Agosti
Plazi
[email protected] SSS Day 2015, November 6, NHMB, Bern
BioDiP - a proposed infrastructure to link the taxonomic to the genomic and other domains
Content:What is the issue?Where do we stand?What is planned?What can you do?
Improve the Role of Published Biodiversity Literature in Policy decisions
Published observation records (Scientific literature)
EU BON Policy Brief
The Scientific Challenge: Annotate genes
1 tnntttccca cgaataaata atataagatt ttgattatta cctccttctt taattttatt61 attatcaaga agattagttt ataaaggagt aggaacagga tgaactgttt atcctccttt121 atctaataat ttatatcata atggattttc aactgattta gcaatttttt ctttacatat181 tgcaggaata tcatcaatta taggagcaat taattttatt tcaacaattt taaatataca241 tcataaaaat ttatcattag ataaaattcc attgttagtt tgatcaattt taattacagc301 tattttatta ttattatctt tacctgtatt agcaggtgca attactatat tattaactga361 tcgaaatcta aatacaactt tttttgatcc ttcgggtgga ggagatccaa ttttatatca421 acatttattt
Text
<tax:treatment>
<tax:nomenclature>
<tax:name>
<tax:xid source="HNS" identifier="193329"/>
<tax:xmldata>
<dc:Genus>Mystrium</dc:Genus>
<dc:Species>leonie</dc:Species>
</tax:xmldata>
Mystrium leonie
</tax:name>
<tax:status>n. sp.</tax:status>
Fig 1 D - F
</tax:nomenclature>
<tax:div type="description">
<tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL
1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin
to a sharp apical tooth, the apex parallel to the anterior
(Holotype with material in mandibles, so mandibles and
$ described below from paratypes.) Median clypeus
....
</treatment>
XML: Semantic enhanced text (e.g. TaxonX)
From human to machine readable text
RDF
Countries (Region)Australia (Queensland)
Export species materials citations (DwC)
Visualization of taxonomic literature
Text mining tools: Visualization of treatment content
Summary of content of 37 Zootaxa spider publications and 8 Biodiversity Data Journal. (Miller et al., 2015)
Pseudomyrmex ants and Vachellia ant-acaciasare a classic example of mutualism in biology.
allenii
melanoceras
ruddiae
chiapensis
collinsii
cookii
cornigera
globulifera
hindsii
janzenii
mayana
sphaerocephala
boopis
flavicornis
hesperius
ita
janzenikuenckeli
mixtecus
nigrocinctus
nigropilosus
opaciceps
particeps
peperi
reconditus
satanicus
simulansspinicola
subtilissimus
veneficus
ferrugineus
gentlei
gracilis
Transbiotic link networkAssociated species linked throughreferences in taxonomic treatments
Acacia-ant species: Pseudomyrmex gracili
Treatment: redescription
Associated ant-acacia: Acacia gentlei
Ants Plants
Photocredits: Alex Wild
Treatment
Treatments linked through citations
Text mining tools: Visualization of treatment content
The Scientific Challenge: link a name to its treatment
What is the issue? Build the necessary system
A system that allows to text and data mine the corpus of taxonomic literature.
A system that links taxonomic names to its treatments.
Trait information machine ready
ResolutionReconciliation
TreatmentBank
NAMES
MANAGEMENT
CITATION
MANAGEMENT
REFBANK
TREATMENT
MANAGEMENT
ATOMIZATION &
SEMANTICIZATION
OF CONTENT MARKUP / initial trait extraction
Specialist taxonomic
databases
Build the necessary system
Where do we stand?
Number of citations:Better than most scientific papers
The origin
Where do we stand?
Citations of the ALL-Book (Google Scholar)
Book 458
Chapter 1: Alonso, L.E., and D. Agosti. Biodiversity Studies, Monitoring, and Ants: An Overview 162Chapter 2: Kaspari, M. A Primer in Ant Ecology (pdf) 85
Chapter 3: Andersen, A.N. A Global Ecology of Rainforest Ants: Functional Groups in Relation to Environmental Stress and Disturbance (pdf) 172
Chapter 4: Schultz, T.R., and T.P. McGlynn. The Interaction of Ants with Other Organisms (pdf) 94Chapter 5: Brown, W.L.Jr. Diversity of Ants (pdf) 220Chapter 6: Alonso, L.E. Ants as Indicators of Diversity (pdf) 164
Chapter 7: Kaspari, M., J.D. Majer. Using Ants to Monitor Environmental Change (pdf) 116
Chapter 8: Ward, P.S. Broad-scale Patterns of Diversity in Leaf litter Ant Communities (pdf) 112
Chapter 9: Bestelmeyer, B.T., D. Agosti, L.E. Alonso, C.R.F. Brandão, W.L. Brown Jr., J.H.C. Delabie, and R. Silvestre. Field Techniques for the Study of Ground-Dwelling Ants: An Overview, Description and Evaluation (pdf) 388Chapter 10: Delabie, J.H.C., B.L. Fisher, J.D. Majer, and I.W. Wright. Sampling Effort and Choise of Methods (pdf) 117
Chapter 11: Lattke, J.E. Specimen Processing: Building and Curating an Ant Collection (pdf) 36
Chapter 12: Brandão, C.R.F. Major Regional and Type Collections of Ants (Formicidae) of the World and Sources for the Identification of Ant Species (pdf) 41Chapter 13: Longino, J.R. What to Do with the Data (pdf) 89
Chapter 14: Agosti, D., and L.E. Alonso. The ALL Protocol: A Standard Protocol for the Collection of Ground-Dwelling Ants (pdf); versión en español (pdf) 179
Chapter 15: Fisher, B.L., A.K.F. Malsch, R. Gadagkar, J.H.C. Delabie, H.L. Vasconcelos, and J.D. Majer. Applying the ALL Protocol: Selected Case Studies (pdf); extended version (pdf) 21Total 2454
Thanks to
Open Access
Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the
only location with a complete set of ant systematics publications from 1758 - present.
Through antbase.org‘s
digital library, access
to this body of
literature is worldwide,
and it is actively used
(>10,000 visits in one
month only).2004
.. Open Access to ca 80% of the papers citing the Antbook
(most of if not through the publishers’ Websites.
.. But it needs time to find the articles
Thanks to
Manual work
Access through Citation
2015
Zookeys doi
DOI: Digital Object Identifier
Create a citable open corpus of taxonomic publications
Create a citable open corpus of taxonomic publications
Content (Nov 2015)
Active:4,500 Poctorupoidea and ant articlesDOI provider for Revue Suisse de Zoologie, Israel Journal of Entomology, Polish Forestry Institute, Revue de Paléobiologie
Pipeline:5,000 ant articles16,000 drosophilid articlesAll Pensoft journal articles, images, tables
Planned:Back issues of RSZBackbone for Plazi and Pensoft cited articles
Participation:Please join! Make all Swiss based treatments accessible?
Biodiversity Literature Repository: RecordTreatment
Illustration
Text
<tax:treatment>
<tax:nomenclature>
<tax:name>
<tax:xid source="HNS" identifier="193329"/>
<tax:xmldata>
<dc:Genus>Mystrium</dc:Genus>
<dc:Species>leonie</dc:Species>
</tax:xmldata>
Mystrium leonie
</tax:name>
<tax:status>n. sp.</tax:status>
Fig 1 D - F
</tax:nomenclature>
<tax:div type="description">
<tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL
1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin
to a sharp apical tooth, the apex parallel to the anterior
(Holotype with material in mandibles, so mandibles and
$ described below from paratypes.) Median clypeus
....
</treatment>
XML: Semantic enhanced text (e.g. TaxonX)
From human to machine readable text
RDF
Taxonomic publication: deconstruct but keep parts linked
Material citation
TreatmentArticleJournalhas part has part has part
Treatment: a well defined part of an article that defines the particular usage of a scientific name by an authority at a given time (a page(s) in a publication).
Treatment
The special case taxonomic literature: The citated elements aretreatments, not article
Formica obsoleta Linnaeus, 1758: 580
Treatment
Original combinations
Reference to an orginal combination
Subsequent useages of names cite the referenced treatment
What is a treatment?
Treatment and treatment reference and citation
Trea
tmen
t ci
tati
on
Treatment references
Treatment Graph for the Malagasy Ants Aphaenogaster
Original description
Re-descriptioncites
cite
s /
syn
ony
miz
es
Re-description
Re-de.Re-description
cites
Material citation
TreatmentArticleJournalis part of is part of is part of
Taxonomic publication: deconstruct but keep parts linked
Material citationTreatmentArticleJournal
is part of is part of is part of
citescites cites
Taxonomic publication: deconstruct but keep parts linked
ISSN DOI httpURI httpURI
Material citationTreatmentArticleJournal
is part of is part of is part of
citescites cites
Taxonomic publication: deconstruct but keep parts linked
ISSN DOI httpURI httpURI
CIEPS / ISSN CrossRef / DataCite Client Client
Plazi PlaziBiodiversity Literature Repository / Zenodo
CERN
Treatment Citation Life
article
treatment
Dikow & Agosti, 2015.
Taxonomic publication
Treatment Verbatim
Material citations
Specimen ID
Treatment citation
Bibliogr. citation
Taxonomic Name
Usages
is part of
cites
Illustration
Taxonomic publication
Material citation
GenbankID
Collection Accession
#
SpecimenID
Digital Object ID
Collecting Event ID
Host ID
Verbatim
is part of
cites
Treatment: implicit and explicit links
Plazi Tools: Data extraction: tables
«Treatment»Wissenschaftliche ArtnameVerbreitungsnachweisBibliographische Records
Cataglyphis tartessica workersVariable mean ± SDHead length 11.23 ± 0.12Head width 11.15 ± 0.12Scape length 11.47 ± 0.12Mesosoma length 11.94 ± 0.16Femur length 12.03 ± 0.14Cephalic index 0 93.60 ± 3.940Scape index 128.10 ± 7.660
Plazi tools: discovering of scientific names
Plazi tools: discovering and parsing of bibliographic references
Plazi tools: discovering and parsing of observation data
Plazi tools: discovering of treatments
Status quo
• 50,000+ treatments life, daily growth
• RDF in Betaversion
• GoldenGate Imagine (PDF and text mining tool) in betaversion
• Provider for data for NCBI, Wikidata, GBIF, EOL, antweb
• Biodiversity Literature Repository functional
LODPDF
HNS
HNS
The Scientific Challenge
The Scientific Challenge
article
treatment
CiteshttpURI
cites (DOI)
Scientific name
https://www.wikidata.org/wiki/Property:P1992
Feed Wikipedia with taxonomic data
Status quo
• 50,000+ treatments life, daily growth
• RDF in Betaversion
• GoldenGate Imagine (PDF and text mining tool) in betaversion
• Provider for data for NCBI, Wikidata, GBIF, EOL, antweb
• Biodiversity Literature Repository functional
What is planned?What can you do?
What is planned? What can you do?
A system that allows to text and data mine the corpus of taxonomic literature.
A system that links taxonomic names to its treatments.
Trait information machine ready
ResolutionReconciliation
TreatmentBank
NAMES
MANAGEMENT
CITATION
MANAGEMENT
REFBANK
TREATMENT
MANAGEMENT
ATOMIZATION &
SEMANTICIZATION
OF CONTENT MARKUP / initial trait extraction
Specialist taxonomic
databases
What is planned? What can you do?
existing prototype planned
prototype
BioDiP
Program: SUK-Programm 2013-2016 P-2 «Wissenschaftliche Information: Zugang, Verarbeitung und Speicherung»
Partners: HES-SO, HEG Geneva (Swiss Institute of Bioinformatics), Plazi, open at various levels – from adding content to BLR to data mining and building applications
Submission: February 2016
Duration: 2-3 years
What is planned? What can you do?