BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

51
Donat Agosti Plazi [email protected] SSS Day 2015, November 6, NHMB, Bern BioDiP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Transcript of BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Page 1: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Donat Agosti

Plazi

[email protected] SSS Day 2015, November 6, NHMB, Bern

BioDiP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Page 2: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Content:What is the issue?Where do we stand?What is planned?What can you do?

Page 3: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Improve the Role of Published Biodiversity Literature in Policy decisions

Published observation records (Scientific literature)

EU BON Policy Brief

Page 4: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

The Scientific Challenge: Annotate genes

1 tnntttccca cgaataaata atataagatt ttgattatta cctccttctt taattttatt61 attatcaaga agattagttt ataaaggagt aggaacagga tgaactgttt atcctccttt121 atctaataat ttatatcata atggattttc aactgattta gcaatttttt ctttacatat181 tgcaggaata tcatcaatta taggagcaat taattttatt tcaacaattt taaatataca241 tcataaaaat ttatcattag ataaaattcc attgttagtt tgatcaattt taattacagc301 tattttatta ttattatctt tacctgtatt agcaggtgca attactatat tattaactga361 tcgaaatcta aatacaactt tttttgatcc ttcgggtgga ggagatccaa ttttatatca421 acatttattt

Page 5: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Text

<tax:treatment>

<tax:nomenclature>

<tax:name>

<tax:xid source="HNS" identifier="193329"/>

<tax:xmldata>

<dc:Genus>Mystrium</dc:Genus>

<dc:Species>leonie</dc:Species>

</tax:xmldata>

Mystrium leonie

</tax:name>

<tax:status>n. sp.</tax:status>

Fig 1 D - F

</tax:nomenclature>

<tax:div type="description">

<tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL

1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin

to a sharp apical tooth, the apex parallel to the anterior

(Holotype with material in mandibles, so mandibles and

$ described below from paratypes.) Median clypeus

....

</treatment>

XML: Semantic enhanced text (e.g. TaxonX)

From human to machine readable text

RDF

Page 7: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Text mining tools: Visualization of treatment content

Summary of content of 37 Zootaxa spider publications and 8 Biodiversity Data Journal. (Miller et al., 2015)

Page 8: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Pseudomyrmex ants and Vachellia ant-acaciasare a classic example of mutualism in biology.

allenii

melanoceras

ruddiae

chiapensis

collinsii

cookii

cornigera

globulifera

hindsii

janzenii

mayana

sphaerocephala

boopis

flavicornis

hesperius

ita

janzenikuenckeli

mixtecus

nigrocinctus

nigropilosus

opaciceps

particeps

peperi

reconditus

satanicus

simulansspinicola

subtilissimus

veneficus

ferrugineus

gentlei

gracilis

Transbiotic link networkAssociated species linked throughreferences in taxonomic treatments

Acacia-ant species: Pseudomyrmex gracili

Treatment: redescription

Associated ant-acacia: Acacia gentlei

Ants Plants

Photocredits: Alex Wild

Treatment

Treatments linked through citations

Text mining tools: Visualization of treatment content

Page 9: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

The Scientific Challenge: link a name to its treatment

Page 10: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

What is the issue? Build the necessary system

A system that allows to text and data mine the corpus of taxonomic literature.

A system that links taxonomic names to its treatments.

Page 11: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Trait information machine ready

ResolutionReconciliation

TreatmentBank

NAMES

MANAGEMENT

CITATION

MANAGEMENT

REFBANK

TREATMENT

MANAGEMENT

ATOMIZATION &

SEMANTICIZATION

OF CONTENT MARKUP / initial trait extraction

Specialist taxonomic

databases

Build the necessary system

Page 12: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Where do we stand?

Page 13: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Number of citations:Better than most scientific papers

The origin

Where do we stand?

Page 14: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Citations of the ALL-Book (Google Scholar)

Book 458

Chapter 1: Alonso, L.E., and D. Agosti. Biodiversity Studies, Monitoring, and Ants: An Overview 162Chapter 2: Kaspari, M. A Primer in Ant Ecology (pdf) 85

Chapter 3: Andersen, A.N. A Global Ecology of Rainforest Ants: Functional Groups in Relation to Environmental Stress and Disturbance (pdf) 172

Chapter 4: Schultz, T.R., and T.P. McGlynn. The Interaction of Ants with Other Organisms (pdf) 94Chapter 5: Brown, W.L.Jr. Diversity of Ants (pdf) 220Chapter 6: Alonso, L.E. Ants as Indicators of Diversity (pdf) 164

Chapter 7: Kaspari, M., J.D. Majer. Using Ants to Monitor Environmental Change (pdf) 116

Chapter 8: Ward, P.S. Broad-scale Patterns of Diversity in Leaf litter Ant Communities (pdf) 112

Chapter 9: Bestelmeyer, B.T., D. Agosti, L.E. Alonso, C.R.F. Brandão, W.L. Brown Jr., J.H.C. Delabie, and R. Silvestre. Field Techniques for the Study of Ground-Dwelling Ants: An Overview, Description and Evaluation (pdf) 388Chapter 10: Delabie, J.H.C., B.L. Fisher, J.D. Majer, and I.W. Wright. Sampling Effort and Choise of Methods (pdf) 117

Chapter 11: Lattke, J.E. Specimen Processing: Building and Curating an Ant Collection (pdf) 36

Chapter 12: Brandão, C.R.F. Major Regional and Type Collections of Ants (Formicidae) of the World and Sources for the Identification of Ant Species (pdf) 41Chapter 13: Longino, J.R. What to Do with the Data (pdf) 89

Chapter 14: Agosti, D., and L.E. Alonso. The ALL Protocol: A Standard Protocol for the Collection of Ground-Dwelling Ants (pdf); versión en español (pdf) 179

Chapter 15: Fisher, B.L., A.K.F. Malsch, R. Gadagkar, J.H.C. Delabie, H.L. Vasconcelos, and J.D. Majer. Applying the ALL Protocol: Selected Case Studies (pdf); extended version (pdf) 21Total 2454

Page 15: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Thanks to

Page 16: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Open Access

Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the

only location with a complete set of ant systematics publications from 1758 - present.

Through antbase.org‘s

digital library, access

to this body of

literature is worldwide,

and it is actively used

(>10,000 visits in one

month only).2004

Page 17: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

.. Open Access to ca 80% of the papers citing the Antbook

(most of if not through the publishers’ Websites.

.. But it needs time to find the articles

Thanks to

Manual work

Page 18: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Access through Citation

2015

Zookeys doi

DOI: Digital Object Identifier

Page 19: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Create a citable open corpus of taxonomic publications

Page 20: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Create a citable open corpus of taxonomic publications

Content (Nov 2015)

Active:4,500 Poctorupoidea and ant articlesDOI provider for Revue Suisse de Zoologie, Israel Journal of Entomology, Polish Forestry Institute, Revue de Paléobiologie

Pipeline:5,000 ant articles16,000 drosophilid articlesAll Pensoft journal articles, images, tables

Planned:Back issues of RSZBackbone for Plazi and Pensoft cited articles

Participation:Please join! Make all Swiss based treatments accessible?

Page 21: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains
Page 22: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Biodiversity Literature Repository: RecordTreatment

Illustration

Page 23: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Text

<tax:treatment>

<tax:nomenclature>

<tax:name>

<tax:xid source="HNS" identifier="193329"/>

<tax:xmldata>

<dc:Genus>Mystrium</dc:Genus>

<dc:Species>leonie</dc:Species>

</tax:xmldata>

Mystrium leonie

</tax:name>

<tax:status>n. sp.</tax:status>

Fig 1 D - F

</tax:nomenclature>

<tax:div type="description">

<tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL

1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin

to a sharp apical tooth, the apex parallel to the anterior

(Holotype with material in mandibles, so mandibles and

$ described below from paratypes.) Median clypeus

....

</treatment>

XML: Semantic enhanced text (e.g. TaxonX)

From human to machine readable text

RDF

Page 24: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Taxonomic publication: deconstruct but keep parts linked

Material citation

TreatmentArticleJournalhas part has part has part

Page 25: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Treatment: a well defined part of an article that defines the particular usage of a scientific name by an authority at a given time (a page(s) in a publication).

Treatment

The special case taxonomic literature: The citated elements aretreatments, not article

Formica obsoleta Linnaeus, 1758: 580

Page 26: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Treatment

Page 27: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Original combinations

Reference to an orginal combination

Subsequent useages of names cite the referenced treatment

What is a treatment?

Page 28: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Treatment and treatment reference and citation

Trea

tmen

t ci

tati

on

Treatment references

Page 29: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Treatment Graph for the Malagasy Ants Aphaenogaster

Original description

Re-descriptioncites

cite

s /

syn

ony

miz

es

Re-description

Re-de.Re-description

cites

Page 30: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Material citation

TreatmentArticleJournalis part of is part of is part of

Taxonomic publication: deconstruct but keep parts linked

Page 31: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Material citationTreatmentArticleJournal

is part of is part of is part of

citescites cites

Taxonomic publication: deconstruct but keep parts linked

ISSN DOI httpURI httpURI

Page 32: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Material citationTreatmentArticleJournal

is part of is part of is part of

citescites cites

Taxonomic publication: deconstruct but keep parts linked

ISSN DOI httpURI httpURI

CIEPS / ISSN CrossRef / DataCite Client Client

Plazi PlaziBiodiversity Literature Repository / Zenodo

CERN

Page 33: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Treatment Citation Life

article

treatment

Dikow & Agosti, 2015.

Page 34: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Taxonomic publication

Treatment Verbatim

Material citations

Specimen ID

Treatment citation

Bibliogr. citation

Taxonomic Name

Usages

is part of

cites

Illustration

Page 35: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Taxonomic publication

Material citation

GenbankID

Collection Accession

#

SpecimenID

Digital Object ID

Collecting Event ID

Host ID

Verbatim

is part of

cites

Page 36: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Treatment: implicit and explicit links

Page 37: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Plazi Tools: Data extraction: tables

«Treatment»Wissenschaftliche ArtnameVerbreitungsnachweisBibliographische Records

Cataglyphis tartessica workersVariable mean ± SDHead length 11.23 ± 0.12Head width 11.15 ± 0.12Scape length 11.47 ± 0.12Mesosoma length 11.94 ± 0.16Femur length 12.03 ± 0.14Cephalic index 0 93.60 ± 3.940Scape index 128.10 ± 7.660

Page 38: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Plazi tools: discovering of scientific names

Page 39: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Plazi tools: discovering and parsing of bibliographic references

Page 40: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Plazi tools: discovering and parsing of observation data

Page 41: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Plazi tools: discovering of treatments

Page 42: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Status quo

• 50,000+ treatments life, daily growth

• RDF in Betaversion

• GoldenGate Imagine (PDF and text mining tool) in betaversion

• Provider for data for NCBI, Wikidata, GBIF, EOL, antweb

• Biodiversity Literature Repository functional

Page 43: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

LODPDF

HNS

HNS

The Scientific Challenge

Page 44: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

The Scientific Challenge

Page 45: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

article

treatment

CiteshttpURI

cites (DOI)

Scientific name

https://www.wikidata.org/wiki/Property:P1992

Feed Wikipedia with taxonomic data

Page 46: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Status quo

• 50,000+ treatments life, daily growth

• RDF in Betaversion

• GoldenGate Imagine (PDF and text mining tool) in betaversion

• Provider for data for NCBI, Wikidata, GBIF, EOL, antweb

• Biodiversity Literature Repository functional

Page 47: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

What is planned?What can you do?

Page 48: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

What is planned? What can you do?

A system that allows to text and data mine the corpus of taxonomic literature.

A system that links taxonomic names to its treatments.

Page 49: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Trait information machine ready

ResolutionReconciliation

TreatmentBank

NAMES

MANAGEMENT

CITATION

MANAGEMENT

REFBANK

TREATMENT

MANAGEMENT

ATOMIZATION &

SEMANTICIZATION

OF CONTENT MARKUP / initial trait extraction

Specialist taxonomic

databases

What is planned? What can you do?

existing prototype planned

prototype

Page 50: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

BioDiP

Program: SUK-Programm 2013-2016 P-2 «Wissenschaftliche Information: Zugang, Verarbeitung und Speicherung»

Partners: HES-SO, HEG Geneva (Swiss Institute of Bioinformatics), Plazi, open at various levels – from adding content to BLR to data mining and building applications

Submission: February 2016

Duration: 2-3 years

What is planned? What can you do?

Page 51: BioDIP - a proposed infrastructure to link the taxonomic to the genomic and other domains

Thank you!

Donat Agosti

[email protected]