Biodiversity Knowledge Graphs

Post on 15-Jul-2015

230 views 3 download

Tags:

Transcript of Biodiversity Knowledge Graphs

Biodiversity knowledge graphs

@rdmpage

http://iphylo.blogspot.com

Our questions are “paths” in this network

http://iphylo.org/~rpage/geojson-phylogeny-demo/

Phylogeography

PMID:948206

http://biostor.org/reference/102054

http://data.gbif.org/occurrences/215921922/

BHL and GBIF as biomedical databases

http://iphylo.blogspot.co.uk/2012/03/bhl-and-gbif-as-biomedical-databases.html

MESH term

Metrics(counting links in the knowledge graph)

In an attempt to live up to that increasing demand for documentation, the leadership of the Natural History Museum of Denmark hasissued an order to its curatorial staff - The staff members are requested to document which publications from 2011, written entirely by external scientists, that in one way or another are based on material in the collections of the Museum.

http://markmail.org/message/opv2we7fkmro2nen@TAXACOM

Use of a collection

Building the knowledge graph

BioStorhttp://biostor.org

BioNameshttp://bionames.org

Material examined

Species, sequences, publications

Biodiversity knowledge graph(s)

Implications

• Sequencing is cheap

• The flood of sequences is only going to increase

• How much of this is relevant to biodiversity?

Numbers of new animal names

1923

WWI WWII

Implications

• Taxonomists are working at capacity

• Most taxonomic work is in the past

• Compare this to exponential growth of sequencing

Mammals in GenBank

Proper Linnaean names

Aus sp.

Mammals

Proper Linnaean names

Aus sp.

“Invertebrates”

BOLD

Dark taxa

• Disconnect between taxonomy and genomics

• Are “dark taxa” species we already know about or are they new diversity?

• Do we need taxonomic names?

100,000 articles from http://biostor.org (BHL)

1923 today

Publishers of taxonomy(# articles)

http://bionames.org

Legacy literature

• 1923 and the chilling effect of copyright

• Much of the taxonomic literature is digitally “dark”

• Commercial publishers control access to a lot of literature

Size of Wikipedia articles on mammals

Few, large articles

Many, small articles “long tail”

PanTHERIA (2009)

1923 2003

Power law

• We know a lot about a few species

• For most species we know very little (even in well-known groups)

• For poorly known species need to go to legacy literature

GBIF.org 500 million records

GBIF

• The Global Biodiversity Information Facility is not evenly “global”

• Tells us as much about sampling as distribution of diversity

Flickr EOL group

Crowd sourcing

• Where is the “crowd”?

• It’s where the iPhones are…

BOLD DNA barcodeshttp://iphylo.org/~rpage/bold-map

GenBank host records “symbiome”

GenBank as a biodiversity database

• GenBank is about more than genes

• GenBank has a wealth of information on location, and ecological interactions

Implications

• Phylogenetic data is not being archived (why not?)

• What would make archiving a “no brainer?”