Biodiversity Knowledge Graphs
-
Upload
roderic-page -
Category
Science
-
view
230 -
download
3
Transcript of Biodiversity Knowledge Graphs
Biodiversity knowledge graphs
@rdmpage
http://iphylo.blogspot.com
Our questions are “paths” in this network
http://iphylo.org/~rpage/geojson-phylogeny-demo/
Phylogeography
PMID:948206
http://biostor.org/reference/102054
http://data.gbif.org/occurrences/215921922/
BHL and GBIF as biomedical databases
http://iphylo.blogspot.co.uk/2012/03/bhl-and-gbif-as-biomedical-databases.html
MESH term
Metrics(counting links in the knowledge graph)
In an attempt to live up to that increasing demand for documentation, the leadership of the Natural History Museum of Denmark hasissued an order to its curatorial staff - The staff members are requested to document which publications from 2011, written entirely by external scientists, that in one way or another are based on material in the collections of the Museum.
http://markmail.org/message/opv2we7fkmro2nen@TAXACOM
Use of a collection
Building the knowledge graph
BioStorhttp://biostor.org
BioNameshttp://bionames.org
Material examined
Species, sequences, publications
Biodiversity knowledge graph(s)
Implications
• Sequencing is cheap
• The flood of sequences is only going to increase
• How much of this is relevant to biodiversity?
Numbers of new animal names
1923
WWI WWII
Implications
• Taxonomists are working at capacity
• Most taxonomic work is in the past
• Compare this to exponential growth of sequencing
Mammals in GenBank
Proper Linnaean names
Aus sp.
Mammals
Proper Linnaean names
Aus sp.
“Invertebrates”
BOLD
Dark taxa
• Disconnect between taxonomy and genomics
• Are “dark taxa” species we already know about or are they new diversity?
• Do we need taxonomic names?
100,000 articles from http://biostor.org (BHL)
1923 today
Publishers of taxonomy(# articles)
http://bionames.org
Legacy literature
• 1923 and the chilling effect of copyright
• Much of the taxonomic literature is digitally “dark”
• Commercial publishers control access to a lot of literature
Size of Wikipedia articles on mammals
Few, large articles
Many, small articles “long tail”
PanTHERIA (2009)
1923 2003
Power law
• We know a lot about a few species
• For most species we know very little (even in well-known groups)
• For poorly known species need to go to legacy literature
GBIF.org 500 million records
GBIF
• The Global Biodiversity Information Facility is not evenly “global”
• Tells us as much about sampling as distribution of diversity
Flickr EOL group
Crowd sourcing
• Where is the “crowd”?
• It’s where the iPhones are…
BOLD DNA barcodeshttp://iphylo.org/~rpage/bold-map
GenBank host records “symbiome”
GenBank as a biodiversity database
• GenBank is about more than genes
• GenBank has a wealth of information on location, and ecological interactions
Implications
• Phylogenetic data is not being archived (why not?)
• What would make archiving a “no brainer?”