Building the Biodiversity Knowledge Graph

Slides from 4th Global Online Biodiversity Informatics Seminar

  • Building the Biodiversity Knowledge Graph @rdmpage
  • There are known knowns, things we know that we know There are known unknowns, things we now know we dont know But there are also unknown unknowns, things we do not know we dont know
  • known unknown
  • Things we dont know that we know
  • Melissotarsus insularis
  • Melissotarsus insularis no hit CASENT0107663-D01 DQ176312 Melissotarsus sp. BLF m1DQ176312 CASENT0107663-D01Melissotarsus insularis 1 Melissotarsus insularisMelissotarsus sp. BLF m1 =
  • We have a vast amount of old stuff
  • Numbers of new animal names 1923 WWI WWII
  • We are learning new stuff
  • New and old are disconnected
  • Dark taxa
  • Mammals in GenBank Proper Linnaean names Aus sp.
  • Mammals Proper Linnaean names Aus sp.
  • Invertebrates BOLD
  • Challenge: linking things together (sticky data)
  • Data is good
  • More data is better
  • but this data is not sticky
  • Location
  • name name Tags
  • Namenname
  • Identifiers
  • Shared identifiers are sticky
  • Identifiers Globally unique Resolvable (for humans and machines) Use other peoples identifiers to link things together
  • Human and machine readable machine human
  • { "author": [ { "family": "Page", "given": "Roderic D.M." } ], "container-title": "PeerJ", "reference-count": 60, "page": "e190", "deposited": { "date-parts": [ [ 2013, 11, 18 ] ], "timestamp": 1384732800000 }, "title": "BioNames: linking taxonomy, texts, and trees", "type": "journal-article", "DOI": "10.7717/peerj.190", "ISSN": [ "2167-8359" ], "URL": " }
  • Using other peoples identifiers is hard work and scary Hard work - you have to find their identifiers Scary - what happens if other person breaks their identifiers? Solution: make it easy to find them, and make them robust (e.g., CrossRef and DOIs)
  • DOI (Digital Object Identifier)
  • Biodiversity Knowledge Graph (linking things together)
  • Our questions are paths in this network
  • Phylogeography
  • Taxonomy
  • GenBank records from Spain
  • MESH term
  • PMID:948206
  • BHL and GBIF as biomedical databases
  • Metrics (counting links in the knowledge graph)
  • In an attempt to live up to that increasing demand for documentation, the leadership of the Natural History Museum of Denmark has issued an order to its curatorial staff - The staff members are requested to document which publications from 2011, written entirely by external scientists, that in one way or another are based on material in the collections of the Museum.[email protected]
  • /205595933159858176
  • Cited, linkable specimens NMNH Vertebrate Zoology Herpetology Collections 11194 CAS Herpetology Collection Catalog MCZ Herpetology Collection Herpetology Collection (University of Kansas Biodiversity Research Center) 9619 6720 5818
  • Annotation (everyone can make the knowledge graph)
  • How many people view annotation Data Fix me!
  • Annotation as fixing errors
  • Annotation as building the knowledge graph paper specimen paper sequence taxonomic name specimen cites publishes has voucher
  • OK, but if the biodiversity knowledge graph is so cool, why havent we made it already?
  • Open question: Who will build the biodiversity knowledge graph?