Biodiversity Knowledge Graphs

48
Biodiversity knowledge graphs @rdmpage http://iphylo.blogspot.com

Transcript of Biodiversity Knowledge Graphs

Page 1: Biodiversity Knowledge Graphs

Biodiversity knowledge graphs

@rdmpage

http://iphylo.blogspot.com

Page 2: Biodiversity Knowledge Graphs
Page 3: Biodiversity Knowledge Graphs

Our questions are “paths” in this network

Page 4: Biodiversity Knowledge Graphs
Page 5: Biodiversity Knowledge Graphs

http://iphylo.org/~rpage/geojson-phylogeny-demo/

Page 6: Biodiversity Knowledge Graphs

Phylogeography

Page 7: Biodiversity Knowledge Graphs

PMID:948206

Page 8: Biodiversity Knowledge Graphs

http://biostor.org/reference/102054

Page 9: Biodiversity Knowledge Graphs
Page 10: Biodiversity Knowledge Graphs

http://data.gbif.org/occurrences/215921922/

Page 11: Biodiversity Knowledge Graphs

BHL and GBIF as biomedical databases

http://iphylo.blogspot.co.uk/2012/03/bhl-and-gbif-as-biomedical-databases.html

Page 12: Biodiversity Knowledge Graphs

MESH term

Page 13: Biodiversity Knowledge Graphs

Metrics(counting links in the knowledge graph)

Page 14: Biodiversity Knowledge Graphs

In an attempt to live up to that increasing demand for documentation, the leadership of the Natural History Museum of Denmark hasissued an order to its curatorial staff - The staff members are requested to document which publications from 2011, written entirely by external scientists, that in one way or another are based on material in the collections of the Museum.

http://markmail.org/message/opv2we7fkmro2nen@TAXACOM

Page 15: Biodiversity Knowledge Graphs

Use of a collection

Page 16: Biodiversity Knowledge Graphs

Building the knowledge graph

Page 17: Biodiversity Knowledge Graphs

BioStorhttp://biostor.org

Page 18: Biodiversity Knowledge Graphs
Page 19: Biodiversity Knowledge Graphs

BioNameshttp://bionames.org

Page 20: Biodiversity Knowledge Graphs

Material examined

Page 21: Biodiversity Knowledge Graphs

Species, sequences, publications

Page 22: Biodiversity Knowledge Graphs

Biodiversity knowledge graph(s)

Page 23: Biodiversity Knowledge Graphs
Page 24: Biodiversity Knowledge Graphs
Page 25: Biodiversity Knowledge Graphs
Page 26: Biodiversity Knowledge Graphs

Implications

• Sequencing is cheap

• The flood of sequences is only going to increase

• How much of this is relevant to biodiversity?

Page 27: Biodiversity Knowledge Graphs

Numbers of new animal names

1923

WWI WWII

Page 28: Biodiversity Knowledge Graphs

Implications

• Taxonomists are working at capacity

• Most taxonomic work is in the past

• Compare this to exponential growth of sequencing

Page 29: Biodiversity Knowledge Graphs

Mammals in GenBank

Proper Linnaean names

Aus sp.

Page 30: Biodiversity Knowledge Graphs

Mammals

Proper Linnaean names

Aus sp.

Page 31: Biodiversity Knowledge Graphs

“Invertebrates”

BOLD

Page 32: Biodiversity Knowledge Graphs

Dark taxa

• Disconnect between taxonomy and genomics

• Are “dark taxa” species we already know about or are they new diversity?

• Do we need taxonomic names?

Page 33: Biodiversity Knowledge Graphs

100,000 articles from http://biostor.org (BHL)

1923 today

Page 34: Biodiversity Knowledge Graphs

Publishers of taxonomy(# articles)

http://bionames.org

Page 35: Biodiversity Knowledge Graphs

Legacy literature

• 1923 and the chilling effect of copyright

• Much of the taxonomic literature is digitally “dark”

• Commercial publishers control access to a lot of literature

Page 36: Biodiversity Knowledge Graphs

Size of Wikipedia articles on mammals

Few, large articles

Many, small articles “long tail”

Page 37: Biodiversity Knowledge Graphs

PanTHERIA (2009)

1923 2003

Page 38: Biodiversity Knowledge Graphs

Power law

• We know a lot about a few species

• For most species we know very little (even in well-known groups)

• For poorly known species need to go to legacy literature

Page 39: Biodiversity Knowledge Graphs

GBIF.org 500 million records

Page 40: Biodiversity Knowledge Graphs

GBIF

• The Global Biodiversity Information Facility is not evenly “global”

• Tells us as much about sampling as distribution of diversity

Page 41: Biodiversity Knowledge Graphs

Flickr EOL group

Page 42: Biodiversity Knowledge Graphs

Crowd sourcing

• Where is the “crowd”?

• It’s where the iPhones are…

Page 43: Biodiversity Knowledge Graphs

BOLD DNA barcodeshttp://iphylo.org/~rpage/bold-map

Page 44: Biodiversity Knowledge Graphs

GenBank host records “symbiome”

Page 45: Biodiversity Knowledge Graphs

GenBank as a biodiversity database

• GenBank is about more than genes

• GenBank has a wealth of information on location, and ecological interactions

Page 46: Biodiversity Knowledge Graphs
Page 47: Biodiversity Knowledge Graphs

Implications

• Phylogenetic data is not being archived (why not?)

• What would make archiving a “no brainer?”

Page 48: Biodiversity Knowledge Graphs