20140317 pi b_nmbe_journal_club

71
Towards an (European) Open Biodiversity Knowledge Management System Donat Agosti (Plazi, Bern) March 17, 2014 Berne, Journal Club @ NMBE

description

Lecture presented at the Journals Club of the Naturhistorisches Museum Bern, March 17, 2014. "Towards an (European) Open Biodiversity Knowledge Management System"

Transcript of 20140317 pi b_nmbe_journal_club

Page 1: 20140317 pi b_nmbe_journal_club

Towards an (European) Open Biodiversity Knowledge Management System

Donat Agosti (Plazi, Bern)

March 17, 2014

Berne, Journal Club @ NMBE

Page 3: 20140317 pi b_nmbe_journal_club

The cook (Ferran Adriá) wants to know when he can expect what seafood for his kitchen.

He assumes that phenological data is open and accessible to anyone.

He has a question and needs to know: What seafood at what time?

His goal is to provide a service based on the use of observation data, i.e. treat you (and make some money).

Page 4: 20140317 pi b_nmbe_journal_club

The fishmonger knows when what seafood is available.

He considers his knowledge of seafood phenology as his asset to make money.

His goal is to make money with knowledge based on observation records and understanding the characteristics of seafood.

Page 5: 20140317 pi b_nmbe_journal_club

What do YOU want to know?

How do YOU expect to get to your information?

Page 6: 20140317 pi b_nmbe_journal_club

• What are the main online resources you use?

• Do you maintain your own digital library?

• Do you participate in an online project, egscratchpads, catalogue, digital archive andmake your data accessible?

• … ?

Page 7: 20140317 pi b_nmbe_journal_club

What does this mean?

Meredith Lane, e-biosphere Conference, London 2009

Page 8: 20140317 pi b_nmbe_journal_club

Hardisty, Nature 502, 171 (2013)

BUT: predictive ecology has substantial data needs

Harfoot, BIH2013, Rome, 2013

The big question

What is the future of the biological world?

Imagine if we could:

…Predict community level dynamics of ecosystems atscales from local to global, based on the ecology andbiology of all individual organisms

Page 9: 20140317 pi b_nmbe_journal_club

Decentralized biodiversity infrastructure

Plants

3,400 Herbaria worldwide

10,000 Associate curators and specialists

350,000,000 specimens in collections

180,000,000 specimens digitized

2,000,000,000 specimens including animals

Source: gbif.org; http://sciweb.nybg.org/science2/IndexHerbariorum.asp

Page 10: 20140317 pi b_nmbe_journal_club

200,000,000+ printed pages1,900,000 species described20,000,000+ species treatments 17,000 new species per year

Biodiversity libraries

BUT: The data are hidden

Incomplete digitization Publications areunstructuredCollections are incompleteData is not linkedMost data are not open

Page 11: 20140317 pi b_nmbe_journal_club

Nationaal Herbarium Nederland collection on GBIF

Source: http://www.gbif.org/dataset/7b33b040-f762-11e1-a439-00145eb45e9a

One collection’s view of the world

Page 12: 20140317 pi b_nmbe_journal_club

Another collection’s view of the world

http://www.gbif.org/dataset/82b0f51c-f762-11e1-a439-00145eb45e9a

Page 13: 20140317 pi b_nmbe_journal_club

What does this mean?

The Linking Open Data cloud diagram

Linked Open Data Cloud

Page 14: 20140317 pi b_nmbe_journal_club

Names as information tags in life sciences

Names

Characteristics

Publications

GenesCollections

Specimens

Distribution

Page 15: 20140317 pi b_nmbe_journal_club

The enhanced and linked treatments, extracted, stored on Plazi.org, and served in

a human readable form, are linked to the underlying data: Fisher & Smith, 2008,

PLoS ONE.

Page 16: 20140317 pi b_nmbe_journal_club

Towards an (European) Open Biodiversity Knowledge Management System

Page 17: 20140317 pi b_nmbe_journal_club

Coordination and Policy Development in Preparation for a European Open Biodiversity Knowledge Management

System

Supported by the European Commission through its FP7 research funding programme

pro-iBiosphere

Page 18: 20140317 pi b_nmbe_journal_club

pro-iBiosphere: Partners

Page 19: 20140317 pi b_nmbe_journal_club

Create digital objects+ Identifiers and resolvers

+ Open Access+ Adequate infrastructure

+ Sustainable and permanent infrastructure+ Reliable services for partners in research projects and society

Seamless Global Virtual Research Knowledge Management System

(European Open Biodiversity Knowledge Management System)

Biodiversity Knowledge Management System

Page 20: 20140317 pi b_nmbe_journal_club

Impact

Support reliable and permanent open access to digital biodiversity recordsCreate identifiers and link biodiversity literature, collections, digital objects, genes, etc.Ensure global interoperability and sharing of biodiversity data, information and knowledgeProvide new services in support of open scienceProvide the ground for modelling biosphereDevelop data policies to harness the potential of open access

European Open Biodiversity Management SystemThe envisaged

will:

Page 21: 20140317 pi b_nmbe_journal_club

Convert data into machine readable data

Page 22: 20140317 pi b_nmbe_journal_club

Literature as an example

Page 23: 20140317 pi b_nmbe_journal_club

Text

<tax:treatment>

<tax:nomenclature>

<tax:name>

<tax:xid source="HNS" identifier="193329"/>

<tax:xmldata>

<dc:Genus>Mystrium</dc:Genus>

<dc:Species>leonie</dc:Species>

</tax:xmldata>

Mystrium leonie

</tax:name> Bohn & Verhaagh

<tax:status>n. sp.</tax:status>

Fig 1 D - F

</tax:nomenclature>

<tax:div type="description">

<tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL

1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving

to a sharp apical tooth, the apex parallel to the anterior clypeal margin.

(Holotype with material in mandibles, so mandibles and anterior clypeus

$ described below from paratypes.) Median clypeus

....

</treatment>

Enhanced and linked text

Page 24: 20140317 pi b_nmbe_journal_club

Treatment

A publication or section of a publication documenting the

features or distribution of a related group of organisms

(called a “taxon”, plural “taxa”) in ways adhering to highly

formalized conventions.

http://terms.tdwg.org/wiki/tp:taxon-treatment

Catapano, 2010.

Page 25: 20140317 pi b_nmbe_journal_club

Treatment

Page 26: 20140317 pi b_nmbe_journal_club

X-us c-us

(Treatment)

Citation

Description

Mate

X-us b-us

(Treatment)

Citation

Description

Material cit

X-us b-us

n.sp

(Treatment)

Citation

Description

Material cit

X-us b-us

(Treatment)

Citation

Description

Material cit

Treatments

Page 27: 20140317 pi b_nmbe_journal_club

X-us c-us

(Treatment)

Citation

Description

Mate

X-us b-us

(Treatment)

Citation

Description

Material cit

X-us b-us

n.sp

(Treatment)

Citation

Description

Material cit

X-us b-us

(Treatment)

Citation

Description

Mateerial cit

Title

(Article)

Bibliogra-phic references

Title

(Article)

Bibliogra-phic references

Title

(Article)

Bibliogra-phic references

Title

(Article)

Bibliogra-phic references

Systema naturae

(Article)

Bibliogra-phic references

Treatments

References

Page 28: 20140317 pi b_nmbe_journal_club

Treatments can be cited, like publications, with stable identifiers.

Page 29: 20140317 pi b_nmbe_journal_club

http://treatment.plazi.org/id/31F96F41E3E002BD88985A4F3A20E45A

Best practices for stable URIs:

http://wiki.pro-ibiosphere.eu/wiki/Best_practices_for_stable_URIs

Page 30: 20140317 pi b_nmbe_journal_club
Page 31: 20140317 pi b_nmbe_journal_club

Jeremy Miller, Work in Progress

Page 32: 20140317 pi b_nmbe_journal_club

Jeremy Miller, Work in Progress

Page 33: 20140317 pi b_nmbe_journal_club

Names can be linked automatically

Page 34: 20140317 pi b_nmbe_journal_club

Automated registration

MANUSCRIPT SUBMISSION

MANUSCRIPT ACCEPTED

XML Response

ARTICLEPUBLISHED

Taxon name available/valid (effectively published)

XML article metadata

XML Query

Peer review

Page 35: 20140317 pi b_nmbe_journal_club

The enhanced and linked treatments, extracted, stored on Plazi.org, and served in

a human readable form, are linked to the underlying data: Fisher & Smith, 2008,

PLoS ONE.

Page 36: 20140317 pi b_nmbe_journal_club

Penestomus egazini Miller, Haddad & Griswold, 2010

Progress

Treatments (% complete): 4/4 (100%)

Data summary

Specimen records: 41

adult femaleadult maleother

51%

2%

46%

Specimen collections

Institutions: 3

Distribution

Muséum National d'Histoire Naturelle, Paris

California Academy of Sciences, San Francisco

Albany Museum, Grahamstown

2%

5%

76%

20%

CountriesLesotho

South Africa

Georeferenced materials citations

Export species materials citations (DwC)

Export treatment materials citations (DwC)

Page 37: 20140317 pi b_nmbe_journal_club

02000400060008000

100001200014000160001800020000

Materials Citations Records by Researcher

Other

Donat Agosti

David Grimaldi

Toby Schuh

James Carpenter

Norman Platnick

American Museum of Natural History

Data summary

Materials citations 2004-2013: 111,364

Distribution

Georeferenced materials citations

Export species materials citations (DwC)

Ma

teria

ls C

ita

tion

s R

ecord

s

Page 38: 20140317 pi b_nmbe_journal_club

0

500

1000

1500

2000

2500

Materials Citations Records by Institution

Other

Muséum National d'HistoireNaturelle, ParisNatural History Museum,LondonMuseum of ComparativeZoologySmithsonian Institution

American Museum of NaturalHistory

Zootaxa

Data summary

Materials citations 2004-2013: 11,476

Distribution

Georeferenced materials citations

Export species materials citations (DwC)

Ma

teria

ls C

ita

tion

s R

ecord

s

Page 39: 20140317 pi b_nmbe_journal_club
Page 40: 20140317 pi b_nmbe_journal_club
Page 41: 20140317 pi b_nmbe_journal_club
Page 42: 20140317 pi b_nmbe_journal_club

Better:

Create data as machine readable data

Page 43: 20140317 pi b_nmbe_journal_club

Unified marked up final outputTaxon treatments, keys, images, localities

PROSPECTIVE PUBLISHING | HISTORICAL LITERATURE

Legacy and new taxonomic literature

Content management systems &repositories (e.g., Plazi, EOL, GBIF, SCRATCHPADS, EDIT)

TaxPub XML schemaPENSOFT MARK UP tool

Marked up publicationsPDF, HTML and XML

archiving

WIKISpecies-ID, Wikispecies

Wikipedia

Indexing (IPNI, ZooBank, Myco-

Bank, GNA)

Aggregators(EOL, GBIF)

Electronic archives; Data

Centers

END

USERS

TaxonX schema PLAZI’ GOLDEN GATE editor

Automated submission; peer-

review

Page 44: 20140317 pi b_nmbe_journal_club

http://biodiversitydatajournal.com/articles.php?id=995

Page 45: 20140317 pi b_nmbe_journal_club

Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire

body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages)

Page 46: 20140317 pi b_nmbe_journal_club

Open Access

Page 47: 20140317 pi b_nmbe_journal_club

Knowledge wants to be free

Page 48: 20140317 pi b_nmbe_journal_club

Before antbase.org, Harvard‘s Museum of

Comparative Zoology could claim to be the only

location with a complete set of ant systematics

publications from 1758 - present.

Page 49: 20140317 pi b_nmbe_journal_club

Before antbase.org, Harvard‘s Museum of

Comparative Zoology could claim to be the only

location with a complete set of ant systematics

publications from 1758 - present.

Through antbase.org‘s

digital library, access

to this body of

literature is worldwide,

and it is actively used

(>10,000 visits in one

month only).

Page 50: 20140317 pi b_nmbe_journal_club

Knowledge has to be free

Page 51: 20140317 pi b_nmbe_journal_club

Bouchout Declaration, 2014

Umsetzung durch den Schweizerischen Nationalfonds, 2007

Berlin Declaration, 2003

Page 52: 20140317 pi b_nmbe_journal_club

• The free and open use of content, services and other digital resources about biodiversity;

• Licenses that grant all users a free, irrevocable, world-wide, right to copy, use, distribute, transmit and display the work publicly as well as build on the work and making derivative works, subject to proper attribution consistent with community practices;

• Policy developments that will foster free and open access to biodiversity data;

• Tracking the use of information to ensure that sources and suppliers of data are assigned credit for their contributions;

• An agreed infrastructure, standards and protocols to improve access to and use of open data;

Bouchout Declaration, 2014 (1)

Page 53: 20140317 pi b_nmbe_journal_club

• Registers for content and services to allow discovery, access and use of open data;

• Persistent, dereferenceable identifiers for data objects and physical objects such as specimens, images and taxonomic treatments;

• Linking data using agreed vocabularies, both within and beyond biodiversity, that enable participation in the Linked Open Data Cloud;

• Dialogue coordinated by the leading signatories to refine the concept, priorities and technical requirements of Open Biodiversity Knowledge Management.

• A sustainable Open Biodiversity Knowledge Management that is attentive to scientific, sociological, legal, and financial aspects.

Bouchout Declaration, 2014 (2)

Page 54: 20140317 pi b_nmbe_journal_club

Knowledge has to be made free

Page 55: 20140317 pi b_nmbe_journal_club

You!

Page 56: 20140317 pi b_nmbe_journal_club
Page 57: 20140317 pi b_nmbe_journal_club

Reduce costs – future publishing

Page 58: 20140317 pi b_nmbe_journal_club

Don’t waist money:

Focus on Open Access enhanced linked publications – not pdf only

Page 59: 20140317 pi b_nmbe_journal_club
Page 60: 20140317 pi b_nmbe_journal_club

founded in 2008

Swiss based NGO with members in

Switzerland, Germany, Bulgaria, US and

Iran

research based think tank with the

mission to promote open access to

scientific content

five pillars: Legal advice,

technical innovations and solutions,

maintenance of a treatment repository

and Biowikifarm, consultancy, advocacy

Page 61: 20140317 pi b_nmbe_journal_club

Modify copyright legislation to serve

better the scientific needs

Page 62: 20140317 pi b_nmbe_journal_club

Taxpub TaxonX

DTD Schema

Prospective publications Legacy publications

Constraint loose

Derivative of JATS independent

Self-contained Allows import of other schemas

Page 63: 20140317 pi b_nmbe_journal_club

Plazi workflow: overview

Page 64: 20140317 pi b_nmbe_journal_club

Plazi Search and Retrieval Server: Access to data

Darwin Core-Archive

You

You

You

human

machine

Page 65: 20140317 pi b_nmbe_journal_club

Biowikifarm

Page 66: 20140317 pi b_nmbe_journal_club

founded in 2008

Swiss based NGO with members in

Switzerland, Germany, Bulgaria, US and

Iran

research based think tank with the

mission to promote open access to

scientific content

five pillars: Legal advice,

technical innovations and solutions,

maintenance of a treatment repository

and Biowikifarm, consultancy, advocacy

Plazi GmbH founded in 2012 as

service SME owned by Plazi

Page 67: 20140317 pi b_nmbe_journal_club

research based think tank with the

mission to promote open access to

scientific content

five pillars: Legal advice,

technical innovations and solutions,

maintenance of a treatment repository

and Biowikifarm, consultancy, advocacy

Plazi GmbH founded in 2012 as

service SME owned by Plazi

Funding from public donors, eg. EU,

corporate and private

Page 68: 20140317 pi b_nmbe_journal_club

Funding:

EU

EU-BON

Pro-iBiosphere

Private sector

Inkind

Voluntary work

Page 69: 20140317 pi b_nmbe_journal_club

five pillars: Legal advice, technical

innovations and solutions, maintenance

of a treatment repository and

Biowikifarm, consultancy, advocacy

Plazi GmbH founded in 2012 as

service SME owned by Plazi

Funding from public donors, eg. EU,

corporate and private

Clients are global

Page 70: 20140317 pi b_nmbe_journal_club

Consultancies and Services:

Consulting publishers on how to

produce XML semantically enhanced

output (eg. EJT, Zootaxa, Smithsonian

Institution)

Service to mark-up literature

Page 71: 20140317 pi b_nmbe_journal_club

http://plazi.org

Thank you very much!

Donat Agosti

[email protected]

This project is funded under the European Union's Seventh Framework

Programme (FP7/2007-2013) under grant agreement №312848.