Scratchpad 2014-introduction

70
Scratchpads Virtual Research Environme for taxonomic and biodiversity related Dr. Vince Smith Informatics Research Leader The Natural History Museum London

description

Scratchpad Introduction by Smith, V.S., Koureas, D, & Livermore, L. Updated for Feb. 2014.

Transcript of Scratchpad 2014-introduction

Page 1: Scratchpad 2014-introduction

ScratchpadsVirtual Research Environments

for taxonomic and biodiversity related data

Dr. Vince SmithInformatics Research Leader

The Natural History Museum London

Page 2: Scratchpad 2014-introduction

Smith, V.S., Koureas, D, & Livermore, L. 2014. Scratchpads

introductory presentation. Slideshare.

http://www.slideshare.net/vsmithuk/Scratchpad-2014-Introduction

Where to find and how to cite this presentation:

Page 3: Scratchpad 2014-introduction

Publications based on countless

specimens, images, maps, keys and datasets

Current taxonomic data production

Typically generated by small communities for “local” research projects

Figure from Costello M.J et al, 2013. doi: 10.1126/science.1230318

Page 4: Scratchpad 2014-introduction

However…

not publicly accessible

lack sufficient contextual metadata

published in formats that require time-consuming manual extraction

difficulty in publishing valuable datasets (i.a. local or regional Floras, Faunas)

Published knowledge cannot easily be mobilised

Vast amounts of unpublished taxonomic “knowledge”

Page 5: Scratchpad 2014-introduction

On the other hand:

Estimates of

7.5 million species

still undescribed1

1How Many Species Are There on Earth and in the Ocean? Mora C et al.

doi:10.1371/journal.pbio.1001127

Page 6: Scratchpad 2014-introduction

Expected volume

of taxonomic and

biodiversity data

Need of extracting,

aggregating and linking

data on a global level

Page 7: Scratchpad 2014-introduction

The four nodes of data cycle

1. We collect and generate data

2. We curate, link and structure data

3. We analyse data

4. We publish data

Page 8: Scratchpad 2014-introduction

Data curation

Data publishing

The four nodes of data cycle

Data collection &generation

What are the

bottlenecks

in the workflow?

Data analysis

Page 9: Scratchpad 2014-introduction

Data curation

Data publishing

What we need is…

Data collection &generation

aseamless

workflow

Data analysis

Page 10: Scratchpad 2014-introduction

Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001

This requires data, information & knowledge to be…

• Digital Not printed paper

• Openly accessible Not behind barriers (e.g. paywalls)

• Linked-up Not in silos

“Link together evolutionary data… by developing

analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses”

To achieve this…

Page 11: Scratchpad 2014-introduction

ScratchpadsVirtual Research Environments

Making taxonomy digital, open & linked

Page 12: Scratchpad 2014-introduction

so…

what are

the

Scratchpads?

Page 13: Scratchpad 2014-introduction

What are Scratchpads?

Hosted websites for biodiversity data

Virtual research & publication platform

Completely open access & open source

Modular & flexible

Page 14: Scratchpad 2014-introduction

What are Scratchpads?

development of online research communities

facilitate

standardized environment of entering and curating data

through

sharing and interlinking

that allow

dissemination of research products

and

Page 15: Scratchpad 2014-introduction

A Scratchpad is a website that holds data for you and your community

The Scratchpads concept

Your data External data & services

Page 16: Scratchpad 2014-introduction

The Scratchpads concept

Page 17: Scratchpad 2014-introduction

Examples of use:

Taxa(Classifications, taxon profiles, specimens, literature, images, maps, phenotypic,

genotypic & morphometric datasets, keys, phylogenies)

Conservation Projects Regions Societies

Page 18: Scratchpad 2014-introduction

Red List conservation assessments

Examples of use:

Page 19: Scratchpad 2014-introduction

Examples of use:

Bulbous monocot genera listed in CITES

Page 20: Scratchpad 2014-introduction

Global Invasive Alien Species Information Partnership

Examples of use:

Page 21: Scratchpad 2014-introduction

Belgian Network for DNA Barcoding

Examples of use:

Page 22: Scratchpad 2014-introduction

Major integrated projects

• Online resource for monocot plants

• Collaboration between Kew, Oxford University and NHM

• Data to be open and usable by other scientists

Page 23: Scratchpad 2014-introduction

Major integrated projects

• 21+ open community sites and growing

• Over 45 internationally collaborating scientists

• Site data feeds into a “Portal”

Site List: http://about.e-monocot.org/list-emonocot-scratchpads

Page 24: Scratchpad 2014-introduction

Major integrated projects

• Retrieve information on any Monocot plant

• Rich downloadable data

• Identification keys

• Model example of linked attributed data

eMonocot Portal: http://e-monocot.org/

Page 25: Scratchpad 2014-introduction

65,000 unique visitors/month

Per month unique visitors to Scratchpads sites

665 Scratchpads Communities

by 7,334 active registered users

covering 162,432 taxa

in 735,660 pages.

Are Scratchpads sustainable?

81 paper citations in 2012

In total more than

1,300,000 visitors

Page 27: Scratchpad 2014-introduction

the main

features

Page 28: Scratchpad 2014-introduction

Classification term oriented system

Biologicalclassifications

Non-biologicalclassifications

Taxonomies Hierarchical controlled vocabularies

The main features

Page 29: Scratchpad 2014-introduction

Dynamic Biological Classifications

Manually entered or imported

Auto generated

The main features

Page 30: Scratchpad 2014-introduction

Taxon pages

Overview of data related to taxon

Generated from tagged content

The main features

Page 31: Scratchpad 2014-introduction

Bibliography management

Faceted browsing

An inbuilt Bibliography manager

Taxon tagging and free keywords

Import from and export to all major formats

The main features

Page 32: Scratchpad 2014-introduction

Specimen/Observation data

Linked to images and georeferenced

Annotated full specimen/observation records

The main features

Linked to GenBank accession numbers

Page 33: Scratchpad 2014-introduction
Page 34: Scratchpad 2014-introduction

Distribution maps

Google maps based

Data layers

Occurrence data

Distribution dataTDWG regions

GBIF data

The main features

Page 35: Scratchpad 2014-introduction

Example regional distributionThe main features

Page 36: Scratchpad 2014-introduction

Create phylogenetic treesBased on Newick/NeXML

Different views

Page 37: Scratchpad 2014-introduction

Character matrices – Key construction

Quantitative or qualitative characters

Auto generation of keys

Taxon based matrices [Specimens based character matrices]

The main features

Page 38: Scratchpad 2014-introduction

Media handling

Bulk upload

Metadata

(EXIF & Aubudon core)

Media galleries

The main features

Page 39: Scratchpad 2014-introduction

Generation of custom pages

Tagged or not

External RSS

Twitter feeds

Media files

The main features

Page 40: Scratchpad 2014-introduction

Working groups

Forums

Blog entries

Webforms

Newsletters

RSS syndication

Inbuilt comments

Enhanced communication tools

The main features

Page 41: Scratchpad 2014-introduction

analytical tools

OBOE service

i.a.

Ecological informatics,

Phylogenetics,

Sequence alignment

The main features

Page 42: Scratchpad 2014-introduction

MCMC methods to estimate the posterior distribution of model parameters

Phylogenies

Sequence alignment

Multiple sequence alignment

Microsatellite repeats finder

Page 43: Scratchpad 2014-introduction

data

mobilisation

more on the way…

External services Integration

Page 44: Scratchpad 2014-introduction

IUCN data integration

Page 45: Scratchpad 2014-introduction

GBIF data integration

Page 46: Scratchpad 2014-introduction
Page 47: Scratchpad 2014-introduction

Help & Support

• In-site Support

• Wiki

• Training Courses (12 in 2012)

• Ambassadors Programme

• Embedded Issues Queue

• Sandbox Site

http://help.scratchpads.eu

Page 48: Scratchpad 2014-introduction

Data curation

Data publishing

Data collection &generation

aseamless

workflow

Data analysis

Data publishing

Page 49: Scratchpad 2014-introduction

Helping researchers take

credit for all research products

The vision

Page 50: Scratchpad 2014-introduction

Publication module

Page 51: Scratchpad 2014-introduction

The

Publication module

Open-accessjournal

The main features

Page 52: Scratchpad 2014-introduction

What does the BDJ publish?

• Single taxon treatments and nomenclatural acts

• Local or regional checklists• Sampling reports and occasional

inventories• Habitat-based checklists and inventories• Ecological and biological observations of

species and communities?• Single identification keys • biodiversity-related databases, including

genomic, ecological and environmental data (data papers)

• Biodiversity-related software tools

Page 53: Scratchpad 2014-introduction

How do

Scratchpads and the

BDJ interact?

Page 54: Scratchpad 2014-introduction

Allow submission of

datasets for publication

without reformatting and

restructuring

Working in a single environment

based on standardised XML schema

Page 55: Scratchpad 2014-introduction

• Work on multiple manuscripts

• Allocate different people to different manuscripts

• Handle permissions

Assembling a manuscript

Page 56: Scratchpad 2014-introduction

Author names and affiliations

Data included in manuscript in a structured annotated format

Assembling a manuscript

Page 57: Scratchpad 2014-introduction

Taxon descriptions

Assembling a manuscript

Page 58: Scratchpad 2014-introduction

Specimen data

Assembling a manuscript

Page 59: Scratchpad 2014-introduction

Figures and Tables

Page 60: Scratchpad 2014-introduction

Supplementary files

Select from existing or upload new

Page 61: Scratchpad 2014-introduction

References

Assembling a manuscript

Easily cite bibliography

Auto compile list of references

Page 62: Scratchpad 2014-introduction

Assembling a manuscript

Texts

Page 63: Scratchpad 2014-introduction

XMLFigures and Tables

Keys

References

Texts

The publication module

Author names and affiliations

Taxon descriptions

Specimen data

Supplementary files

Page 64: Scratchpad 2014-introduction

Previewing your manuscript

Page 65: Scratchpad 2014-introduction

Submission & enhanced peer review

• Manuscript data validation

• One-click submission to BDJ

• Traditional peer review and optional panel/public review

Page 66: Scratchpad 2014-introduction

The workfl ow

MANUSCRIPT PUBLISHED(XML, PDF)

PENSOFT JOURNAL SYSTEM (PJS 2.0)

XML submission

SCRATCHPADS

Com

mun

ity

Taxon namesOccurrence datadatasetsArchive Taxon treatments

Plazi Wiki

Page 67: Scratchpad 2014-introduction

Scratchpads are an integrated system to

Enter, Curate, Mark-up, Link and Publish data

taxonomic workflowin a single virtual environment

Page 68: Scratchpad 2014-introduction

Scratchpads technical development- Vince Smith, Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boutton

Scratchpads outreach- Laurence Livermore, Isa van deVelde & Dimitris Koureas

e-Monocot- Paul Wilkin & the Kew team, Charles Godfray & the Oxford team

ViBRANT- Vince Smith, Dave Roberts & Lucy Reeve

Pensoft

- Lyubomir Penev and the Pensoft team

Our 7000+ users

Acknowledgements

Page 69: Scratchpad 2014-introduction

Thank you Data

curation

Data analysis

Data publishing

Data collection &generation

Page 70: Scratchpad 2014-introduction