Scratchpad 2014-introduction
-
Upload
vincent-smith -
Category
Technology
-
view
456 -
download
0
description
Transcript of Scratchpad 2014-introduction
ScratchpadsVirtual Research Environments
for taxonomic and biodiversity related data
Dr. Vince SmithInformatics Research Leader
The Natural History Museum London
Smith, V.S., Koureas, D, & Livermore, L. 2014. Scratchpads
introductory presentation. Slideshare.
http://www.slideshare.net/vsmithuk/Scratchpad-2014-Introduction
Where to find and how to cite this presentation:
Publications based on countless
specimens, images, maps, keys and datasets
Current taxonomic data production
Typically generated by small communities for “local” research projects
Figure from Costello M.J et al, 2013. doi: 10.1126/science.1230318
However…
not publicly accessible
lack sufficient contextual metadata
published in formats that require time-consuming manual extraction
difficulty in publishing valuable datasets (i.a. local or regional Floras, Faunas)
Published knowledge cannot easily be mobilised
Vast amounts of unpublished taxonomic “knowledge”
On the other hand:
Estimates of
7.5 million species
still undescribed1
1How Many Species Are There on Earth and in the Ocean? Mora C et al.
doi:10.1371/journal.pbio.1001127
Expected volume
of taxonomic and
biodiversity data
Need of extracting,
aggregating and linking
data on a global level
The four nodes of data cycle
1. We collect and generate data
2. We curate, link and structure data
3. We analyse data
4. We publish data
Data curation
Data publishing
The four nodes of data cycle
Data collection &generation
What are the
bottlenecks
in the workflow?
Data analysis
Data curation
Data publishing
What we need is…
Data collection &generation
aseamless
workflow
Data analysis
Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001
This requires data, information & knowledge to be…
• Digital Not printed paper
• Openly accessible Not behind barriers (e.g. paywalls)
• Linked-up Not in silos
“Link together evolutionary data… by developing
analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses”
To achieve this…
ScratchpadsVirtual Research Environments
Making taxonomy digital, open & linked
so…
what are
the
Scratchpads?
What are Scratchpads?
Hosted websites for biodiversity data
Virtual research & publication platform
Completely open access & open source
Modular & flexible
What are Scratchpads?
development of online research communities
facilitate
standardized environment of entering and curating data
through
sharing and interlinking
that allow
dissemination of research products
and
A Scratchpad is a website that holds data for you and your community
The Scratchpads concept
Your data External data & services
The Scratchpads concept
Examples of use:
Taxa(Classifications, taxon profiles, specimens, literature, images, maps, phenotypic,
genotypic & morphometric datasets, keys, phylogenies)
Conservation Projects Regions Societies
Red List conservation assessments
Examples of use:
Examples of use:
Bulbous monocot genera listed in CITES
Global Invasive Alien Species Information Partnership
Examples of use:
Belgian Network for DNA Barcoding
Examples of use:
Major integrated projects
• Online resource for monocot plants
• Collaboration between Kew, Oxford University and NHM
• Data to be open and usable by other scientists
Major integrated projects
• 21+ open community sites and growing
• Over 45 internationally collaborating scientists
• Site data feeds into a “Portal”
Site List: http://about.e-monocot.org/list-emonocot-scratchpads
Major integrated projects
• Retrieve information on any Monocot plant
• Rich downloadable data
• Identification keys
• Model example of linked attributed data
eMonocot Portal: http://e-monocot.org/
65,000 unique visitors/month
Per month unique visitors to Scratchpads sites
665 Scratchpads Communities
by 7,334 active registered users
covering 162,432 taxa
in 735,660 pages.
Are Scratchpads sustainable?
81 paper citations in 2012
In total more than
1,300,000 visitors
Are Scratchpads sustainable?
2007 2011 2014
ViBRANTVirtual Biodiversity Research
& &
Other grants in the pipeline
New Proposals
the main
features
Classification term oriented system
Biologicalclassifications
Non-biologicalclassifications
Taxonomies Hierarchical controlled vocabularies
The main features
Dynamic Biological Classifications
Manually entered or imported
Auto generated
The main features
Taxon pages
Overview of data related to taxon
Generated from tagged content
The main features
Bibliography management
Faceted browsing
An inbuilt Bibliography manager
Taxon tagging and free keywords
Import from and export to all major formats
The main features
Specimen/Observation data
Linked to images and georeferenced
Annotated full specimen/observation records
The main features
Linked to GenBank accession numbers
Distribution maps
Google maps based
Data layers
Occurrence data
Distribution dataTDWG regions
GBIF data
The main features
Example regional distributionThe main features
Create phylogenetic treesBased on Newick/NeXML
Different views
Character matrices – Key construction
Quantitative or qualitative characters
Auto generation of keys
Taxon based matrices [Specimens based character matrices]
The main features
Media handling
Bulk upload
Metadata
(EXIF & Aubudon core)
Media galleries
The main features
Generation of custom pages
Tagged or not
External RSS
Twitter feeds
Media files
The main features
Working groups
Forums
Blog entries
Webforms
Newsletters
RSS syndication
Inbuilt comments
Enhanced communication tools
The main features
analytical tools
OBOE service
i.a.
Ecological informatics,
Phylogenetics,
Sequence alignment
The main features
MCMC methods to estimate the posterior distribution of model parameters
Phylogenies
Sequence alignment
Multiple sequence alignment
Microsatellite repeats finder
data
mobilisation
more on the way…
External services Integration
IUCN data integration
GBIF data integration
Help & Support
• In-site Support
• Wiki
• Training Courses (12 in 2012)
• Ambassadors Programme
• Embedded Issues Queue
• Sandbox Site
http://help.scratchpads.eu
Data curation
Data publishing
Data collection &generation
aseamless
workflow
Data analysis
Data publishing
Helping researchers take
credit for all research products
The vision
Publication module
The
Publication module
Open-accessjournal
The main features
What does the BDJ publish?
• Single taxon treatments and nomenclatural acts
• Local or regional checklists• Sampling reports and occasional
inventories• Habitat-based checklists and inventories• Ecological and biological observations of
species and communities?• Single identification keys • biodiversity-related databases, including
genomic, ecological and environmental data (data papers)
• Biodiversity-related software tools
How do
Scratchpads and the
BDJ interact?
Allow submission of
datasets for publication
without reformatting and
restructuring
Working in a single environment
based on standardised XML schema
• Work on multiple manuscripts
• Allocate different people to different manuscripts
• Handle permissions
Assembling a manuscript
Author names and affiliations
Data included in manuscript in a structured annotated format
Assembling a manuscript
Taxon descriptions
Assembling a manuscript
Specimen data
Assembling a manuscript
Figures and Tables
Supplementary files
Select from existing or upload new
References
Assembling a manuscript
Easily cite bibliography
Auto compile list of references
Assembling a manuscript
Texts
XMLFigures and Tables
Keys
References
Texts
The publication module
Author names and affiliations
Taxon descriptions
Specimen data
Supplementary files
Previewing your manuscript
Submission & enhanced peer review
• Manuscript data validation
• One-click submission to BDJ
• Traditional peer review and optional panel/public review
The workfl ow
MANUSCRIPT PUBLISHED(XML, PDF)
PENSOFT JOURNAL SYSTEM (PJS 2.0)
XML submission
SCRATCHPADS
Com
mun
ity
Taxon namesOccurrence datadatasetsArchive Taxon treatments
Plazi Wiki
Scratchpads are an integrated system to
Enter, Curate, Mark-up, Link and Publish data
taxonomic workflowin a single virtual environment
Scratchpads technical development- Vince Smith, Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boutton
Scratchpads outreach- Laurence Livermore, Isa van deVelde & Dimitris Koureas
e-Monocot- Paul Wilkin & the Kew team, Charles Godfray & the Oxford team
ViBRANT- Vince Smith, Dave Roberts & Lucy Reeve
Pensoft
- Lyubomir Penev and the Pensoft team
Our 7000+ users
Acknowledgements
Thank you Data
curation
Data analysis
Data publishing
Data collection &generation