Digitised collections: Toward a digital strategy for for the NHM, London
-
Upload
vincent-smith -
Category
Technology
-
view
196 -
download
0
description
Transcript of Digitised collections: Toward a digital strategy for for the NHM, London
![Page 1: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/1.jpg)
Digitised collections:Toward a digital strategy forfor the NHM, London
Vince Smith
Workshop 3, pro-iBiosphere, Berlin23 May 2013
![Page 2: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/2.jpg)
Digital Ambition: NHM Science Strategy 2013-2017
A New Voyage of Discovery
Three Focal Areas1. Scientific discovery2. Scientific Infrastructure3. Scientific engagement
Five Challenges1. The Digital NHM2. Origins, evolution & futures3. Biodiversity discovery4. Natural resources & hazards5. Science, society & skills
Resources & funding
Measuring success
![Page 3: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/3.jpg)
data.nhm.ac.uk/globe/
![Page 4: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/4.jpg)
A New Voyage of Discovery
Three Focal Areas1. Scientific discovery2. Scientific Infrastructure3. Scientific engagement
Five Challenges1. The Digital NHM2. Origins, evolution & futures3. Biodiversity discovery4. Natural resources & hazards5. Science, society & skills
Resources & funding
Measuring success
Digital Ambition: NHM Science Strategy 2013-2017
Scientific impact 1,000 papers in leading journalsDigital access 20M specimens available digitallyEngagement 1M face-to-face engagementsCollections Globally important collectionsDiagnostic tools Diagnostic tools for key groupsDeep time Timeline of key transitionsScience & society Articulate of the role of scienceUK network Act as a national museumEarth sciences Earth Sciences CentreFunding £10M for Five Challenge Areas
![Page 5: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/5.jpg)
Overview
1. Existing digital content, sources & formats• Research data• Collections data
2. Making collections data digital• Priorities• Protocols & pathfinder activities• Crowdsourcing transcription
3. Aggregation & delivery• The NHM data portal• Data visualisation, data sub-portals
4. Identifiers, links & interoperability• DataCite DOIs• Third party aggregators• Portal API’s, download & analytical functions
5. Timeline & constraints• Data policies• Next steps
Digitisation activities
Data portal
![Page 6: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/6.jpg)
NHM Research Outputs
• 49 papers, 45 available online(4 print only or behind pay walls)
• 9 had supplementary data files• 39 papers with tables, charts & other data
o >1000 sequenceso 826 figureso 76 tableso 1 genome
• No collective view of these data (37 journals)• No consistent way of citing NHM data• No consistent mechanism to access data• Effectively invisible at the institutional level
One Month of NHM Science group papers
Data via Carolyn Lowry e-mail, 13th Feb. 2013
1. Existing digital content
![Page 7: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/7.jpg)
NHM Collections Outputs: data
• Huge investment in NHM collection management system• ≠ Imaging• Most research projects need spatio-temporal records• Different requirements for different purposes
NHM COLLECTIONS April 2013
Collection area Estimate no of specimens
No. records in database
% collection in database
% records with location info
Botany 6,000,000 626,000 ~ 10% 96%Entomology 32,000,000 316,000 <1% 68%Mineralogy 500,000 422,000 ~ 95% 79%Palaeontology 9,000,000 342,000 ~ 3% 89%Zoology 28,000,000 1,131,000 ~ 60% via lots) 69%TOTAL 76,000,000 2,837,000 3% (23% )
1. Existing digital content
![Page 8: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/8.jpg)
• Many, many imaging projects (highly fragmented)• Circa 40 TB for major collections (excluding library)• 120,000 images in KE EMu (many others not in KE!)• Circa 250,000 via NHM Photo unit (limited metadata)
Collection area No. image files Disk spaceBotany 140,133 35,302Entomology 529,106 3,172Mineralogy 14,000 6Palaeontology 122,548 993Zoology 12,975 1,598TOTAL 818,762 41,070
NHM Collections Outputs: images1. Existing digital content
![Page 9: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/9.jpg)
Current data formats
• Darwin Core Archive (DwCA) & extensions (collections)• Circa 2020 fields mapped to 50 fields to generate archive• Images mainly JPG & TIFF• Metadata using EML & Genesis II standard• Research data files in a wide array of formats (blob files)
Nexus (character data and Newick formatted phylogenetic trees)
Non-NHM specimen lists (as Darwin Core Archive files)
PhyloXML (an XML standard for representing phylogenetic trees)
Output from the Imaging and Analysis Centre (Micro CT datafile formats)
NeXML (an XML standard for representing character data)
Collections of images from digitisation projects (as a collection of links or a zipped archive)
Sequence trace files (.scf sequence chromatogram format files) Environmental sequence files
Taxon checklists (as Darwin Core Archive files) Collection level descriptions
1. Existing digital content
![Page 10: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/10.jpg)
• Priorities linked to science strategic prioritieso Disease, sustainability, crop wild relatives, pests etc.
• Tiered approach, different needs for different collections• Low hanging fruit (2D objects e.g. herb. sheets & slides)
2. Making collections data digitalDigitisation Priorities
![Page 11: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/11.jpg)
• Priorities linked to science strategic prioritieso Disease, sustainability, crop wild relatives, pests etc.
• Tiered approach, different needs for different collections• Low hanging fruit (2D objects e.g. herb. sheets & slides)• Linked to strategic collaborations & financial opportunities
o e.g RBG Kew, RBG Edinburgh, Nat. Mum. Wales, Hunterian etc.
• Priorities dictate order – we plan to do it all (eventually)!
2. Making collections data digitalDigitisation Priorities
![Page 12: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/12.jpg)
• Exercise to develop digitisation protocols across collectiono Slides, spirit, herbarium sheets, pinned, multispecimen/drawer
• Protocols mapped to high level collections descriptions• Workflow software supporting rapid digitisation (to KE & DAMS)
2. Making collections data digitalDigitisation Protocols
![Page 13: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/13.jpg)
• Exercise to develop digitisation protocols across collectiono Slides, spirit, herbarium sheets, pinned, multispecimen/drawer
• Protocols mapped to high level collections descriptions• Workflow software supporting rapid digitisation (to KE & DAMS)• Pathfinder activities for less well understood projects
o Entomological dry material (30 M specimens)- iCollections (specimen-by-specimen) approach- SatScan (drawer level multi-specimen) approach
2. Making collections data digitalDigitisation Protocols
![Page 14: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/14.jpg)
• Specimen-by-specimen, traditional, dedicated 6 person team• Digitising British Isles Lepidoptera collection• ~500,000 specimens, 5,000 drawers• Re-curation & specimen imaging• Complete label information including georeferencing• For use in Climate Change initiative
2. Making collections data digitaliCollections Initiative
![Page 15: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/15.jpg)
• 4-6 people over 3 years, work broken into small tasks by teams• Average imaging rate 163 specimen/day*person• Averaging >3min per specimen (prep., imaging & databasing) • >£1/specimen• BUT: 6,800 person years for the entire collection
2. Making collections data digitaliCollections Initiative
![Page 16: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/16.jpg)
• Drawer level digitisation, segmented down to specimens• Very fast imaging, no specimen handling, just one view• No label information, but some data extracted from drawer• Specimens retrospectively cropped & annotated
2. Making collections data digitalSatScan Initiative
![Page 17: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/17.jpg)
• Drawer level digitisation, segmented down to specimens• Very fast imaging, no specimen handling, just one view• No label information, but some data extracted from drawer• Specimens retrospectively cropped & annotated
2. Making collections data digitalSatScan Initiative
![Page 18: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/18.jpg)
• Dedicated specimen-level rapid annotation software
2. Making collections data digitalSatScan Initiative
![Page 19: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/19.jpg)
Crowdsourcing & Transcription
• We have a massive transcription problem• Experiments via Notes-from-Nature (a Zooniverse project)
• Transcribing the NHM ornithological accession registers
• Wikimedian in Residence (Wikisource transcription)• 4 Month project, including specimen label transcription
2. Making collections data digital
![Page 20: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/20.jpg)
data.nhm.ac.uk• A focus for deposition and discovery of major NHM data sets• Promote innovation though re-use of museum data• Open Access, at a dedicated subdomain of the NHM website• Started Jan. 2013 (3 years), consultation throughout 2012
NHM Data Portal
Functional components of the data portal
3. Aggregation & Delivery
![Page 21: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/21.jpg)
Search
Datasets matching
criteria
Individual dataset
Results
Browse & searchcriteria
Advanced display options
• Dataset registry, for dataset discovery, modeled on data.gov.uk• Uses CKAN, an open-source data portal software platform
3. Aggregation & DeliveryNHM Data Portal: Registry
![Page 22: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/22.jpg)
Metadata about the dataset
Name
Geographic scope
Tags
“Social”
Authors
License
Download
Developer tools
TechnicalInfo.
(extracted from data
file)
• Dataset metadata discovery
3. Aggregation & DeliveryNHM Data Portal: Registry
![Page 23: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/23.jpg)
• Simple datasets upload workflow for non-collections data
1. Name the dataset 2. Upload / link
the data file
3. Describe the data file
4. Theme & tag
5. Add additional resources
6. Temporal coverage
7. Geographic coverage
8. Save & finish
3. Aggregation & DeliveryNHM Data Portal: Dataset upload
![Page 24: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/24.jpg)
Zoomable map
Applied filters
Toggle map, table & stats views
Search, download & display optionsNo. records
No. Georef. records
• Dedicated interface to visualise & explore major datasets• Focused on collections data, based on Canadensys.net, uses CartoDB
3. Aggregation & DeliveryNHM Data Portal: Data visualisation
![Page 25: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/25.jpg)
Collections views
Statistical summary
Specimen record views
Data field mappings
Summary preview
Full record
Tables
Download
3. Aggregation & DeliveryNHM Data Portal: Data visualisation
![Page 26: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/26.jpg)
• Using DataCite DOIs in the data portal• datasets (2014) & specimens (2015)
• Unique, persistent and resolvable identifiers• Easy to cite, alias existing specimen identifiers• Conform to minimum DataCite requirements
• Landing page, min. metadata standard, fee, min. 10 yr. contract, DOI (pre)fixes
NHM Data Portal & DataCite
Breaks us out of the biodiversity data silo
4. Identifiers, links & interoperability
![Page 27: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/27.jpg)
• Content within the NHM data portal will be highly accessibleo Collections harvestable (e.g. by GBIF as a DwCA)o Download DwCAs on any search faceto Wide set of API’s available of datasets (part of CKAN)
• Sub-portals (selected content, themed by topic)o e.g Virtual Herbarium, NHM Science initiatives, geographic regions
• Analytical interface planned for 2015 (but not specified)
Data Aggregation, APIs & download4. Identifiers, links & interoperability
![Page 28: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/28.jpg)
• Data portal will be “open-by-default”• Ambiguity in what this means & top down schizophrenia• Conflicting mandates on open access & revenue opportunities• Lots of guidance available, will use to form a common policy• A cross institutional policy would be useful (but challenging)
Data Policies & Next Steps5. Timeline & constraints
![Page 29: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/29.jpg)
Jan 2013 Jan 2014 Jan 2015 Jan 2016
Requirements& dataset discovery
Private alpha Stable public beta
Full release & sub-portals
Internal feedback, data visualisation & DOIs
Subportals & analytical tools
Project start
NHM Data portal timeline
Next 6 months• More documentation (PID and Tech Spec)• Consultation and advocacy (internal and external)• Data mapping from KE EMu and software testing• Development
o website wireframe designo drafting data visualisation subcontracto Construction of private alpha release
5. Timeline & constraintsData Policies & Next Steps
![Page 30: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/30.jpg)
Jan 2013 2014 2018
Path-finding & Programme
development
Private alpha Stable public beta
20 Million!!Project start
NHM digitisation timeline
Next 6 months• Initial conclusions from path-finding digitisation activities• Initial grant funding bids developed• Advocacy, outreach & development of a digitisation “programme”• Investigate possibilities for gallery development• Develop crowdsourcing strategy
2015 2016 2017
Major funding applications & a new gallery?
Digitisie… Digitisie… Digitisie…
5. Timeline & constraintsData Policies & Next Steps
![Page 31: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/31.jpg)
QUESTIONS
![Page 32: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/32.jpg)
Digitisation Priorities
• Priorities linked to science strategic prioritieso Disease, sustainability, crop wild relatives, pests etc.
Poacea
e
Brassic
acea
e
Solan
acea
e
Rubiacea
e
Anacard
iacea
e
Arecac
eae
Malvac
eae
Cucurbita
ceae
Grossular
iacea
e
Aquifolia
ceae
Juglandac
eae
Apiacea
e
Aspara
gace
ae
Pedali
acea
e
Laurac
eae
Convolvu
lacea
e
Oleace
ae
Bromeliac
eae
Lecy
thidacea
e0
100200300400500600700
Crop Wild Relatives (accepted taxa only)
2. Making collections data digital
![Page 33: Digitised collections: Toward a digital strategy for for the NHM, London](https://reader036.fdocuments.in/reader036/viewer/2022070315/554e838bb4c90545698b5475/html5/thumbnails/33.jpg)
• Priorities linked to science strategic prioritieso Disease, sustainability, crop wild relatives, pests etc.
• Tiered approach, different needs for different collections
Nick Poole, UK Collections Trust
2. Making collections data digitalDigitisation Priorities