Digitizing the legacy literature of biodiversity: An introduction to the Biodiversity Heritage...
-
Upload
martin-kalfatovic -
Category
Economy & Finance
-
view
791 -
download
3
description
Transcript of Digitizing the legacy literature of biodiversity: An introduction to the Biodiversity Heritage...
TDWG 2006 Conference, St Louis
Digitizing the legacy literature of biodiversity
An introduction to the Biodiversity Heritage Library (BHL)
Neil ThomsonNatural History Museum, London
TDWG 2006 Conference, St Louis
BHL origins and objectives
Encyclopedia of Life meeting at Telluride, 2003 Cost and storage possibilities Natural history literature is an ideal digitization candidate Aim: Available at point of use
TDWG 2006 Conference, St Louis
Scope and IPR
Public domain (pre-1923 in USA) Legacy literature as complement to current material Negotiation with societies and Not-For-Profits Creative Commons licensing – some rights reserved
TDWG 2006 Conference, St Louis
Partners
10 Library partners American Museum of Natural History Field Museum Harvard University Botany Library Missouri Botanical Garden Museum of Comparative Zoology, Ernst Mayr Library National Museum of Natural History, Smithsonian
Institution Natural History Museum, London New York Botanical Garden Royal Botanic Gardens, Kew Woods Hole Oceanographic Institution
TDWG 2006 Conference, St Louis
Associates
OCLC http://www.oclc.org/
Internet Archive http://www.archive.org/index.p
hp
Others in negotiation
TDWG 2006 Conference, St Louis
Structure & funding
BHL is a founder member of the Open Content Alliance www.opencontentalliance.org
/
Charitable status English-language project Register of intentFunding
TDWG 2006 Conference, St Louis
Digitization phases
Bibliographic record pooling Internet Archive Pod of 10 cameras Boutique scanning of rare, fragile or oversize material Metadata enhancement Service building
TDWG 2006 Conference, St Louis
Digitization process
Pooled bibliographic records used for selection, matching and status Page images and OCR Addition of identifiers Quality check Return or offsite storage
TDWG 2006 Conference, St Louis
Metadata repository
Bibliographic record pool
Monographs Serial-titles Article-level metadata
OCLC analysis
TDWG 2006 Conference, St Louis
Statistics - 1
Initial analysis showed: We have 1.3 million catalogue records 73% are monographs (remainder are
serials at title-level) 63% is English language material. The
next most popular language (9%) is German.
About 30% of material was published before 1923.
TDWG 2006 Conference, St Louis
Statistics - 2
Overlap analysis Of the 981,000 monograph records
from all institutions 378,000 matching pairs were found
616,000 had no matches at all and were unique to one institution.
After de-duplication of the matching pairs, the final file contains 757,000 records.
TDWG 2006 Conference, St Louis
Metadata development
Data standards METS
DOIs
LSIDs
Indexes and taxonomic intelligence
TDWG 2006 Conference, St Louis
TDWG 2006 Conference, St Louis
The future
What do scientists want from a digital library?
What will the BHL look like?
TDWG 2006 Conference, St Louis
http://bhl.si.edu/index.cfm