A Global Library of Life: The Biodiversity Heritage Library
-
Upload
martin-kalfatovic -
Category
Education
-
view
1.301 -
download
2
Transcript of A Global Library of Life: The Biodiversity Heritage Library
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
A Global Library for Life
Martin R. KalfatovicSmithsonian Institution Libraries25 February 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
The cultivation of natural science cannot be efficiently carried on without reference to an extensive library
Charles Darwin, et al (1847)
Darwin, C. R. et al. 1847. Copy of Memorial to the First Lord of the Treasury [Lord John Russell], respecting the Management of the British Museum. Parliamentary Papers, Accounts and Papers 1847, paper number (268), volume XXXIV.253 (13 April): 1-3. [Complete Works of Charles Darwin Online]
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Taxonomic descriptions must be published for the name to be valid
Publications must be available to the public through trusted sources
Libraries have been the traditional place
Taxonomic Literature
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
The cited half-life of publications in taxonomy is longer than in any other scientific discipline
* * * The decay rate is longer than in any scientific discipline
~ Macro-economic case for open accessTom Moritz
Taxonomic Literature
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Over 250 years of systematic description of life
Systema naturae (10th ed. 1758) by Carl von Linné
Taxonomic Literature
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
2003. Telluride. Encyclopedia of Life meeting
February 2005. London. Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature
May 2005. Washington. Ground work for the Biodiversity Heritage Library
June 2006. Washington. Organizational and Technical meeting
August 2006. New York Botanical Garden. BHL Director’s Meeting.
October 2006. St. Louis/San Francisco. Technical meetings
February 2007. Museum of Comparative Zoology. Organizational meeting
May 2007. Encyclopedia of Life and BHL Portal Launch. Washington DC.
BHL Timeline
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
American Museum of Natural History (New York)
Field Museum (Chicago)
Natural History Museum (London)
Smithsonian Institution Libraries (Washington)
Missouri Botanical Garden (St. Louis)
New York Botanical Garden (New York)
BHL Members
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Royal Botanic Garden, Kew
Botany Libraries, Harvard University
Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University
Marine Biological Laboratory / Woods Hole Oceanographic Institution
BHL Members
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
BHL MembersUniversity of Illinois, Urbana-
Champaign (contributing member)
Scheme for addition of European and Asian partners underway
Additional categories of membership under consideration
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
BHL Focus: Literature
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
BHL Focus: Literature
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
• Core literature pre-1923: 100 million pages (?)
• All pre-1923: 120-150 million pages
• All literature: 280-320 million pages
BHL Focus: Literature
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
• 1.3 million catalogue records • 73% are monographs
(remainder are serials at title-level)
• 63% is English language material
• The next most popular language (9%) is German
• About 30% of material was published before 1923
BHL Collections
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
The Internet Archive
• 501(c)(3) organization• Dedicated to “Universal Access to
Human Knowledge”• Founder of the Open Content
Alliance• Provides:
– Mass scanning– Archival storage of files– Image processing– Technology development
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Scribe Scanner
• Single Scribe Machine– Custom built by the
Internet Archive– Human operated– 3,500 page per shift per
day
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
BHL Scanning Centers
Northeast Regional Scanning Center 10 Scribe machines MBL/WHOI Harvard
New York Public Library 10 Scribe machines AMNH NYBG
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
BHL Scanning Centers
University of Illinois 2 Scribe machines
Natural History Museum, London 1 Scribe machine
Missouri Botanical Garden Non-Scribe operation
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
BHL Scanning Centers
Washington, DC 1 Scribe machine at
Smithsonian Libraries 10 Scribe facility at
Library of Congress with Fedlink (operational Spring 2008)
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Scanning Stats: Now 5.5 million plus total pages scanned
(and growing daily) <90,000 Fieldiana (via UIUC) >100,000 pages each Harvard, New
York Botanical Garden, 225,000+ pages from the American
Museum of Natural History 400,000+ from Smithsonian Libraries 500,000+ from the Natural History
Museum, London 800,000 Missouri Botanical Garden
Library 1,000,000+ from the MBL/WHOI library
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
But what about ...
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
BHL \ Google(the difference between)
Bibliographic accuracy for all materials
Ability to re-purpose and reuse all data as needed
Congruence of original printed materials to digital versions
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Persistent Identifiers Stable URL Handle DOI BICI/SICI ISSN ISBN LSIDs
http://www.biodiversitylibrary.org
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Structural Markup<article> <title>A BRIEF CONSIDERATION OF
CERTAIN POINTS IN THE MORPHOLOGY OFTHE FAMILY CHALCIDID^E.*.</title>
<author>L. O. HOWARD.</author> <volume>1</volume> <issue>2</issue> <start_page>65</start_page> <end_page>86</end_page> <start_count_page>85</start_count_page> <end_count_page>106</end_count_page>
<start_page_image_file>3908800908001101smthrich_0085.djvu</start_page_image_file>
<end_page_image_file>3908800908001101smthrich_0106.djvu</end_page_image_file>
</article>
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Semantic Markup
GoldenGATEThe intention of the GoldenGATE editor is to build a bridge between NLP components and XML markup of natural language text according to arbitrary XML schemas. It allows the deployment of NLP components to marking up the bodies of literature they were designed for. In this way, it enables transforming the texts into XML content according to an XML schema that was designed to gain maximum benefit from the knowledge provided in them.
Integrated Open Taxonomic Access (INOTAXA)
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
10.7 million name strings in NameBank
Uses sophisticated algorithm (TaxonGrab) to locate likely name strings in OCR text
Iterative processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition
Taxonomic Intelligence
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
BHL & Publishers
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Permissions
• Seek permissions from copyright holders
• Opt in Copyright Model: The BHL will actively work with professional societies and associations to integrate their publications into the BHL in a way that serves the societies’ missions and goals
• BHL will digitize learned society backfiles and mount them through the BHL Portal at no cost.
• Will provide a set of files to the publishers for reuse as they see fit
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
BHL Advantages• Use of the articles will increase
as evidenced by citation upsurge• Long-term management of the
digital assets is provided by the BHL at no cost
• Publishers’ content is embedded in the emerging knowledge ecology that is sweeping biology in this century
• Structural markup of backfiles into conformance with NLM DTD (just starting)
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Successes
• Entomological News• Journal of Hymenoptera
Research
• Herpetological Review
• Publications of the San Diego Natural History Museum
• California Academy of Sciences publications
• And more ...
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
BHL Portal• Library catalog-like interface
to BHL literature• Enhanced structural
analysis to provide volume/issue/article page access to the literature
• Iterative development based on feedback from user community
• Provide access to two key audiences:–Humans–Machines
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Page Delivery
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Taxonomic Intelligence
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Search Browse
Web 2.0 Features
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Discovered Bibliographies
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Initial grant from the MacArthur and Sloan Foundations (as part of the Encyclopedia of Life grant)
Additional support from parent institutions
Additional grants being actively pursued by BHL and individual members
Funding
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Structure of the Encyclopedia of Life
Serine Molecule
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Serine Molecule
Synthesis CenterField Museum
BiodiversityHeritageLibrary
SecretariatSmithsonian Education &
OutreachSmithsonian/Harvard
InformaticsMarine Biological
Laboratory & MOBOT
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
EOL Species Pages
Built from a variety of new and existing sources
Views available for varying levels of expertise from novice to expert
Legacy literature a key component of the EOL species pages
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
• Co-evolving bioinformatics resources produce a rich information ecology:
– Consortium for the Barcoding of Life (CBOL) with gene sequences deposited in GenBank.
– GBIF’s Electronic Catalog of Taxonomic Names
– Herbaria and museum specimen databases
Looking Forward
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
• Quick ramp-up high early costs – development, mass scanning, etc.
• Derive some long-term costs from the operating budgets of the member institutions (Examples under consideration: acquisitions budget, staff positions, etc.)
• Integrate functions/tasks with wider efforts where appropriate, e.g. mass storage
Looking Forward
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Institutions that are creating the BHL exist to persist through time. That’s an important part of their business
The future is uncertain, the technology landscape changes, people pass on. So create consortial structures that are low-overhead, flexible, and can respond quickly
The Long Now Strategy
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
In any well-appointed Natural History Library there should be found every book and every edition of every book dealing in the remotest way with the subjects concerned.
Charles Davies Sherborn, Epilogue to Index Animalium,
March 1922
A Global Library for Life
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Thank You ... for sticking around!
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Biodiversity Heritage Libraryhttp://www.biodiversitylibrary.org/
Biodiversity Heritage Library Bloghttp://biodiversitylibrary.blogspot.com
Encyclopedia of Lifehttp://www.eol.org/
Smithsonian Institution Librarieshttp://www.sil.si.edu/
Universal Biological Indexer and Organizerhttp://www.ubio.org/
Biologia Centrali-Americana http://www.sil.si.edu/digitalcollections/bca/
LINKS
MARTIN R. KALFATOVIC :: SMITHSONIAN INSTITUTION LIBRARIES :: NFAIS 2008 ANNUAL CONFERENCE :: 25 FEBRUARY 2008
Thanks to: Chris Freeland, Missouri Botanical
Garden Tom Garnett, The Biodiversity Heritage
Library The staff at the Internet Archive
Images from The Galaxy of Images, Smithsonian
Libraries (www.sil.si.edu/imagegalaxy) Martin R. Kalfatovic Suzanne C. Pilsk Bernard Scaife
CREDITS