Digitization in Support of Services @ Smithsonian Libraries (May)
Technology Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process
description
Transcript of Technology Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process
Technology BootcampJanuary 18, 2014
Large-Scale Digital LibrariesDigitization Process
Krystyna K. Matusiak, Ph.D.Assistant ProfessorLibrary & Information Science Program
Overview of Digitization 2
Overview• Large-scale digital libraries (DLs)
The National Science Digital Library (NSDL) HathiTrust Europeana The Digital Public Library of America (DPLA)
• Digitization as a conversion process• Fundamental questions
What? Why? How?
• Digitization as a multi-step process• Digitization standards and guidelines
The notion of archival master files and derivatives Image capture: technical factors
• Digitization technology
Overview of Digitization 3
LARGE-SCALE DIGITAL LIBRARIES
Overview of Digitization 4
Large-Scale Digital Libraries• Massive aggregations of scientific and cultural heritage
content with millions of digital objects Offer a new centralized approach to providing access to
scientific and cultural materials Aggregate content (or metadata) from smaller individual
DLs and provide portals for global searching and retrieval Address the limitations of the resource discovery in the
DL environment Build upon over two decades of extensive digitization efforts
• Types of content Born-digital Digitized
Overview of Digitization 5
Large-Scale Digital Libraries• Sources of content
Local digitization: Individual DLs created by academic and public libraries, archives, historical societies, and other cultural heritage and research organizations
Mass digitization: Google Book Project; Open Content Alliance• Information ecosystem – multilayered trusted networks
• Models Distributed (DPLA, Europeana, NSDL) Centralized (HathiTrust)
• Coverage• Goals
Expanding access Supporting digital preservation
Overview of Digitization 10
DIGITIZATION PROCESS
How have we created this critical mass of digitized content?
Overview of Digitization 11
Digitization is a process of conversion of analog information into a digital format through scanning or digital photography. It is a multi-step process that involves selection, image capture, creation of descriptive and technical metadata, and digital preservation of the objects created as a result of the conversion process.
Digitization Overview 12
Basic Digitization Workflow Digitization is More than Scanning
Selection Image capture
Digital processing
Indexing and metadata Ingesting
Preservation and maintenance
Overview of Digitization 13
What?Manuscripts * Books *Journals *Maps
Overview of Digitization 14
What?Archival Materials
Overview of Digitization 15
What?Cultural Heritage Materials on Tape and Film
Overview of Digitization 16
Why?• Expand access – 24/7• Provide access to unique primary sources held in local
archives• Extend search capabilities of digital text• Improve resource discovery• Provide access to high-resolution images• Integrate resources in multiple modes of representation• Bring together dispersed collections• Assist preservation and conversation efforts
Overview of Digitization 17
How?General Guidelines
• Digitize at the highest resolution appropriate to the nature of the source material
Avoid rescanning and handling of the originals in the future
• Create digital objects that are accessible and interoperable across platforms and devices
High-quality Consistent Authentic
• Produce digital objects that support the intended current and future use
Build a repository of digital master files to facilitate reprocessing and maintaining digital collections over time
Provide derivative access files for current use• Create backup copies of all files on servers and have an off-site
backup strategy
Overview of Digitization 18
Digital Master Files• Created as a direct result of the image capture process either
through scanning or photographing with a digital camera• Should represent the visual information of the original material• Serve as a long term archival file and a source for derivative images
Digital masters are not used for online delivery or print output
• General recommendations for digital master file creation include: Scanning at the highest quality affordable No compression or lossless compression Non-proprietary archival formats
TIFF – text or still imagesWAV – audioAVI or Motion JPEG 2000 or MXF – moving images *
*Unlike text, still image, or audio, there is no archival file format that has been definitively established for moving images
• Examples Photographic print 5x7 in. scanned in
RGB mode at 600 ppi → 35 MB TIFF file, e.g. kw000010.tif
Large map 63 x 56 cm. (24. 8 x 22 in.) scanned in RGB mode at 300 ppi → 185 MB TIFF file, e.g. am001385.tif
Monograph page 23 cm (approx. 9 in.)scanned in RGB mode at 400 ppi → 25 MB TIFF file, e.g. 001_Front cover.tif
Digital Masters
Overview of Digitization 20
Derivative Files• Created from digital master files for specific use including
Access images for digital collections or other types of Web delivery User requests High resolution prints
• General recommendations for derivative files: Reduce the resolution depending on the intended use
72 dpi or 96 dpi for Web access 300 dpi for print output or for high-resolution viewers
Compress files to reduce their size Select appropriate access formats
PDF – textJPEG or JPEG 2000 - still imagesMP3 – audioMPEG-4 (MP4) or QuickTime or Real Video – moving images
21
Image CaptureTechnical Factors
• Mode of capture Bitonal — one bit per pixel representing black and white Grayscale — multiple bits per pixel representing shades of gray RGB (red-green-blue) — multiple bits per pixel representing color
• File formats Tiff JPEG JPEG2000 RAW and DNG
• No compression• Compression
Lossless Lossy
22
Image CaptureTechnical Factors
• Resolution (ppi – pixels per inch; dpi – dots per inch) An image 1500 x 2100 pixels displayed at 100 ppi = ? in. The same image 1500 x 2100 pixels displayed at 300 ppi = ? in
• Bit depth The number of bits used to represent each pixel determines how
many colors can appear in a digital image
Source: BCR’s CDP Digital Imaging Best Practices.
• Digital Masters – Photographs and Text
Scanning Specifications
Source: Wisconsin Heritage Online Digital Imaging Guidelines(2009). Version 2.0.
Original material Scanning resolution
Bit depth Approximate scanned dimensions
Approx. size of preview image
Photographs
16” x 20” + 200 ppi 24-bit color 6400 x 8000 pixels 146 MB
8 ½” x 11”–16” x 20” 300 ppi 24-bit color 3200 x 4000 pixels 36 MB
8” x 10” 400 ppi 24-bit color 3200 x 4000 pixels 36 MB
5” x 7” 625 ppi 24-bit color 3200 x 4000 pixels 36 MB
4” x 5” 800 ppi 24-bit color 3200 x 4000 pixels 36 MB
4” x 2 ½” 1200 ppi 24-bit color 3200 x 4000 pixels 36 MB
Text
Print—no images 600 ppi 1-bit bitonal Varies Varies
Print—with images 300 ppi 8-bit grayscale or 24-bit color
Varies Varies
Manuscript 400 ppi 8-bit grayscale or 24-bit color
Varies Varies
Overview of Digitization 24
Scanners• Source materials in a variety of formats require versatile
scanning equipment Photographs (reflective and transparent materials)
Photographic prints → flatbed scanners Film negatives and slides → film scanners, flatbed scanners with
transparency adapters Text (reflective materials)
Single leaf documents → flatbed scanners, sheet-fed scanners Bound materials → overhead scanners or digital cameras
Oversize materials (reflective materials) Maps, charts, etc. → large format scanners or digital cameras
Microfilm (transparent) Newspapers → microfilm scanners
Overview of Digitization 25
Book scanner for Book, Oversized Prints and Maps DSLR for Oversized Prints, Maps,
Scrolls, and 3D objects
Film and Slide scanner
Film and Slide Scanner with auto-feeder
Flatbed scanner for Prints, Glass, and Transparent objects
Video conversion
Audio conversion
Large format scanner for maps and oversized materials
Overview of Digitization 26
ResourcesGeneral Digitization Guides, Standards, and Best Practices
Association for Library Collections & Technical Services (ALCTS). (2013). Minimum Digitization Capture Recommendations. http://www.ala.org/alcts/resources/preserv/minimum-digitization-capture-recommendations
BCR’s CDP Digital Imaging Best Practices (2008). [updated version of Western States Digital Imaging Best Practices] BCR CDP Digital Imaging Best Practices_2008.pdf
Besser, Howard. Introduction to Imaging, Revised Edition (2003). The J. Paul Getty Trust. This book is free as a downloadable PDF. http://www.getty.edu/research/conducting_research/standards/introimages/
A Framework of Guidance for Building Good Digital Collections. 3rd Edition (2007). NISO Framework Advisory Group. http://www.niso.org/publications/rp/framework3.pdf
Handbook for Digital Projects: A Management Tool for Preservation and Access. (2000). Northeast Document Conservation Center. http://www.nedcc.org/oldnedccsite/digital/dman.pdf
Moving Theory into Practice: Digital Imaging Tutorial. (2000). Cornell University Library/Research Department. http://www.library.cornell.edu/preservation/tutorial/contents.html
The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials. (2002). The National Initiative for a Networked Cultural Heritage (NINCH). http://www.nyu.edu/its/humanities/ninchguide/