Technology Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

26
Technology Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process Krystyna K. Matusiak, Ph.D. Assistant Professor Library & Information Science Program

description

Technology Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process. Krystyna K. Matusiak , Ph.D. Assistant Professor Library & Information Science Program. Overview. Large-scale digital libraries (DLs) The National Science Digital Library (NSDL) HathiTrust - PowerPoint PPT Presentation

Transcript of Technology Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Page 1: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Technology BootcampJanuary 18, 2014

Large-Scale Digital LibrariesDigitization Process

Krystyna K. Matusiak, Ph.D.Assistant ProfessorLibrary & Information Science Program

Page 2: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 2

Overview• Large-scale digital libraries (DLs)

The National Science Digital Library (NSDL) HathiTrust Europeana The Digital Public Library of America (DPLA)

• Digitization as a conversion process• Fundamental questions

What? Why? How?

• Digitization as a multi-step process• Digitization standards and guidelines

The notion of archival master files and derivatives Image capture: technical factors

• Digitization technology

Page 3: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 3

LARGE-SCALE DIGITAL LIBRARIES

Page 4: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 4

Large-Scale Digital Libraries• Massive aggregations of scientific and cultural heritage

content with millions of digital objects Offer a new centralized approach to providing access to

scientific and cultural materials Aggregate content (or metadata) from smaller individual

DLs and provide portals for global searching and retrieval Address the limitations of the resource discovery in the

DL environment Build upon over two decades of extensive digitization efforts

• Types of content Born-digital Digitized

Page 5: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 5

Large-Scale Digital Libraries• Sources of content

Local digitization: Individual DLs created by academic and public libraries, archives, historical societies, and other cultural heritage and research organizations

Mass digitization: Google Book Project; Open Content Alliance• Information ecosystem – multilayered trusted networks

• Models Distributed (DPLA, Europeana, NSDL) Centralized (HathiTrust)

• Coverage• Goals

Expanding access Supporting digital preservation

Page 6: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

6

The National Science Digital Library (NSDL)http://nsdl.org/

Page 7: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

7

HathiTrusthttp://www.hathitrust.org/

Page 8: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

8

Europeanahttp://www.europeana.eu/portal/

Page 9: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

9

The Digital Public Library of America (DPLA)http://dp.la/

Page 10: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 10

DIGITIZATION PROCESS

How have we created this critical mass of digitized content?

Page 11: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 11

Digitization is a process of conversion of analog information into a digital format through scanning or digital photography. It is a multi-step process that involves selection, image capture, creation of descriptive and technical metadata, and digital preservation of the objects created as a result of the conversion process.

Page 12: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Digitization Overview 12

Basic Digitization Workflow Digitization is More than Scanning

Selection Image capture

Digital processing

Indexing and metadata Ingesting

Preservation and maintenance

Page 13: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 13

What?Manuscripts * Books *Journals *Maps

Page 14: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 14

What?Archival Materials

Page 15: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 15

What?Cultural Heritage Materials on Tape and Film

Page 16: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 16

Why?• Expand access – 24/7• Provide access to unique primary sources held in local

archives• Extend search capabilities of digital text• Improve resource discovery• Provide access to high-resolution images• Integrate resources in multiple modes of representation• Bring together dispersed collections• Assist preservation and conversation efforts

Page 17: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 17

How?General Guidelines

• Digitize at the highest resolution appropriate to the nature of the source material

Avoid rescanning and handling of the originals in the future

• Create digital objects that are accessible and interoperable across platforms and devices

High-quality Consistent Authentic

• Produce digital objects that support the intended current and future use

Build a repository of digital master files to facilitate reprocessing and maintaining digital collections over time

Provide derivative access files for current use• Create backup copies of all files on servers and have an off-site

backup strategy

Page 18: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 18

Digital Master Files• Created as a direct result of the image capture process either

through scanning or photographing with a digital camera• Should represent the visual information of the original material• Serve as a long term archival file and a source for derivative images

Digital masters are not used for online delivery or print output

• General recommendations for digital master file creation include: Scanning at the highest quality affordable No compression or lossless compression Non-proprietary archival formats

TIFF – text or still imagesWAV – audioAVI or Motion JPEG 2000 or MXF – moving images *

*Unlike text, still image, or audio, there is no archival file format that has been definitively established for moving images

Page 19: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

• Examples Photographic print 5x7 in. scanned in

RGB mode at 600 ppi → 35 MB TIFF file, e.g. kw000010.tif

Large map 63 x 56 cm. (24. 8 x 22 in.) scanned in RGB mode at 300 ppi → 185 MB TIFF file, e.g. am001385.tif

Monograph page 23 cm (approx. 9 in.)scanned in RGB mode at 400 ppi → 25 MB TIFF file, e.g. 001_Front cover.tif

Digital Masters

Page 20: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 20

Derivative Files• Created from digital master files for specific use including

Access images for digital collections or other types of Web delivery User requests High resolution prints

• General recommendations for derivative files: Reduce the resolution depending on the intended use

72 dpi or 96 dpi for Web access 300 dpi for print output or for high-resolution viewers

Compress files to reduce their size Select appropriate access formats

PDF – textJPEG or JPEG 2000 - still imagesMP3 – audioMPEG-4 (MP4) or QuickTime or Real Video – moving images

Page 21: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

21

Image CaptureTechnical Factors

• Mode of capture Bitonal — one bit per pixel representing black and white Grayscale — multiple bits per pixel representing shades of gray RGB (red-green-blue) — multiple bits per pixel representing color

• File formats Tiff JPEG JPEG2000 RAW and DNG

• No compression• Compression

Lossless Lossy

Page 22: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

22

Image CaptureTechnical Factors

• Resolution (ppi – pixels per inch; dpi – dots per inch) An image 1500 x 2100 pixels displayed at 100 ppi = ? in. The same image 1500 x 2100 pixels displayed at 300 ppi = ? in

• Bit depth The number of bits used to represent each pixel determines how

many colors can appear in a digital image

Source: BCR’s CDP Digital Imaging Best Practices.

Page 23: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

• Digital Masters – Photographs and Text

Scanning Specifications

Source: Wisconsin Heritage Online Digital Imaging Guidelines(2009). Version 2.0.

Original material Scanning resolution

Bit depth Approximate scanned dimensions

Approx. size of preview image

Photographs

16” x 20” + 200 ppi 24-bit color 6400 x 8000 pixels 146 MB

8 ½” x 11”–16” x 20” 300 ppi 24-bit color 3200 x 4000 pixels 36 MB

8” x 10” 400 ppi 24-bit color 3200 x 4000 pixels 36 MB

5” x 7” 625 ppi 24-bit color 3200 x 4000 pixels 36 MB

4” x 5” 800 ppi 24-bit color 3200 x 4000 pixels 36 MB

4” x 2 ½” 1200 ppi 24-bit color 3200 x 4000 pixels 36 MB

Text

Print—no images 600 ppi 1-bit bitonal Varies Varies

Print—with images 300 ppi 8-bit grayscale or 24-bit color

Varies Varies

Manuscript 400 ppi 8-bit grayscale or 24-bit color

Varies Varies

Page 24: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 24

Scanners• Source materials in a variety of formats require versatile

scanning equipment Photographs (reflective and transparent materials)

Photographic prints → flatbed scanners Film negatives and slides → film scanners, flatbed scanners with

transparency adapters Text (reflective materials)

Single leaf documents → flatbed scanners, sheet-fed scanners Bound materials → overhead scanners or digital cameras

Oversize materials (reflective materials) Maps, charts, etc. → large format scanners or digital cameras

Microfilm (transparent) Newspapers → microfilm scanners

Page 25: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 25

Book scanner for Book, Oversized Prints and Maps DSLR for Oversized Prints, Maps,

Scrolls, and 3D objects

Film and Slide scanner

Film and Slide Scanner with auto-feeder

Flatbed scanner for Prints, Glass, and Transparent objects

Video conversion

Audio conversion

Large format scanner for maps and oversized materials

Page 26: Technology  Bootcamp January 18, 2014 Large-Scale Digital Libraries Digitization Process

Overview of Digitization 26

ResourcesGeneral Digitization Guides, Standards, and Best Practices

Association for Library Collections & Technical Services (ALCTS). (2013). Minimum Digitization Capture Recommendations. http://www.ala.org/alcts/resources/preserv/minimum-digitization-capture-recommendations

BCR’s CDP Digital Imaging Best Practices (2008). [updated version of Western States Digital Imaging Best Practices] BCR CDP Digital Imaging Best Practices_2008.pdf

Besser, Howard. Introduction to Imaging, Revised Edition (2003). The J. Paul Getty Trust. This book is free as a downloadable PDF. http://www.getty.edu/research/conducting_research/standards/introimages/

A Framework of Guidance for Building Good Digital Collections. 3rd Edition (2007). NISO Framework Advisory Group. http://www.niso.org/publications/rp/framework3.pdf

Handbook for Digital Projects: A Management Tool for Preservation and Access. (2000). Northeast Document Conservation Center. http://www.nedcc.org/oldnedccsite/digital/dman.pdf

Moving Theory into Practice: Digital Imaging Tutorial. (2000). Cornell University Library/Research Department. http://www.library.cornell.edu/preservation/tutorial/contents.html

The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials. (2002). The National Initiative for a Networked Cultural Heritage (NINCH). http://www.nyu.edu/its/humanities/ninchguide/