Northwestern digital repository initiative: platform and persistence

Post on 25-May-2015

176 views 3 download

Tags:

description

Introduction to and overview of digital repository projects at Northwestern University, developed for a guest lecture at the Dominican University Graduate School of Library and Information Science Digital Curation course. Presentation based in part on an earlier presentation developed by Steve DiDomenico and Claire Stewart

Transcript of Northwestern digital repository initiative: platform and persistence

Northwestern digital repository initiative:

Platform and persistence

Claire StewartDirector, Center for Scholarly Communication and Digital CurationHead, Digital Collections, Library Technology DivisionNorthwestern Universityclaire-stewart@northwestern.edu

What is a repository and why should I care?

Library as institutional memory

Tweeted in 2012 by Gail Steinhart, Head of Research Services, Mann Library, Cornell University

Vines, T. H., Albert, A. Y. K., Andrew, R. L., Débarre, F., Bock, D. G., Franklin, M. T., … Rennison, D. J. (2013). The Availability of Research Data Declines Rapidly with Article Age. Current Biology, 24(1), 94–97. doi:10.1016/j.cub.2013.11.014

“The major cause of the reduced data availability for older

papers was the rapid increase in the proportion of data sets

reported as either lost or on inaccessible storage media. For

papers where authors reported the status of their data, the

odds of the data being extant decreased by 17% per year

(Figure 1D).” [emphasis added]

The Availability of Research Data Declines Rapidly with Article Age

What is a repository and why should I care?

A concept

TheRepository

All the stuff

A set of technologies

Technologies and architecture

Repository as service• Description and characterization - descriptive, provenance and technical

metadata

• Selection, conversion, digitization

• Deposit and versioning

• Interoperability, APIs for ingest, discovery

• Access control, copyright support and other legal/regulatory compliance

• Persistence –Stable, permanent links (URLs, DOIs, etc.)

–Health of digital objects

–Replication and dark archiving

–Migration or emulation, virtualization

What’s already in our repository

digital.library.northwestern.edu

Maps of Africa

First Fedora project @ NU

2006 project, internally funded

116 antique maps at high resolution

Archival finding aids

findingaids.library.northwestern.edu Archon for EAD, Fedora + Blacklight for storage and discovery, Primo syndication

Northwestern Books and the Book Workflow Interface

2009

Mellon-funded

Now used for all in-house book digitization

books.northwestern.edu

Every page of each digitized book has this information:Datastream ID MIMETYPE Schema/ontology

Dublin Core metadata DC text/xml OAI_DC

MODS metadata MODS text/xml MODS

Relationship metadata RELS-EXT text/xml RELS-EXT

OCR PDF file PDF application/pdf

OCR XML OCR XML text/xml ABBYY OCR

OCR Text OCR TEXT text/plain

Source camera image file ARCHV-IMG image/jpeg

Source technical metadata in MIX ARCHIV-TECHMD text/xml MIX

Source camera technical metadata in EXIF ARCHV-EXIF text/xml Exif as XML

Corrected image file PROC-IMG image/jpeg

Corrected image technical metadata in MIX PROC-TECHMD text/xml MIX

Delivery image JPEG2000 file DELIV-IMG image/jp2

Delivery image technical metadata in MIX DELIV-TECHMD text/xml MIX

SVG for delivery mechanism DELIV-OPS text/xml SVG

Viewer html HTML text/html HTML

By the numbers — # of objectsAs of November 2013:

• Finding aids: 1,114

• Digitized books: 3,491

• Digitized book pages: 835,806

• Image objects: 216,271

• A few others, including 3D objects, and collection objects

A total of 1,187,414 objects in the repository

Every object has several datastreams (files, descriptive metadata, technical metadata, etc.)

By the numbers — storageAs of Feb 5, 2014:97.1 TB of content on repository (including digitized collections

queued for ingestion) and JPEG2000 server.

Library & NUIT purchased 200 TB of storage replicated between Evanston and Chicago campuses (that is over 400 TB in total).

Digital preservation/persistence• Persistent URLs• Mirrored storage (as of fall 2014)• PREMIS (preservation) metadata• Routine health checks for data• Geographically distributed storage• Dark archives• Migration/virtualization services

Distributed storage and dark archives

• DuraCloud• Amazon Glacier• Digital Preservation Network (DPN)

Current repository projects

• Digital Image Library (DIL)

• Avalon

• Hydramata

HydraNorthwestern joined 2011

Framework for repository applications using Ruby on Rails

Community with 22 partners

2007 Provost funded move from Art History to the Library, expansion to other disciplines

115,000 images in Hydra + Fedora

Moving all legacy digital collections into DIL & its Hydra counterparts in 2014-2015

images.northwestern.edu

Digital Image Library (DIL)

AvalonIMLS-funded project with

Indiana UniversityReleases:• 0 July 2012

• .5 October 2012

• 1.0 May 2013

• 2.0 October 2013 (NU pilot)

First NU production with R3, expected in next month

media.northwestern.edu (dev/demo)

Scholarly communication and digital curation

• Options for archiving scholarly materials

• Authors rights, copyright help and education, open access support

• E-science and research data life cycle

• Digital humanities

• Library-based publishing

• Responding to funder requirements

Hydramata (formerly Shared IR)

Five-institution project to develop a next-generation institutional repository solution in Hydra

Expanding our repository program• Massive storage, planning for growth, sustainability

• Digital preservation serviceso Offsite third copy (DPN, DuraCloud, Glacier)o Verification services

• Research computingo Research data lifecyle - how to capture metadata early? what to

keep?o Automate deposit from Vault?

• Shared infrastructure and services whenever possible

• Deeper collaboration with NUIT, Research, central admin, schools

Discussion and questionsClaire StewartDirector, Center for Scholarly Communication and Digital CurationHead, Digital Collections, Library Technology DivisionNorthwestern Universityclaire-stewart@northwestern.edu