1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley...

12
1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton

Transcript of 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley...

Page 1: 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.

1

Metadata Toolsfor JISC Digitisation Projectsof still images and text

Ed FayBOPCRIS, Hartley Library

University of Southampton

Page 2: 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.

2

Overview: BOPCRIS today

Move to work natively with standards• Interoperability• Preservation

Design project procedures from ground up with metadata in mind

• File-naming and directory structuring• Metadata capture processes

Production workflow that automates where possible Minimize possibility for human error / subjectivity “Final package” of digital object that records preservation

information on the “digital shelf” and aims for maximum interoperability between systems, all in one place

Page 3: 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.

3

Overview: technical details

File-naming / directory structure• Incorporating project-specific “unique ids”

Final package (digital object)• Internally consistent “tarball” [*.TAR]• Relative path-naming conventions• METS wrapper• Extension formats for metadata: descriptive (MODS);

technical (MIX); process (PREMIS) Production workflow

• Automated production of final package Metadata recording

• Dynamic input by scanner operators

Page 4: 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.

4

History

Eighteenth Century Parliamentary Papers• Project under Phase 1 of JISC Digitization Programme• Proprietary system and data formats (Agora)• Manual input of metadata

o Descriptive and Structural

• Advantages and Disadvantages

Page 5: 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.

5

History: Advantages

Proprietary system with advanced functionality:• OCR workflow• Web presentation

Highly customizable• Metadata fields specified and modified at will

Page 6: 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.

6

History: Disadvantages

Non-standard metadata fields • No mapping to standard formats difficulties: interoperability; metadata harvesting

Translation• Between systems, or between “use” and “archive” formats introduces possibility of versioning issues

No scope for preservation metadata• Separation between workflow / presentation system and

preservation strategy

Resulted in disparate collection of scripts and tools to manage data

Page 7: 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.

7

Present: Metadata Standards

Bibliographic database export File-system level

• Directory structure• File-naming conventions

Scanning level• TIFF headers• Additional descriptive metadata

METS profile• Tailored to project needs• Extension formats (MODS, MIX, PREMIS)

Checksums (MD5)

Page 8: 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.

8

Present: Metadata Origins

Scanned Images• TIFF headers

METS

OCR (Agora / ABBYY)

MIX

(Z39.87)

File-naming

Directory structure

(TAR)

Other metadata• Process• Additional descriptive

PREMIS

Bibliographic Metadata

MARC21 / MODS / etc.

File formats• TIFF master / Derived JPEG

• Flat text (TXT) & Word-co-ordinated OCR

Custom dmdSec

PRECURSORS

GENERATED

Page 9: 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.

9

Future

One tool for entire process, from scanned images to METS

Tool would:• Extract technical metadata• Include descriptive metadata• Build flat-structure METS

Tool would require:• File-naming, directory-structuring conventions• Image file sources

Page 10: 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.

10

Future: Advantages

Abstraction = standardization All digitization projects will produce metadata in

similar formats interoperability Certain technical base-standards will be present

preservation Any centrally developed preservation or

presentation systems would be able to ingest output from any project

Saves wasted effort developing similar solutions many times, when one solution can be developed once and adapted

Page 11: 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.

11

Future: Questions…

Usefulness of such a tool? Relevance to your project? Problems / obstacles? How much flexibility is necessary? Manual input / editing?

Main points: Abstraction, functionality, flexibility

Page 12: 1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.

12

Further information

Ed Fay, Software Developer• BOPCRIS, Hartley Library• University of Southampton• [email protected]• 023 8059 3575