Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst...

62
Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library [email protected] https://github.com/organizations/Georgetown-Universit y-Libraries

Transcript of Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst...

Page 1: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Focus on Your Content, Not on Ingesting Your

ContentTerry Brady

Applications Programmer AnalystGeorgetown University Library

[email protected]

https://github.com/organizations/Georgetown-University-Libraries

Page 2: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Goals of our Repository Managers

Create new collections

Grow collections

Accurately describe collection contents

Showcase our repository content

Page 3: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Our storyUsing simple tools to facilitate these goals

Page 4: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Imagine that you have content to load into your

repository

Page 5: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Scenario: One Item to Add to DSpace

Page 6: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

One Item to Add: Item Submission

Click through 7 item submission screens

authoring metadata as you go

Page 7: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Scenario: Three Items to Add to DSpace

Page 8: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Three Items to Add: Item Submission

Click through 3x7 item submission

screens authoring metadata as you go

Page 9: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

50 Items

Scenario: 50 newspaper issues to add to DSpace (very similar metadata)

Page 10: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

50 Items to Add: Individual Item Submission is impractical

Page 11: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Next OptionDSpace Bulk Ingest Process

Page 12: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

DSpace Bulk Ingest

50 Items

Page 13: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Ingest Folder

Media File

Thumbnail (optional)

Contents File

Metadata File

License File (optional)

Page 14: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: Build a Metadata Spreadsheet

50 Items

Page 15: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: Build Ingest Folders

50 Items

Page 16: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: For Each ItemCopy Item to Folder

50 Items

.PDF

Page 17: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: For Each ItemsCreate a unique Contents File

50 Items .TXT

.PDF

Page 18: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: For Each ItemsCreate a Dublin Core File

50 Items

.PDF

.TXT

.XML

Page 19: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: Initiate Import from a Terminal Window

50 Items .TXT

.PDF

.XML

Page 20: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: For Each ItemsCreate a Dublin Core File

50 Items .TXT

.PDF

.XML

What if you make a mistake?

What if you need to refine the metadata?

Page 21: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

The ChallengeWant to grow the collections

But, the ingest process is daunting

Page 22: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

The conversation focused on HOW to ingest the contentRather than on the content itself

Page 23: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Our Approach

Page 24: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Our Approach:Empower Content Owners

• Automate the tedious tasks

• Make metadata entry the focus of the effort

• Hide the command line from content owners

Page 25: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Our Approach:Simple Tools

Work around the tedious steps

Without constructing a complex workflow

Page 26: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Our Tools

• File Analyzer

o Desktop Application for File System Traversal

• DSpace QC Tools

o Web application for Batch Process Submission

Both of these tools are available on GitHub

• Georgetown-University-Libraries

Page 27: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

File AnalyzerDesktop Application for File Processing

Page 28: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 29: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

What we need

50 Items

Page 30: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Step 1: Automatically Generate an Ingest Inventory based on existing files

50 Items

Page 31: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 32: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Export the Generated Inventory

Page 33: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Step 2: Edit the Ingest Inventory as a Spreadsheet

Page 34: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Step 3: Generate the Ingest Folders from the Inventory Spreadsheet

Generate Contents FileGenerate Dublin Core Metadata FileInclude custom thumbnails if applicable

Page 35: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 36: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Create Ingest Folders

• An error message will appear if files are missing (or misspelled)

• Process can be rerun if the metadata spreadsheet needs to change

Page 37: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Ingest Folder Creation Report

Page 38: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Step 4: Validate Ingest Folders

• Identify Missing Files• Required Metadata• Validate Files

o Contentso Dublin Core

Page 39: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 40: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Validation Status Report

Page 41: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Step 5: Move Ingest Folders to Server and Initiate Bulk Ingest

Page 42: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

for Batch Process Submission

Web Tools

Page 43: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 44: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Web Tools, Tutorials co-located with tools

Page 45: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Collection

Folder Location

Page 46: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Processes run by Bulk Ingest

• import

• filter-media [collection]

• update-discovery-index

• oai-import

• stats-util

Content is visible, searchable, and thumbnails are present!

Page 47: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 48: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Results

Empowered Librarians

Iterative metadata refinement

At the right point of the workflow

Significant growth in repository content

Decreasing IT involvement

Rapid development of support tools

Page 49: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Derived Tools

Generate Ingest Folders for ProQuest ETD's

Filter Media

Page 50: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Ingest ETD's from ProQuest

Page 51: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

ProQuest ETD Ingest Rule

Page 52: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Filter Media Toolfor Items Submitted One by One

Collection

Filter Media Tasks

Re-index?

Page 53: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Benefits

Companion tools easy to learn

Users are very comfortable with them

De-mystify DSpace-specifics

Users trained other users!

Page 54: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Other Tools Created

Automation

• Undo Bulk Ingest

• Update Metadata

• Move Community/Collection

Reporting

• Data Quality Reports

• Statistics Reports

Page 55: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

More Tools (time permitting)

Page 56: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Data Quality Reports

• Items with multiple media files

• Non-PDF Document Items

• Items missing a Thumbnail

• "Non-standard" Media Types

• Items modified last 30 days

• Items with Embargo

• Items missing a metadata field

• Item metadata containing a URL

Page 57: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Collection QC Report

Page 58: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Item QC Report

Page 59: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Usage Statistics Reports

• Not confident in the out of the box reports

• Wanted to understand underlying data

• Filter Stats

o On campus

o Within the library

Page 60: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 61: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Try it yourself

GitHub: Georgetown-University-Libraries

• File Analyzer & Metadata Harvestero Just need a Java Compilero Contains several utilities for digitization workflowso Links to tutorials

• DSpace QC Toolso PHP Codeo Sample code, not ready to runo Links to tutorials

Please let me know how these work for you!

Page 62: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Terry BradyApplications Programmer Analyst

Georgetown University [email protected]

https://github.com/organizations/Georgetown-University-Libraries