R utgers C ommunity R epository RU CORE 1 1 WMS, RUcore and Fedora Mini-Conference Wednesday Morning...

27
1 Rutgers Community Repository RUCORE 1 WMS, RUcore and Fedora Mini-Conference Wednesday Morning Greetings and Introduction – Grace Collaboration and Architecture Overview – Ron RUcore Data Model – Grace WMS Tutorial - Mary Beth, Kalaivani, Sharon Lunch (box lunch in conference room) Wednesday Afternoon Hands-On Experience – Mary Beth, Kalaivani, Sharon Feedback from WMS sessions Collaboration Discussion – All
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    1

Transcript of R utgers C ommunity R epository RU CORE 1 1 WMS, RUcore and Fedora Mini-Conference Wednesday Morning...

1Rutgers Community RepositoryRUCORE

1

WMS, RUcore and FedoraMini-Conference

Wednesday Morning• Greetings and Introduction – Grace• Collaboration and Architecture Overview – Ron• RUcore Data Model – Grace• WMS Tutorial - Mary Beth, Kalaivani, Sharon

Lunch (box lunch in conference room)

Wednesday Afternoon• Hands-On Experience – Mary Beth, Kalaivani, Sharon• Feedback from WMS sessions• Collaboration Discussion – All

2Rutgers Community RepositoryRUCORE

2

WMS, RUcore and FedoraMini-Conference

Thursday Morning• Brief Recap – Ron• WMS architecture - Yang• User Interface, Search engine and collections -

Chad• Management services - Ron

Lunch (on your own)

Thursday Afternoon• Further collaboration discussion• Wrap-up and next steps

3Rutgers Community RepositoryRUCORE

3

Possible Areas for Collaboration

Data Registries• File formats• Content Models

Software Development• Requirements• Sharing software• Joint development• Life cycle support

Sharing Content• Exchange, harvesting• Federated Searching

Fedora Experimentation• Relationship services• Directory ingest• Use of xacml• Very large files• Event management

4Rutgers Community RepositoryRUCORE

4

Fedora Enterprise Architecture Major Goals – 2007 thru 2009

Paradigm Focus• Scholarly Communication Collaboration• Libraries and Museums Access and Publishing

Infinite Scalability• Size of and number of objects• Capacity and throughput (e.g. ingest 20TB a day)• Life cycle preservation

Trust Model• Transactions - Begin/Commit• Transactions across repositories• Enable graph based objects (compound objects)

5Rutgers Community RepositoryRUCORE

5

Persistence and Layered Architecture

Repository

Middleware

Applications

Data

App. Prog.Interface

6Rutgers Community RepositoryRUCORE

6

Layered Architecture - RUcore

FOXML & Datastreams

Fedora Core & Framework

Middleware Services(searching, alerting, integrity, etc)

Applications and Portals(NJDH, RUcore, workflow, etc)

API

RUcore - How it Works

DigitalObject

Repository(Fedora)

XML

Digital Object Ingest

Fedora Repository Service

User, Collection, & Preservation ServicesWorkflow

Management System

CustomPortals

NJ DigitalHighway

Dissertations

User Input

Metadata andArchival masters

RUCORE Portal

FacultySubmissions

7

8Rutgers Community RepositoryRUCORE

8

Simple and Compound Objects

Article Object(Simple)

Persistent ID

Metadata

Behaviors(Disseminators)

Data streams

PDF1 - presentation

XML1 – OCR text

ARCH1- Archival master(tiffs of each page)

DJVU1- presentation

SMAP1 – StrMap (TOC)

article

A2

A1

IsAnnotationOf

IsAnnotationOf

Compound Object - Graph Model

9Rutgers Community RepositoryRUCORE

9

Collections In RUcore

A digital collection is simply a grouping of objects according to some criteria.

Types of digital collections in RUcore

• Explicit – A digital collection whose object membership is specified explicitly within the descriptive metadata.

• Dynamic – A digital collection of objects which are grouped according to user specified criteria.

10Rutgers Community RepositoryRUCORE

10

Using Explicit and Dynamic Collections

Personal Collections

Department Collections• Including Faculty Personal collections (e.g. preprints,

reports, etc)• ETDs for the Department

Centers and Grant Funded Research• New Jersey Digital Highway• Center for Remote Sensing and Spatial Analysis (CRRSA) –

Access and preservation of GIS resources related to New Jersey

RutgersUniversity

RutgersUniversityLibraries

NJDH(Grant

Project)

M1

N1 N3

GeneralCollections

N2

RUCORE

New JerseyHistoricalSociety

O1

SpecialCollections

EagletonArchive

Roosevelt

O2

B1

P2P1

RUcore Collection Architecture

Solid line – explicit membershipDashed line – dynamic membership

Centers/Departments

O1

Circles – collection objectsRectangles – content objects

11

Collection Architecture - Lefty

Solid line – explicit membershipDashed line – dynamic membership

RutgersUniversity(1782.2)

RUL(1782.1)

Princeton(1782.1)

RUCORE

Center/DeptCollections

Department

FacCollOne

FacCollTwo

ETDs(Graduate

School)

D3D2D1

Dept.ETDs

RUETDs

Penn State(1782.1)

N’Western(1782.1)

12

• http://hdl.rutgers.edu/1782.1/NorthwesternU.collection.165• http://hdl.rutgers.edu/1782.1/PennStateUniv.collection.164• http://hdl.rutgers.edu/1782.1/PrincetonUniv.collection.166

13Rutgers Community RepositoryRUCORE

13

Management Services(incl. Collection and Preservation)

Management• Super-user editing (handles, datastreams, metadata)• Purging an object• Export (foxml, mets)

Collections• Collection administration • Statistics

Preservation• Creation of archival master• Creation of persistent ID (handle)• Checksum verification

14Rutgers Community RepositoryRUCORE

14

Management Services

Access to individual objects is provided by a special search portal using the same indexes as the public search but providing Fedora API management functionality:

• Viewing, Exporting and/or purging objects• Editing metadata, adding/changing datastreams• Validating objects, checking audit trails, testing signatures

There is a special Fedora database search allowing access to all objects whether or not they are members of an active collection.

15Rutgers Community RepositoryRUCORE

15

Collection Administration

Edit collection information

Add parents to a collection

Add dynamic search terms to a collection

Generate an XML structure map

16Rutgers Community RepositoryRUCORE

16

Collections - Indexing and Ingest

Active Collections may be indexed individually or all together at any time, though this is typically done using a nightly cron job.

Ingest is done through the management API and is typically called by the WMS program, but may be called directly from the management interface as well.

17Rutgers Community RepositoryRUCORE

17

Preservation - Alerting

All Fedora API management functions trigger alerting messages, are stored in the Fedora audit trails, and are registered in the collection statistics database.

Statistics are kept for all object downloads as well as editing activities and may be accessed at collection or repository levels.

18Rutgers Community RepositoryRUCORE

18

Preservation – PIDs and Handles

Handles are normally created as part of the ingest process, but may be manually created, changed, or purged on a per object basis using the management interface.

Three global registries for RU• 1782.1 – Rutgers University Libraries• 1782.2 – Rutgers University• 1782.3 – NJ Digital Highway

19Rutgers Community RepositoryRUCORE

19

Object Integrity – Verifying Checksums

Archival datastreams have SHA1 checksums, created during the WMS pipeline process, as well as filesize data stored in the technical metadata section of each objects.

SHA1 checksums are tested using the sha1sum checking algorithm in conjunction with a management function that polls the repository and extracts sha1sum character strings from the techMD of individual objects or groups of objects. It has a calendar feature that allows it to be run as a cron on a subset of objects for each day of the week with result reports emailed to appropriate data managers.

20Rutgers Community RepositoryRUCORE

20

Certification as a Trusted Repository*

Ultimately, we want to become certified as a trusted repository. There are four major areas:

A. Organization B. Repository Functions

C. Designated Community D. Technologies

* RLG/NARA draft “An Audit Checklist for the Certification of Trusted Digital Repositories”

Repository staff have skillsappropriate to their duties.

Repository actively monitorsArchival Information PackageIntegrity.

Repository defines itsDesignated Community

Repository has technologies to monitor security.

Preservation Services Architecture

Fedora Repository Service

Preservation Services

AlertingMigration Statistics

PreservationMonitoring

EventMessaging

PreservationIntegrity

Preservation Portal

DigitalObject

Repository

ContentModels

. . .

FormatRegistry

Monitoring

Fedora Service Framework

21

22Rutgers Community RepositoryRUCORE

22

Content Models(Content Model Dissemination Architecture – CMDA)

The CM object specifies constraints on the digital object (DO)• MIME type and format• Min/max of number of datastreams• Whether multiple datastreams are ordered

The CM is used to determine runtime behavior• On ingest, Fedora validates DO based on CM constraints• Disseminators are not bound into the DO• Run time binding occurs through the CM object and the

rels-ext datastream• The CM can point to a format registry

Content Models, Formats, and Disseminators

Book Object

Persistent ID

Metadata

Rels-Ext(cmodel: book)

Data streams

PDF1 - presentation

XML1 – OCR text

ARCH1- Archival master(tiffs of each page)

DJVU1- presentation

SMAP1 – StrMap (TOC)

Persistent ID

Metadata

Rels-Ext

Composite Model

Content Model

Persistent ID

Metadata

Rels-Ext

WSDL

Bmech Object

Persistent ID

Metadata

Bdef Object

MethodMap

hasCM hasBmech hasBdef

<dsCompositeModel><dsTypeModel ID=“PDF1” ordered=“false” min=“1” max=“1”><form MIME=“application/pdf”</form></dsTypeModel>

<dsTypeModel ID=“ARCH1” ordered=“false” min=“1” max=“1”><form MIME=“application/tar”</form></dsTypeModel>..</dsCompositeModel>

tiff

tar

FormatRegistry

pdf

23

24Rutgers Community RepositoryRUCORE

24

Events and Outcomes

An event is an: • . . . action that involves at least one object, agent,

and/or rights entity (PREMIS).• . . . occurrence that is significant to the performance

of a task

Event outcome – a situation or state that follows an event and is a result of the event.

25Rutgers Community RepositoryRUCORE

25

Fedora Event Management

Generic Framework• Events can have messages which are associated with all types

of services (preservation, collection, user, etc)• Messages represent events with actions and outcomes• Fedora will provide a middle-ware messaging solution based

on open-source Java Messaging Service (JMS)

Fedora Working Group Focus• Preservation events are atomic (i.e. associated with a Fedora

API)• The event message will be based on the PREMIS event entity• Initial types: ingest, delete, modify, fixityCheck

26Rutgers Community RepositoryRUCORE

26

The Event Message

Event message structure• The message payload will be xml-based and use the PREMIS event

entity semantic units • Global identifiers (URIs) will be used for event type and outcome

An example might look like the following:<event><eventIdentifier>

<eventIdentifierType>Rucore event</eventIdentifierType><eventIdentifierValue>30169</eventIdentifierValue>

</eventIdentifier><eventType>info:premis/preservation/event/ingest<eventType><eventDateTime>2006-07-16T19:20:30</eventDateTime><eventDetail>(to be used for general information)</eventDetail><eventOutcomeInformation><eventOutcome>info:premis/preservation/outcome/success</eventOutcome><eventOutcomeDetail>(more text)</eventOutcomeDetail></eventOutcomeInformation><linkingAgentIdentifier>rutgers-lib:200</linkingAgentIdentifier><linkingAgentIdentifier>rutgers-lib:400</linkingAgentIdentifier><linkingObjectIdentifier>rutgers-lib:4291</linkingObjectIdentifier></event>

27Rutgers Community RepositoryRUCORE

27

Event Management - Ingest(Using the publisher/subscriber model)

XML

Digital Object Ingest

Workflow Management

System

User Input

DigitalObject

Repository(Fedora)

JMS

(sn

d/r

cv)

JMS Topic Queue

<eventType>ingest<>

<eventType>delete<>

<eventType>

<eventType>

<eventType>

PreservationService

(reporting)

JMS

(sn

d/rc

v)

PreservationService(alerting)

JMS

(sn

d/rc

v)