R utgers C ommunity R epository RU CORE 1 1 WMS, RUcore and Fedora Mini-Conference Wednesday Morning...
-
date post
18-Dec-2015 -
Category
Documents
-
view
214 -
download
1
Transcript of R utgers C ommunity R epository RU CORE 1 1 WMS, RUcore and Fedora Mini-Conference Wednesday Morning...
1Rutgers Community RepositoryRUCORE
1
WMS, RUcore and FedoraMini-Conference
Wednesday Morning• Greetings and Introduction – Grace• Collaboration and Architecture Overview – Ron• RUcore Data Model – Grace• WMS Tutorial - Mary Beth, Kalaivani, Sharon
Lunch (box lunch in conference room)
Wednesday Afternoon• Hands-On Experience – Mary Beth, Kalaivani, Sharon• Feedback from WMS sessions• Collaboration Discussion – All
2Rutgers Community RepositoryRUCORE
2
WMS, RUcore and FedoraMini-Conference
Thursday Morning• Brief Recap – Ron• WMS architecture - Yang• User Interface, Search engine and collections -
Chad• Management services - Ron
Lunch (on your own)
Thursday Afternoon• Further collaboration discussion• Wrap-up and next steps
3Rutgers Community RepositoryRUCORE
3
Possible Areas for Collaboration
Data Registries• File formats• Content Models
Software Development• Requirements• Sharing software• Joint development• Life cycle support
Sharing Content• Exchange, harvesting• Federated Searching
Fedora Experimentation• Relationship services• Directory ingest• Use of xacml• Very large files• Event management
4Rutgers Community RepositoryRUCORE
4
Fedora Enterprise Architecture Major Goals – 2007 thru 2009
Paradigm Focus• Scholarly Communication Collaboration• Libraries and Museums Access and Publishing
Infinite Scalability• Size of and number of objects• Capacity and throughput (e.g. ingest 20TB a day)• Life cycle preservation
Trust Model• Transactions - Begin/Commit• Transactions across repositories• Enable graph based objects (compound objects)
5Rutgers Community RepositoryRUCORE
5
Persistence and Layered Architecture
Repository
Middleware
Applications
Data
App. Prog.Interface
6Rutgers Community RepositoryRUCORE
6
Layered Architecture - RUcore
FOXML & Datastreams
Fedora Core & Framework
Middleware Services(searching, alerting, integrity, etc)
Applications and Portals(NJDH, RUcore, workflow, etc)
API
RUcore - How it Works
DigitalObject
Repository(Fedora)
XML
Digital Object Ingest
Fedora Repository Service
User, Collection, & Preservation ServicesWorkflow
Management System
CustomPortals
NJ DigitalHighway
Dissertations
User Input
Metadata andArchival masters
RUCORE Portal
FacultySubmissions
7
8Rutgers Community RepositoryRUCORE
8
Simple and Compound Objects
Article Object(Simple)
Persistent ID
Metadata
Behaviors(Disseminators)
Data streams
PDF1 - presentation
XML1 – OCR text
ARCH1- Archival master(tiffs of each page)
DJVU1- presentation
SMAP1 – StrMap (TOC)
article
A2
A1
IsAnnotationOf
IsAnnotationOf
Compound Object - Graph Model
9Rutgers Community RepositoryRUCORE
9
Collections In RUcore
A digital collection is simply a grouping of objects according to some criteria.
Types of digital collections in RUcore
• Explicit – A digital collection whose object membership is specified explicitly within the descriptive metadata.
• Dynamic – A digital collection of objects which are grouped according to user specified criteria.
10Rutgers Community RepositoryRUCORE
10
Using Explicit and Dynamic Collections
Personal Collections
Department Collections• Including Faculty Personal collections (e.g. preprints,
reports, etc)• ETDs for the Department
Centers and Grant Funded Research• New Jersey Digital Highway• Center for Remote Sensing and Spatial Analysis (CRRSA) –
Access and preservation of GIS resources related to New Jersey
RutgersUniversity
RutgersUniversityLibraries
NJDH(Grant
Project)
M1
N1 N3
GeneralCollections
N2
RUCORE
New JerseyHistoricalSociety
O1
SpecialCollections
EagletonArchive
Roosevelt
O2
B1
P2P1
RUcore Collection Architecture
Solid line – explicit membershipDashed line – dynamic membership
Centers/Departments
O1
Circles – collection objectsRectangles – content objects
11
Collection Architecture - Lefty
Solid line – explicit membershipDashed line – dynamic membership
RutgersUniversity(1782.2)
RUL(1782.1)
Princeton(1782.1)
RUCORE
Center/DeptCollections
Department
FacCollOne
FacCollTwo
ETDs(Graduate
School)
D3D2D1
Dept.ETDs
RUETDs
Penn State(1782.1)
N’Western(1782.1)
12
• http://hdl.rutgers.edu/1782.1/NorthwesternU.collection.165• http://hdl.rutgers.edu/1782.1/PennStateUniv.collection.164• http://hdl.rutgers.edu/1782.1/PrincetonUniv.collection.166
13Rutgers Community RepositoryRUCORE
13
Management Services(incl. Collection and Preservation)
Management• Super-user editing (handles, datastreams, metadata)• Purging an object• Export (foxml, mets)
Collections• Collection administration • Statistics
Preservation• Creation of archival master• Creation of persistent ID (handle)• Checksum verification
14Rutgers Community RepositoryRUCORE
14
Management Services
Access to individual objects is provided by a special search portal using the same indexes as the public search but providing Fedora API management functionality:
• Viewing, Exporting and/or purging objects• Editing metadata, adding/changing datastreams• Validating objects, checking audit trails, testing signatures
There is a special Fedora database search allowing access to all objects whether or not they are members of an active collection.
15Rutgers Community RepositoryRUCORE
15
Collection Administration
Edit collection information
Add parents to a collection
Add dynamic search terms to a collection
Generate an XML structure map
16Rutgers Community RepositoryRUCORE
16
Collections - Indexing and Ingest
Active Collections may be indexed individually or all together at any time, though this is typically done using a nightly cron job.
Ingest is done through the management API and is typically called by the WMS program, but may be called directly from the management interface as well.
17Rutgers Community RepositoryRUCORE
17
Preservation - Alerting
All Fedora API management functions trigger alerting messages, are stored in the Fedora audit trails, and are registered in the collection statistics database.
Statistics are kept for all object downloads as well as editing activities and may be accessed at collection or repository levels.
18Rutgers Community RepositoryRUCORE
18
Preservation – PIDs and Handles
Handles are normally created as part of the ingest process, but may be manually created, changed, or purged on a per object basis using the management interface.
Three global registries for RU• 1782.1 – Rutgers University Libraries• 1782.2 – Rutgers University• 1782.3 – NJ Digital Highway
19Rutgers Community RepositoryRUCORE
19
Object Integrity – Verifying Checksums
Archival datastreams have SHA1 checksums, created during the WMS pipeline process, as well as filesize data stored in the technical metadata section of each objects.
SHA1 checksums are tested using the sha1sum checking algorithm in conjunction with a management function that polls the repository and extracts sha1sum character strings from the techMD of individual objects or groups of objects. It has a calendar feature that allows it to be run as a cron on a subset of objects for each day of the week with result reports emailed to appropriate data managers.
20Rutgers Community RepositoryRUCORE
20
Certification as a Trusted Repository*
Ultimately, we want to become certified as a trusted repository. There are four major areas:
A. Organization B. Repository Functions
C. Designated Community D. Technologies
* RLG/NARA draft “An Audit Checklist for the Certification of Trusted Digital Repositories”
Repository staff have skillsappropriate to their duties.
Repository actively monitorsArchival Information PackageIntegrity.
Repository defines itsDesignated Community
Repository has technologies to monitor security.
Preservation Services Architecture
Fedora Repository Service
Preservation Services
AlertingMigration Statistics
PreservationMonitoring
EventMessaging
PreservationIntegrity
Preservation Portal
DigitalObject
Repository
ContentModels
. . .
FormatRegistry
Monitoring
Fedora Service Framework
21
22Rutgers Community RepositoryRUCORE
22
Content Models(Content Model Dissemination Architecture – CMDA)
The CM object specifies constraints on the digital object (DO)• MIME type and format• Min/max of number of datastreams• Whether multiple datastreams are ordered
The CM is used to determine runtime behavior• On ingest, Fedora validates DO based on CM constraints• Disseminators are not bound into the DO• Run time binding occurs through the CM object and the
rels-ext datastream• The CM can point to a format registry
Content Models, Formats, and Disseminators
Book Object
Persistent ID
Metadata
Rels-Ext(cmodel: book)
Data streams
PDF1 - presentation
XML1 – OCR text
ARCH1- Archival master(tiffs of each page)
DJVU1- presentation
SMAP1 – StrMap (TOC)
Persistent ID
Metadata
Rels-Ext
Composite Model
Content Model
Persistent ID
Metadata
Rels-Ext
WSDL
Bmech Object
Persistent ID
Metadata
Bdef Object
MethodMap
hasCM hasBmech hasBdef
<dsCompositeModel><dsTypeModel ID=“PDF1” ordered=“false” min=“1” max=“1”><form MIME=“application/pdf”</form></dsTypeModel>
<dsTypeModel ID=“ARCH1” ordered=“false” min=“1” max=“1”><form MIME=“application/tar”</form></dsTypeModel>..</dsCompositeModel>
tiff
tar
FormatRegistry
23
24Rutgers Community RepositoryRUCORE
24
Events and Outcomes
An event is an: • . . . action that involves at least one object, agent,
and/or rights entity (PREMIS).• . . . occurrence that is significant to the performance
of a task
Event outcome – a situation or state that follows an event and is a result of the event.
25Rutgers Community RepositoryRUCORE
25
Fedora Event Management
Generic Framework• Events can have messages which are associated with all types
of services (preservation, collection, user, etc)• Messages represent events with actions and outcomes• Fedora will provide a middle-ware messaging solution based
on open-source Java Messaging Service (JMS)
Fedora Working Group Focus• Preservation events are atomic (i.e. associated with a Fedora
API)• The event message will be based on the PREMIS event entity• Initial types: ingest, delete, modify, fixityCheck
26Rutgers Community RepositoryRUCORE
26
The Event Message
Event message structure• The message payload will be xml-based and use the PREMIS event
entity semantic units • Global identifiers (URIs) will be used for event type and outcome
An example might look like the following:<event><eventIdentifier>
<eventIdentifierType>Rucore event</eventIdentifierType><eventIdentifierValue>30169</eventIdentifierValue>
</eventIdentifier><eventType>info:premis/preservation/event/ingest<eventType><eventDateTime>2006-07-16T19:20:30</eventDateTime><eventDetail>(to be used for general information)</eventDetail><eventOutcomeInformation><eventOutcome>info:premis/preservation/outcome/success</eventOutcome><eventOutcomeDetail>(more text)</eventOutcomeDetail></eventOutcomeInformation><linkingAgentIdentifier>rutgers-lib:200</linkingAgentIdentifier><linkingAgentIdentifier>rutgers-lib:400</linkingAgentIdentifier><linkingObjectIdentifier>rutgers-lib:4291</linkingObjectIdentifier></event>
27Rutgers Community RepositoryRUCORE
27
Event Management - Ingest(Using the publisher/subscriber model)
XML
Digital Object Ingest
Workflow Management
System
User Input
DigitalObject
Repository(Fedora)
JMS
(sn
d/r
cv)
JMS Topic Queue
<eventType>ingest<>
<eventType>delete<>
<eventType>
<eventType>
<eventType>
PreservationService
(reporting)
JMS
(sn
d/rc
v)
PreservationService(alerting)
JMS
(sn
d/rc
v)