Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

32
Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009

Transcript of Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Page 1: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Fedora Service FrameworkSimple Queue Services

For fulfillment of the Mellon Grant

June 29, 2009

Page 2: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Simple Queue Services

• Provide a simple, reliable way to connect content-related infrastructure services to:– Enable moving notifications and content between services and

repositories– Perform tasks using decoupled, reusable services– Enable easy reuse and repurposing of services as programmable flows

• Inspirations– Amazon “Simple Queue Services” (FOSS Implementation)– Tom Cramer, Stanford Library “Work Do” workflow (via Hydra)– Richard Rogers, MIT Libraries “Cloud Task Replica”– NSDL NCORE

Page 3: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Example FSF-SQS Application

Request Queue

Response Queue

File System Or DuraspaceOr Naked Akubra

Or Fedora Repository

SimpleIngest

Service

PortableIngestClient

Validation Service(e.g.)

CustomIngestClient

Browser

Page 4: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Example Chained FSF-SQS Application

Request Queue

Response Queue

Staging or Institutional Store

SimpleIngest

Service

Request Queue

Response Queue

Appraisal Service(e.g.)

Validation Service(e.g.)

PortableIngestClient

FedoraRepository

Service

Page 5: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Example Replication FSF-SQS Application

Request Queue

Response Queue

NotificationPollingService

Request Queue

Response Queue

Fedora Ingest

Service

Transform Service

ExistingClient

Metadata

Bitstreams DSpace Fedora

Repository

Page 6: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Fedora Repository ServiceGSearch

OAI

Ingest

SimpleJMS

Service Integration

More…

First, we are providing simple messaging (via ActiveMQ in Fedora 3.0)

repository publishes events

Serviceslisten andconsumeevents or other messages

Next, lightweight integration with workflow engine(s); orchestration

Original FSF Messaging Concept

Did not get implemented

No message ingest method

Page 7: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Collective Experience• Domain Characterization (reference Mellon ESB Study):

– Limited governance structures– High developer turnover– Rapid environment changes– Cost-sensitive

• Examples:– RepoMMan and Remap (BPEL)– Hydra (three approaches)(Dlib)– eSciDoc plus others (Red Hat jBPM)

• Northwestern Books• Trident Project

• Conclusion:– Using full-featured workflow systems will be difficult for the majority of our

targeted organizations

Page 8: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Amazon’s Simple Queue Service

• Amazon SQS• Implemented as a service within Amazon’s Cloud• Less capable but much simpler than direct JMS• Limited to an 8K message body with no attachments• SOAP and Query (aka Web) API• Messages are durable for 4 days• Messages are locked while processing

Page 9: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Amazon’s SQS API• CreateQueue: Create queues for use with your AWS account.• ListQueues: List your existing queues.• DeleteQueue: Delete one of your queues.• SendMessage: Add any data entries to a specified queue.• ReceiveMessage: Return one or more messages from a specified queue.• ChangeMessageVisibility: Change the visibility timeout of previously received

message.• DeleteMessage: Remove a previously received message from a specified queue.• SetQueueAttributes: Control queue settings like the amount of time that messages

are locked after being read so they cannot be read again.• GetQueueAttributes: See information about a queue like the number of messages in

it. • AddPermission: Add queue sharing for another AWS account for a specified queue. • RemovePermission: Remove an AWS account from queue sharing for a specified

queue.

Page 10: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Rogers’ “Cloud Task Replica”

• OR09 Presentation • Oriented to Cloud characteristics• Uses lightweight interfaces and queuing, highly-decoupled• Primarily focuses on replication use cases• At prototype stage

Page 11: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

“CTR” - Roles

• decompose work into distinct replaceable agents

• archive = content home• replicator = manages copies• auditor = implements and enforces policy• role != institution

Page 12: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

“CTR” - Process Model

• a message queue for each role• message post triggers activity

asynchronously• bucket brigade - message is a handoff or

acknowledgment• storage is abstracted

Page 13: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

“CTR” - Workflow: Replication

archive replicator auditor

S3

Page 14: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

“CTR” - Message Semantics

• web-standard URI addressing• entities: packages, ORE maps• content model agnostic• entity checksums for integrity • standard identifiers for actors

Page 15: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Stanford’s “Work Do” Workflow• Puts the resource management state inside the Fedora digital

object• Each application is read the object and performs its function• Able to support both human workflow and BPE• Uses logical queues to manage workflow (no messaging SW)• Depends on applications doing the right thing• Simplifies governance to resource management semantics

and representation

Page 16: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

“Work Do” - Approach• Each object in DOR has:

– a locally defined resource-management metadata– a special Datastream to describe processing conditions and

their state for that object.• Places work-related information in the object:

– it can be indexed (using SOLR or other search engines)– co-located alongside other useful processing information– contains collection and selector identity to mark records

ready for a particular process.

Page 17: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

“Work Do” – Process Model• Simple queries are used to:

– establish logical queues– queues define the work ready for a particular robot or

human interaction at any given time.• Queries also provide:

– ongoing management information about the flow of objects through the system

– can be exposed as facets in an administrative discovery environment

• Simple REST based interactions based on Fedora service calls are used to identify queues and update state.

Page 18: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

“Work Do” – Process DataA workflow datastream in each object describes processing requirements

and status

<workflow id=“googleScannedBookWF" status="active” …> <process name="register-object" status="completed” attempts="1" /> <process name="desc-metadata" status="completed” attempts="1" /> <process name="google-convert" status="completed” attempts="1" /> <process name="google-download" status="exception” message="Item for barcode 0339518 not found" attempts="3" /> <process name="create-pages" status="waiting” attempts="0" /> <process name="ingest" status="waiting” attempts="0" /> <process name="shelve" status="waiting” attempts="0" /> <process name="cleanup" status="waiting” attempts="0" /></workflow>

Page 19: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

FSF-SQS Development Approach

• Merge selected aspects of Amazon, Stanford “Work Do”, and MIT “Cloud Task Replica” approaches

• Enable moving notifications and data between repository services

• Mostly integration of existing FOSS, minimal new build• Extends existing ActiveMQ implementation

– Adds tools for moving data– Adds additional language bindings likely using Stomp– Realizes promise of completing asynchronous messaging– Can be extended later to include business rules engine, full workflow– Can be extended to Cloud implementations (Amazon, Eucalyptus)– Note: No FOSS implementation currently available for Amazon SQS

Page 20: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Targeted Use Cases

• Bi-directional replication between Fedora repositories– initial and ongoing– possibly update

• Uni-direction replication from DSpace to Fedora– initial and ongoing

• One-time ingest (ETL) from legacy repositories• Validation services• Selected workflows (TBD)

Page 21: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

FSF-SQS Implementation

• Would prefer to use FOSS implementation of Amazon SQS interface

• Fallback is to use other products directly• Under investigation:

– ActiveMQ integrations including Apache CXF– Mule– Apache Camel– FUSE ESB 4 (Apache ServiceMix – Mellon ESB top

recommendation)• Note: “Bus In the Cloud”• Note: “Is Eucalyptus ready to be your private cloud?”

Page 22: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Don’t Need to Build

• Messaging (ActiveMQ)• Language Bindings, Brokers/Gateways (e.g. Stomp)• ESB (e.g. Camel, Mule) or Workflow (e.g. jBPM, Kepler)• Most services• Business integration patterns (but will have to choose)

– Document (send object, action and content through)– Disconnected (temporarily put the content in storage or in Fedora and

incrementally perform actions) – Notification (events only)

Page 23: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Do Need to Build• Service Wrappers (or request from community)• FSF-SQS based on Amazon SQS in ActiveMQ possibly with Mule• Message payload formats include resource processing state• DSpace to Fedora extract, transform, transfer and load flow• Replacement for Diringest service (maybe)

– Chris Wilper wants this work done– Needs to handle content without requiring FOXML wrapper, manifest– Good to use Fedora Content Models where feasible– Be extensible– Needs some common components with FC-REPO WebDAV– Support Messaging and Web end-point (brokers/gateways)

• Portable client (partial SIP builder replacement)(maybe)– Works both client or server-side (consider Python, Ruby, Flex)– Works with or without manifest, synchronous and asynchronous– Simple, Simple, Simple on-ramp client for entry-level users

Page 24: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Advantages and Drawbacks

• Advantages– Messaging is the simplest of the enterprise methods– Low risk since simplifying approaches may be taken at may points– Has been requested many times by large repository users– Immediately useful– Fits overall Mellon goals

• Drawbacks– Does not include a named “workflow” product though workflow term

used by Amazon and others to describe this approach– Meat and potatoes type implementation does not excite people

Page 25: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Details

Page 26: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Integrate a Simple Queue Service

• Demonstrates a lightweight ingest pipeline using off-the-shelf open source technology (ActiveMQ with REST brokers/gateways)

• Performs the services selected by the Simple Ingest Service web application

• Work consists mostly of integration tasks with building some service wrappers

• Service code is to be selected only from existing off-the-shelf FOSS

• Provides a model for integration with the Fedora Repository• The specific products/languages for services to be determined

when the use cases and partners are well characterized

Page 27: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

FSF-SQS Integration Patterns

• Enterprise Integration Patterns• Document (object, actions/state and content

in message)• Disconnected (object and content stored in

file systems, Akubra, DuraCloud or Fedora during processing, actions/state in message)

• Notification (actions in message, state, object and content elsewhere)

Page 28: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Potential Demonstration Services• Create derivative forms• Format conversion• Verify Checksum• Virus scanning• Validate object• Validate datastream format (and label or check FORMAT_URI and MIME-

type)• Get non-Fedora PID• Metadata feature services (feature extraction with write into FOXML or

datastream)– JHOVE– iVia (Descriptive metadata generation plus other services)

• Many other services possible but a few key selections should be incorporated leaving room for later additions

Page 29: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Workflow States• Object State

– State of a data object at a point in time– Can be contained in the object and reflected on

• Process State– State of an instance of a processing flow– Workflow engines designed to handle this– Long running vs. short running

• Event State– General notion of “event” is a statement which is reflected on– PREMIS-like “preservation event” is more of a process

• Person State– Characteristics of a person (actor) with respect to objects, processes, or

events– (e.g. requirements fulfilled by a PH.D. student to graduate)

Page 30: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Build a Simple Ingest Service

• Directory/file ingest (Diringest replacement)• Web application (server-side service)• Generates FOXML for transferred content• Supports content models where practical (also

needed for WEBDav interface)• Use lightweight ingest pipeline described

below to perform the pre-ingest preparation services

Page 31: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Build a Portable Ingest Client

• Ingest a single file or a directory• Choose the content model (if any) from menu • Choose what pre-ingest services to perform on the

content from menu• Works both as a Web App and as a Desktop App• Communicates by Web (REST) and messaging via

broker/gateway• Later can be extended more towards FedoraShare

concept• Consider scripting framework Python, Ruby, Flex

Page 32: Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Content Models

• Hydra Content Models