Islandora Overview: PASIG May 2013

Post on 29-Jan-2015

105 views 1 download

Tags:

description

An overview of the Islandora project and open source framework, including sample productions sites. Islandora is a digital asset management system that can accommodate any type of data, and is designed for digital library collections, research data, enterprise document management, and more.

Transcript of Islandora Overview: PASIG May 2013

Islandora OverviewMark Leggott, University of PEI/DiscoveryGarden

PASIG - Washington DC May, 2013

Note: Red text indicates a link.

Open Source

Islandora 101

Project Foundations

• Developed at University of PEI (2007)

• UPEI has FT staff and project staff (AIF)

• DiscoveryGarden is commercial services/support company - sustainability

• 25+ staff at DGI, 6 at UPEI

• Both teams maintain/contribute to code

Conceptualizing

Initializing

Creating/Analyzing

Reporting

Formalizing

Popularizing

Research Institutes

Libraries & Archives

Museums

Media Organizations

Health Centres

Government Agencies

Private Companies

Universities & Colleges

NGOs & Non-ProfitsOther

Access Collaboration Preservation

E-Mail, Letters, Published Research,

Requirements

Meeting Minutes, Grants, Data Collection,

Acquisitions

Forms, Data, Cataloguing,

Findings, Discussion

Reports, Theses, Datasets, Visualizations

Articles, Curricular Content, Policies,

Exhibits

Blogs, Twitter, Newspapers, iTunesU, Flickr

Information Life Cycle

Object Space

User Space

Individual

Group

Department

Museum

University

External

Private Shared Open

Collabora

te

Publish

Re-U

se

Cre

ate

Preservation, Migration, Transformation

Basics

• Drupal+Fedora+Other OS = ecosystem

• Flexible UI on top of Fedora + other apps

• Support for 180+ languages via Drupal

• Focus on robust preservation features and services + flexibility in data models and UI

• VM/code, documentation, lists, Camps

Key Components

• Core - Islandora, Tuque, Solr, XML Forms, FITS, Workflow, Solution Packs (SPs)

• SPs - add specific+tested functionality

• Image, Large Image, PDF, Audio, Video, Book/Paged Image Document

• Newspapers, Digital Humanities, IR, Chem

• DuraCloud integration via Vault Module

Drupal UI

HTML

CSS/Themes

User Roles/PermissionsEditorial WorkFlow

Modules (LDAP/BibUtils/Etc

HooksSPARQL

LDAP

FCK

BibUtils

FormsAPI

ImageAPI

Tabs

Drupal UI

HTML

CSS/Themes

User Roles/PermissionsEditorial WorkFlow

Modules (LDAP/BibUtils/Etc

HooksSPARQL

LDAP

FCK

BibUtils

FormsAPI

ImageAPI

Tabs

XSLTs PHP/Python Snippets

Micro Services Engine

Tika

Kakadu

SWFTools

OOffice

Djatoka

Tesseract

R

Wowza

Islandora

JMS JMS

Code Snippets/Applications

Drupal UI

HTML

CSS/Themes

User Roles/PermissionsEditorial WorkFlow

Modules (LDAP/BibUtils/Etc

HooksSPARQL

LDAP

FCK

BibUtils

FormsAPI

ImageAPI

Tabs

XSLTs PHP/Python Snippets

Micro Services Engine

Tika

Kakadu

SWFTools

OOffice

Djatoka

Tesseract

R

Wowza

Islandora

JMS JMS

Code Snippets/Applications

Process

Any Metadata&

Any Data

Solr/Lucene

GSearch

Mulgara

MySQL

Content Models XACML Policies

Fedora

SPARQLREST

SOAP

Drupal UI

HTML

CSS/Themes

User Roles/PermissionsEditorial WorkFlow

Modules (LDAP/BibUtils/Etc

HooksSPARQL

LDAP

FCK

BibUtils

FormsAPI

ImageAPI

Tabs

XSLTs PHP/Python Snippets

Micro Services Engine

Tika

Kakadu

SWFTools

OOffice

Djatoka

Tesseract

R

Wowza

Islandora

JMS JMS

Code Snippets/Applications

Process

Any Metadata&

Any Data

Solr/Lucene

GSearch

Mulgara

MySQL

Content Models XACML Policies

Fedora

SPARQLREST

SOAP

• Fedora Object Model

• Flexibility supports any data model

• Atomistic and compound objects

• Support for RDF allows integration of specific ontologies

imagined:208361 (PID)

Object Properties

Relations (RELS-EXT)

Dublin Core (DC)

Audit Trail (AUDIT)

JP2K Web (JP2)

JP2K Archival (LOSSLES_JP2)

Low Res JPEG (JPG)

Thumbnail (TN)

Descriptive Metadata (MODS)

Object Model - IslandImagined/Large Image

Digital Object Identifier

System PropertiesManage & Track Object

Reserved DatastreamsKey Object Metadata

DatastreamsAggregates Content Items

• MicroServices

• PHP/Python/Java

• Drives integration of external services for data transformation +

• Log via Fedora audit

• Taverna integration

• Simple Workflow Module

• Simple approach to Editorial Workflow

• Provides “human” nodes in the services framework

• Upcoming version support more granular controls and workflow states/actions

• XML Form Builder

• Create a rich form for any XML schema

• Multiple forms for specific schemas

• Control access via security policies

Administration

• Flexible admin options

• Standard Drupal admin functions

• + ability to maintain aspects of Fedora and other apps via Admin interfaces

• Solution Packs increasingly adding greater configuration options

Preservation Services

• Fedora provides robust service framework

• TechDS+DescDS+RightsDS+AuditDSs transformed to a Dynamic PREMIS record

• Adding DuraCloud support via “Vault”

• Adding Archivematica integration as an optional preservation component

Islandora Community

Community

• Estimate 150+ Islandora sites worldwide in production or development

• 500+ people on Google Groups List

• Some projects starting to contribute back

• Libraries bulk of use now, but includes museums, archives, private companies

discoverygarden

• Commercial UPEI spin-off - full service

• Installation, Configuration, Customization

• Support, System Audit, Consulting

• Hosting, Platforms, Vendor partnerships

• Primary codebase contributor

DGI and Oracle

• discoverygarden working with Oracle to test/certify Islandora on Oracle systems

• SAM/QFS optimization for HFS

• Non-profit membership organization

• Provides members with a range of services, including Islandora hosting/setup

• Shared/Individual/Group repositories

• Working with discoverygarden to provide customization services when desired

Code

Releases

• General goal is to release 4 times per year, or now 7-8 with 2 versions

• Latest “Islandora 6” for March

• First full “Islandora 7” for March

• Goal is to release bug fixes for 6, focus on new developments in 7

Islandora 6

• March 2013 Release

• Improved documentation, print book

• XACML Editor, Workflows

• Forms Autocomplete, FITS integration

• Smoother SP Installation

Islandora 7

• 1st full release for Drupal 7

• New admin interface/functions

• All new SPs, SeaDragon, IAV

• Complete integration of Tuque API

• Clip tool for SeaDragon

Contribs

• WARC SP (Nick Ruest, York)

• Administrative Dashboard (Peter MacDonald, Hamilton)

• Relationship Editor/Ontology Management (Giancarlo Birello and Rosie Le Faive)

• Batch Ingester (Colorado Alliance)

• Black Thumbnail Bug (Aaron Collie)

Standard SPs

• Image, Large Image, Audio, Video, Book, PDF, Newspaper

• Includes MODS for, DC mapping, sample data, viewer(s), TechMD extraction

• Solution Pack module makes it easier to create new ones, modify existing

Book SP

• Code simplified and made more modular

• Can enable IA viewer for books, Open Seadragon for page images

• Tesseract OCR support standard

• Page manipulation, PDF creation

Image/Large Image SP

• GIF/PNG/JPG + TIFF/J2K support

• Conversion of TIFF to J2K

• DC + MODS

• Option to use OpenSeadragon viewer

Tools Modules

• FITS Extractor, creates technical metadata

• Batch Import (RIS, EndNote, PubMed, DOI)

• OCR, Tesseract with OCR/HOCR

• MARCXML, ingest and view MARC data

• XACML Editor, rebuilt XML FormBuilder

Bridge

• Upcoming module which will allow tighter integration with Drupal, using Ver 7 Nodes or Entities

• Create content via Drupal - sync’d to Fedora and visa versa

• Facilitate re-creation of entire repo, including interface a future goal

DropBox

• Alpha module provides sync between DropBox and Islandora

• Creates Collection objects for each folder and a separate file object for each contained file with all relationships

• Provides basic DC record for metadata

• Upcoming for Google Drive, DataFlow ++

Fedora 4

• Fedora Futures project to review & rebuild Fedora for next major release

• Looking to provide better support for large files, large collections and optimized ingest

• Pilot project us using ModeShape as the core repository

• Islandora team already has pilot integration

Roadmap

• More SPs: Research Data, Digital Humanities, Chemistry, Conferences

• Image Annotation tool (Shared Canvas from Stanford - OAC compliant)

• Full Bridge development

• Integration of Microservices + Taverna

Trying Islandora

• Try production sites (list on last slide)

• Play in sandbox.islandora.ca (cleaned daily)

• Download VM from islandora.ca

• Install code referring to documentation

• iCamps: PEI, Europe, Australia, US east+west

• Documentation: Jira, videos, GitHub, Jenkins

Islandora in the Cloud

DuraCloud

• UPEI and DGI committed to supporting DuraCloud in the Islandora interface

• Works with CloudSync as the bridge between Fedora and DuraCloud

• Can be used with or without Islandora managed collections

DGI Examples• DGI 5 Islandora clients using DC + backup

• Largest has 2 TB of mostly image J2Ks

• full site (objects/MySQL/Drupal) with DC and DCStool using Continuous mode

• + backup up using Zmanda/S3: D/W/M/Y

• + experimental backup to Glacier

• 3 sites using DC/DCS for full backup of IR

Islandora Vault Module

• New module for managing DC+CS services

• “Vault” component on Manage Tab

• Manage CS sets/tasks for Collections

• View Health Check at Object level (e.g. check for matching checksums)

• Defining default actions for mismatches

Vault Setup

Collection Restore

Object Health

Reports

Next Steps

• Tighter integration and more UI functions

• Automated recovery (Auto vs Manual)

• Full Fedora/Collection restore

• Support for private DuraCloud instances

• Add integration with Glacier+

DGI Platforms

• Islandora Platform solutions from discoverygarden released at OR in July

• Initial offerings IR and Digital Collections

• 1-button setup/payment/management

• Additional platforms before end of 2013

• Research Data and Digital Humanities

Sustainability

• Non-profit Islandora Foundation will help maintain code, documentation, training, community participation and more

• Membership model

• Partner - $10K, Board, Resources, Camps

• Collaborator - $4K, Roadmap

• Member - $2K, links

Progress

• Non-profit registered

• UPEI and discoverygarden Partners

• Commitment from other members in 1st month sufficient to hire 1 staff person

• Goal is to have 2 FT staff by Fall 2013

Research Data

@

Physical Data Model

• UPEI/DGI developing a generic data tool to work with systems researchers use now

• Provide a range of filesystem sync tools

• Minimal service - store data in repository

• Enhance with metadata, transform services

• Project metadata CASRAI/VIVO/CERIF +

@

Fedora Repository

DescMDTechMD

AdminMDAssets

Local File System

DropBox

Box.net

DataStage

Google Drive

Private Cloud

Storage

Generic Research Data SP

(+ Standard SPs, Viewers)

Sync

Extract

Transform

Enrich

Check

MintTaverna DataCite

FITS + Authority

Islandora Generic Research Data Architecture

Islandora Framework

Islandora VRE (Virtual

Research Environment)

Islandora IR (Research Articles)

BackupsRegional &

National TDRs

@

Intellectual Data Model

• Smithsonian/DGI developing Sidora system to respond to specific research data needs

• Custom interface, Content Models and Forms, adding Taverna/R integration

• Camera trap images, archaeological data, carbon sequestration data

• File browse interface for all operations

@

Fedora Repository

DescMDTechMD

AdminMDAssets

Image SP + FGDC,

DwC

Numeric Data SP +

FGDC, DDI

Panama Dig Data +

LIDO

Research Articles

Sidora Application

Taverna R

FITS + Authority

The Smithsonian Data Architecture

Islandora Framework

Sidora

@

Intellectual Data Model

Physical Data Model

@

@

@

@

Additional

• Domain specific Solutions Packs for 2013

• Digital Humanities

• Chemistry

• Biodiversity

• Taverna+R++ integration

Examples

Institutional Repository

Digital Collections

Research Data

UPEI VRE

• Rich implementation of Islandora

• Used for digital stewardship of research, administrative and learning assets of UPEI

• Over 150 VREs with wide range of features

• VRE Management Team with 4 librarians

• Standard no cost, extra features charged

Consortia

Admin Collections

Links• General: islandora.ca, discoverygarden.ca, islandora.ca/if, sandbox.islandora.ca,

wiki.duraspace.org/display/FF/Fedora+Futures+Home, duracloud.org

• Code: github.com/Islandora, jenkins.discoverygarden.ca, travis-ci.org/Islandora/islandora/pull_requests, wiki.duraspace.org/display/ISLANDORA/Islandora, jira.duraspace.org/browse/ISLANDORA

• Institutional Repositories: islandscholar.ca, digital.march.es/ceacs-ir, digital.grinnell.edu/drupal/, digitalunc.coalliance.org/

• Digital Library Collections: peildo.ca, digital.march.es/clamor, digital.march.es/merce, newspapers.vre.upei.ca, mirc.sc.edu, islandimagined.ca, vre2.upei.ca/pwc/, atmintis.mb.vu.lt/en, unbound.williams.edu

• Research Data: library.upei.ca/vre, www.taverna.org.uk/, vdp.vre3.upei.ca/, modernistcommons.ca, vre2.upei.ca/herbarium/, discoveryspace.upei.ca/parca, discoveryspace.upei.ca/quantumchem/, upeikerrlab.ca

• Consortia: cairnrepo.ca, adrresources.coalliance.org

Note: some of these sites require authentication access - contact Mark for more information.

Questions?Mark Leggott - University of PEI/discoverygarden

mleggott@upei.ca

Kathleen Van Ekris - discoverygardenkathleen@discoverygarden.ca