OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open...

38
OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island (Canada) July 8 - 12 DIGIBESS repository (economic and social science books of Piedmont Area, Italy) is indexed in international portals OpenDOAR, BASE, IESR, ROAR. The books are indexed in WorldCat Catalogue, PLEIADI, Cultura ITALIA and soon in Europeana. OAI-PMH OAI-PMH Data Provider Metadata Formats: OAI_DC and PICO #proai.properties ... driver.fedora.md.formats = oai_dc pico ... driver.fedora.md.format.oai_dc.dissType = info:fedora/*/DC driver.fedora.md.format.pico.dissType = info:fedora/*/openbess:dc2picoSdef/dc2pico BOOK Item Repository exposes OAI-PMH interface for external metadata harvesting. Metadata can be disseminated in two formats: OAI_DC and PICO. OAI_DC are extracted from object datastream DC. PICO are generated on-the-fly by an object service.

Transcript of OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open...

Page 1: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

OAI-PMH harvesting metadata and virtual datastreamG.Birello, I.Fucile, V.Giovanetti, A.Perin

Open Repository 2013 - Charlottetown, Prince Edward Island (Canada) July 8 - 12

DIGIBESS repository (economic and social science books of Piedmont Area, Italy) is indexed in international portals OpenDOAR, BASE, IESR, ROAR. The books are indexed in WorldCat Catalogue, PLEIADI, Cultura ITALIA and soon in Europeana.

OAI-PMH

OAI-PMH Data ProviderMetadata Formats: OAI_DC and PICO

#proai.properties...driver.fedora.md.formats = oai_dc pico...driver.fedora.md.format.oai_dc.dissType = info:fedora/*/DC

driver.fedora.md.format.pico.dissType = info:fedora/*/openbess:dc2picoSdef/dc2pico

BOOK ItemRepository exposesOAI-PMH interface for external metadata harvesting.Metadata can be disseminated in two formats: OAI_DC and PICO.OAI_DC are extracted from object datastream DC.PICO are generated on-the-fly by an object service.

Page 2: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

OAI-PMH harvesting metadata and virtual datastreamG.Birello, I.Fucile, V.Giovanetti, A.Perin

Open Repository 2013 - Charlottetown, Prince Edward Island (Canada) July 8 - 12

SERVICE DEPLOYMENTopenbess:dc2picoSdep-bookCModel

BOOK MODELislandora:bookCModel

getViewer

DC

RELS-EXT

hasService → openbess:dc2picoSdef

INDEX

PDF

TN

Dc2pico

RELS-EXT

hasModel → fedora-system:ServiceDeployment-3.0

isDeploymentOf → openbess:dc2picoSdef

isContractorOf → islandora:bookCModel

WSDL

address location="http://fc1.to.cnr.it:8080/saxon/"

binding verb="GET"

operation location="SaxonServlet?source=(DC)&style=(XSL)&clear-stylesheet-cache=yes“

METHODMAP

DatastreamInputParm parmName="DC"

DatastreamInputParm parmName="XSL"

MethodReturnType wsdlMsgName="response" wsdlMsgTOMIME="text/xml“

XSLXSLT for DC to PICO transformation

REFERENCE

web site: http://www.digibess.it web development: http://dev.digibess.it

An object method is made up of two objects: Service Definition and Service Deployment. Model and Service objects are connected by semantic relationships. Service Definition describes the method. Service Deployment describes how the method is executed: input and output parameters, web service location and static xslt datastream.

SERVICE DEFINITIONopenbess:dc2picoSdef

DC

RELS-EXT

METHOD MAP

hasModel → fedora-system:ServiceDefinition-3.0

Method operationName="dc2pico”

DC

DSInputLabel → DC

DSInputLabel → XSL (pid="openbess:dc2picoSdep-bookCModel")

DSINPUTSPEC

Page 3: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Is this your hardware plan?

1. Buy expensive hardware.

2. Pay expensive annual support fees.

3. Worry that you’ll outgrow your hardware.

4. Save up to buy more hardware in 4 years.

Page 4: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

You need a new plan. (So did we!)

Digital Repository Infrastructure: Rent or Buy?

Robin Dean & Ed FugikawaAlliance Digital Repository

Colorado Alliance of Research Libraries

Save MoneyImprove Performance

Prepare for Growth

Page 5: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Clever CrosswalkingStarting point• Repository: mainly theses• Research management system

Challenges• Different metadata granularity• Good repo workflows but RMS now

primary data source

Y. Zhao, K. Shepherd, L. Hayes – The University of Auckland, New ZealandA. Schweer – Library Consortium of New Zealand

Integrate repository & RMS(once-off sync, then ongoing)

Page 6: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Bending the rules without breaking the repo: Using free

RDF description in Fedora Commons repositories

Adam SorokaUniversity of Virginia

Page 7: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Backend

Repository Cloud Service in JapanCRUD

SWORD 2.0

Search

OpenSearch

Society Copyright Policy KBEmail alert or

SWORD deposit

Researcher CV Platform

Open Repository

200000+ Users

A mash-up of a Japanese Open Repository and a Researcher CV Platform

Institutional Repositories

Page 8: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Crowdsourcing HCI for the institutional

repository

Stephanie Taylor, Critical Eye CommsEmma Tonkin, University of Bristol

Page 9: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Learn about the NEW hosted service from DuraSpace!

And how you can save yourself time by letting DuraSpace handle managing your DSpace repository software.

DSpaceDirect includes: Repository quick-start

You-pick features No-cost upgrades

Content safeguards Anytime data access

There will be aliens! And roses, too!

Page 10: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

We propose that a new metadata/annotation can be collected from access logs of a HTTP server of IR.The access logs of the IR from search engine contain search queries that relate to the contents of IR.A logger program sends the obtained metadata to IR using SWORD.

Automatic reproduce metadata from the log of HTTP server

HTTP Server Log

Log analysisqueryResource id

New metadata

SWORD

RepositorySystem

Resource metadata

Toshihiro Aoyama, Yuta Suzuki, Kazu Yamaji

Page 11: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

P-CUBE Major Modules

Object RelationsWhat is P-CUBE ? P-CUBE Data Model Architecture

P-CUBE Lifecycle Actors’ Role

Suntae Kim ([email protected])

Page 12: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Redirecting Web service for ORCID to scholarly systems

via the Researcher Name Resolver

Kei Kurakawa and Hideaki Takeda

National Institute of Informatics, Japan

National Researcher identifier

International Researcher identifier

National scholarly systems

Campus Directories

Page 13: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Challenge to Data-intensive science: cooperation of metadata database for upper atmospheric

research and author IDYuki. KOYAMA* et al.

*World Data Center for Geomag., Kyoto & Graduate School of Sci. Kyoto Univ.

DOI

ORCID

with Role

(e.g., PI, Archive Spefialist)DataCite

Japan Link Center (JaLC)

Granule

Page 14: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Client based interface and proxy server for content re-

use framework based on OAI-PMH

Takao NamikiHokkaido University

Page 15: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

FEDORA COMMONS BASED FRAMEWORK FOR AGGREGATION, REUSE AND DISSEMINATION OF THE DIGITAL CONTENTMartin LhotákLibrary of the Academy of SciencesCzech RepublicOpen Repositories 2013

Page 16: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

FEDORA COMMONS BASED FRAMEWORK FOR AGGREGATION, REUSE AND DISSEMINATION OF THE DIGITAL CONTENT

- open source system for a digital library

- digitization workflow monitoring system

- digital document production and archiving system

http://www.czechdigitallibrary.cz

Page 17: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Collaborative repository to support food and feed

safety risk assessment in Europe

Jane Richardson, Lara Congiu, Cristiano Morganti, Patrizia Pirro, Elisa Aiassa, Sadia Noorani,

Diane Lefebrve, Didier VerlooEuropean Food Safety Authority

Page 18: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Defiant Objects

Managing non-standard research outputs in institutional repositories

www.sherpa-leap.ac.uk

Page 19: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Giving them what they want: Using Data Curation Profiles to guide Datastar developmentSarah Wright1, Dianne Dietrich1, Huda Khan1, Wendy Kozlowski1, Leslie McIntosh2, Gail Steinnhart1. 1: Cornell University, 2: Washington University in Saint Louis

Page 20: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

The ability to apply standardized metadata from your field or discipline to the dataset.

The ability of the general public to easily find the data set.

Documentation of any and all changes that were made to the dataset over time.

A requirement that others cite the data set if they were to use it in their research.

The ability to enable version control for the data set.

The ability to track data citations.*

The ability to cite the dataset in my publications.

The ability for people to easily discover the dataset using Internet search engines.

The ability to create a basic, public description of (and provide a link to) my data.*

0 1 2 3 4 5 6 7 8

High priority Medium priority Low priority Not a priority I Don't Know or N/A

What did they want?

Page 21: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

AN INVESTIGATION INTO JOURNAL RESEARCH DATA POLICIES

jord

The Findings of the JoRD Project

Azhar Hussain, Marianne Bamkin, Paul Sturges*, Jane H Smith, Bill Hubbard

Centre for Research Communications, University of Nottingham* Loughborough University

OR2013

Page 22: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

AN INVESTIGATION INTO JOURNAL RESEARCH DATA POLICIES

jord

What is the JoRD project?

• Scholarly journals are increasingly recommending or requiring as a condition of publication that research data should be made available in an appropriate repository

• Different journals, different requirements and recommendations • JoRD was a 6 month feasibility study (July-Dec 2012) commissioned by JISC

• Tasked to scope the shape of a potential service to collate and summarise journal research data policies to provide an easy source of reference to understand requirements and recommendations made by journal editorial boards with regard to data sharing

Page 23: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Wendy WatkinsErnie Boyko

Carleton UniversityCANADA

Page 24: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Introducing new technology is 20% tech and 80% culture.

Page 25: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Physics and Astronomy (comPADRE

)

Customizing STEM Instruction with Educational Digital Libraries Open Repositories

2013

District

Teacher

Students

Student Stude

nt

Group of

Students

Students

Publisher

Materials

(Purchased

by Districts)

Teachers’

private

materials

Shared materials among teachers

Digital Library for Earth System

Education (DLESE)

Other Resourc

es

National Science Digital Library (NSDL)

Page 26: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Students

Student Stude

nt

Group of

Students

Students

Physics and Astronomy (comPADRE

)

Customizing STEM Instruction with Educational Digital Libraries Open Repositories

2013

National Science Digital Library (NSDL)

Digital Library for Earth System

Education (DLESE)

District

Teacher

Teachers’ private

materials

Shared materials among teachers

Publisher Materials

(Purchased by Districts)

Other Resourc

es

NSDLPublish

er

Materia

ls

DLES

E

Shared

materials

Publish

er

Materia

ls

Teacher

s’

private

materia

ls

NSDL

Shared

materials

The Curriculum Customization Service

Page 27: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Link it or Don't Use It

Transitioning Metadata to Linked Data in Hydra

Karen EstlundHead, Digital Scholarship CenterUniversity of Oregon Libraries

Tom JohnsonDigital Applications Librarian

Oregon State University Libraries

Page 28: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

The Repository as Data (Re) User: Hand Curating for Replication

Yale UniversityInstitution for Social and Policy Studies

Limor Peer

How does the ISPS Data Archive

re-use data?

How does replication

drive curation at the ISPS Data

Archive?

Page 29: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

http://livelymorgue.tumblr.com

http://reusesymbol.maker.good.is/projects/CedricCummings

From this…

To this…

Page 30: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.
Page 31: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Phase One of CED2AR Comprehensive Extensible Data Documentation and Access Repository

Block, WilliamLagoze, CarlBrown, WarrenWilliams, JeremyVilhuber, LarsAbowd, JohnArguillas, Florio

Open Repositories 2013Repository Island Charlottetown, PEI, Canada July 8-11, 2013

Page 32: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Cornell Institute for Social and Economic ResearchA LEADER IN SOCIAL SCIENCE AND DATA COMPUTING

• $3 million spread over 5 years• CED2AR is a metadata repository that integrates metadata

of multiple versions and derivatives of datasets produced or managed by the U.S. Census Bureau that reside in public and/or restricted spheres.

• Phase one addresses the challenges faced by the NSF-Census Research Network (NCRN) in terms of integrating metadata from disparate sources, such as format disparity, schema disparity, and sparseness of metadata• See our solution for syncing metadata records from restricted and public-use datasets• See how we standardized disparate metadata resources (such as metadata from SSB,

IPUMS, and ACS) • See how we switch on and off confidential metadata at the variable and value label• See our user interface for searching variables across datasets

Open Repositories 2013, Repository Island, Charlottetown, PEI, Canada July 8-11, 2013

Page 33: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Development of the Health Research Data Repository and the TREC Longitudinal Monitoring System

James Doiron, Manager, HRDR , University of Alberta, Canada

Open Repositories 2013Charlottetown, PEI

Page 34: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Health Research Data Repository (HRDR)Based in the Faculty of Nursing, University of AlbertaSecure virtual environment for housing and managing

health research data and metadata throughout their lifecycle

Supports and promotes health research and multi-disciplinary collaboration

Provides secure remote access to data and a regular suite of analytic software

Clearly operationalized policies and procedures, with user orientation process

Operates on a ‘minimal cost recovery’ basisPromotes and provides educational opportunities

regarding data management based on best practicesPromotes secondary use and re-purposing of research

data

Page 35: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Holistically Preserving and Presenting Complex Research

Data

Uploading

Modeling

Storage

Accessing

Using RDF

Researchers submit their research data in whatever form it is captured, recorded, or saved. The file structure can be complex and deeply nested. All file formats are accepted.

The file structure is captured and stored in a METS Structural Map. The Structural Map stores the directory names, system identifiers to the files and the file hierarchy n-levels deep.

At this point the files can be stored simply and the underlying storage mechanism does not

need to preserve the original resources hierarchy or directory structure.

Using Fedora the files are stored as separate datastreams and are

managed as a single object.

Managing the resource as a single object reduces the

burden on the researcher of providing item-level

cataloging.

If the resource does require item-level

cataloging the RUcore metadata model does

support source, technical, and event-based metadata.

Chad Mills - [email protected] Document

When the resource is accessed through a public

interface the existence of the Structural Map indicates that

it is a complex resource. Using the Structural Map, JSON and

JavaScript an interface is rendered that allows the users to select all or

parts of the resource for downloading.

Once selections are made a package file, TAR/ZIP, is generated and streamed back to the

user. The package file follows the BagIt file packaging developed by the Internet Engineering Task Force.

Research project resources are related to the project using RDF statements. Using similarconcepts an entire project can be packaged and downloaded based on the projects RDF statements. Whena Structural Map is detected it isused on that portion of the package.

The example screenshot in the middle is the user interface when downloading an entire research project from RUcore.

Page 36: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Addressing Impediments to Reuse – The Open Folklore Portal

James Halliday (Programmer/Analyst)Julie Hardesty (Metadata Analyst)

Jennifer Laherty (Digital Publishing Librarian)Garett Montanez (Lead Web Architect)

Page 37: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

1Open Repositories 2013

Nikos Kasioumis, CERN, Digital Library Services

Page 38: OAI-PMH harvesting metadata and virtual datastream G.Birello, I.Fucile, V.Giovanetti, A.Perin Open Repository 2013 - Charlottetown, Prince Edward Island.

Open Repositories 2013

Nikos Kasioumis, CERN, Digital Library Services