NISO ResourceSync Training Session

74
NISO Training NISO Training ResourceSync: A Web-Based Resource Synchronization Framework December 3, 2013 Speakers: Bernhard Haslhofer - Postdoc Research Associate Department of Computer Science, University of Vienna Simeon Warner - Information Science, Cornell University http:// www.niso.org /workrooms/ onixpl -encoding/

description

ResourceSync core team members Bernhard Haslhofer and Simeon Warner will present on the ResourceSync specification and provide practical examples and scenarios for its application.

Transcript of NISO ResourceSync Training Session

Page 1: NISO ResourceSync Training Session

NISO Training

NISO TrainingResourceSync: A Web-Based

Resource SynchronizationFramework

December 3, 2013

Speakers: Bernhard Haslhofer - Postdoc Research Associate

Department of Computer Science, University of Vienna Simeon Warner - Information Science, Cornell University

http://www.niso.org/workrooms/onixpl-encoding/

Page 2: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync:A Web-Based

Resource SynchronizationFramework

ResourceSync is funded by The Sloan Foundation & JISC

#resourcesync

2

Page 3: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

3

This is a short version of the complete ResourceSync tutorial,which is available at

http://www.slideshare.net/OpenArchivesInitiative/resourcesync-tutorial

Page 4: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

4

ResourceSync Tutorial History

• OAI8, June 2013 – Open Repositories, July 2013 – JCDL, July 2013 – TPDL 2013, September 2013 –LITA Forum, November 2013, SWIB November 2013, …

Presenters

Bernhard HaslhoferUniversity of Vienna

Simeon WarnerCornell University

Page 5: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Martin KleinLos Alamos National Laboratory<[email protected]>

@mart1nkle1n

ResourceSync Tutorial Contributors

5

Simeon WarnerCornell University

<[email protected]>@zimeon

Herbert Van de Sompel Los Alamos National Laboratory

<[email protected]>@hvdsomp

Robert SandersonLos Alamos National Laboratory

<[email protected]>@azaroth24

Richard JonesCottage Labs

<[email protected]>@cottagelabs

Michael L. NelsonOld Dominion University

<[email protected]>@phonedude_mln

Page 6: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

6

OAI

Herbert Van de SompelMartin KleinRobert Sanderson(Los Alamos National Laboratory)

Simeon Warner(Cornell University)

Bernhard Haslhofer(University of Vienna)

Michael L. Nelson(Old Dominion University)

Carl Lagoze(University of Michigan)

NISO

Todd CarpenterNettie Lagace

University of Oxford

Graham Klyne

Lyrasis

Peter Murray

Page 7: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync Technical Group

7

JISC

Richard Jones

Stuart Lewis

OCLC

Jeff Young

LOCKSS

David Rosenthal

RedHat

Christian Sadilek

Ex Libris Inc.

Shlomo Sanders

Library of Congress

Kevin Ford

Paul Walk

Page 8: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Timeline, Status of Specification(s)

• August 2013o Release of ResourceSync framework Core specification

- Version 0.9.1 o Public draft of ResourceSync Archives specification released

• September 2013o Core specification on its way to become an ANSI standard

• November 2013o Internal draft of ResourceSync Notification specification

• January 2014o Public draft of ResourceSync Notification specification

• Mid 2014o Core specification becomes ANSI/NISO standard

8

Page 9: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Pointers

• Specification

http://www.openarchives.org/rs/http://www.openarchives.org/rs/resourcesynchttp://www.openarchives.org/rs/archives

• List for public comment

https://groups.google.com/d/forum/resourcesync

• Client and simulator code

http://github.org/resync/resynchttp://github.org/resync/simulator

9

Page 10: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

10

Page 11: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

11

Page 12: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Synchronize What?

• Web resourceso things with a URI that can be dereferenced

• Focus on needs of research communication and cultural heritage organizations

o but aim for generality

12

Page 13: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Synchronize What?

• Small websites/repositories (a few resources) to large repositories/datasets/linked data collections (many millions of resources)

13

sync

sync

Page 14: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Synchronize What?

14

• Low change frequency (weeks/months) to high change frequency (seconds)

• Synchronization latency and accuracy needs may vary

sync

sync

sync

Page 15: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Why?

… because lots of projects and services are doing synchronization but have to resort to ad-hoc, case by case, approaches!

• Project team involved with projects that need this

• Experience with OAI-PMH: widely used in repos buto XML metadata onlyo Attempts at synchronizing actual content via OAI-PMH

(complex object formats, dc:identifier) not successful.o Web technology has moved on since 1999

• Devise a shared solution for data, metadata, linked data?

15

Page 16: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync Problem

16

• Consideration:• Source (server) A has resources that change over time: they

get created, modified, deleted• Destination (servers) X, Y, and Z leverage (some)

resources of Source A.• Problem:

• Destinations want to keep in step with the resource changes at Source A: resource synchronization.

• Goal:• Design an approach for resource synchronization aligned

with the Web Architecture that has a fair chance of adoption by different communities.• The approach must scale better than recurrent HTTP

HEAD/GET on resources.

Page 17: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Source: Core Synchronization Capabilities

1. Describing content – publish a list of resources available for synchronization to enable Destinations to perform an initial load or catch-up with a Source

2. Packaging content – bundle resources to enable bulk download by destinations

3. Describing changes – publish a list of resource changes to enable destinations to stay synchronized and decrease latency

4. Packaging changes – bundle resource changes for bulk download by destinations

17

PULL

Page 18: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

18

To reduce synchronization latency and to optimize the synchronization process the Source can support:

• 1. Change Notification• Notifies about changes to particular resources• e.g., resource A has been updated | created | deleted

• 2. Framework Notification• Notifies about changes to capabilities i.e., their documents• e.g., a Change List has been updated | created | deleted

Source: Notifications Capabilities

PUSH

Page 19: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Source: Synchronization Features

1. Discovery of capabilities – support Destinations in discovering all offered capabilities

o Applies to PULL, PUSH, capabilities

2. Linking to related resources – provide links from resources subject to synchronization to related resources

o Applies to PULL, PUSH capabilities

19

Page 20: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Destination: Synchronization Needs

1. Baseline synchronization – A destination must be able to perform an initial load or catch-up with a source

- avoid out-of-band setup

2. Incremental synchronization – A destination must have some way to keep up-to-date with changes at a source

- subject to some latency; minimal: create/update/delete- allow to catch-up after destination has been offline

3. Audit – A destination should be able to determine whether it is synchronized with a source

- regarding coverage and accuracy

20

Page 21: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

21

Page 22: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Use Case 1: arXiv Mirroring and Data Sharing

• Repository of scholarly articles in physics, mathematics, computer science, etc.

• > 850k articles• approx. 1.5 revisions per article on

average• approx. 75k new articles per year• Each article has full-text and separate

metadata record• approx. 3.8M resources

22

Page 23: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Use Case 1: arXiv Mirroring and Data Sharing

• 2,700 updates dailyo at 8pm ESTo Currently using homebrew mirroring

solution (running with minor modifications since 1994!)

o occasional rsync (file system-specific, auth issues)

23

Page 24: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Use Case 1: arXiv

Mirroring / Data Sharing

• GOAL: Keep mirror sites synchronized with daily changes

• WANT:o high consistencyo moderate latencyo robustness to global network outages (low admin effort)o ability to verify sync status in case of questionso low admin effort (i.e. standard approach, standard tools)o reasonable consistency, latency, efficiency

24

Page 25: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Use Case 2: DBpedia Live Duplication

• Average of 2 updates per second• Low latency desirable => need for a push technology

25

Page 26: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

26

Page 27: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Source Capability 1: Describing Content

In order to advertise the resources that a source wants destinations to know about, it may describe them:

o Publish a Resource List, a list of resource URIs and possibly associated metadata- Destination GETs the Resource List- Destination GETs listed resources by their URI

o A Resource List describes the state of a set of resources at one point in time (snapshot)

27

Page 28: NISO ResourceSync Training Session

28

Page 29: NISO ResourceSync Training Session

29

Page 30: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Source Capability 2: Packaging Content

By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms:

o Publish a Resource Dump, a document that points to packages of resource representations and necessary metadata- Destination GETs the package- Destination unpacks the package- ZIP format supported

o A Resource Dump and the packages it points to reflect the state of a set of resources at one point in time (snapshot)

30

Page 31: NISO ResourceSync Training Session

31

Page 32: NISO ResourceSync Training Session

32

Page 33: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Source Capability 3: Describing Changes

In order to achieve lower latency and/or greater efficiency, a source may communicate about changes to its resources:

o Publish a Change List, a list of recent change events (created, updated, deleted resource)- Destination acts upon change events, e.g. GETs

created/updated resources, removes deleted resources.o A Change List pertains to resources that changed in a

temporal interval with a start- and an end-date- If a resource changed more than once, it will be listed

more than once

33

Page 34: NISO ResourceSync Training Session

34

Page 35: NISO ResourceSync Training Session

35

Page 36: NISO ResourceSync Training Session

36

Page 37: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Source Capability 4: Packaging Changes

In order to reduce the number of requests to obtain resource changes, a source may provide packaged bitstreams for changed resources:

o Publish a Change Dump, a document that points to packages containing bitstreams of recently changed resource and necessary metadata - Destination GETs the package- Destination unpacks the package- ZIP format supported

o A Change Dump and its packages pertain to resources that changed in a temporal interval with a start- and an end-date- If a resource changed more than once, it will be included

more than once

37

Page 38: NISO ResourceSync Training Session

38

Page 39: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Destination: Key Processes

39

Page 40: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

40

Page 41: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

So Many Choices

41

XMPP

AtomPub

SDShare

RSS

Atom

PubSubHubbub

Sitemap

XMPP

rsync

OAI-PMH

WebDAV Col. Syn.

OAI-ORE

DSNotify

RDFsync

Crawl

Push

Pull

SWORD

SPARQLpush

Page 42: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

So Many Choices

42

XMPP

AtomPub

SDShare

RSS

Atom

PubSubHubbub

Sitemap

XMPP

rsync

OAI-PMH

WebDAV Col. Syn.

OAI-ORE

DSNotify

RDFsync

Crawl

Push

Pull

SWORD

SPARQLpush

Page 43: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

43

Page 44: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Sitemap Format

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9”>

<url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> </url>

<url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T14:00:00Z</lastmod> </url> …</urlset>

44

Page 45: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync Sitemap Extensions

<urlset xmlns=http://www.sitemaps.org/schemas/sitemap/0.9 xmlns:rs="http://www.openarchives.org/rs/terms/”> <rs:ln …/> <rs:md …/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:ln …/> <rs:md …/> </url> <url> … </url></urlset>

45

Page 46: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Related Resource Metadata Summary

• Attributes of the <rs:ln> element; c.f. resource metadata + pri

Element/Attribute Description Defined by

<rs:ln>   ResourceSync

encoding HTTP Content-Encoding header value RFC2616

hash One or more content digests (md5, sha-1, sha-256) Atom Link Ext.

href Related resource URI (identity) RFC4287

length HTTP Content-Length header value RFC4287

modified Timestamp of last change (c.f. <lastmod>) Atom Link Ext.

path Path in ZIP package (Dump Manifests only) ResourceSync

pri Priority of link RFC6249

rel Relation - IANA registered or URI RFC4287

type HTTP Content-Type header value RFC4287

Page 47: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Resource Metadata SummaryElement/Attribute Description Defined by

<loc> Resource URI (identity) sitemaps

<lastmod> Timestamp of last change sitemaps

<changefreq> Expected update frequency sitemaps

<rs:md>   ResourceSync

change Change type (Change List & Change Dump Manifest only) ResourceSync

encodingHTTP Content-Encoding header value RFC2616

hashOne or more content digests (md5, sha-1, sha-256)

Atom Link Ext.

lengthHTTP Content-Length header value RFC4287

pathPath in ZIP package (Dump Manifests only)

ResourceSync

typeHTTP Content-Type header value RFC4287

Page 48: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Link Relation Summary

Relation Use in ResourceSync Defined in

rel="alternate" Link from generic to specific URI HTML 5

rel="canonical" Link from specific to generic URI RFC6596

rel="collection" Resource is member of collection RFC6573

rel="contents" Link from dump to manifest HTML4

rel="describedby" Has metadata Protocol for Web Description Resources (POWDER): Description Resources

rel="describes" Is metadata for The 'describes' Link Relation Type

rel="duplicate" Mirror or alternative copy RFC6249

rel=".../rs/terms/patch"A patch -- efficient change information This specification

rel="memento" Link to time-specific URI Memento Internet Draft

rel="timegate" Link to timegate Memento Internet Draft

rel="via" Provenance chain, came from RFC4287

Page 49: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Resource List

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" at="2013-01-03T09:00:00Z” completed="2013-01-03T09:01:00Z” /> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url></urlset>

49

Page 50: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Resource List

• Describe Source’s resources that are subject to synchronization• At one point in time (snapshot)• Creation can take some time – duration can be conveyed

• Typical Destination use: Baseline Synchronization, Audit

• Each URI typically listed only once• Might be expensive to generate• Destinations use @at to determine freshness

• [@at, @completed] – interval of uncertainty• Destination issues GETs against URIs to obtain resources• Very similar to current Sitemaps

50

Page 51: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Resource Dump

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”resourcedump" at="2013-01-02T09:00:00Z”/> <url> <loc>http://example.com/resourcedump_part1.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length=”97553" type=”application/zip"/> <rs:ln rel=”contents” href="http://example.com/resourcedump_manifest-part1.xml" type=”application/xml"/> </url> <url> <loc>http://example.com/resourcedump_part2.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod></url></urlset>

51

Page 52: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Resource Dump Manifest

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”resourcedump-manifest" at="2013-01-02T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md type="text/html" path=”/resources/res1"/> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md type=”application/pdf” path=”/resources/res2"/> </url></urlset>

52

Page 53: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Resource Dump

• A Resource Dump points to packages (ZIP files) that contain representations of the Source’s resources• At one point in time (snapshot)

• Resource Dump is mandatory, even if there is only one ZIP file• ZIP package contains manifest, listing contained bitstreams• Typical Destination use: Baseline Synchronization, bulk

download

• Each URI typically listed only once• Might be expensive to generate• Destinations use @at to determine freshness

• [@at, @completed] – interval of uncertainty• GETs against individual URIs from Resource List achieves the

same result (ignoring varying freshness)

53

Page 54: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Change List

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url></urlset>

54

Page 55: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Change List

• A Change List pertains to a Source’s resources that changed• Changes that occurred during a temporal interval with start-

and end-date• Typical Destination use: Incremental Synchronization, Audit

• Changes are listed in chronological order• Multiple changes to one resource results in the resource being

listed multiple times, once per change• Source determines duration of temporal interval• Destinations use @from and @until to determine freshness• Destinations issue GETs against URIs to obtain changed

resources

55

Page 56: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Discovery of Capabilities

Requirements:• Need to discover capabilities, i.e. Resource List, Resource

Dump, Change List, Change Dump, Archives, Notification channels

• Need to know the type of capability each document represents.

Approach:• The Source publishes a Capability List that enumerates the

capabilities it supports.• By pointing at Resource List, Change List, Resource Dump,

etc. using appropriate relation types, e.g. “resourcelist”, “changelist”, “resourcedump” etc.

56

http://www.openarchives.org/rs/resourcesync#CapabilityList

Page 57: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

57

Discovery of Capabilities

Page 58: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Capability List

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”capabilitylist”/> <url> <loc>http://example.com/dataset1/resourcelist.xml</loc> <rs:md capability=”resourcelist”/> </url> <url> <loc>http://example.com/dataset1/changelist.xml</loc> <rs:md capability=”changelist”/> </url> <url> <loc>http://example.com/dataset1/resourcedump.xml</loc> <rs:md capability=”resourcedump”/> </url></urlset>

58

Page 59: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

59

Requirements:• Need to discover a Capability List

Approaches:• Introduce a link in the HTTP Link header of a resources that is

subject to synchronization, pointing at the Capability List with the relation type “resourcesync”

• Introduce a link from an HTML document that is subject to synchronization (<head> section), pointing at the Capability List with the relation type “resourcesync”

• Link from a Resource List, etc. to the Capability List with the relation type “up”

Link header on example.com/res1.pdf

Link: <example.com/dataset1/capabilitylist.xml>;rel=“resourcesync”

Discovery of Capability Lists

Page 60: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

60

• Resource Lists are (enhanced) Sitemaps• Sitemaps can be discovered via robots.txt

• Ergo, Resource Lists should be discoverable via robots.txt

User-agent: *Disallow: /cgi-bin/Disallow: /tmp/Sitemap: http://example.com/dataset1/resourcelist.xml

Discovery via robots.txt

Page 61: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

61

Framework Structure

Page 62: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Motivation for Notifications

62

• Reduce synchronization latency by having the Source push out resource change information• To avoid continuous pull of Change Lists by Destinations

• Share information about changes to the Source’s ResourceSync implementation, e.g. announcement of new Resource List, new Capability List, etc.• To avoid continuous polling of e.g. Resource Lists,

ResourceSync Description

Page 63: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

63

• 1. Change Notification• Notifies about changes to particular resources• e.g., resource A has been updated | created | deleted

• 2. Framework Notification• Notifies about changes to capabilities i.e., their documents• e.g., a Change List has been updated | created | deleted• Also for Capability Lists and Source Description

Source: Notification Capabilities

PUSH

Page 64: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

64

• Notification sent via channels• Resource Notification: one channel per set of resources• Framework Notification: one channel per set of resources

• Sent on level of capability document, not on index-level• Notifications about changes to Source Description sent on all

Framework Notification channels

• Payload for notifications: <urlset> documents

• Transport protocol for notifications under discussion:• PubSubHubbub -

https://pubsubhubbub.googlecode.com/git/pubsubhubbub-core-0.4.html - current choice

• WebSockets -http://tools.ietf.org/html/rfc6455 – may be added later

Notification Channels

Page 65: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Change Notification Payload

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"><rs:ln rel="up" href="http://example.com/dataset1/capabilitylist.xml"/><url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T09:07:00Z</lastmod> <rs:md change=”updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url></urlset>

65

Page 66: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

66

Page 67: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

DSpace support for metadata harvesting use case

67

Metadata FormatResourceSync webapp Item handle

http://mydspace.edu/dspace-rs/resource/123456789/7/qdc

DSpace Module:https://github.com/CottageLabs/DSpaceResourceSyncPHP client:https://github.com/stuartlewis/resync-php

Page 68: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013 68

ResourceSync @ arXiv

• Use ResourceSync for both mirroring and public data access

o efficient updateso ability to do periodic auditso public synchronization capabilityo reduce admin burden

• Start with metadata + source for mirroring use case (doing experiments now)

• Open Access use cases require processed PDF also

Page 69: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Getting a copy of arXiv

It might be as easy as:

69

(of course, you probably have to wait a while but it is nice to know ResourceSync is stateless so one can efficiently restart)

Page 70: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

Python Library and Client

• Aim to provide library code implementing all ResourceSync facilities for use in both source and destination implementations

o Designed for python 2.6 (RHEL6) and 2.7• Client (resync) supports many destination operations, inspired

by the common Unix rsync program• Client also supports some operations that might be useful in a

source, such as generation of static Resource Lists, or periodic Change Lists (used in arXiv experiments)

• Explorer (resync-explorer) intended to allow easy inspection of a source’s resource sets and capabilities

• Developed since ResourceSync v0.5, updated for v0.9.1

http://github.org/resync/resync

On pypi: “easy_install resync”

Page 71: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync Source Simulator

• Python code using Tornado server• Provides random set of resources of different sizes updated at a

particular rate• Very useful for testing Destination code

http://github.com/resync/simulator

Page 72: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

72

Page 73: NISO ResourceSync Training Session

ResourceSync WebinarDecember 3 2013

ResourceSync:A Web-Based

Resource SynchronizationFramework

ResourceSync is funded by The Sloan Foundation & JISC

#resourcesync

73

Page 74: NISO ResourceSync Training Session

We look forward to seeing you at a future NISO training event.

THANK YOU