Introduction to SDshare

41
1 An introduction to SDshare 2011-03-15 Lars Marius Garshol, <[email protected]> http://twitter.com/larsga

description

An introduction to the SDshare protocol for replicating/syndicating Atom feeds of changes in Topic Maps or RDF stores

Transcript of Introduction to SDshare

Page 1: Introduction to SDshare

1

An introduction to SDshare

2011-03-15Lars Marius Garshol, <[email protected]>http://twitter.com/larsga

Page 2: Introduction to SDshare

2

Overview of SDshare

Page 3: Introduction to SDshare

3

SDshare

• A protocol for tracking changes in a semantic datastore– essentially allows clients to keep track of all

changes, for replication purposes

• Supports both Topic Maps and RDF• Based on Atom• Highly RESTful• A CEN specification

Page 4: Introduction to SDshare

4

Basic workings

Server Client

Fragment

Server publishes fragments representing changes in datastore

Client pulls these in, updateslocal copy of dataset

Fragment

Fragment

Fragment

There is, however, more to it than just this

Page 5: Introduction to SDshare

5

What more is needed?

• Support for more than one dataset per server– this means: more than one fragment stream

• How do clients get started?– a change feed is nice once you've got a copy

of the dataset, but how do you get a copy?

• What if you miss out on some changes and need to restart?– must be a way to reset local copy

• The protocol supports all this

Page 6: Introduction to SDshare

6

Two new concepts

• Collection– essentially a dataset inside the server– exact meaning is not defined in spec– will generally be a topic map (TMs) or a graph

(RDF)

• Snapshot– a complete copy of a collection at some point

in time

Page 7: Introduction to SDshare

7

Feeds in the server

Overview feed

Collection feeds

Fragment feed

Snapshot feed

Fragment

Snapshot

Page 8: Introduction to SDshare

8

An overview feed<feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare"> <title>SDshare feeds from localhost</title> <updated>2011-03-15T18:55:38Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>http://localhost:8080/sdshare/</id> <link href="http://localhost:8080/sdshare/"></link> <entry> <title>beer.xtm</title> <updated>2011-03-15T18:55:38Z</updated> <id>http://localhost:8080/sdshare/beer.xtm</id> <link href="collection.jsp?topicmap=beer.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link> </entry> <entry> <title>metadata.xtm</title> <updated>2011-03-15T18:55:38Z</updated> <id>http://localhost:8080/sdshare/metadata.xtm</id> <link href="collection.jsp?topicmap=metadata.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link> </entry></feed>

Page 9: Introduction to SDshare

9

The snapshot feed

• A list of links to snapshots of the entire dataset (collection)

• The spec doesn't say anything about how and when snapshots are produced

• It's up to implementations to decide how they want to do this

• It makes sense, though, to always have a snapshot for the current state of the dataset

Page 10: Introduction to SDshare

10

Example snapshot feed

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">

<title>Snapshots feed for beer.xtm</title> <updated>2011-03-15T19:12:34Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshots</id> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/

beer.xtm</sdshare:ServerSrcLocatorPrefix> <entry> <title>Snapshot of beer.xtm</title> <updated>2011-03-15T19:12:34Z</updated> <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshot/0</id> <link href="snapshot.jsp?topicmap=beer.xtm" type="application/x-

tm+xml; version=1.0" rel="alternate"></link> </entry></feed>

Page 11: Introduction to SDshare

11

The fragment feed

• For every change in the topic map, there is one fragment– the granularity of changes is not defined by

the spec– it could be per transaction, or per topic

changed

• The fragment is basically a link to a URL that produces a part of the dataset

Page 12: Introduction to SDshare

12

An example fragment feed

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">

<title>Fragments feed for beer.xtm</title> <updated>2011-03-15T19:21:20Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>file:/Users/larsga/data/topicmaps/beer.xtm/fragments</id> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</

sdshare:ServerSrcLocatorPrefix> <entry> <title>Topic with object ID 4521</title> <updated>2011-03-15T19:20:03Z</updated> <id>file:/Users/larsga/data/topicmaps/beer.xtm/4521/1300216803730</id> <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=rdf"

type="application/rdf+xml" rel="alternate"/> <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=xtm"

type="application/x-tm+xml; version=1.0" rel="alternate"/> <sdshare:TopicSI>http://psi.example.org/12</sdshare:TopicSI> </entry></feed>

Page 13: Introduction to SDshare

13

What is a fragment?

• Essentially, a piece of a topic map– that is, a complete XTM file that contains only

part of a bigger topic map– typically, most of the topic references will

point to topics not in the XTM file

• Downloading more fragments will yield a bigger subset of the topic map– the automatic merging in Topic Maps will

cause the fragments to match up

• Exactly the same applies in RDF

Page 14: Introduction to SDshare

14

An example fragment<topicMap xmlns="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink"> <topic id="id4521"> <instanceOf> <subjectIndicatorRef xlink:href="http://psi.garshol.priv.no/beer/pub"></subjectIndicatorRef> </instanceOf> <subjectIdentity> <subjectIndicatorRef xlink:href="http://psi.example.org/12"></subjectIndicatorRef> <topicRef xlink:href="file:/Users/larsga/data/topicmaps/beer.xtm#id2662"></topicRef> </subjectIdentity> <baseName> <baseNameString>Amundsen Bryggeri og Spiseri</baseNameString> </baseName> <occurrence> <instanceOf> <subjectIndicatorRef

xlink:href="http://psi.ontopia.net/ontology/latitude"></subjectIndicatorRef> </instanceOf> <resourceData>59.913816</resourceData> </occurrence> ... </topic> ...</topicMap>

Page 15: Introduction to SDshare

15

Applying a fragment

• The feed contains a URI prefix– this is used to create item identifiers tagging

statements with their origin

• For each TopicSI find that topic, then– for each statement, remove matching item

identifier– if statement now has no item identifiers,

delete it

• Merge in the received fragment– then tag all statements in it with matching

item identifier

Page 16: Introduction to SDshare

16

Properties of the protocol

• HATEOAS– uses hypertext principles– only endpoint is that of the overview feed– all other URLs available via hypertext

• Applying a fragment is idempotent– ie: result is the same, no matter how many times

you do it

• Loose binding– very loose binding between server and client

• Supports federation of data– client can safely merge data from different sources

Page 17: Introduction to SDshare

17

SDshare push

• In normal SDshare data receivers connect to the data source– basically, they poll the source with GET requests

• However, the receiver is not always allowed to make connections to the source– SDshare push is designed for this situation

• Solution is a slightly modified protocol– source POSTs Atom feeds with inline fragments to

receipient– this flips the server/client relationship

• Not part of the spec; unofficial Ontopia extension

Page 18: Introduction to SDshare

18

Uses of SDshare

Page 19: Introduction to SDshare

19

Example use case #1

Portal

Ontopia DB2TM

Frontend

JDBC

Database

Page 20: Introduction to SDshare

20

Example use case #1

Portal

OntopiaDB2TM

Frontend

Database

ESB

Service #1

Service #3

SDshare

OntopiaSDshare

Page 21: Introduction to SDshare

21

NRK/Skole today

Editorial serverMediaDBDB2TM

DB server 1

JDBC

DB server 2

Prod #1 Prod #2

JDBCnrk-grep.xtm

Database

Server

Production environment

Export

Import

Firewall

Page 22: Introduction to SDshare

22

NRK/Skole with SDshare push

Editorial serverMediaDBDB2TM

DB server 1

JDBC

DB server 2

Prod #1 Prod #2

JDBC

Database

Server

Production environment

Firewall

SDsharePUSH

Page 23: Introduction to SDshare

23

Hafslund

ERP GIS CRM ...

UMIC

Archive

Search engine

Page 24: Introduction to SDshare

24

Hafslund architecture

• The beauty of this architecture is that SDshare insulates the different systems from one another

• More input systems can be added without hassle

• Any component can be replaced without affecting the others

• Essentially, a plug-and-play architecture

Page 25: Introduction to SDshare

25

A Hafslund problem

• There are too many duplicates in the data– duplicates within each system– also duplication across systems

• How to get rid of the duplicates?– unrealistic to expect cleanup across systems

• So, we build a deduplicator– and plug it in...

Page 26: Introduction to SDshare

26

DuKe plugged in

ERP GIS CRM ...

UMIC

Archive

Search engine Dupe Killer

Page 27: Introduction to SDshare

27

Implementations

Page 28: Introduction to SDshare

28

Current implementations

• Web3– both client and server

• Ontopia– ditto + SDshare push

• Isidorus– don't know

• Atomico– server framework only; no actual

implementation

Page 29: Introduction to SDshare

29

Ontopia SDshare server

• Event tracker– taps into event API where it listens for

changes– maintains in-memory list of changes– writes all changes to disk as well– removes duplicate changes and discards old

changes

• Web application based on tracker– JSP pages producing feeds and fragments– one fragment per changed topic, sorted by

time– only a single snapshot of current state of TM

Page 30: Introduction to SDshare

30

Ontopia SDshare client

• Web UI for mgmt• Pluggable frontends• Pluggable backends• Combine at will• Frontends

– Ontopia: event listener– SDshare: polls Atom feeds

• Backends– Ontopia: applies changes to Ontopia locally– SPARQL: writes changes to RDF repo via SPARUL– push: pushes changes over SDshare push

SDshare client

Web UI

Ontopia events

Core logic

Ontopia backend

SPARQL Update

SDshare push

Page 31: Introduction to SDshare

31

Web UI to client

Page 32: Introduction to SDshare

32

Problems with the spec

Page 33: Introduction to SDshare

33

What if many fragments?

• The size of the fragments feed grows enormous– expensive if polled frequently

• Paging might be one solution– basically, end of feed contains pointer to more

• "since" parameter might be another– allows client to say "only show me changes

since ..."

• Probably need both in practice

http://projects.topicmapslab.de/issues/3675

Page 34: Introduction to SDshare

34

Ordering of fragments

• Should the spec require that fragments be ordered?– not really necessary if all fragment URIs

return current state (instead of state at time fragment entry was created)

Page 35: Introduction to SDshare

35

RDF fragment algorithm

• The one given in the spec makes no sense

• Relies on Topic Maps constructs not found in RDF

• Really no way to make use of it

http://projects.topicmapslab.de/issues/4013

Page 36: Introduction to SDshare

36

Our interpretation

• Server prefix is URI of RDF named graph

• Fragment algorithm therefore becomes– delete all statements about changed resources– then add all statements in fragment

• Means each source gets a different graph

Page 37: Introduction to SDshare

37

TopicSL/TopicII

• Currently, topics can only be identified by subject identifier– but not all topics have one

• Solution– add elements for subject locators and item

identifiers

http://projects.topicmapslab.de/issues/3667

Page 38: Introduction to SDshare

38

Paging of snapshots?

• What if the snapshot is vast?– clients probably won't be able to download

and store the entire thing in one go

• Could we page the snapshot into fragments?

• Or is there some other solution?

http://projects.topicmapslab.de/issues/4307

Page 39: Introduction to SDshare

39

How to tell if the fragment feed is complete?

• When reading the fragment feed, how can we tell if there are older fragments that are discarded?– and how can we tell which fragment was the newest to

be thrown away?

• Without this there's no way to know for certain if you've lost fragments if the feed stops before the newest fragment you've got– and if you're using since it always will stop before the

newest fragment...

• Make new sdshare:foo element on feed level for this information?

http://projects.topicmapslab.de/issues/4308

Page 40: Introduction to SDshare

40

Blank nodes are not supported

• What to do?

http://projects.topicmapslab.de/issues/4306

Page 41: Introduction to SDshare

41

More information

• SDshare spec– http://www.egovpt.org/fg/CWA_Part_1b

• SDshare issue tracker– http://projects.topicmapslab.de/projects/

sdshare

• SDshare use cases– http://www.garshol.priv.no/blog/215.html