Introduction to SDshare

Post on 15-Jan-2015

1.521 views 1 download

Tags:

description

An introduction to the SDshare protocol for replicating/syndicating Atom feeds of changes in Topic Maps or RDF stores

Transcript of Introduction to SDshare

1

An introduction to SDshare

2011-03-15Lars Marius Garshol, <larsga@bouvet.no>http://twitter.com/larsga

2

Overview of SDshare

3

SDshare

• A protocol for tracking changes in a semantic datastore– essentially allows clients to keep track of all

changes, for replication purposes

• Supports both Topic Maps and RDF• Based on Atom• Highly RESTful• A CEN specification

4

Basic workings

Server Client

Fragment

Server publishes fragments representing changes in datastore

Client pulls these in, updateslocal copy of dataset

Fragment

Fragment

Fragment

There is, however, more to it than just this

5

What more is needed?

• Support for more than one dataset per server– this means: more than one fragment stream

• How do clients get started?– a change feed is nice once you've got a copy

of the dataset, but how do you get a copy?

• What if you miss out on some changes and need to restart?– must be a way to reset local copy

• The protocol supports all this

6

Two new concepts

• Collection– essentially a dataset inside the server– exact meaning is not defined in spec– will generally be a topic map (TMs) or a graph

(RDF)

• Snapshot– a complete copy of a collection at some point

in time

7

Feeds in the server

Overview feed

Collection feeds

Fragment feed

Snapshot feed

Fragment

Snapshot

8

An overview feed<feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare"> <title>SDshare feeds from localhost</title> <updated>2011-03-15T18:55:38Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>http://localhost:8080/sdshare/</id> <link href="http://localhost:8080/sdshare/"></link> <entry> <title>beer.xtm</title> <updated>2011-03-15T18:55:38Z</updated> <id>http://localhost:8080/sdshare/beer.xtm</id> <link href="collection.jsp?topicmap=beer.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link> </entry> <entry> <title>metadata.xtm</title> <updated>2011-03-15T18:55:38Z</updated> <id>http://localhost:8080/sdshare/metadata.xtm</id> <link href="collection.jsp?topicmap=metadata.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link> </entry></feed>

9

The snapshot feed

• A list of links to snapshots of the entire dataset (collection)

• The spec doesn't say anything about how and when snapshots are produced

• It's up to implementations to decide how they want to do this

• It makes sense, though, to always have a snapshot for the current state of the dataset

10

Example snapshot feed

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">

<title>Snapshots feed for beer.xtm</title> <updated>2011-03-15T19:12:34Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshots</id> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/

beer.xtm</sdshare:ServerSrcLocatorPrefix> <entry> <title>Snapshot of beer.xtm</title> <updated>2011-03-15T19:12:34Z</updated> <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshot/0</id> <link href="snapshot.jsp?topicmap=beer.xtm" type="application/x-

tm+xml; version=1.0" rel="alternate"></link> </entry></feed>

11

The fragment feed

• For every change in the topic map, there is one fragment– the granularity of changes is not defined by

the spec– it could be per transaction, or per topic

changed

• The fragment is basically a link to a URL that produces a part of the dataset

12

An example fragment feed

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">

<title>Fragments feed for beer.xtm</title> <updated>2011-03-15T19:21:20Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>file:/Users/larsga/data/topicmaps/beer.xtm/fragments</id> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</

sdshare:ServerSrcLocatorPrefix> <entry> <title>Topic with object ID 4521</title> <updated>2011-03-15T19:20:03Z</updated> <id>file:/Users/larsga/data/topicmaps/beer.xtm/4521/1300216803730</id> <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=rdf"

type="application/rdf+xml" rel="alternate"/> <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=xtm"

type="application/x-tm+xml; version=1.0" rel="alternate"/> <sdshare:TopicSI>http://psi.example.org/12</sdshare:TopicSI> </entry></feed>

13

What is a fragment?

• Essentially, a piece of a topic map– that is, a complete XTM file that contains only

part of a bigger topic map– typically, most of the topic references will

point to topics not in the XTM file

• Downloading more fragments will yield a bigger subset of the topic map– the automatic merging in Topic Maps will

cause the fragments to match up

• Exactly the same applies in RDF

14

An example fragment<topicMap xmlns="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink"> <topic id="id4521"> <instanceOf> <subjectIndicatorRef xlink:href="http://psi.garshol.priv.no/beer/pub"></subjectIndicatorRef> </instanceOf> <subjectIdentity> <subjectIndicatorRef xlink:href="http://psi.example.org/12"></subjectIndicatorRef> <topicRef xlink:href="file:/Users/larsga/data/topicmaps/beer.xtm#id2662"></topicRef> </subjectIdentity> <baseName> <baseNameString>Amundsen Bryggeri og Spiseri</baseNameString> </baseName> <occurrence> <instanceOf> <subjectIndicatorRef

xlink:href="http://psi.ontopia.net/ontology/latitude"></subjectIndicatorRef> </instanceOf> <resourceData>59.913816</resourceData> </occurrence> ... </topic> ...</topicMap>

15

Applying a fragment

• The feed contains a URI prefix– this is used to create item identifiers tagging

statements with their origin

• For each TopicSI find that topic, then– for each statement, remove matching item

identifier– if statement now has no item identifiers,

delete it

• Merge in the received fragment– then tag all statements in it with matching

item identifier

16

Properties of the protocol

• HATEOAS– uses hypertext principles– only endpoint is that of the overview feed– all other URLs available via hypertext

• Applying a fragment is idempotent– ie: result is the same, no matter how many times

you do it

• Loose binding– very loose binding between server and client

• Supports federation of data– client can safely merge data from different sources

17

SDshare push

• In normal SDshare data receivers connect to the data source– basically, they poll the source with GET requests

• However, the receiver is not always allowed to make connections to the source– SDshare push is designed for this situation

• Solution is a slightly modified protocol– source POSTs Atom feeds with inline fragments to

receipient– this flips the server/client relationship

• Not part of the spec; unofficial Ontopia extension

18

Uses of SDshare

19

Example use case #1

Portal

Ontopia DB2TM

Frontend

JDBC

Database

20

Example use case #1

Portal

OntopiaDB2TM

Frontend

Database

ESB

Service #1

Service #3

SDshare

OntopiaSDshare

21

NRK/Skole today

Editorial serverMediaDBDB2TM

DB server 1

JDBC

DB server 2

Prod #1 Prod #2

JDBCnrk-grep.xtm

Database

Server

Production environment

Export

Import

Firewall

22

NRK/Skole with SDshare push

Editorial serverMediaDBDB2TM

DB server 1

JDBC

DB server 2

Prod #1 Prod #2

JDBC

Database

Server

Production environment

Firewall

SDsharePUSH

23

Hafslund

ERP GIS CRM ...

UMIC

Archive

Search engine

24

Hafslund architecture

• The beauty of this architecture is that SDshare insulates the different systems from one another

• More input systems can be added without hassle

• Any component can be replaced without affecting the others

• Essentially, a plug-and-play architecture

25

A Hafslund problem

• There are too many duplicates in the data– duplicates within each system– also duplication across systems

• How to get rid of the duplicates?– unrealistic to expect cleanup across systems

• So, we build a deduplicator– and plug it in...

26

DuKe plugged in

ERP GIS CRM ...

UMIC

Archive

Search engine Dupe Killer

27

Implementations

28

Current implementations

• Web3– both client and server

• Ontopia– ditto + SDshare push

• Isidorus– don't know

• Atomico– server framework only; no actual

implementation

29

Ontopia SDshare server

• Event tracker– taps into event API where it listens for

changes– maintains in-memory list of changes– writes all changes to disk as well– removes duplicate changes and discards old

changes

• Web application based on tracker– JSP pages producing feeds and fragments– one fragment per changed topic, sorted by

time– only a single snapshot of current state of TM

30

Ontopia SDshare client

• Web UI for mgmt• Pluggable frontends• Pluggable backends• Combine at will• Frontends

– Ontopia: event listener– SDshare: polls Atom feeds

• Backends– Ontopia: applies changes to Ontopia locally– SPARQL: writes changes to RDF repo via SPARUL– push: pushes changes over SDshare push

SDshare client

Web UI

Ontopia events

Core logic

Ontopia backend

SPARQL Update

SDshare push

31

Web UI to client

32

Problems with the spec

33

What if many fragments?

• The size of the fragments feed grows enormous– expensive if polled frequently

• Paging might be one solution– basically, end of feed contains pointer to more

• "since" parameter might be another– allows client to say "only show me changes

since ..."

• Probably need both in practice

http://projects.topicmapslab.de/issues/3675

34

Ordering of fragments

• Should the spec require that fragments be ordered?– not really necessary if all fragment URIs

return current state (instead of state at time fragment entry was created)

35

RDF fragment algorithm

• The one given in the spec makes no sense

• Relies on Topic Maps constructs not found in RDF

• Really no way to make use of it

http://projects.topicmapslab.de/issues/4013

36

Our interpretation

• Server prefix is URI of RDF named graph

• Fragment algorithm therefore becomes– delete all statements about changed resources– then add all statements in fragment

• Means each source gets a different graph

37

TopicSL/TopicII

• Currently, topics can only be identified by subject identifier– but not all topics have one

• Solution– add elements for subject locators and item

identifiers

http://projects.topicmapslab.de/issues/3667

38

Paging of snapshots?

• What if the snapshot is vast?– clients probably won't be able to download

and store the entire thing in one go

• Could we page the snapshot into fragments?

• Or is there some other solution?

http://projects.topicmapslab.de/issues/4307

39

How to tell if the fragment feed is complete?

• When reading the fragment feed, how can we tell if there are older fragments that are discarded?– and how can we tell which fragment was the newest to

be thrown away?

• Without this there's no way to know for certain if you've lost fragments if the feed stops before the newest fragment you've got– and if you're using since it always will stop before the

newest fragment...

• Make new sdshare:foo element on feed level for this information?

http://projects.topicmapslab.de/issues/4308

40

Blank nodes are not supported

• What to do?

http://projects.topicmapslab.de/issues/4306

41

More information

• SDshare spec– http://www.egovpt.org/fg/CWA_Part_1b

• SDshare issue tracker– http://projects.topicmapslab.de/projects/

sdshare

• SDshare use cases– http://www.garshol.priv.no/blog/215.html