Introduction to SDshare
-
Upload
lars-marius-garshol -
Category
Technology
-
view
1.521 -
download
1
description
Transcript of Introduction to SDshare
1
An introduction to SDshare
2011-03-15Lars Marius Garshol, <[email protected]>http://twitter.com/larsga
2
Overview of SDshare
3
SDshare
• A protocol for tracking changes in a semantic datastore– essentially allows clients to keep track of all
changes, for replication purposes
• Supports both Topic Maps and RDF• Based on Atom• Highly RESTful• A CEN specification
4
Basic workings
Server Client
Fragment
Server publishes fragments representing changes in datastore
Client pulls these in, updateslocal copy of dataset
Fragment
Fragment
Fragment
There is, however, more to it than just this
5
What more is needed?
• Support for more than one dataset per server– this means: more than one fragment stream
• How do clients get started?– a change feed is nice once you've got a copy
of the dataset, but how do you get a copy?
• What if you miss out on some changes and need to restart?– must be a way to reset local copy
• The protocol supports all this
6
Two new concepts
• Collection– essentially a dataset inside the server– exact meaning is not defined in spec– will generally be a topic map (TMs) or a graph
(RDF)
• Snapshot– a complete copy of a collection at some point
in time
7
Feeds in the server
Overview feed
Collection feeds
Fragment feed
Snapshot feed
Fragment
Snapshot
8
An overview feed<feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare"> <title>SDshare feeds from localhost</title> <updated>2011-03-15T18:55:38Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>http://localhost:8080/sdshare/</id> <link href="http://localhost:8080/sdshare/"></link> <entry> <title>beer.xtm</title> <updated>2011-03-15T18:55:38Z</updated> <id>http://localhost:8080/sdshare/beer.xtm</id> <link href="collection.jsp?topicmap=beer.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link> </entry> <entry> <title>metadata.xtm</title> <updated>2011-03-15T18:55:38Z</updated> <id>http://localhost:8080/sdshare/metadata.xtm</id> <link href="collection.jsp?topicmap=metadata.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link> </entry></feed>
9
The snapshot feed
• A list of links to snapshots of the entire dataset (collection)
• The spec doesn't say anything about how and when snapshots are produced
• It's up to implementations to decide how they want to do this
• It makes sense, though, to always have a snapshot for the current state of the dataset
10
Example snapshot feed
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">
<title>Snapshots feed for beer.xtm</title> <updated>2011-03-15T19:12:34Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshots</id> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/
beer.xtm</sdshare:ServerSrcLocatorPrefix> <entry> <title>Snapshot of beer.xtm</title> <updated>2011-03-15T19:12:34Z</updated> <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshot/0</id> <link href="snapshot.jsp?topicmap=beer.xtm" type="application/x-
tm+xml; version=1.0" rel="alternate"></link> </entry></feed>
11
The fragment feed
• For every change in the topic map, there is one fragment– the granularity of changes is not defined by
the spec– it could be per transaction, or per topic
changed
• The fragment is basically a link to a URL that produces a part of the dataset
12
An example fragment feed
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">
<title>Fragments feed for beer.xtm</title> <updated>2011-03-15T19:21:20Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>file:/Users/larsga/data/topicmaps/beer.xtm/fragments</id> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</
sdshare:ServerSrcLocatorPrefix> <entry> <title>Topic with object ID 4521</title> <updated>2011-03-15T19:20:03Z</updated> <id>file:/Users/larsga/data/topicmaps/beer.xtm/4521/1300216803730</id> <link href="fragment.jsp?topicmap=beer.xtm&topic=4521&syntax=rdf"
type="application/rdf+xml" rel="alternate"/> <link href="fragment.jsp?topicmap=beer.xtm&topic=4521&syntax=xtm"
type="application/x-tm+xml; version=1.0" rel="alternate"/> <sdshare:TopicSI>http://psi.example.org/12</sdshare:TopicSI> </entry></feed>
13
What is a fragment?
• Essentially, a piece of a topic map– that is, a complete XTM file that contains only
part of a bigger topic map– typically, most of the topic references will
point to topics not in the XTM file
• Downloading more fragments will yield a bigger subset of the topic map– the automatic merging in Topic Maps will
cause the fragments to match up
• Exactly the same applies in RDF
14
An example fragment<topicMap xmlns="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink"> <topic id="id4521"> <instanceOf> <subjectIndicatorRef xlink:href="http://psi.garshol.priv.no/beer/pub"></subjectIndicatorRef> </instanceOf> <subjectIdentity> <subjectIndicatorRef xlink:href="http://psi.example.org/12"></subjectIndicatorRef> <topicRef xlink:href="file:/Users/larsga/data/topicmaps/beer.xtm#id2662"></topicRef> </subjectIdentity> <baseName> <baseNameString>Amundsen Bryggeri og Spiseri</baseNameString> </baseName> <occurrence> <instanceOf> <subjectIndicatorRef
xlink:href="http://psi.ontopia.net/ontology/latitude"></subjectIndicatorRef> </instanceOf> <resourceData>59.913816</resourceData> </occurrence> ... </topic> ...</topicMap>
15
Applying a fragment
• The feed contains a URI prefix– this is used to create item identifiers tagging
statements with their origin
• For each TopicSI find that topic, then– for each statement, remove matching item
identifier– if statement now has no item identifiers,
delete it
• Merge in the received fragment– then tag all statements in it with matching
item identifier
16
Properties of the protocol
• HATEOAS– uses hypertext principles– only endpoint is that of the overview feed– all other URLs available via hypertext
• Applying a fragment is idempotent– ie: result is the same, no matter how many times
you do it
• Loose binding– very loose binding between server and client
• Supports federation of data– client can safely merge data from different sources
17
SDshare push
• In normal SDshare data receivers connect to the data source– basically, they poll the source with GET requests
• However, the receiver is not always allowed to make connections to the source– SDshare push is designed for this situation
• Solution is a slightly modified protocol– source POSTs Atom feeds with inline fragments to
receipient– this flips the server/client relationship
• Not part of the spec; unofficial Ontopia extension
18
Uses of SDshare
19
Example use case #1
Portal
Ontopia DB2TM
Frontend
JDBC
Database
20
Example use case #1
Portal
OntopiaDB2TM
Frontend
Database
ESB
Service #1
Service #3
SDshare
OntopiaSDshare
21
NRK/Skole today
Editorial serverMediaDBDB2TM
DB server 1
JDBC
DB server 2
Prod #1 Prod #2
JDBCnrk-grep.xtm
Database
Server
Production environment
Export
Import
Firewall
22
NRK/Skole with SDshare push
Editorial serverMediaDBDB2TM
DB server 1
JDBC
DB server 2
Prod #1 Prod #2
JDBC
Database
Server
Production environment
Firewall
SDsharePUSH
23
Hafslund
ERP GIS CRM ...
UMIC
Archive
Search engine
24
Hafslund architecture
• The beauty of this architecture is that SDshare insulates the different systems from one another
• More input systems can be added without hassle
• Any component can be replaced without affecting the others
• Essentially, a plug-and-play architecture
25
A Hafslund problem
• There are too many duplicates in the data– duplicates within each system– also duplication across systems
• How to get rid of the duplicates?– unrealistic to expect cleanup across systems
• So, we build a deduplicator– and plug it in...
26
DuKe plugged in
ERP GIS CRM ...
UMIC
Archive
Search engine Dupe Killer
27
Implementations
28
Current implementations
• Web3– both client and server
• Ontopia– ditto + SDshare push
• Isidorus– don't know
• Atomico– server framework only; no actual
implementation
29
Ontopia SDshare server
• Event tracker– taps into event API where it listens for
changes– maintains in-memory list of changes– writes all changes to disk as well– removes duplicate changes and discards old
changes
• Web application based on tracker– JSP pages producing feeds and fragments– one fragment per changed topic, sorted by
time– only a single snapshot of current state of TM
30
Ontopia SDshare client
• Web UI for mgmt• Pluggable frontends• Pluggable backends• Combine at will• Frontends
– Ontopia: event listener– SDshare: polls Atom feeds
• Backends– Ontopia: applies changes to Ontopia locally– SPARQL: writes changes to RDF repo via SPARUL– push: pushes changes over SDshare push
SDshare client
Web UI
Ontopia events
Core logic
Ontopia backend
SPARQL Update
SDshare push
31
Web UI to client
32
Problems with the spec
33
What if many fragments?
• The size of the fragments feed grows enormous– expensive if polled frequently
• Paging might be one solution– basically, end of feed contains pointer to more
• "since" parameter might be another– allows client to say "only show me changes
since ..."
• Probably need both in practice
http://projects.topicmapslab.de/issues/3675
34
Ordering of fragments
• Should the spec require that fragments be ordered?– not really necessary if all fragment URIs
return current state (instead of state at time fragment entry was created)
35
RDF fragment algorithm
• The one given in the spec makes no sense
• Relies on Topic Maps constructs not found in RDF
• Really no way to make use of it
http://projects.topicmapslab.de/issues/4013
36
Our interpretation
• Server prefix is URI of RDF named graph
• Fragment algorithm therefore becomes– delete all statements about changed resources– then add all statements in fragment
• Means each source gets a different graph
37
TopicSL/TopicII
• Currently, topics can only be identified by subject identifier– but not all topics have one
• Solution– add elements for subject locators and item
identifiers
http://projects.topicmapslab.de/issues/3667
38
Paging of snapshots?
• What if the snapshot is vast?– clients probably won't be able to download
and store the entire thing in one go
• Could we page the snapshot into fragments?
• Or is there some other solution?
http://projects.topicmapslab.de/issues/4307
39
How to tell if the fragment feed is complete?
• When reading the fragment feed, how can we tell if there are older fragments that are discarded?– and how can we tell which fragment was the newest to
be thrown away?
• Without this there's no way to know for certain if you've lost fragments if the feed stops before the newest fragment you've got– and if you're using since it always will stop before the
newest fragment...
• Make new sdshare:foo element on feed level for this information?
http://projects.topicmapslab.de/issues/4308
40
Blank nodes are not supported
• What to do?
http://projects.topicmapslab.de/issues/4306
41
More information
• SDshare spec– http://www.egovpt.org/fg/CWA_Part_1b
• SDshare issue tracker– http://projects.topicmapslab.de/projects/
sdshare
• SDshare use cases– http://www.garshol.priv.no/blog/215.html