A Decentralized Approach to Dissemination,Retrieval, and Archiving of Data
Tobias Kuhn
http://www.tkuhn.org
@txkuhn
Department of Computer Science, VU University Amsterdam
Open Science for an Open Society Workshop2016 Conference on Complex Systems
Amsterdam, Netherlands20 September 2016
Increasing Importance of Scientific Data
https://www.google.com/trends/explore#q=%22data%20science%22
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 2 / 15
Scientific Data as Supplemental Material
...
http://www.nature.com/ni/journal/v16/n10/full/ni.3267.html#supplementary-information
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 3 / 15
Scientific Data in Open Repositories
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 4 / 15
We Need Better Data Publishing!
Published data should be:
• Verifiable (Is this really the data I am looking for?)
• Immutable (Can I be sure that it hasn’t been modified?)
• Permanent (Will it be available in 1, 5, 20 years from now?)
• Reliable (Can it be efficiently retrieved whenever needed?)
• Granular (Can I refer to individual data entries?)
• Semantic (Can it be automatically interpreted?)
• Linked (Does it use established identifiers and ontologies?)
• Trustworthy (Can I trust the source?)
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 5 / 15
Requirement: Automated Low-Level Operations
We need automated low-level operations to publish and retrieve dataentries and datasets:
publish <dataset-identifier>
get <dataset-identifier>
(like HTTP POST/GET but verifiable, immutable, permanent, reliable, ...)
Approach: Linked Data + Cryptography + Decentralization
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 6 / 15
Nanopublications: Linked Data Containers forProvenance-Aware Semantic Publishing
assertion
provenance
publication info
nanopublication
http://nanopub.org / @nanopub org
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 7 / 15
Trusty URIs: Cryptographic Hash Values forVerifiable and Immutable Web Identifiers
Nanopublications with Trusty URIs are ...
XVerifiable
+
Immutable
+ �Permanent
.trighttp://example.org/r1. RA 5AbXdpz5DcaYXCh9l3eI9ruBosiL5XDU3rxBbBaUO70
http://trustyuri.net/
Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014.
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 8 / 15
Decentralized and Reliable Publishing with aNanopublication Server Network
Nanopublicationswith Trusty URIs
Publication
Retrieval
Propagation / Archiving
http://npmonitor.inn.ac
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 9 / 15
Defining Datasets with Nanopublication Indexes(which are themselves Nanopublications)
appends
has sub-index
has element
(a) (b)
(c) (f)
(d) (e)
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 10 / 15
Nanopublication Server Network isEfficient and Scalable
Our servers can deliver nanopublications about 100 times faster thanwhen a triple store is used (and need much less resources):
time from start of test in seconds
resp
onse
tim
e in
sec
onds
0 50 100 150 200 250 3000 50 100 150 200 250 300
0.1
1
10
100
0 20 40 60 80 100
number of clients accessing the service in parallel
Virtuoso triple store with SPARQL endpointnanopublication server
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 11 / 15
Nanopublication Datasets
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 12 / 15
Reliable Low-Level Publish/Retrieve Operations!
Operation to publish data:
$ np publish nanopubs.trig
156026 nanopubs published at http://np.inn.ac/
which can also be used to publish dataset definitions (indexes):
$ np publish index.trig
157 nanopubs published at http://np.inn.ac/
Operation to retrieve data entries:
$ np get http://np.inn.ac/RA7Kmmugi8OuCirfe5WKchnJhC3FuhQDi6M4O8mgR0CqE
and to retrieve entire datasets:
$ np get -c http://np.inn.ac/RAY lQruuagCYtAcKAPptkY7EpITwZeUilGHsWGm9ZWNI
https://github.com/Nanopublication/nanopub-java
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 13 / 15
Future Work
• Improve server protocol
• Develop services on top of the server network
• Establish best practices for versioning, retractions, reviews, etc.
• Connect it all to the scientific publishing workflow
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 14 / 15
Thank you for your attention!
Questions?
Further information:
• Paper on the approach:https://peerj.com/articles/cs-78/
• Nanopublications: http://nanopub.org
• Trusty URIs: http://trustyuri.net
• Nanopublication Server Network: http://npmonitor.inn.ac
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 15 / 15
Top Related