DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
-
Upload
herbert-van-de-sompel -
Category
Internet
-
view
2.692 -
download
0
Transcript of DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
![Page 1: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/1.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Herbert Van de Sompel@hvdsomp
Los Alamos National Laboratory
Acknowledgments: Lyudmila Balakireva, Harihar Shankar, Ruben Verborgh
Access to DBpedia Versions using Memento and Triple Pattern Fragments
Miel Vander Sande@Miel_vds
Ghent University
![Page 2: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/2.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
![Page 3: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/3.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
![Page 4: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/4.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Memento Framework
![Page 5: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/5.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Memento LDOW 2010 Submission
Herbert Van de Sompel et al. (2010) An HTTP-Based Versioning Mechanism for Linked Datahttp://arxiv.org/abs/1003.3661
![Page 6: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/6.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Memento and Linked Data
![Page 7: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/7.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Memento and Linked Data
![Page 8: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/8.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Time-Series Analysis across DBpedia Versions
Data collected through “follow your nose” HTTP Navigation
![Page 9: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/9.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
![Page 10: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/10.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
First Generation DBpedia Archive: Storage
![Page 11: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/11.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
First Generation DBpedia Archive: StorageCharacteristics
upload softwarecustom
upload time~ 24 hours per version
storage softwareMongoDB
storage space383 Gb for 10 versions
DBpedia versions10 versions: 2.0 through 3.9
number of triples~ 3 billion
![Page 12: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/12.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
First Generation DBpedia Archive: Subject-URI Access
![Page 13: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/13.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
First Generation DBpedia Archive: Subject-URI Access
http://dbpedia.mementodepot.org/memento/2009052/http://dbpedia.org/page/Oaxaca
![Page 14: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/14.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
First Generation DBpedia Archive: Subject-URI AccessCharacteristics
TimeGate softwarecustom
access typeSubject URI & datetime
external integrationcurrent DBpedia
clients• all clients: direct access to
Memento Subject-URI• Memento clients: datetime
negotiation with Subject-URI
![Page 15: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/15.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
DBpedia Archive @ LANL Since 2010
• Access based on Subject-URI (DBpedia Topic URI) only
• MongoDB storage• A blob per Subject-URI per version• Dynamically transformed to other RDF serializations• No updates since version 3.9 (2013) of DBpedia as a result of
scalability problems
!!!
!!!
![Page 16: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/16.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
![Page 17: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/17.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Affordable & Useful Linked Data Archives
• A Linked Data Archive consists of temporal snapshots of one or more Linked Data sets, whereby each temporal snapshot reflects the state of a Linked Data set at a specific moment or interval in time.
• How to make Linked Data Archives accessible in a manner that is • affordable/sustainable for the publisher• useful for the consumer
![Page 18: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/18.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive: Characteristics
General Characteristics Publisher Consumer
Availability
Bandwidth
Cost
Functionality
Interface Expressiveness
LOD Integration
Memento Support
Cross Time/Data
Verdict:• Publication perspective: $$$$• Access perspective: ++++
![Page 19: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/19.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Publishing
• The typical ways of publishing Linked Data on the Web:
• Subject URI access • Data dump• SPARQL endpoint
Let’s consider these from the perspective of Linked Data Archives, i.e. archival storage and access
![Page 20: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/20.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive with Subject-URI Access
• For each temporal snapshot of a Linked Data set, and for each Subject in that snapshot, publish an RDF description (of the Subject) at a URI that is specific per snapshot/subject
![Page 21: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/21.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive with Subject-URI Access: Characteristics
General Characteristics Publisher Consumer
Availability rather high rather high
Bandwidth ~ description ~ description
Cost rather low rather high
Functionality
Interface Expressiveness rather low
LOD Integration yes
Memento Support possible
Cross Time/Data follow your nose
Verdict:• Publication perspective: $$$$• Access perspective: ++++
![Page 22: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/22.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive Using Dumps
• Renders each temporal snapshot of a Linked Data set as a data dump that places all temporal dataset triples (as they were at a specific moment in time) into one or more files
![Page 23: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/23.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive Using Dumps: Characteristics
General Characteristics Publisher Consumer
Availability high high
Bandwidth high high
Cost low high
Functionality
Interface Expressiveness download dataset
LOD Integration no
Memento Support not possible
Cross Time/Data download various datasets
Verdict:• Publication perspective: $$$$• Access perspective: ++++
![Page 24: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/24.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive with SPARQL Endpoint(s)
• For each temporal snapshot of a Linked Data set, supports arbitrary SPARQL queries. • Different architectural set-ups possible; no standard approach
![Page 25: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/25.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive Using SPARQL Endpoint(s): Characteristics
General Characteristics Publisher Consumer
Availability problematic problematic
Bandwidth ~ query ~ query
Cost high low
Functionality
Interface Expressiveness highly expressive
LOD Integration no
Memento Support hard
Cross Time/Data custom distributed queries
Verdict:• Publication perspective: $$$$• Access perspective: ++++
![Page 26: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/26.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Affordable & Useful Linked Data Archives
Linked Data Archive Type Publishing Consuming
Data Dump $$$$ ++++SPARQL Endpoint(s) $$$$ ++++Subject URI Access $$$$ ++++
![Page 27: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/27.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
![Page 28: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/28.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Fragments (Ghent U)
• Every Linked Data interface offers specific fragments of a Linked Data set
• A fragment is described by• Selector: what questions can I ask?• Controls: how do I get more fragments?• Metadata: helpful information for consumption?
• Each interface type comes with tradeoffs• cf. the analysis thus far
http://linkeddatafragments.org
Verborgh, R. et al. (2014) Querying datsets on the web with high availability. ISWC 2014http://ruben.verborgh.org/publications/verborgh_iswc_2014/
![Page 29: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/29.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Triple Pattern Fragments (Ghent U)
• Triple Pattern Fragments is a new interface with a different set of tradeoffs that are attractive from an archival perspective
http://www.hydra-cg.com/spec/latest/triple-pattern-fragments/
![Page 30: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/30.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Triple Pattern Fragments (Ghent U)
• Allows querying a Linked Data set according to?Subject ?Predicate ?Objectpatterns
![Page 31: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/31.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Triple Pattern Fragments (Ghent U)
Controls: Responses provide navigational help for clients• Based on emerging Hydra vocabulary for self-describing
Hypermedia-Driven Web APIs
Metadata: dataset info, estimated count (to aid client applications)
http://www.hydra-cg.com/spec/latest/core/
![Page 32: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/32.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
![Page 33: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/33.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Binary RDF Representation for Publication and Exchange (HDT)
http://www.w3.org/Submission/HDT/
![Page 34: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/34.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Binary RDF Representation for Publication and Exchange (HDT)
http://www.w3.org/Submission/HDT/
• Header-Dictionary-Triple (HDT) is a compact, binary representation of RDF datasets.
![Page 35: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/35.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Binary RDF Representation for Publication and Exchange (HDT)
http://www.w3.org/Submission/HDT/
• Able to represent massive data sets• Dictionary/Triples structure achieves
• rapid search for ?subject ?predicate ?object pattern• high compression rates
• Header provides metadata about the dataset
![Page 36: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/36.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
![Page 37: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/37.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
HDT Linked Data Archive with TPF Support
• For each temporal snapshot of a Linked Data set, generate an HDT serialization that provides access according to?subject ?predicate ?objectpatterns
![Page 38: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/38.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive with ?s?p?o Access: Characteristics
General Characteristics Publisher Consumer
Availability high high
Bandwidth ~ query ~ query
Cost low medium
Functionality
Interface Expressiveness better than subject-URI only
LOD Integration yes
Memento Support possible
Cross Time/Data follow your nose
Verdict:• Publication perspective: $$$$• Access perspective: ++++
![Page 39: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/39.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Affordable & Useful Linked Data Archives
Linked Data Archive Type Publishing Consuming
Data Dump $$$$ ++++SPARQL Endpoint(s) $$$$ ++++Subject URI Access $$$$ ++++HDT & TPF $$$$ ++++
![Page 40: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/40.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
![Page 41: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/41.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: Storage
![Page 42: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/42.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: StorageCharacteristics
upload softwareHDT-CPP
upload time~ 4 hours per version
storage softwareHDT binary files
storage space70 Gb for 12 versions
DBpedia versions12 versions: 2.0 through 2015
number of triples~ 5 billion
![Page 43: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/43.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: ?s?p?o Query-URI Access
![Page 44: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/44.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: ?s?p?o Query-URI Access
http://fragments.mementodepot.org/dbpedia_3_8?subject=&predicate=http://dbpedia.org/ontology/birthPlace&object=http://dbpedia.org/resource/Ghent
![Page 45: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/45.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: ?s?p?o Query-URI Access
?s?p?o Query-URI Access
TimeGate URI http://fragments.mementodepot.org/timegate/dbpedia?subject={DBpediaURI}&predicate={DBpediaURI}&object={DBpediaURI}http://fragments.mementodepot.org/timegate/dbpedia?
subject=&predicate=&object=http://dbpedia.org/resource/GhentTimeMap URI not supported
Memento URI http://fragments.mementodepot.org/{DBpediaVersion}?subject={DBpediaURI}&predicate={DBpediaURI}&object={DBpediaURI}
http://fragments.mementodepot.org/dbpedia_3_0?subject=&predicate=&object=http://dbpedia.org/resource/Ghent
Further info http://mementoweb.org/depot/native/fragments/
Try it with Memento for Chrome – http://bit.ly/memento-for-chrome
![Page 46: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/46.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: Subject-URI Access
![Page 47: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/47.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: Subject-URI Access
Subject-URI Access
TimeGate URI http://dbpedia.mementodepot.org/timegate/{DBpediaURI}
http://dbpedia.mementodepot.org/timegate/http://dbpedia.org/data/Ghent
TimeMap URI http://dbpedia.mementodepot.org/timemap/link/{DBpediaURI}http://dbpedia.mementodepot.org/timemap/link/http://dbpedia.org/data/Ghent
Memento URI http://dbpedia.mementodepot.org/{yyyymmdd}/{DBpediaURI}
http://dbpedia.mementodepot.org/20080103/http://dbpedia.org/data/GhentFurther info http://mementoweb.org/depot/native/dbpedia/
Try it with Memento for Chrome – http://bit.ly/memento-for-chrome
![Page 48: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/48.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: AccessCharacteristics
TimeGate software① node.js LDF server 2.0.0② LDF js client
access type① ?s?p?o Query-URI & datetime② Subject-URI & datetime
external integration① DBpedia LDF server② current DBpedia
clients• all clients: direct access to
Mementos of Subject-URI and ?s?p?o Query-URI• Memento clients: datetime
negotiation with Subject-URI and
?s?p?o Query-URI
1
2
![Page 49: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/49.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
![Page 50: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/50.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Building a Linked Data Archive
• Convert the archival data set(s) to HDT using HDT-CPP
![Page 51: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/51.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
HDT Software (C++)
https://github.com/rdfhdt/hdt-cpp
• input data requires cleaning before processing, especially regarding URI characters• DBpedia data not clean• DBpedia v3.5 was not
successfully processed• No meaningful error
messages to help locate problems
• memory intensive• Kyoto Cabinet was used
to optimize storage requirement and speed during processing
• Java version exists but has memory problems
![Page 52: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/52.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Building a Linked Data Archive
• Convert the archival data set(s) to HDT using HDT-CPP
• Download the Triple Fragment Server code
![Page 53: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/53.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Fragment Server (Node.js)
https://github.com/LinkedDataFragments/Server.js
• provides ?s?p?o access to local and/or remote Linked Data sets
• supports HDT, Turtle files, N-Triple files, JSON-LD files, SPARQL endpoints, in-memory store, and BlazeGraph Linked Data sets
• version 2.0.0 (released March 31 2016) has built-in Memento support
![Page 54: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/54.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Building a Linked Data Archive
• Convert the archival data set(s) to HDT using HDT-CPP
• Download the Triple Fragment Server code
• Create the JSON config file for Memento
![Page 55: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/55.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Fragment Server, Memento Configuration
https://github.com/LinkedDataFragments/Server.js/wiki/Configuring-Memento
• declare archival data set(s)• add datetime ranges for the
archival data set(s)• add a TimeGate • list the archival data set(s) for
which the TimeGate should support datetime negotiation
![Page 56: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/56.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Building a Linked Data Archive
• Convert the archival data set(s) to HDT using HDT-CPP
• Download the Triple Fragment Server code
• Create the JSON config file for Memento
• Run the server
![Page 57: DBpedia Archive using Memento, Triple Pattern Fragments, and HDT](https://reader036.fdocuments.in/reader036/viewer/2022062523/58f9b33d760da3da068bd403/html5/thumbnails/57.jpg)
Herbert Van de Sompel & Miel Vander SandeCNI Spring Meeting, San Antonio, TX, April 5 2016
Herbert Van de Sompel@hvdsomp
Los Alamos National Laboratory
Acknowledgments: Lyudmila Balakireva, Harihar Shankar, Ruben Verborgh
Access to DBpedia Versions using Memento and Triple Pattern Fragments
Miel Vander Sande@Miel_vds
Ghent University