Rethinking Online SPARQL Querying to Support Incremental Result Visualization

Rethinking Online SPARQL Querying to Support

Incremental Result Visualization

Olaf Hartig

http://olafhartig.de

@olafhartig

Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 2

Prologue

Live Querying the Web of Data

● Federated query processing– i.e., querying a federation of SPARQL endpoints

● Linked Data query processing– i.e., querying Linked Data by relying only on the

Linked Data principles (interface: URI lookups)

– e.g., traversal-based query execution

● Querying other Linked Data fragment servers– e.g., triple pattern fragments

Chapter 1

Can the progress that has been madeon (Read/Write) Linked Data change theway we interact with the Web […] ?”

Information in Dynamic Web Pages

Support for such an incremental visualizationhas not received much attention in existing

work on querying the Web of Data

I think we have not made enough progress to evenenable well-understood interaction techniques thatare widely applied in “traditional” Web applications

Can the progress that has been madeon (Read/Write) Linked Data change theway we interact with the Web […] ?”

Topics

Opportunities to Optimize the ResponseTimes of Traversal-based Query Executions

Making the Core Fragment of SPARQLSuitable for the Task

Chapter 2

Implementation Approach

Data RetrievalOperator

TriplePattern

Operator

TriplePattern

Operator

Dispatcher

Triple pattern ( ?v1, knows, ?v2 )

Data Retrieval Operator

Dispatcher

. . . GET http://example.org/...

. . . . . . . .

RDF triple( Bob, knows, Alice )

Triple pattern ( ?v1, knows, ?v2 )

TriplePattern

Operator

TriplePattern

Operator

Triple Pattern Operator

Dispatcher

. . . . . . . . Triple pattern ( ?v1, knows, ?v2 )

RDF triple( Bob, knows, Alice )

Intermediate SolutionTimestamp: 1Bindings: ?v1 → Bob, ?v2 → AliceFlags: [ ∙ | √ | ∙ | ∙ ]

Dispatcher

. . . . . . . .

Output

Intermediate SolutionTimestamp: 1Bindings: ?v1 → Alice, ?v2 → BobFlags: [ ∙ | √ | ∙ | ∙ ]

Output

Triple Pattern Operator cont'd

. . . . . . . .

Output

Triple Pattern Operator cont'd

. . . . . . . .

Intermediate SolutionTimestamp: 461Bindings: ?v1 → Bob, ?v2 → SteveFlags: [ ∙ | √ | ∙ | ∙ ]

Intermediate SolutionTimestamp: 327Bindings: ?v1 → Bob, ?v3 → BerlinFlags: [√ | ∙ | ∙ | ∙ ]

Intermediate SolutionTimestamp: 461Bindings: ?v1 → Bob, ?v2 → Steve, ?v3 → BerlinFlags: [√ | √ | ∙ | ∙ ]

Output

Properties

. . . . . . . .

TP Operator

Data Retrieval

Dispatcher

TP Operator

● Supports:– any reachability-based

query semantics

● Highly flexible– routing of intermediate

solutions

● Inspired by “Eddies”– Avnur & Hellerstein,

SIGMOD 2000

Hypothesis 1

Responses time can be reducedby applying a suitable routing policy.

Test of Different Routing Policies

Setup:● Data retrieval operator simply appends to its lookup queue● Web simulation environment (test Web: W-62-47, test query: Q1, details: [Hartig and Özsu 2014])● Each bar represents geometric mean of 5 separate executions

Response time forlast reported solution,relative to overall QET

Response time forfirst reported solution,relative to overall QET

Routing policyhas no impact!

Hypothesis 1

Responses time can be reducedby applying a suitable routing policy.

Data Retrieval Dominates!!!

Query 1 Query 4 Query 5 Query 9 Query 100.1

10000010 threads 20 threads cache

5 queries of the FedBench benchmark suite,executed over real Linked Data on the WWW

Different number of lookup threadsused by the data retrieval operator Data retrieval op. equipped with a cache

● Cache populatedby a first execution

● Times measured fora 2nd, cache-onlyexecution (i.e., dataretrieval deactivated)

Hypothesis 2

Response times can be reducedby choosing a “good” strategy

of prioritizing URI lookups.

. . . . . . . .

0 1 2 3 4 5 60

QETexec1exec2exec3exec4exec5

Prioritizing Lookups Randomly

result elements

ca. 25% of QET

ca. 58%

Setup:● LD10 of the FedBench benchmark suite,

over real Linked Data on the WWW

Hypothesis 2

of prioritizing URI lookups.√

Question

of prioritizing URI lookups.√

What is

Chapter 3

Topics

Opportunities to Optimize the Response Times of Traversal-based Query Executions √

Making the Core Fragment of SPARQLSuitable for the Task

(by making it monotonic)

Monotonicity?

● Query Q is monotonic if for every pair ( , ) of possible databases, it holds that:

● Example: the SPARQL pattern is

P = (a, p,?x) OPT (?x, p,?y)

is not monotonic– G1 = { (a, p, b) }

– G2 = { (a, p, b), (b, p, c) }

– ⟦P⟧G1 = { μ }, where μ = { ?x → b }

– ⟦P⟧G2 = { μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !

⟹ Q( ) ⊆ Q( )

What is the Issue?

● For any non-monotonic query, elements ofthe result set can be output only after wehave seen all query-relevant parts of the DB– Hence, since we discover our DB (the Web of Data)

at runtime, we can output result elements only after completing the discovery process

● Good news: the AND-UNION-FILTER fragment of SPARQL is monotonic [Arenas and Perez 2011]

● Bad news: for the AND-UNION-FILTER-OPT fragment, monotonicity is undecidable [Hartig 2014]

– i.e., queries with OPT may be non-monotonic

What is the Usage of OPT?

● DBpedia– 46.4% of ca. 1.3M unique queries

(logs from Apr. – Jul. 2010)Picalausa and Vansummeren, in SWIM 2011

– 16.6% (logs from USEWOD 2011 dataset)Gallego et al., in USEWOD 2011

– 15% (logs from USEWOD 2011 dataset)Elbedweihy et al., in COLD 2011

● Semantic Web conference corpus (SWDF)– 0.4% (logs from USEWOD 2011 dataset)

Gallego et al., in USEWOD 2011

A Proposal: The OPT+ Operator

● Recall our example: the SPARQL pattern is

P' = (a, p,?x) OPT (?x, p,?y)

is not monotonic– G1 = { (a, p, b) }, G2 = { (a, p, b), (b, p, c) }

– ⟦P'⟧G1 = { μ }, where μ = { ?x → b }

– ⟦P'⟧G2 = { μ, μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !

● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G \ 〚 P2 〛 G )

● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G

➔ P1 OPT+ P2 ≡ (P1 AND P2) UNION P1

● Recall our example: the SPARQL pattern is

P' = (a, p,?x) OPT+ (?x, p,?y)

is not monotonic √– G1 = { (a, p, b) }, G2 = { (a, p, b), (b, p, c) }

– ⟦P'⟧G1 = { μ }, where μ = { ?x → b }

– ⟦P'⟧G2 = { μ, μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !

● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G

Epilogue

Conclusions

● Returning result elements early has not yet received sufficient attention in existing workon live querying the Web of Data

● Prioritizing data retrieval can reduce response times of traversal-based query executions

What approaches are suitable and effective?

Similar for federated query processing, LDFs?

● Language features have to be chosen with care

Their impact has to be studied

Dedicated optimization techniques are possible

Rethinking Online SPARQL Querying to Support Incremental Result Visualization

Science

Transcript of Rethinking Online SPARQL Querying to Support Incremental Result Visualization

Killing Two Birds with One Stone Querying Property Graphs using SPARQL … · 2018. 1. 30. · SPARQL !SQL: There is a substantial amount of work been done for conversion of SPARQL

SPARQL for Querying PML Data Jitin Arora. Overview SPARQL: Query Language for RDF Graphs W3C Recommendation since 15 January 2008 Outline: Basic Concepts.

S2RDF: RDF Querying with SPARQL on Spark

SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)data

Querying Distributed RDF Data Sources with SPARQL

Querying)theWeb) ofData with) SPARQL)and)XSPARQL)zimmermann/WI_2014_Site/Programme/sparql-xsparql/... · web: twitter:@AxelPolleres Querying)theWeb) ofData with) SPARQL)and)XSPARQL)

Towards More Intelligent SPARQL Querying Interfacesceur-ws.org/Vol-2548/paper-12.pdf · the SPARQL query. High availability is achieved in TPF but increase in network bandwidth and

Querying Semantic Web Data with SPARQL

Chapter 3 Querying RDF stores with SPARQL

SPARQL Querying Benchmarks ISWC2016

SPARQL QUERY LANGUAGEai.fon.bg.ac.rs › wp-content › uploads › 2015 › 04 › SPARQL_Dec...SPARQL query language • W3C standard for querying RDF graphs • Can be used to query

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs

Developing a Curriculum of Open Educational Resources for ... › download › pdf › 20667283.pdf · • Module 2: Querying Linked Data . This module looks in detail at SPARQL (SPARQL

MapReduce-based Solutions for Scalable SPARQL Querying

Querying the Semantic Web with SPARQL

Querying the Web of Data with SPARQL-LDusers.ics.forth.gr/~fafalios/files/ppts/SPARQL-LD_TPDL2016.pdf · –Web of Data, Linked Data –RDF, SPARQL –Web of Data and Digital Libraries

SPARQL - SPARQL Protocol and RDF Query Languageimss-atenciam/SWXO/3-sparql.pdf · Querying RDF data: SPARQL 2% SPARQL'Protocol'And'RDF'Query'Language Linked Data : 3rd Principe When

SPARQL - Prof. Mustafa Jarrar (Personal Page)...... RDF Query Langauge, Graph Databases, Querying Graph, Semantic Web, Data Web, Jarrar © 2013 4 SPARQL As we have learned, RDF is

SPARQL - Querying the Web of Datagweddell/cs848/slides/OlafSPARQLIntro.pdfAn Introduction to SPARQL 2 SPARQL in General SPARQL Protocol and RDF Query Language SPARQL Query Language

Querying the Web of Data with SPARQL and · PDF fileQuerying the Web of Data with SPARQL and XSPARQL ... Querying the Web of Data with ... "Nightwish", "playcount":