Post on 28-Jul-2015
Rethinking Online SPARQL Querying to Support
Incremental Result Visualization
Olaf Hartig
http://olafhartig.de
@olafhartig
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 2
Prologue
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 3
Live Querying the Web of Data
● Federated query processing– i.e., querying a federation of SPARQL endpoints
● Linked Data query processing– i.e., querying Linked Data by relying only on the
Linked Data principles (interface: URI lookups)
– e.g., traversal-based query execution
● Querying other Linked Data fragment servers– e.g., triple pattern fragments
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 4
Chapter 1
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 5
Can the progress that has been madeon (Read/Write) Linked Data change theway we interact with the Web […] ?”
“
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 6
Information in Dynamic Web Pages
Support for such an incremental visualizationhas not received much attention in existing
work on querying the Web of Data
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 7
“
I think we have not made enough progress to evenenable well-understood interaction techniques thatare widely applied in “traditional” Web applications
Can the progress that has been madeon (Read/Write) Linked Data change theway we interact with the Web […] ?”
“
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 8
Topics
Opportunities to Optimize the ResponseTimes of Traversal-based Query Executions
Making the Core Fragment of SPARQLSuitable for the Task
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 9
Chapter 2
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 10
Implementation Approach
Data RetrievalOperator
TriplePattern
Operator
TriplePattern
Operator
Dispatcher
. . .
Triple pattern ( ?v1, knows, ?v2 )
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 11
Data Retrieval Operator
Dispatcher
. . . GET http://example.org/...
. . . . . . . .
RDF triple( Bob, knows, Alice )
Triple pattern ( ?v1, knows, ?v2 )
TriplePattern
Operator
TriplePattern
Operator
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 12
Triple Pattern Operator
Dispatcher
. . .
. . . . . . . . Triple pattern ( ?v1, knows, ?v2 )
RDF triple( Bob, knows, Alice )
Intermediate SolutionTimestamp: 1Bindings: ?v1 → Bob, ?v2 → AliceFlags: [ ∙ | √ | ∙ | ∙ ]
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 13
Dispatcher
. . .
. . . . . . . .
Output
Intermediate SolutionTimestamp: 1Bindings: ?v1 → Alice, ?v2 → BobFlags: [ ∙ | √ | ∙ | ∙ ]
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 14
Output
Triple Pattern Operator cont'd
. . .
. . . . . . . .
?X
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 15
Output
Triple Pattern Operator cont'd
. . .
. . . . . . . .
?
Intermediate SolutionTimestamp: 461Bindings: ?v1 → Bob, ?v2 → SteveFlags: [ ∙ | √ | ∙ | ∙ ]
Intermediate SolutionTimestamp: 327Bindings: ?v1 → Bob, ?v3 → BerlinFlags: [√ | ∙ | ∙ | ∙ ]
Intermediate SolutionTimestamp: 461Bindings: ?v1 → Bob, ?v2 → Steve, ?v3 → BerlinFlags: [√ | √ | ∙ | ∙ ]
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 16
Output
Properties
. . .
. . . . . . . .
TP Operator
Data Retrieval
Dispatcher
TP Operator
● Supports:– any reachability-based
query semantics
● Highly flexible– routing of intermediate
solutions
● Inspired by “Eddies”– Avnur & Hellerstein,
SIGMOD 2000
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 17
Hypothesis 1
Responses time can be reducedby applying a suitable routing policy.
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 18
Test of Different Routing Policies
Setup:● Data retrieval operator simply appends to its lookup queue● Web simulation environment (test Web: W-62-47, test query: Q1, details: [Hartig and Özsu 2014])● Each bar represents geometric mean of 5 separate executions
Response time forlast reported solution,relative to overall QET
Response time forfirst reported solution,relative to overall QET
Routing policyhas no impact!
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 19
Hypothesis 1
Responses time can be reducedby applying a suitable routing policy.
No!
Why?
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 20
Data Retrieval Dominates!!!
Query 1 Query 4 Query 5 Query 9 Query 100.1
1
10
100
1000
10000
10000010 threads 20 threads cache
avg.
que
ry e
xec.
tim
e (s
econ
ds)
log
scal
e!
5 queries of the FedBench benchmark suite,executed over real Linked Data on the WWW
Different number of lookup threadsused by the data retrieval operator Data retrieval op. equipped with a cache
● Cache populatedby a first execution
● Times measured fora 2nd, cache-onlyexecution (i.e., dataretrieval deactivated)
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 21
Hypothesis 2
Response times can be reducedby choosing a “good” strategy
of prioritizing URI lookups.
. . . . . . . .
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 22
0 1 2 3 4 5 60
5
10
15
20
25
30
35
QETexec1exec2exec3exec4exec5
Prioritizing Lookups Randomly
result elements
time
from
beg
in o
f the
que
ry e
xecu
tion
(in m
inut
es)
ca. 25% of QET
ca. 58%
Setup:● LD10 of the FedBench benchmark suite,
over real Linked Data on the WWW
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 23
Hypothesis 2
Response times can be reducedby choosing a “good” strategy
of prioritizing URI lookups.√
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 24
Question
Response times can be reducedby choosing a “good” strategy
of prioritizing URI lookups.√
What is
?
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 25
Chapter 3
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 26
Topics
Opportunities to Optimize the Response Times of Traversal-based Query Executions √
Making the Core Fragment of SPARQLSuitable for the Task
(by making it monotonic)
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 27
Monotonicity?
● Query Q is monotonic if for every pair ( , ) of possible databases, it holds that:
● Example: the SPARQL pattern is
P = (a, p,?x) OPT (?x, p,?y)
is not monotonic– G1 = { (a, p, b) }
– G2 = { (a, p, b), (b, p, c) }
– ⟦P⟧G1 = { μ }, where μ = { ?x → b }
– ⟦P⟧G2 = { μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !
⟹ Q( ) ⊆ Q( )
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 28
What is the Issue?
● For any non-monotonic query, elements ofthe result set can be output only after wehave seen all query-relevant parts of the DB– Hence, since we discover our DB (the Web of Data)
at runtime, we can output result elements only after completing the discovery process
● Good news: the AND-UNION-FILTER fragment of SPARQL is monotonic [Arenas and Perez 2011]
● Bad news: for the AND-UNION-FILTER-OPT fragment, monotonicity is undecidable [Hartig 2014]
– i.e., queries with OPT may be non-monotonic
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 29
What is the Usage of OPT?
● DBpedia– 46.4% of ca. 1.3M unique queries
(logs from Apr. – Jul. 2010)Picalausa and Vansummeren, in SWIM 2011
– 16.6% (logs from USEWOD 2011 dataset)Gallego et al., in USEWOD 2011
– 15% (logs from USEWOD 2011 dataset)Elbedweihy et al., in COLD 2011
● Semantic Web conference corpus (SWDF)– 0.4% (logs from USEWOD 2011 dataset)
Gallego et al., in USEWOD 2011
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 30
A Proposal: The OPT+ Operator
● Query Q is monotonic if for every pair ( , ) of possible databases, it holds that:
●
● Recall our example: the SPARQL pattern is
P' = (a, p,?x) OPT (?x, p,?y)
is not monotonic– G1 = { (a, p, b) }, G2 = { (a, p, b), (b, p, c) }
– ⟦P'⟧G1 = { μ }, where μ = { ?x → b }
– ⟦P'⟧G2 = { μ, μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !
● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G \ 〚 P2 〛 G )
● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G
➔ P1 OPT+ P2 ≡ (P1 AND P2) UNION P1
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 31
A Proposal: The OPT+ Operator
● Query Q is monotonic if for every pair ( , ) of possible databases, it holds that:
●
● Recall our example: the SPARQL pattern is
P' = (a, p,?x) OPT+ (?x, p,?y)
is not monotonic √– G1 = { (a, p, b) }, G2 = { (a, p, b), (b, p, c) }
– ⟦P'⟧G1 = { μ }, where μ = { ?x → b }
– ⟦P'⟧G2 = { μ, μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !
● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G \ 〚 P2 〛 G )
● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G
➔ P1 OPT+ P2 ≡ (P1 AND P2) UNION P1
√
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 32
A Proposal: The OPT+ Operator
● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G \ 〚 P2 〛 G )
● 〚 P1 OPT+ P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G
➔ P1 OPT+ P2 ≡ (P1 AND P2) UNION P1
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 33
Epilogue
Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 34
Conclusions
● Returning result elements early has not yet received sufficient attention in existing workon live querying the Web of Data
● Prioritizing data retrieval can reduce response times of traversal-based query executions
What approaches are suitable and effective?
Similar for federated query processing, LDFs?
● Language features have to be chosen with care
Their impact has to be studied
Dedicated optimization techniques are possible