balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
-
Upload
kai-schlegel -
Category
Data & Analytics
-
view
591 -
download
0
description
Transcript of balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
![Page 1: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/1.jpg)
DESWeb 2014ICDE 2014, Chicago IL, USA, March 3
balloon FusionSPARQL Rewriting Based on
Unified Co-Reference Information
Kai Schlegel ([email protected])Florian Stegmaier, Sebastian Bayerl, Michael Granitzer, Harald Kosch
![Page 2: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/2.jpg)
2
Motivation
SPARQL Rewriting & Federation
Intermediate Results
Outline
supported by the European Commission under the Seventh Framework Program
![Page 3: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/3.jpg)
3
Linked Data isthe heart of Semantic Web
“- W3C Semantic Web Group
![Page 4: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/4.jpg)
4
Huge Potential!
![Page 5: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/5.jpg)
5
Developing withLinked Open Data
![Page 6: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/6.jpg)
6
• Easy access to Linked Data• Query Linked Open Data with SPARQL
• Plethora of tools available
• Problems: • Business oriented
• Complex setup
• Maintenance
• „Paper-only“
• Not developer friendly
• Simple and „instant“ SPARQL Query Federation (-as-a-Service)
Motivation
Nothing-as-a-Service
![Page 7: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/7.jpg)
7
• How to get information about the German City „Passau“?
• Problem: LOD is not a single database!
Querying LOD
SPARQL
SPARQL
RDF
RDFRDF
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
de.dbpedia.org
Relations, Coordinates, Leader, etc.
What about the population?
SPARQL
![Page 8: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/8.jpg)
8
• Problem: Selection of appropriate endpoints
• Send query to some endpoints and aggregate the results?
Distributed Querying!
SPARQL
SPARQL
RDF
RDFRDF
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
de.dbpedia.org
SPARQL
linkedgeodata.org
WHAT ?
![Page 9: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/9.jpg)
9
• Problem: Different identifier for the same semantic concept
Misunderstanding: Co-Referencing
SPARQL
SPARQL
RDF
RDFRDF
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
de.dbpedia.org
SPARQL
linkedgeodata.org
WHAT ?
Known problem in linguistic:
It’s a spud! “What?“
I mean potato! “
Co-Referencing: Multiple expressions refer to the same thing.
![Page 10: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/10.jpg)
10
Problem = Solution?
SPARQL-based crawling of co-reference information
Exploit co-reference information for• accomplishing immediate SPARQL rewriting
• performing endpoint selection
• execute automatic query federation
Basic idea: Focusing distributed co-reference information
Main principle: Semantic entites over identifier!
![Page 11: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/11.jpg)
11
Components
balloon toolsuite
![Page 12: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/12.jpg)
12
balloon Overflight• SPARQL based crawling of LOD endpoints
• Query: Ask for subjects and objects which are related with special predicate
• Simplified global view on• Equivalence: owl:SameAs, skos:exactMatch,
coref:coreferenceData, ...
• Graph-Database Neo4j• Equivalence Cluster:
Multiple synonym URIs representing the same semantic entity including Provenance
![Page 13: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/13.jpg)
13
balloon Fusion
SPARQL Federation setup using co-reference information
SPARQL Transformation for each BGP1. Determine synonym URIs
2. Select suitable endpoints
3. Adapt sub-queries to endpoints
4. Federated querying
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
SPARQL
![Page 14: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/14.jpg)
141. Determine synonym URIs
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
SPARQL
![Page 15: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/15.jpg)
15
2. Select suitable endpoints
• Provenance based selection (PBS)• Endpoints which are involved in cluster composition
• Namespace based selection (NBS)• Prefix and Namespace matching of synonym URLs
Summarized: origin of co-reference information and origin of synonym URIs
![Page 16: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/16.jpg)
162. Select suitable endpoints (2)
Assumption: • Provenance information only contains „linkedgeodata.org“
as co-reference origin• Namespaces for freebase and dbpedia available (datahub.io)
PBS:Linked-Geo-Data
Endpoint
NBS:DBPedia endpoint
NBS:Freebaseendpoint
![Page 17: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/17.jpg)
17
3. Adapt sub-queries to endpoints
PBS:Linked-Geo-Data
Endpoint
NBS:DBPedia endpoint
NBS:Freebaseendpoint
SELECT ?p ?o WHERE {<http://rdf.freebase.com/
ns/m.01h5td> ?p ?o.}
SPARQL
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
SPARQL
SELECT ?p ?o WHERE { { <http://rdf.freebase.com/ns/m.01h5td> ?p ?o. } UNION { <http://linkedgeodata.org/triplify/node240057351> ?p ?o. } UNION { <http://de.dbpedia.org/resource/Passau> ?p ?o. }}
SPARQL
SELECT ?p ?o WHERE { <http://de.dbpedia.org/resource/Passau> ?p ?o.}
SPARQL
![Page 18: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/18.jpg)
18
• W3C SPARQL 1.1 Federated Query Extension (SERVICE)• (Partial) Query can be executed against a remote SPARQL
endpoint
• Distributed sub-queries don‘t contain SPARQL 1.1 features
4. Federated Querying
SPARQL
SELECT ?p ?o WHERE { SERVICE <http://dbpedia.org/sparql> { <http://de.dbpedia.org/resource/Passau> ?p ?o. } UNION { SERVICE <http://www.freebase.com/base/sparql> { <http://rdf.freebase.com/ns/m.01h5td> ?p ? } } UNION { SERVICE <http://linkedgeodata.org/sparql/> { { <http://rdf.freebase.com/ns/m.01h5td> ?p ?o. } UNION { <http://linkedgeodata.org/triplify/node240057351> ?p ?o. } UNION { <http://de.dbpedia.org/resource/Passau> ?p ?o. }}}}
![Page 19: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/19.jpg)
19
• Endpoint status check• Check routine in terms of availability and latency
• Minimize sub-queries• Group sub-queries with common endpoint
• Push join to endpoint
• SPARQL Features• Condense PBS UNION-construct of synonym URIs
• SPARQL 1.1 VALUES or FILTER with IN operator
• Not well implemented in Linked Data endpoints
Optimizations (ongoing)
![Page 20: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/20.jpg)
20
balloon Overflight Results
![Page 21: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/21.jpg)
21Results from a sounding balloon
![Page 22: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/22.jpg)
22balloon toolsuite
![Page 23: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/23.jpg)
23
Statistics• Datahub.io: Linked Open Data Cloud catalog• 337 datasets in total
• 237 expose a SPARQL endpoint
• 112 successfully queried for co-reference information
• Balloon Dataset (first run)
• 17.6M co-reference statements
• 22.4M distinct URLs
• 8.4M equivalence cluster (~ 2.68 identifier per cluster)
• Pending Analysis• Distribution of cluster sizes, Number of different Hosts per
cluster
• Main representative per cluster & False-Friends
![Page 24: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/24.jpg)
24
Open Source:
• Demo, information and sources available (MIT License)• X as a Service
• SPARQL Rewriting (HTTP API)
• Query Federation (SPARQL)
http://schlegel.github.io/balloon
![Page 25: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/25.jpg)
25
Summary:• SPARQL-based crawling of distributed co-reference
information
• Exploit co-reference information for SPARQL federation
Single Point of Access
![Page 26: balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information](https://reader035.fdocuments.in/reader035/viewer/2022070316/555d4ddcd8b42a9d3b8b4750/html5/thumbnails/26.jpg)
26
Any questions?
“Research is formalized curiosity. It is poking and prying with a purpose. - Zora Neale Hurston