RDF Analytics... SPARQL and Beyond
-
Upload
fadi-maali -
Category
Education
-
view
1.666 -
download
4
description
Transcript of RDF Analytics... SPARQL and Beyond
![Page 1: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/1.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
© Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
@fsheer
Fadi Maali
RDF Analytics… SPARQL and Beyond…
![Page 2: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/2.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Why analytics (1/2)
![Page 3: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/3.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Why analytics (2/2)
![Page 4: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/4.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Appetite Whetting (1/3)
Google accurately detects Flu trend ahead of the U.S. Center for Disease Control.
http://www.google.org/flutrends/about/how.html
![Page 5: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/5.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
http://www.dailymail.co.uk/sciencetech/article-2120416/Twitter-predicts-stock-prices-accurately-investment-tactic-say-scientists.html
Appetite Whetting (2/3)
![Page 6: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/6.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Appetite Whetting (3/3)
http://www.nature.com/srep/2011/111215/srep00196/full/srep00196.html
Flavor pyramids for North American and East Asian cuisines
![Page 7: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/7.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Data Science and RDF
Ø Can we do “data science” using RDF data?
§ Do we have the data?
§ Do we have the tools?
Ø Why should we use RDF?
![Page 8: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/8.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
RDF Characteristics
§ Graph data model
§ Clearly defined semantics
§ Support Web-scale distributed publication
![Page 9: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/9.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Available RDF Data
§ Freebase has 1.2 billion triples (Google) § The LOD Cloud has more than 31 billion triples § Embedded RDF data: schema.org, Drupal…
http://lod-cloud.net/
![Page 10: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/10.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Available RDF Tools
In this presentation we focus on the standard SPARQL: q W3C Recommendation
q Supports Querying, transforming and updating RDF data
q Large number of available implementations
q Define a communication protocol
q 427 public SPARQL endpoints registered on the DataHub* * http://sw.deri.org/~aidanh/docs/epmonitorISWC.pdf
![Page 11: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/11.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
RDF Data… a graph
![Page 12: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/12.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT ?name WHERE{ ?p :name ?name . }ORDER BY ?name
SPARQL… Simple queries
![Page 13: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/13.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT ?gender (COUNT(*) AS ?count) WHERE{ ?p :gender ?gender } GROUP BY ?gender
SPARQL… BI queries
![Page 14: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/14.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT ?gender (COUNT(*) AS ?count) WHERE{ ?p :gender ?gender } GROUP BY ?gender
SPARQL… BI queries
![Page 15: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/15.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT ?name (COUNT(?n) AS ?neighbours) WHERE{ ?p :knows ?n . ?p :name> ?name . } GROUP BY ?p ?name ORDER BY desc(?neighbours)
SPARQL… BI queries
![Page 16: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/16.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT ?name (COUNT(?n) AS ?neighbours) WHERE{ ?p :knows ?n . ?p :name> ?name . } GROUP BY ?p ?name ORDER BY desc(?neighbours)
SPARQL… BI queries
![Page 17: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/17.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… BI queries
Ø How influential a person is within a social network Ø How a road is within an urban network Ø How central an employee in an enterprise
![Page 18: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/18.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… Graph measure
Can we use SPARQL to compute shortest paths in the graph? Short answer: NO! Long answer: Let’s try!
![Page 19: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/19.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT ?v1 ?v2 (MIN(?l) AS ?shortestPath) WHERE{ { ?v1 :knows ?v2 BIND (1 AS ?l) } UNION { ?v1 :knows{2} ?v2 BIND (2 AS ?l) } UNION { ?v1 :knows{3} ?v2 BIND (3 AS ?l) } FILTER (?v1 != ?v2) } GROUP BY ?v1 ?v2
SPARQL… graph measure
![Page 20: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/20.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… graph measure
![Page 21: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/21.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… graph measure
Ø finding directions between physical locations
Ø finding the most direct way to contact a person
Ø finding the min-delay communication path
![Page 22: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/22.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… clustering
Can we do clustering using SPARQL? YES! Peer-pressure algorithm implemented using (almost only) SPARQL*
* http://yarcdata.com/blog/?p=318
![Page 23: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/23.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
DROP GRAPH <urn:ga/g/xjz1> ; CREATE GRAPH <urn:ga/g/xjz1>; INSERT {GRAPH <urn:ga/g/xjz1> {?s :cluster ?clus3}} WHERE { SELECT ?s (SAMPLE(?clus) AS ?clus3) { { SELECT ?s (MAX(?clusCt) AS ?maxClusCt) { SELECT ?s ?clus (COUNT(?clus) AS ?clusCt) WHERE { ?s :knows ?o . GRAPH <urn:ga/g/xjz0> { ?o :cluster ?clus } } GROUP BY ?s ?clus } GROUP BY ?s } { SELECT ?s ?clus (COUNT(?clus) AS ?clusCt) WHERE { ?s :knows ?o . GRAPH <urn:ga/g/xjz0> { ?o :cluster ?clus } } GROUP BY ?s ?clus } FILTER (?clusCt = ?maxClusCt) } GROUP BY ?s }
SPARQL… clustering
![Page 24: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/24.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
DROP GRAPH <urn:ga/g/xjz1> ; CREATE GRAPH <urn:ga/g/xjz1>; INSERT {GRAPH <urn:ga/g/xjz1> {?s :cluster ?clus3}} WHERE { SELECT ?s (SAMPLE(?clus) AS ?clus3) { { SELECT ?s (MAX(?clusCt) AS ?maxClusCt) { SELECT ?s ?clus (COUNT(?clus) AS ?clusCt) WHERE { ?s :knows ?o . GRAPH <urn:ga/g/xjz0> { ?o :cluster ?clus } } GROUP BY ?s ?clus } GROUP BY ?s } { SELECT ?s ?clus (COUNT(?clus) AS ?clusCt) WHERE { ?s :knows ?o . GRAPH <urn:ga/g/xjz0> { ?o :cluster ?clus } } GROUP BY ?s ?clus } FILTER (?clusCt = ?maxClusCt) } GROUP BY ?s }
SPARQL… clustering
![Page 25: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/25.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL Expressivity
Ø BI-like operations (rollup and drilldown)
Ø Graph Measures
Ø Iterative algorithms (Clustering)
![Page 26: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/26.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL Scalability…
One approach is to use a scale-out architecture… think MapReduce or Hadoop q Translate SPARQL into MapReduce
q Process RDF data directly in MapReduce
![Page 27: RDF Analytics... SPARQL and Beyond](https://reader034.fdocuments.in/reader034/viewer/2022051109/548473f1b4af9f690d8b4bc8/html5/thumbnails/27.jpg)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
All examples used in this presentation and equivalent of some of them using Pig Latin is available at: https://github.com/fadmaa/rdf-analytics
Conclusion
Ø Can we do “data science” using RDF data?
§ Do we have the data? YES
§ Do we have the tools? Almost
v Is SPARQL expressive enough? Almost v Does it scale? Yes… in principle, No in practice v Is it usable/easy? Not really