Semantic Web - TKK Semantic We… · So is the whole Semantic Web thing a huge failure? Processing...
Transcript of Semantic Web - TKK Semantic We… · So is the whole Semantic Web thing a huge failure? Processing...
How to use in network application development
Semantic Web
A sneak peek on Semantic Web: RDF and SPARQL
SPARQL Query PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?mbox WHERE { ?x foaf:name ?name . ?x foaf:mbox ?mbox }
RDF Data @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix people: <http://example.com/People/> . people:John foaf:name "Johnny Lee Outlaw" . people:John foaf:mbox <mailto:[email protected]> . people:Peter foaf:name "Peter Goodguy" . people:Peter foaf:mbox <mailto:[email protected]> . people:Carol foaf:mbox <mailto:[email protected]> .
Query result name mbox "Johnny Lee Outlaw" <mailto:[email protected]> "Peter Goodguy" <mailto:[email protected]>
Content
• Semantic Web – A simple idea largely unrealized • Processing structured data in the web • Semantic Web stack • Linked Data • W3C Data Activity • RDF Streams • Continuous SPARQL queries
Semantic Web – A simple idea largely unrealized
10.3.2014 Laitoksen nimi
4
A simple idea largely unrealized
• [Tim Berners-Lee] defines the Semantic Web as "a web of data that can be processed directly and indirectly by machines.” (wikipedia)
• By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web dominated by unstructured and semi-structured documents into a "web of data”.
• Berners-Lee described an expected evolution of the existing Web to a Semantic Web,[5] but this has yet to happen.
• In 2006, Berners-Lee and colleagues stated that: "This simple idea ... remains largely unrealized."[6]
So is the whole Semantic Web thing a huge failure?
Processing Structured Data in the Web
10.3.2014 Laitoksen nimi
6
Structured data in the web – an example
Structured data example – HTML browser
DB
Web server
SQL
Rows
HTML browser
HTTP get
Structured data processing by HTML scraping
DB
Web server
SQL
Rows
HTML scraper
HTTP get
Xpath rules
JSON CSV XML …
Problems: - Tight coupling - HTML output formats change - The changes are not documented
Application
Side note: Scrapy
Structured data processing with a REST API
DB
Web server
SQL
Rows
HTTP get
JSON CSV XML …
Rest API
Problems: - Tight coupling - The API calls hard coded in our application - APIs change
Application
Structured data – multiple data sources
The problems that we had with scraping or APIs multiply • All the HTML formats or the APIs are hard coded in our application • Every change means editing our program
Combining the data from heterogeneous sources: • Each source has its own data format and own naming
conventions
DB
Web server
DB
Web server
DB
Web server
Application
Structured data with Semantic Web technologies
• Refer to objects using URIs (IRIs) • Represent structured data using RDF triples • Use OWL ontologies for shared concepts between different data
sources • Access the RDF data in RDF Graph stores or SPARQL endpoints • Query the data using SPARQL query language • Merge structured data from different data sources
Structured data – Graph store HTTP access
RDF store
HTTP GET containing SPARQL construct query
Application
SPARQL query engine RDF triples
Structured data – Multiple graph store HTTP access
RDF store
HTTP GET containing SPARQL construct query
Application
SPARQL query engine
RDF store
RDF store
RDF triples
Structured data – SPARQL endpoint
JSON CVS XML ...
SPARQL endpoint
Application
HTTP GET containing SPARQL queries
Structured data – multiple SPARQL endpoints HTTP GET containing SPARQL queries
JSON CVS XML ...
SPARQL endpoint
Application SPARQL endpoint
SPARQL endpoint
Semantic Web Stack
Semantic Web Stack
10.3.2014 Laitoksen nimi
19
URI – Universal Resource Identifier
• To be able to merge data from different sources we need a mechanism for naming objects uniquely
• In Semantic Web, URI is this naming mechanism • More generally, we use Internationalized Resource Identifiers
(IRIs)
RDF – Resource Description Framework
RDF: a set of triples • Triple is a subject-predicate-object statement
”John knows Bob” can be represented as a triple: <http://example.com/People/John> <http://xmlns.com/foaf/0.1/knows> <http://example.com/People/Bob>
• Often we use prefixes to shorten the URIs: people:John foaf:knows people:Bob
RDF – Resource Description Framework
RDF: a directed labeled graph
From ”What is RDF and what is it good for?”
RDF – Resource Description Framework Part of the previous graph as triples (N3 notation) @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix ex: <http://www.example.org/> . ex:vincent_donofrio ex:starred_in ex:law_and_order_ci . ex:law_and_order_ci rdf:type ex:tv_show . ex:the_thirteenth_floor ex:similar_plot_as ex:the_matrix .
Part of the previous graph as RDF/XML
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://www.example.org/"> <rdf:Description rdf:about="http://www.example.org/vincent_donofrio"> <ex:starred_in> <ex:tv_show rdf:about="http://www.example.org/law_and_order_ci" /> </ex:starred_in> </rdf:Description> <rdf:Description rdf:about="http://www.example.org/the_thirteenth_floor"> <ex:similar_plot_as rdf:resource="http://www.example.org/the_matrix" /> </rdf:Description> </rdf:RDF>
SPARQL – an RDF query language • Resembles (unfortunately?) SQL a lot
QUERY PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?mbox WHERE { ?x foaf:name ?name . ?x foaf:mbox ?mbox }
DATA @prefix foaf: <http://xmlns.com/foaf/0.1/> . _:a foaf:name "Johnny Lee Outlaw" . _:a foaf:mbox <mailto:[email protected]> . _:b foaf:name "Peter Goodguy" . _:b foaf:mbox <mailto:[email protected]> . _:c foaf:mbox <mailto:[email protected]> .
QUERY RESULT name mbox "Johnny Lee Outlaw" <mailto:[email protected]> "Peter Goodguy" <mailto:[email protected]>
Ontologies Definition:
In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between pairs of concepts. It can be used to model a domain and support reasoning about entities.
RDF schema • Provides basic elements (classes and properties) for the description of ontologies
Web Ontology Language (OWL) • A family of languages for representing ontologies • Related to description logics • IMO, the darker side of the semantic web; don’t go too deep into this, if you
want to get things done.
Ontologies Example ontologies • FOAF (Friend of a friend) an ontology describing persons,
their activities and their relations to other people and objects.
• Dublin Core, a simple ontology for documents and publishing
• Cyc, a large Foundation Ontology for formal representation of the universe of discourse (don’t go in there)
• DBpedia Ontology, a shallow, cross-domain ontology, manually created based on the most commonly used infoboxes within Wikipedia.
Example SPARQL Endpoint – data.aalto.fi
10.3.2014 Laitoksen nimi
27
RDF stores – the only standardized NoSQL solutions available at the moment
10.3.2014 Laitoksen nimi
28
• A simple and uniform standard data model. • A powerful standard query language. • Standardized data interchange formats.
- Data portability - Toolchain interoperability - No vendor or product lock-in - Future proof.
From: “How RDF Databases Differ from Other NoSQL Solutions”
Linked Data
10.3.2014 Laitoksen nimi
29
Linked Data
10.3.2014 Laitoksen nimi
30
• A method of publishing structured data so that it can be interlinked and become more useful - Builds on URIs, RDF, and SPARQL - Term coined by Tim Berners-Lee in a design note year 2006
• For more detailed information, see the wikipedia page, and the articles referred there • Four principles:
1. Use URIs to denote things. 2. Use HTTP URIs so that these things can be referred to and looked up
by people and user agents. 3. Provide useful information about the thing when its URI is dereferenced,
leveraging standards such as RDF, SPARQL. 4. Include links to other related things (using their URIs) when
publishing data on the Web.
Linked Data (cont.)
10.3.2014 Laitoksen nimi
31
• Four principles (repeated): 1. Use URIs to denote things. 2. Use HTTP URIs so that these things can be referred to and looked up
by people and user agents. 3. Provide useful information about the thing when its URI is dereferenced,
leveraging standards such as RDF, SPARQL. 4. Include links to other related things (using their URIs) when
publishing data on the Web. • Restated by Berners-Lee 2009:
1. All kinds of conceptual things, they have names now that start with HTTP. 2. I get important information back… 3. I get back that information it's not just got somebody's height and weight and when
they were born, it's got relationships. And when it has relationships, whenever it expresses a relationship then the other thing that it's related to is given one of those names that starts with HTTP.
W3C Data Activity
10.3.2014 Laitoksen nimi
32
W3C Data activity
10.3.2014 Laitoksen nimi
33
W3C RDF Stream Processing
10.3.2014 Laitoksen nimi
34
W3C RSP Community Group
10.3.2014 Laitoksen nimi
35
Continuous SPARQL queries
10.3.2014 Laitoksen nimi
36
Traditional approach: Executing SPARQL queries on RDF data
• Just like executing SQL queries on a relational database
DB
SQL query
Rows of data
RDF store
SPARQL query
JSON/CVS/XML… data
Continous query approach: Executing RDF data on a set of SPARQL queries
• The queries remain the same, but the data changes • The results of the queries are updated immediately, when the data changes • This should be realized without executing the queries against the complete
RDF store every time the data changes • Continous queries also exist in SQL domain • Also known as standing queries
10.3.2014 Laitoksen nimi
38
Continuous query example – RDF data @prefix people:<http://example.com/People/> @prefix foaf:<http://xmlns.com/foaf/0.1/> @prefix event:<http://purl.org/NET/c4dm/event.owl#> @prefix geo:<http://www.w3.org/2003/01/geo/wgs84_pos#> @prefix tl:<http://purl.org/NET/c4dm/timeline.owl#> @prefix xsd:<http://www.w3.org/2001/XMLSchema#> people:John foaf:knows people:Bob . event:e123 event:agent people:John ;
event:place _:p789 ; event:time _:t790 .
_:p789 geo:lat 60.15922234475837 ; geo:long 24.87610729902513 . _:t790 tl:at "2012-03-05T13:32:28"^^xsd:dateTime . event:e124 event:agent people:Bob ;
event:place _:p791 ; event:time _:t792 .
_:p791 geo:lat 60.168418510496544 ; geo:long 24.857417412169575 . _:t792 tl:at "2012-03-05T13:32:36"^^xsd:dateTime .
10.3.2014 Laitoksen nimi
39
Continuous query example – Event RDF graph
10.3.2014 Laitoksen nimi
40
:e1$
rdf:
type
event:$Event$
event: agent :p3$
event: place
geo:
la
t
60.158776$
geo:
lo
ng
24.881490$
even
t: tim
e rdf: type
tl:$Instant$
tl:
at
2011810803T08:17:11$
geo: alt
Continuous query example – Event RDF graph
10.3.2014 Laitoksen nimi
41
:e1$
rdf:
type
event: agent :p3$
even
t: tim
e
:e27$
rdf:
type
event: agent :p4$
even
t: tim
e
foaf: knows
:nearby
Continuous query example – Remove old events DELETE { ?event event:agent ?person . ?event event:place ?place . ?place geo:lat ?lat . ?place geo:long ?long . ?event event:time ?time . ?time tl:at ?dttm
} WHERE {
?event event:agent ?person . ?event event:place ?place . ?place geo:lat ?lat . ?place geo:long ?long . ?event event:time ?time . ?time tl:at ?dttm FILTER EXISTS { ?event2 event:agent ?person . ?event2 event:time ?time2 . ?time2 tl:at ?dttm2 FILTER (?dttm < ?dttm2) }
} .
10.3.2014 Laitoksen nimi
42
Continuous query example – Detect nearby INSERT { ?person1 :nearby ?person2 } WHERE { ?person1 foaf:knows ?person2 . ?event1 event:agent ?person1 . ?event1 event:place ?place1 . ?place1 geo:lat ?lat1 . ?place1 geo:long ?long1 . ?event1 event:time ?time1 . ?time1 tl:at ?dttm1 . ?event2 event:agent ?person2 . ?event2 event:place ?place2 . ?place2 geo:lat ?lat2 . ?place2 geo:long ?long2 . ?event2 event:time ?time2 . ?time2 tl:at ?dttm2 . FILTER ((abs(hours(?dttm2)* 60 + minutes(?dttm2)-hours(?dttm1)*60 - minutes(?dttm1)) < 5)) FILTER ((abs(?lat2 - ?lat1) < 0.01) && (abs(?long2 - ?long1) < 0.01)) FILTER NOT EXISTS { ?person1 :nearby ?person2 } } .
10.3.2014 Laitoksen nimi
43
Continuous query example – Detect not nearby DELETE { ?person1 :nearby ?person2 } WHERE { ?person1 foaf:knows ?person2 . ?event1 event:agent ?person1 . ?event1 event:place ?place1 . ?place1 geo:lat ?lat1 . ?place1 geo:long ?long1 . ?event1 event:time ?time1 . ?time1 tl:at ?dttm1 . ?event2 event:agent ?person2 . ?event2 event:place ?place2 . ?place2 geo:lat ?lat2 . ?place2 geo:long ?long2 . ?event2 event:time ?time2 . ?time2 tl:at ?dttm2 . FILTER ( (abs(?lat2-?lat1)>0.02) || (abs(?long2-?long1)>0.02)) FILTER EXISTS {?person1 :nearby ?person2} }
10.3.2014 Laitoksen nimi
44
Continuous query example – Report nearby SELECT ?person1 ?person2 WHERE { ?person1 :nearby ?person2 } .
10.3.2014 Laitoksen nimi
45
INSTANS – Incremental Engine for Standing SPARQL
10.3.2014 Laitoksen nimi
46