George Anadiotis, Spyros Kotoulas and Ronny Siebes VU University Amsterdam.
-
Upload
megan-holmes -
Category
Documents
-
view
218 -
download
5
Transcript of George Anadiotis, Spyros Kotoulas and Ronny Siebes VU University Amsterdam.
George Anadiotis, Spyros Kotoulas and Ronny SiebesVU University Amsterdam
Why do we need distribution… Why do we need anytime behavior… Why is should be (very) scalable… Why should we drop consistency and
completeness… Why do we need trust/ontology ranking… etc
2
What is P2P? (1 slide) Relationship between P2P and SW(3 slides) Our Goal (1 slide) Distributed SW stores(1 slide)
◦ Structured P2P stores (3 slides)◦ Federated stores (2 slides)
Our approach (6 slides) Future work (1 slide)
3
Class of distributed systems Most important characteristics
◦ Same functionality across peers◦ Peer autonomy◦ Formation of overlay networks◦ Common interface◦ They respect some agreed-upon way to organize
File-sharing networks are NOT the only Peer-to-Peer systems.
4
5
Source of semantic information to self-organize
Interoperability
6
Scalable infrastructure for◦ Storage◦ Reasoning◦ Collaboration
Self-organization Autonomy – control of data Privacy Scalable algorithms Robustness No censorship No preferential treatment of information
7
Common misconception:All Peer-to-Peer systems can offer the above
Global-scale semantic web storage and reasoning◦ Scalability
Computation Administration
8
Structured peer-to-peer◦ Use DHTs ◦ One global distributed store◦ Peers do not maintain their own data
Federated stores◦ Each peer maintains its own store◦ Stores are interconnected◦ Either global schema or mappings between
schemata
9
• The mathematical abstraction for hashtables is a Map
• Functionality:• put(key,value)• get(key)
• Similar to normal hash-tables with the difference that each bucket now is a peer
• Accessing different buckets involves network traffic
• Routing to a bucket is done bothering approx. log(N) peers, N is network size
10
Values are stored in the peer with ID starting with the first letter of the key
11
a dcb e f
g jih k l
m pon q r
s uvt w x
<Key=horse, Value=the horse is an animal>
<rabbit, subClassOf, animal><seal, subClassOf, animal><animal, lives_in, habitat>
<monk_seal, subClassOf, seal><mseal1, type, monk_seal>
Peer 1 Peer 2
12
a dcb e f
g jih k l
m pon q r
s uvt w x
<rabbit, subClassOf, animal><seal, subClassOf, animal><animal, lives_in, habitat>
<rabbit, subClassOf, animal>
<animal, lives_in, habitat>
<monk_seal, subClassOf, seal>
<mseal1, type, monk_seal>
<seal, subClassOf, animal><monk_seal, subClassOf,
seal><rabbit, subClassOf, animal>
<mseal1, type, monk_seal>
<animal, lives_in, habitat>
13
a dcb e f
g jih k l
m pon q r
s uvt w x
<rabbit, subClassOf, animal><seal, subClassOf, animal><animal, lives_in, habitat>
<rabbit, subClassOf, animal>
<animal, lives_in, habitat>
<monk_seal, subClassOf, seal>
<mseal1, type, monk_seal>
<seal, subClassOf, animal><monk_seal, subClassOf,
seal><rabbit, subClassOf, animal>
<mseal1, type, monk_seal>
RDFS class axioms
(1) <X, subClassOf, Z> <- <X, subClassOf, Y> , <Y, subClassOf, Z>
(2) <X, type, Z> <- <X, type, Y>, <Y, subClassOf, Z>
<animal, lives_in, habitat>
<monk_seal, subClassOf, animal>
<monk_seal, subClassOf, animal>
<monk_seal, subClassOf, animal>
14
a dcb e f
g jih k l
m pon q r
s uvt w x
<rabbit, subClassOf, animal><seal, subClassOf, animal><animal, lives_in, habitat>
<rabbit, subClassOf, animal>
<animal, lives_in, habitat>
<monk_seal, subClassOf, seal>
<mseal1, type, monk_seal>
<seal, subClassOf, animal><monk_seal, subClassOf,
seal><rabbit, subClassOf, animal>
<mseal1, type, monk_seal>
RDFS class axioms
(1) FORALL O,V O[rdfs:subClassOf->V] <- EXISTS W (O[rdfs:subClassOf->W] AND W[rdfs:subClassOf->V]).
(2) FORALL O,T O[rdf:type->T] <- EXISTS S (S[rdfs:subClassOf->T] AND O[rdf:type->S]).
<animal, lives_in, habitat>
<monk_seal, subClassOf, animal>
<monk_seal, subClassOf, animal>
<monk_seal, subClassOf, animal>
<mseal1, type, animal><mseal1, type, animal><mseal1, type, animal>
As shown, the transitive closure has to be calculated – backwards chaining would require many DHT messages
But it does not scale to large number of ontologies.◦ E.g. a animal hierarchy:
Adding the triple <animal, subClassOf, living_organism> means that for all triples with animal, we need to insert an additional triple.
Control over ontologies◦ Provenance of information◦ Ontologies and instance data are made public◦ Publishers are not in control of their ontologies/data
One super-user inserts all data
15
Each peer maintains its ontology and instance data
Mappings are (manually) defined between ontologies
Thus, a semantic topology is created Queries are posted according to such a
schema and forwarded following these mappings
Semantic Web counterpart of Federated Databases
16
Bootstrapping◦ New peers have to manually map their ontologies
to the ontology of a peer already in the network◦ Finding relevant ontologies requires flooding
Routing◦ The overlay is created according to the ontologies
understood by peers, not the data they contain. Possible scalability problem.
◦ Searching for instances requires flooding
17
Effort to combine both approaches◦ Use a DHT to efficiently find ontologies and
instance data◦ Exploit semantic locality by keeping ontologies
local to the publisher◦ Whenever possible, perform reasoning peer-to-
peer
18
19
<rabbit, subClassOf, animal><seal, subClassOf, animal><animal, lives_in, habitat>
<monk_seal, subClassOf, seal><mseal1, type, monk_seal>
Peer 1 Peer 2
19
a dcb e f
g jih k l
m pon q r
s uvt w x
animal:P1
rabbit:P1monk_seal:P2mseal1:P2
habitat:P1 lives_in:P1
seal:P1,P2subClassOf:P1, P2
2020
<rabbit, subClassOf, animal>
<seal, subClassOf, animal><animal, lives_in, habitat>
Peer 1 Peer 2
20
a dcb e f
g jih k l
m pon q r
s uvt w x
animal:P1
rabbit:P1
seal:P1,P2subClassOf:P1, P2
monk_seal:P2mseal1:P2
habitat:P1
<seal, subClassOf, X?> <Y?, subClassOf, seal>
Query
seal?
P1, P2P1, P2
<monk_seal, subClassOf, seal><monk_seal, subClassOf, seal>
<seal, subClassOf, animal><seal, subClassOf, animal>
lives_in:P1
Peer 3
<monk_seal, subClassOf, seal><mseal1, has_type, monk_seal>
2121
<rabbit, subClassOf, animal>
<seal, subClassOf, animal><animal, lives_in, habitat>
Peer 1 Peer 2
21
a dcb e f
g jih k l
m pon q r
s uvt w x
animal:P1
rabbit:P1
seal:P1,P2subClassOf:P1, P2
monk_seal:P2mseal1:P2
habitat:P1
<monk_seal, subClassOf, X?>Query
monk_seal?
P2P2
<monk_seal, subClassOf, seal><monk_seal, subClassOf, seal>
<seal, subClassOf, animal><seal, subClassOf, animal>
lives_in:P1
Peer 3
<monk_seal, subClassOf, seal><mseal1, type, monk_seal>
seal?
P1P1
<seal, subClassOf, X?>
Control◦ Access Control◦ Select which data is published on the index◦ Trust – ban spammers, remember good peers
Privacy◦ It is possible to obfuscate descriptors stored in the DHT
Responsibility◦ Publisher has the responsibility to maintain own data
Scalability◦ DHTs can scale to millions of nodes
Data is up-to-date
22
Based on the data of swoogle, there is currently small overlap between ontologies
The distribution of ontology popularity follows a power-law pattern
If most answers reside on the same peer, our approach outperforms those that rely on triple distribution on top of a DHT
23
Simulations using SWD from Swoogle and Watson (around 25.000)
Integration of privacy in the index Selecting the right ontologies/peers
24
?25