On Triple Dissemination, Forward-Chaining and Load Balancing in DHT Based RDF Stores
description
Transcript of On Triple Dissemination, Forward-Chaining and Load Balancing in DHT Based RDF Stores
On Triple Dissemination, Forward-Chaining and Load Balancing in DHT
Based RDF Stores
Dominic Battre, Felix Heine, Andre Höing, and Odej Kao
Presented byAldarwich Yaser
Albert-Ludwigs-University Freiburg SS 2009 Department of Computer Science
Computer Networks and Telematics Prof. Christian Schindelhaue
Overview
Motivation Introduction
• RDF
• DHT
• Pastry
Triples dissemination Reasoning Load Balancing References
1
Motivation
Centralized database Shortcomings• Incapable to handle load• Capacities limitation like in (Seasame,Jena)
Decentralized database • Example: Babelpeers,RDFpeers and Edutella
• Provides scalibility,effeciency and capacity
Reasoning• Infer new data from existing information
Load balancing
RDF Introduction
Resource Description Framework (RDF) Used for representing information on the Web RDFs provides a powerful model for storing and
inferencing knowledge . In RDF everything is represented by triples of the
form(S,P,O)
Example: Germany has Capital Berlin
S P O
2
DHT Introduction
Solve the item location problem in a distributed
network of nodes
Use a key k to calculate the ID
ID=hash(k)
Operations: • Put(k, x)• Get(k)
3
Triple dissemination
Triple T=(s,p,o)
identifier = (hash(s))
identifier = (hash(p))
identifier = (hash(o))
Responsible node for p
Responsible node for o
Responsible node for s
http://videolectures.net/iswc08_kaoudi_rdfs/
Query q = (s, p, o)
identifier = (hash(p))
4
Pastry Protocol
Each peer has a 128-bit ID: nodeID• Unique and uniformly distributed• Use cryptographic function applied to IP-address
Message takes O(log N) steps to destination
Node state contains:• Leaf Set • Routing table explain• Neighborhood Set
Pastry (prefix-matching)
323310
323211
322021
313221
103231
Route(m, 323310)?
Node-id
Key
RDf Reasoning
The query is formulated gernerally RDFs extract data even if the description does not
exactly match the query
Example:
Christian fatherof SchindelhauerFather subpropertyof relatives
=> Christian relative of Schindelhauer
RDFS Rules
Rule NamePreconditionGenerated Triple
rdfs2a,rdfs:domain,x
u, a , v
u, rdf:type, x
Rdfs3a, rdfs:range, x
u, a, v
v, rdf:type, x
rdfs5u, rdfs:subPropertyOf, v
v, rdfs:subpropertyOf, x
u,rdfs:subPropertyOf,x
rdfs9u, rdfs:subClassOf, x
v, rdf:type, u
v, rdf:type, x
rdfs11u, rdfs:subClassOf, v
v, rdfs:subClassOf, x
u, rdfs:subClassOf, x
6
Node Architecture
Each node hosts multiple RDf databases• local triples database
• Received triples database
• Replica database
• Generated triples
Generated Triples
Local Triples
Received Triples
Replica
5 Node
Triple dissemination in DHT
Node1 Node2 Node3 Node4
Generated Triples
Local Triples
Received Triples
Replica
Generated Triples
Local Triples
Received Triples
Replica
Generated Triples
Local Triples
Received Triples
Replica
Generated Triples
Local Triples
Received Triples
Replica
7
Triples life-cycle
Triples are subjected to different events
like (Joining, Departure)
Triples life-time• long life time triples has few refreshes refreshes
• short life time triples(generated triples)
Update triples update inferred triples Soft-state
Node Departure
Node substitution Correction of routing table Replica duty Decreasing number of replicas
8
n1
n4
n3
n2
n9
Node Arrival
More complicated Query recieving Task of replica nodes Time reduction
9
n1
n4
n3
n2
n6
n9
Load balancing
Major criticism against DHT based RDF strores Many collisions are unavoidable Example:
• DHT stores many triples with predicate rdf:type
“ rdfs:subClassOf“ create many triples with Predicate
rdf:type
Overlay Tree Builds for discrete DHT positions like the one stores triples
with rdf:type
10
Node1 Node2 Node3 Node4
Local Triples
Received Triples
Local
Generated Triples
Remote Triples
Exte
Exte
Local
Remote Triples
Local Triples
Received Triples
Generated Triples
Local Triples
Received Triples
Generated Triples
Local Triples
Received Triples
Generated Triples
Local
Remote Triples
Exte
Local
Remote Triples
Local
Remote Triples
refe
renc
esre
fere
nces
references references
Load-balancing with remote triples database11
Replicated overlay tree
Root
Rank1 Rank2
12
Query routing in overlay tree
RootRank1 Rank2
Qeury
Result
13
Handling RDFs rules in load balancing
Problem of RDF rules• As node is overloaded, the triples are splited into other nodes
• Example:
a, rdfs:domain, x
u, a, v
a, rdfs:domain, xu,a,v u,a,v
a, rdfs:domain, x
Node3Node1 Node2
Handling RDFs rules in load balancing
Solution• Make copy of most common rdfs schema into each node in
overlay tree
a, rdfs:domain, xu,a,v
Node1 Node4Node3
a, rdfs:domain, x
u, a, v
Node2
a, rdfs:domain, x a, rdfs:domain, x
Conclusion
P2p based distributed database offer better
scalability and source integration Real power of RDF is stems from possibility
to derive new data from explicit knwoledge Overlay tree is the solution for overloading
problem
References
http://www.videolectures.net http://cone.informatik.uni-freiburg.de http://www.w3schools.com http://www.w3.org/TR/rdf-schema/ http://peersim.sourceforge.net/ http://infolab.stanford.edu http://www.edutella.org/edutella.shtml Battre,heine,Kao:Top k RDF query evaluation in p2p
14
Thanks for your Attention