P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus...
-
date post
21-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus...
![Page 1: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/1.jpg)
P2P Databases
![Page 2: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/2.jpg)
P2P Today
napster
gnutellamorpheus
kazaa
bearshare seti@home
folding@home
ebay
limewire
icq
fiorana
mojo nation
jxta
united devicesopen cola
uddi
process tree
can
chord
ocean store
farsite
pastry
tapestry
?grove
netmeeting
freenet
popular power
aim
jabber
bittorrentedonkey
![Page 3: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/3.jpg)
Object representation and storageAttributes :
Name , Artist, Album , Genre
Objects
Pointer to object
![Page 4: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/4.jpg)
P2P vs. Distributed DBMS
• Transactions• Distributed Query Optimization• Interoperation of heterogeneous data
sources• Reliability/failure of nodes
Complex features do not scale
Traditional DDBMS Issues:
![Page 5: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/5.jpg)
P2P vs. Distributed DBMSExample application: file-sharing• Simple data model and query language
– No complex query optimization– Easy interoperation
• No guarantee on quality of results– Individual site availability unimportant
• Local updates– No transactions– Network partitions OK
Simple Amenable to large-scale network of PCs
![Page 6: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/6.jpg)
Example: file sharing
• Challenge #1: Performance– Asking everyone
is expensive!– If I am smart, I
only need to ask one peer
– How can I be smart?
?
?
??
File X?
![Page 7: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/7.jpg)
Search in P2P
• System can control:– Connections made by users/topology– Data placement– Query type
• Tight control: “Structured” – Efficient, comprehensive
• Loose control: “Unstructured”– Inefficient, not comprehensive, simple, expressive– Used in real life
Both are useful to study
![Page 8: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/8.jpg)
Centralized
• Napster model• Benefits:
– Efficient search– Limited bandwidth
usage– No per-node state
• Drawbacks:– Central point of failure– Limited scale
Bob Alice
JaneJudy
![Page 9: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/9.jpg)
http://www.snocap.com/
![Page 10: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/10.jpg)
Unstructured – Query Flooding
= forward query
= processed query
= query source
= found result
= forward response
![Page 11: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/11.jpg)
Problems with unstructured
• Inefficient– Query messages are flooded– Even if routing is intelligent, worst case load is still
O(n), where n is # nodes in system
• Not comprehensive– If I do not get a result for my query, is it because
none exists?
• (Of course, many optimizations are possible…)
Structured systems address these problems
![Page 12: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/12.jpg)
Distributed Hash Table (DHTs)
• Model:– Key/Object pair, the key is hashed to get an ID– Example:
• Objects are files• The key is the content of the file• The ID is the hash of the file contents
• Single operation: Lookup(ID)– Input: integer ID– Output: the object with the corresponding ID
![Page 13: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/13.jpg)
Identifiers
• IDs are m-bit integers• Nodes are also assigned IDs
– Commonly assigned by hashing a node’s IP address, although many problems with this
• An object is stored on the node with the smallest ID greater than the object’s ID– This node is called the successor of the object’s
ID– IDs are arranged on a circle, so 0 > 2m-1
![Page 14: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/14.jpg)
Data Placement
0
1
2
3
45
6
7m = 3 Nodes:
• 0• 1• 3
Data:• 1• 2• 6
1
2
2
6
6
![Page 15: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/15.jpg)
Connections
0
1
2
3
45
6
7
“Finger pointers”
Distance• 20
• 21
….
• 2m-1
![Page 16: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/16.jpg)
Query• Lookup(objectID)
– objectID is typically the ID of the object you are looking for, but not necessarily
• Approach:– Find the predecessor of the object
• I.e. the node with the largest ID that is smaller than the object ID
– Return the successor of the predecessor
![Page 17: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/17.jpg)
Query Example
• Say node 0 wants to find the object with ID = 7
• For simplicity, we will assume a node exists at every ID in the space
![Page 18: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/18.jpg)
Query Example
0
1
3
45
6
7
2
Node 0: Lookup(7)
Node 0: FindPred (7)
![Page 19: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/19.jpg)
Query Example
0
1
3
45
6
7
2
Node 4: FindPred(7)
![Page 20: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/20.jpg)
Query Example
0
1
3
45
6
7
2
Node 6: FindPred(7)
Node 6 is predecessorReturn successor node 7
![Page 21: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/21.jpg)
Query characteristics
• With high probability, a query can be answered by contacting O(log N) nodes– N total nodes in the networkEfficient!
• Also notice: if an object with the ID exists in the network, it will be foundComprehensive!
• State is also O(log N) in size
![Page 22: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/22.jpg)
Query characteristics
• Note that finger pointers are not required for correct operation– Only successor pointers are needed– But then cost of query increases
• O(N) in worst case
![Page 23: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/23.jpg)
Advantages of Structured?
• Scalability/Efficiency – load grows with O(log N)
• Comprehensiveness
![Page 24: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/24.jpg)
Disadvantages? (cont)
• Availability of Data– If a node dies suddenly, what happens to
the data it was storing?– MUST replicate data across multiple nodes
• Query Language– How can we express keyword queries
efficiently?– Many useful applications require different
languages
![Page 25: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/25.jpg)
Magnolia
Current approach: Hash each keyword separately and store pointers at h(keyword)
Seven
Innovation
Myths
h(some)
h(innovation)
h(myths)
1100100101
“Seven Innovation Myths” 1100100101h(title) “Innovation”
![Page 26: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/26.jpg)
Resulting Distribution
![Page 27: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/27.jpg)
Prefix hashing
………….
m’
m bits
Innovation
hP(innovation) hP = m’ bit hash function
Partitions network into ~ n/2m’ separate sibling groups
n = nodes, m’ partitioning factorFor m’=12, n= 1 million, ~ 256 nodes will share same prefix Assumption: h is uniformly distributed
100
Prefix Hashing
![Page 28: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/28.jpg)
100
Innovation
Balanced over the sibling group
Sibling group ID=100
Balancing
All siblings in a group share the same prefix
![Page 29: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/29.jpg)
Random Sibling
InsertKeyword hP SiblingGroup ID
Locate a sibling node via SIFT
Lookup
Keyword
O(1)
Group Broadcast or Multicast
Replies
![Page 30: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/30.jpg)
Advantages
• Good Balancing Properties
![Page 31: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/31.jpg)
Advantages
• Low Traffic Load on nodes for popular queries
• Quick Lookup
• Popularity Ranking of Objects
• Distributed Replication for resilience
![Page 32: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/32.jpg)
Implementing Magnolia
• Developed on top of a chord clone written in Python– If you’re going to write a peer-to-peer app, why
not leverage existing modules and libraries?
• Challenge: How do we implement group-based stores and queries without requiring additional network maintenance?
![Page 33: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/33.jpg)
Chord’s Finger Table
• A chord node maintains a finger table of M IP’s pointing to nodes ahead of it in the ring.– A pointer at index i is the successor of node id +
(2^i-1). This lets us reach any node in the network in O(log M) hops
• We use the M’ most significant bits in a node’s id to indicate it’s group. We want to reach any group in O(log M’) hops.– Do we need another table?– Nope. The last M’ entries in our finger table
provide this.
![Page 34: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/34.jpg)
Talking to Siblings
• How do we propagate queries through the group?
• Naïve solution: send to our predecessor and successor.
• A better solution: We can send a query throughout the group by treating the sibling group as a tree.
![Page 35: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/35.jpg)
Sibling Tree
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1 8
2
3
15
4
5
6 7
9
10 11
12
13 14
0+1
1+1
2+1
0+2^3
8+2^21+2^2
2+2^1 5+2^1 9+2^1 12+2^1
14+2^0
5+1 9+1
8+1
12+1
N/N’ = 16; M/M’ = 4
Every edge can be found in the finger table!
![Page 36: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/36.jpg)
Sibling Tree Problems
• Problems:– Not every possible node will exist – Not every node will have results to report– The query maker needs to know when the search is
done
• But we’re okay!– Nodes can determine if a child sub-tree is dead– Even if a child node in our sibling table is of a higher ID
than expected• its sub-tree contains all existing descendents of the expected id• we can predict when a child is in a sibling our ancestor’s tree
![Page 37: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/37.jpg)
Bigger Problems
• What if a pointer in our finger table fails?– We either have to find the successor to it’s id or
fail to query the sub-tree
• What if the lowest ID node isn’t the root of our tree?– Some of our edges won’t be in our finger table
![Page 38: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/38.jpg)
Popularity queries
![Page 39: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/39.jpg)
Yulania , Demo
![Page 40: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/40.jpg)
BitTorrent
![Page 41: P2P Databases. Overview 0. Data objects, pointers (URLs), and attributes 1. Freeform versus structured attribute data 2. Centralized indices for attribute.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6c5503460f94a4c946/html5/thumbnails/41.jpg)
SplitStream