1 Lecture 2 Distributed Hash Table. Information search in P2P r Suppose we have a P2P systems with N...
-
Upload
violet-richard -
Category
Documents
-
view
218 -
download
0
Transcript of 1 Lecture 2 Distributed Hash Table. Information search in P2P r Suppose we have a P2P systems with N...
1
Lecture 2 Distributed Hash Table
Information search in P2P
Suppose we have a P2P systems with N nodes.
A file “F” is stored in one node How could an arbitrary node find
“F” in the system.
2
3
P2P: centralized index
original “Napster” design
1) when peer connects, it informs central server: IP address content
2) Alice queries for “Hey Jude”
3) Alice requests file from Bob
centralizeddirectory server
peers
Alice
Bob
1
1
1
12
3
4
P2P: problems with centralized directory
single point of failure performance
bottleneck copyright
infringement: “target” of lawsuit is obvious
file transfer is decentralized, but locating content is highly centralized
5
Query flooding
fully distributed no central server
used by Gnutella Each peer indexes
the files it makes available for sharing (and no other files)
overlay network: graph edge between peer X
and Y if there’s a TCP connection
all active peers and edges form overlay net
edge: virtual (not physical) link
given peer typically connected with < 10 overlay neighbors
2: Application LayerSSL (7/09)
6
Query flooding
Query
QueryHit
Query
Query
QueryHit
Query
Query
QueryHit
File transfer:HTTP
Query messagesent over existing TCPconnections peers forwardQuery message QueryHit sent over reversepath
Scalability:limited scopeflooding
7
Gnutella: Peer joining
1. joining peer Alice must find another peer in Gnutella network: use list of candidate peers
2. Alice sequentially attempts TCP connections with candidate peers until connection setup with Bob
3. Flooding: Alice sends Ping message to Bob; Bob forwards Ping message to his overlay neighbors (who then forward to their neighbors….) peers receiving Ping message respond to
Alice with Pong message4. Alice receives many Pong messages, and can
then setup additional TCP connections
8
Hierarchical Overlay
Hybrid of centralized index, query flooding approaches
each peer is either a super node or assigned to a super node TCP connection between
peer and its super node. TCP connections between
some pairs of super nodes.
Super node tracks content in its children
ordinary peer
group-leader peer
neighoring re la tionshipsin overlay network
9
Distributed Hash Table (DHT)
DHT = distributed P2P database Database has (key, value) pairs;
key: ss number; value: human name key: content type; value: IP address
Peers query database with key database returns values that match the key
Peers can also insert (key, value) pairs into database
Finding “needles” requires that the P2P system be structured
10
The Principle Of Distributed Hash Tables
A dynamic distribution of a hash table onto a set of cooperating nodes
Key Value
1 Frozen
9 Tangled
11 Mulan
12 Lion King
21 Cinderella
22 Doreamon
• Basic service: lookup operation • Key resolution from any node
• Each node has a routing table • Pointers to some other nodes• Typically, a constant or a logarithmic number of pointers (why?)
node A
node D
node B
node C
→Node D : lookup(9)
11 Oct. 4
DHT Desirable Properties
1. Keys mapped evenly to all nodes in the network
2. Each node maintains information about only a few other nodes
3. A key can be found efficiently by querying the system
4. Node arrival/departures only affect a few nodes
12
Chord Identifiers
Assign integer identifier to each peer in range [0,2n-1]. Each identifier can be represented by n bits.
Require each key to be an integer in same range.
To get integer keys, hash original key. e.g., key = h(“Led Zeppelin IV”) This is why database is called a distributed “hash”
table
13
Each key must be stored in a node Central issue:
Assigning (key, value) pairs to peers. Rule: assign to the peer that has the ID
closest to key. Convention in lecture: closest is the
immediate successor of the key (or equal to)
Example: 4 bits; peers: 1,3,4,5,8,10,12,14; key = 13, then successor peer = 14 key = 15, then successor peer = 1
14
Chord [MIT]
consistent hashing (SHA-1) assigns each node and object an m-bit ID
IDs are ordered in an ID circle ranging from 0 – (2m-1).
New nodes assume slots in ID circle according to their ID
Key k is assigned to first node whose ID ≥ k
successor(k)
15
Consistent Hashing - Successor Nodes
6
1
2
6
0
4
26
5
1
3
7
2identifier
circle
identifiernode
X key
successor(1) = 1
successor(2) = 3successor(6) = 0
16
Consistent Hashing – Join and Departure
When a node n joins the network, certain keys previously assigned to n’s successor now become assigned to n.
When node n leaves the network, all of its assigned keys are reassigned to n’s successor.
17
Consistent Hashing – Node Join
0
4
26
5
1
3
7
keys1
keys2
keys
keys
7
5
18
Consistent Hashing – Node Dep.
0
4
26
5
1
3
7
keys1
keys2
keys
keys6
7
Consistent Hashing: more example For n = 6, # of identifiers is 64. The following DHT ring has 10 nodes
and stores 5 keys. The successor of key 10 is node 14.
20
1
3
4
5
810
12
15
Circular DHT (1)
Each peer only aware of immediate successor and predecessor.
21
Circle DHT (2)
0001
0011
0100
0101
10001010
1100
1111
Who’s resp
for key 1110 ?I am
O(N) messageson avg to resolvequery, when thereare N peers
1110
1110
1110
1110
1110
1110
Define closestas closestsuccessor
22
Circular DHT with Shortcuts
Each peer keeps track of IP addresses of predecessor, successor, and short cuts.
Reduced from 6 to 3 messages. Can design shortcuts such that O(log N) neighbors per
peer, O(log N) messages per query
0001
0011
0100
0101
10001010
1100
1111
Who’s resp for key 1110?
Scalable Key Location – Finger Tables
0
4
26
5
1
3
7
124
130
finger tablestart succ.
keys1
235
330
finger tablestart succ.
keys2
457
000
finger tablestart succ.
keys6
0+20
0+21
0+22
For.
1+20
1+21
1+22
For.
3+20
3+21
3+22
For.
24
Chord key location
Lookup in finger table the furthest node that precedes key
-> O(log n) hops
25
Peer Churn
•To handle peer churn, require each peer to know the IP address of its two successors •Each peer periodically pings its two successors to see if they are still alive •Limited solution for single join or single failure
Consistent Hashing – Node Join
0
4
26
5
1
3
7
keys1
keys2
keys
keys
7
5
Consistent Hashing – Node Dep.
0
4
26
5
1
3
7
keys1
keys2
keys
keys6
7
28
Node Joins and Stabilizations
The most important thing is the successor pointer.
If the successor pointer is ensured to be up to date, which is sufficient to guarantee correctness of lookups, then finger table can always be verified.
Each node runs a “stabilization” protocol periodically in the background to update successor pointer and finger table.
29
Node Joins and Stabilizations
“Stabilization” protocol contains 6 functions: create() join() stabilize() notify() fix_fingers() check_predecessor()
When node n first starts, it calls n.join(n’), where n’ is any known Chord node.
The join() function asks n’ to find the immediate successor of n.
30
Node Joins – stabilize()
Each time node n runs stabilize(), it asks its successor for the it’s predecessor p, and decides whether p should be n’s successor instead.
stabilize() notifies node n’s successor of n’s existence, giving the successor the chance to change its predecessor to n.
The successor does this only if it knows of no closer predecessor than n.
31
Node Joins – Join and Stabilization
np
su
cc(n
p)
= n
s
ns
n
pre
d(n
s)
= n
p
n joins
predecessor = nil n acquires ns as successor via some n’
n runs stabilize n notifies ns being the new predecessor
ns acquires n as its predecessor
np runs stabilize
np asks ns for its predecessor (now n)
np acquires n as its successor
np notifies n
n will acquire np as its predecessor
all predecessor and successor pointers are now correct
fingers still need to be fixed, but old fingers will still work
nil
pre
d(n
s)
= n
su
cc(n
p)
= n
32
Node Failures
Key step in failure recovery is maintaining correct successor pointers
To help achieve this, each node maintains a successor-list of its r nearest successors on the ring
If node n notices that its successor has failed, it replaces it with the first live entry in the list
Successor lists are stabilized as follows: node n reconciles its list with its successor s by
copying s’s successor list, removing its last entry, and prepending s to it.
If node n notices that its successor has failed, it replaces it with the first live entry in its successor list and reconciles its successor list with its new successor.
33
Handling failures: redundancy Each node knows IP addresses of next r
nodes. Each key is replicated at next r nodes
Evaluation results
10,000 node network 34
Load distribution
Probability density function
35
Failure rate
36
Path length
37
Failed lookups vs churn rate
Start with 500 nodes
38
Chord main problem
Not good churn-handling solution Only merely achieves “correctness”
The definition of a correct Chord is letting each node maintain the predecessor and successor.
Which allows a query to eventually arrive the key location, but….
Takes at most O(N) hops to find the key! Not log(N) as the original design claimed.
39
Chord main problem
No good solution to maintain both scalable and consistent finger table under Churn.
Not practical for P2P systems which are highly dynamic
Paper talking about high consistency:
Simon S. Lam and Huaiyu Liu, ``Failure Recovery for Structured P2P Networks: Protocol Design and Performance Evaluation,'' Proceedings of ACM SIGMETRICS 2004,
40
Chord problem 2
Only good for exact search Cannot support range search and
approximate search
41
Solution of BitTorrent
Maintain trackers (servers) as DHT, which are more reliable
Users queries trackers to get the locations of the file
File sharing are not structured.
42
DHT in a cloud Architecture
Servers are hosted in a cloud. Data are distributed among servers User is a device outside the cloud.
User sends a query for a key (webpage, file, data, etc) to the cloud
The query first arrives at an arbitrary server and be routed among the servers using DHT. It finally arrives at the server which has the data
The server replies the user.
43
44
End of Lecture02
Next paper: Read and Write a review
Vivaldi: A Decentralized Network Coordinate System, Frank Dabek,
Russ Cox, Frans Kaashoek and Robert Morris, Proceedings
SIGCOMM 2004.