Distributed Hash-based Lookup for Peer-to-Peer Systems Mohammed Junaid Azad 09305050 Gopal Krishnan...

Distributed Hash-based Lookupfor Peer-to-Peer Systems

Mohammed Junaid Azad 09305050Gopal Krishnan 09305915

Mtech1 ,CSE

Agenda

• Peer-to-Peer System• Initial Approaches to Peer-to-Peer Systems• Their Limitations• Distributed Hash Tables– CAN-Content Addressable Network– CHORD

Peer-to-Peer Systems

• Distributed and Decentralized Architecture• No centralized Server(Unlike Client Server

Architecture)• Any Peer can behave as Server

Napster

• P2P file sharing system• Central Server stores the index of all the files

available on the network• To retrieve a file, central server contacted to

obtain location of desired file• Not completely decentralized system• Central directory not scalable• Single point of failure

Gnutella

• P2P file sharing system• No Central Server store to index the files

available on the network• File location process decentralized as well• Requests for files are flooded on the network• No Single point of failure• Flooding on every request not scalable

File Systems for P2P systems

• The file system would store files and their metadata across nodes in the P2P network

• The nodes containing blocks of files could be located using hash based lookup

• The blocks would then be fetched from those nodes

Scalable File indexing Mechanism

• In any P2P system, File transfer process is inherently scalable

• However, the indexing scheme which maps file names to location crucial for scalability

• Solution:- Distributed Hash Table

8

Distributed Hash Tables• Traditional name and location services provide a direct mapping between

keys and values

• What are examples of values? A value can be an address, a document, or an arbitrary data item

• Distributed hash tables such as CAN/Chord implement a distributed service for storing and retrieving key/value pairs

9

DNS vs. Chord/CANDNS

• provides a host name to IP address mapping

• relies on a set of special root servers

• names reflect administrative boundaries

• is specialized to finding named hosts or services

Chord

• can provide same service: Name = key, value = IP

• requires no special servers

• imposes no naming structure

• can also be used to find data objects that are not tied to certain machines

10

Example Application using Chord:Cooperative Mirroring

• Highest layer provides a file-like interface to user including user-friendly naming and authentication

• This file systems maps operations to lower-level block operations

• Block storage uses Chord to identify responsible node for storing a block and then talk to the block storage server on that node

File System

Block Store

Chord

Block Store

Chord

Block Store

Chord

Client Server Server

What is CAN ?• CAN is a distributed infrastructure that provides

hash table like functionality• CAN is composed of many individual nodes• Each CAN node stores a chunk (zone) of the

entire hash table• Request for a particular key is routed by

intermediate CAN nodes whose zone contains that key

• The design can be implemented in application level (no changes to kernel required)

Co-ordinate space in CAN

Design Of CAN

• Involves a virtual d-dimensional Cartesian Co-ordinate space– The co-ordinate space is completely logical– Lookup keys hashed into this space

• The co-ordinate space is partitioned into zones among all nodes in the system– Every node in the system owns a distinct zone

• The distribution of zones into nodes forms an overlay network

Design of CAN (..continued)

• To store (Key,value) pairs, keys are mapped deterministically onto a point P in co-ordinate space using a hash function• The (Key,value) pair is then stored at the node which

owns the zone containing P• To retrieve an entry corresponding to Key K, the

same hash function is applied to map K to the point P• The retrieval request is routed from requestor node

to node owning zone containing P

Routing in CAN

• Every CAN node holds IP address and virtual co-ordinates of each of it’s neighbours

• Every message to be routed holds the destination co-ordinates

• Using it’s neighbour’s co-ordinate set, a node routes a message towards the neighbour with co-ordinates closest to the destination co-ordinates

• Progress: how much closer the message gets to the destination after being routed to one of the neighbours

Routing in CAN(continued…)

• For a d-dimensional space partitioned into n equal zones, routing path length = O(d.n1/d ) hops– With increase in no. of nodes, routing path length

grows as O(n1/d )

• Every node has 2d neighbours– With increase in no. of nodes, per node state does

not change

Before a node joins CAN

After a Node Joins

Allocation of a new node to a zone1. First the new node must find a node already in CAN(Using

Bootstrap Nodes)2. The new node randomly chooses a point P in the co-

ordinate space3. It sends a JOIN request to point P via any existing CAN

node4. The request is forwarded using CAN routing mechanism to

the node D owning the zone containing P5. D then splits it’s node into half and assigns one half to

new node6. The new neighbour information is determined for both

the nodes

Failure of node

• Even if one of the neighbours fails, messages can be routed through other neighbours in that direction

• If a node leaves CAN, the zone it occupies is taken over by the remaining nodes– If a node leaves voluntarily, it can handover it’s database

to some other node– When a node simply becomes unreachable, the database

of the failed node is lost• CAN depends on sources to resubmit data, to recover lost data

Features• CHORD is a distributed hash table implementation• Addresses a fundamental problem in P2P

Efficient location of the node that stores desired data itemOne operation: Given a key, maps it onto a nodeData location by associating a key with each data item

• Adapts EfficientlyDynamic with frequent node arrivals and departuresAutomatically adjusts internal tables to ensure availability

• Uses Consistent Hashing Load balancing in assigning keys to nodes Little movement of keys when nodes join and leave

Features (continued)• Efficient Routing

Distributed routing tableMaintains information about only O(logN) nodesResolves lookups via O(logN) messages

• ScalableCommunication cost and state maintained at each

node scales logarithmically with number of nodes• Flexible Naming

Flat key-space gives applications flexibility to map their own names to Chord keys

• Decentralized

Some Terminology• Key

Hash key or its image under hash function, as per context m-bit identifier, using SHA-1 as a base hash function

• Node Actual node or its identifier under the hash function Length m such that low probability of a hash conflict

• Chord Ring The identifier circle for ordering of 2m node identifiers

• Successor Node First node whose identifier is equal to or follows key k in the identifier space

• Virtual Node Introduced to limit the bound on keys per node to K/N

Each real node runs Ω(logN) virtual nodes with its own identifier

Chord Ring

Consistent Hashing

• A consistent hash function is one which changes minimally with changes in the range of keys and a total remapping is not required

• Desirable propertiesHigh probability that the hash function balances loadMinimum disruption, only O(1/N) of the keys moved when

a nodes joins or leavesEvery node need not know about every other node, but a

small amount of “routing” information• m-bit identifier for each node and key• Key k assigned to Successor Node

Simple Key Location

Example

30

Scalable Key Location• A very small amount of routing information suffices to implement

consistent hashing in a distributed environment

• Each node need only be aware of its successor node on the circle

• Queries for a given identifier can be passed around the circle via these successor pointers

• Resolution scheme correct, BUT inefficient: it may require traversing all N nodes!

31

Acceleration of Lookups• Lookups are accelerated by maintaining additional routing information

• Each node maintains a routing table with (at most) m entries (where N=2m) called the finger table

• ith entry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2i-1 on the identifier circle (clarification on next slide)

• s = successor(n + 2i-1) (all arithmetic mod 2)

• s is called the ith finger of node n, denoted by n.finger(i).node

32

Finger Tables (1)

0

4

26

5

1

3

7

124

[1,2)[2,4)[4,0)

130

finger tablestart int. succ.

keys1

235

[2,3)[3,5)[5,1)

330


keys2

457

[4,5)[5,7)[7,3)

000


keys6

33

Finger Tables (2) - characteristics• Each node stores information about only a small number of other nodes,

and knows more about nodes closely following it than about nodes farther away

• A node’s finger table generally does not contain enough information to determine the successor of an arbitrary key k

• Repetitive queries to nodes that immediately precede the given key will lead to the key’s successor eventually

Pseudo code

Example

36

Node Joins – with Finger Tables

0

4

26

5

1

3

7

124

[1,2)[2,4)[4,0)

130


keys1

235

[2,3)[3,5)[5,1)

330


keys2

457

[4,5)[5,7)[7,3)

000


keys


keys

702

[7,0)[0,2)[2,6)

003

6

6

66

6

37

Node Departures – with Finger Tables

0

4

26

5

1

3

7

124

[1,2)[2,4)[4,0)

130


keys1

235

[2,3)[3,5)[5,1)

330


keys2

457

[4,5)[5,7)[7,3)

660


keys


keys

702

[7,0)[0,2)[2,6)

003

6

6

6

0

3

38

Source of Inconsistencies:Concurrent Operations and Failures

• Basic “stabilization” protocol is used to keep nodes’ successor pointers up to date, which is sufficient to guarantee correctness of lookups

• Those successor pointers can then be used to verify the finger table entries

• Every node runs stabilize periodically to find newly joined nodes

Pseudo code

Pseudo Code(Continue..)

41

Stabilization after Join

np

succ

(np)

= n

s

ns

n

pred

(ns)

= n p

n joins predecessor = nil n acquires ns as successor via some n’ n notifies ns being the new predecessor ns acquires n as its predecessor

np runs stabilize

np asks ns for its predecessor (now n) np acquires n as its successor np notifies n n will acquire np as its predecessor

all predecessor and successor pointers are now correct

fingers still need to be fixed, but old fingers will still work

nil

pred

(ns)

= n

succ

(np)

= n

42

Failure Recovery• Key step in failure recovery is maintaining correct successor pointers

• To help achieve this, each node maintains a successor-list of its r nearest successors on the ring

• If node n notices that its successor has failed, it replaces it with the first live entry in the list

• stabilize will correct finger table entries and successor-list entries pointing to failed node

• Performance is sensitive to the frequency of node joins and leaves versus the frequency at which the stabilization protocol is invoked

Impact of Node Joins on Lookups: Correctness

• For a lookup before stabilization has finished,1. Case 1: All finger table entries involved in the lookup

are reasonably current then lookup finds correct successor in O(logN) steps

2. Case 2: Successor pointers are correct, but finger pointers are inaccurate. This scenario yields correct lookups but may be slower

3. Case 3: Incorrect successor pointers or keys not migrated yet to newly joined nodes. Lookup may fail. Option of retrying after a quick pause, during which stabilization fixes successor pointers

Impact of Node Joins on Lookups: Performance

• After stabilization, no effect other than increasingthe value of N in O(logN)

• Before stabilization is complete• Possibly incorrect finger table entries• Does not significantly affect lookup speed, since

distance halving property depends only on ID-space distance

• If new nodes’ IDs are between the target predecessor and the target, then lookup speed is influenced

• Still takes O(logN) time for N new nodes

Handling Failures• Problem: what if node does not know who its new successor is, after

failure of old successor• May be in a gap in the finger table• Chord would be stuck!

• Maintain successor list of size r, containing the node’s first r successors• If immediate successor does not respond, substitute the next entry in the

successor list• Modified version of stabilize protocol to maintain the successor list

• Modified closest_preceding_node to search not only finger table but also successor list for most immediate predecessor

• If find_successsor fails, retry after some time• Voluntary Node Departures

• Transfer keys to successor before departure• Notify predecessor p and successor s before leaving

Theorems• Theorem IV.3: Inconsistencies in successor are

transient• If any sequence of join operations is executed interleaved

with stabilizations, then at sometime after the last join the successor pointers will form a cycle on all the nodes in the network.

• Theorem IV.4: Lookup take log(N) time with high probability even if N nodes join a stable N node network, once successor pointers are correct, even if finger pointers are not updated

• Theorem IV.6: If network is initially stable, even if every node fails with probability ½, expected time to execute find_succcessor is O(log N)

Simulation• Implements Iterative Style (other one is recursive style)• Node resolving a lookup initiates all communication unlike

Recursive Style, where intermediate nodes forward request Optimizations

• During stabilization, a node updates its immediatesuccessor and 1 other entry in successor list or finger table

• Each entry out of k unique entries gets refreshed once in’k’ stabilization rounds

• Size of successor list is 1• Immediate notification of predecessor change to old

predecessor, without waiting for next stabilization round

Parameters

• Mean of delay of each packet is 50 ms• Round trip time is 500 ms• Number of nodes is 104

• Number of Keys vary from 104 to 106

Load Balance• Test ability of consistent hashing, to allocate keys

to nodes evenly• Number of keys per node exhibits large variations,

that increase linearly with the number of keys• Association of keys with Virtual Nodes Makes the number

of keys per node more uniform and Significantly improves load balance

• Asymptotic value of query path length not affected much• Total identifier space covered remains same on average• Worst-case number of queries does not change• Not much increase in routing state maintained• Asymptotic number of control messages not affected

In the absence of Virtual Node

In Presence of Virtual Nodes

Path Length• Number of nodes that must be visited to resolve

a query, measured as the query path length• As per theorem, IV.2 • The number of nodes that must be contacted to

find a successor in an N-node Network is O(log N)• Observed Results :• Mean query path length increases logarithmically

with number of nodes• Average Same as expected average query path

length

Path Length Simulator Parameters

• A network with N = 2K nodes• No of Keys 100 x 2K• K varied between 3 to 14 and Path length is

measured

Future Work• Resilience against Network Partitions• Detect and heal partitions• For every node have a set of initial nodes• Maintain a long term memory of a random set of nodes• Likely to include nodes from other partition• Handle Threats to Availability of data• Malicious participants could present incorrect view of data• Periodical Global Consistency Checks for each node• Better Efficiency• O(logN) messages per lookup too many for some apps• Increase the number of fingers

References• Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,

I. Stoica, R. Morris, D. Karger, M. Frans Kaashoek, H. Balakrishnan, In Proc. ACM SIGCOMM 2001. Expanded version appears in IEEE/ACM Trans. Networking, 11(1), February 2003.

• A Scalable Content-Addressable Network,S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, In Proc. ACM SIGCOMM 2001)

• Querying the Internet with PIER Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, and Ion Stoica, VLDB 03

https://docs.google.com/viewer?url=http://www.cse.iitb.ac.in/dbms/Data/Courses/CS632/Papers/chord.pdf

https://docs.google.com/viewer?url=http://www.cse.iitb.ac.in/dbms/Data/Courses/CS632/Papers/ratnasamy-CAN.pdf

https://docs.google.com/viewer?url=http://www.cse.iitb.ac.in/dbms/Data/Courses/CS632/Papers/vldb03-pier.pdf

Thank You !

Any Question ??

Distributed Hash-based Lookup for Peer-to-Peer Systems Mohammed Junaid Azad 09305050 Gopal Krishnan...

Documents

Transcript of Distributed Hash-based Lookup for Peer-to-Peer Systems Mohammed Junaid Azad 09305050 Gopal Krishnan...