1 Lecture 2 Distributed Hash Table. Information search in P2P r Suppose we have a P2P systems with N...

1

Lecture 2 Distributed Hash Table

Information search in P2P

Suppose we have a P2P systems with N nodes.

A file “F” is stored in one node How could an arbitrary node find

“F” in the system.

2

3

P2P: centralized index

original “Napster” design

1) when peer connects, it informs central server: IP address content

2) Alice queries for “Hey Jude”

3) Alice requests file from Bob

centralizeddirectory server

peers

Alice

Bob

1

1

1

12

3

4

P2P: problems with centralized directory

single point of failure performance

bottleneck copyright

infringement: “target” of lawsuit is obvious

file transfer is decentralized, but locating content is highly centralized

5

Query flooding

fully distributed no central server

used by Gnutella Each peer indexes

the files it makes available for sharing (and no other files)

overlay network: graph edge between peer X

and Y if there’s a TCP connection

all active peers and edges form overlay net

edge: virtual (not physical) link

given peer typically connected with < 10 overlay neighbors

2: Application LayerSSL (7/09)

6

Query flooding

Query

QueryHit

Query

Query

QueryHit

Query

Query

QueryHit

File transfer:HTTP

Query messagesent over existing TCPconnections peers forwardQuery message QueryHit sent over reversepath

Scalability:limited scopeflooding

7

Gnutella: Peer joining

1. joining peer Alice must find another peer in Gnutella network: use list of candidate peers

2. Alice sequentially attempts TCP connections with candidate peers until connection setup with Bob

3. Flooding: Alice sends Ping message to Bob; Bob forwards Ping message to his overlay neighbors (who then forward to their neighbors….) peers receiving Ping message respond to

Alice with Pong message4. Alice receives many Pong messages, and can

then setup additional TCP connections

8

Hierarchical Overlay

Hybrid of centralized index, query flooding approaches

each peer is either a super node or assigned to a super node TCP connection between

peer and its super node. TCP connections between

some pairs of super nodes.

Super node tracks content in its children

ordinary peer

group-leader peer

neighoring re la tionshipsin overlay network

9

Distributed Hash Table (DHT)

DHT = distributed P2P database Database has (key, value) pairs;

key: ss number; value: human name key: content type; value: IP address

Peers query database with key database returns values that match the key

Peers can also insert (key, value) pairs into database

Finding “needles” requires that the P2P system be structured

10

The Principle Of Distributed Hash Tables

A dynamic distribution of a hash table onto a set of cooperating nodes

Key Value

1 Frozen

9 Tangled

11 Mulan

12 Lion King

21 Cinderella

22 Doreamon

• Basic service: lookup operation • Key resolution from any node

• Each node has a routing table • Pointers to some other nodes• Typically, a constant or a logarithmic number of pointers (why?)

node A

node D

node B

node C

→Node D : lookup(9)

11 Oct. 4

DHT Desirable Properties

1. Keys mapped evenly to all nodes in the network

2. Each node maintains information about only a few other nodes

3. A key can be found efficiently by querying the system

4. Node arrival/departures only affect a few nodes

12

Chord Identifiers

Assign integer identifier to each peer in range [0,2n-1]. Each identifier can be represented by n bits.

Require each key to be an integer in same range.

To get integer keys, hash original key. e.g., key = h(“Led Zeppelin IV”) This is why database is called a distributed “hash”

table

13

Each key must be stored in a node Central issue:

Assigning (key, value) pairs to peers. Rule: assign to the peer that has the ID

closest to key. Convention in lecture: closest is the

immediate successor of the key (or equal to)

Example: 4 bits; peers: 1,3,4,5,8,10,12,14; key = 13, then successor peer = 14 key = 15, then successor peer = 1

14

Chord [MIT]

consistent hashing (SHA-1) assigns each node and object an m-bit ID

IDs are ordered in an ID circle ranging from 0 – (2m-1).

New nodes assume slots in ID circle according to their ID

Key k is assigned to first node whose ID ≥ k

successor(k)

15

Consistent Hashing - Successor Nodes

6

1

2

6

0

4

26

5

1

3

7

2identifier

circle

identifiernode

X key

successor(1) = 1

successor(2) = 3successor(6) = 0

16

Consistent Hashing – Join and Departure

When a node n joins the network, certain keys previously assigned to n’s successor now become assigned to n.

When node n leaves the network, all of its assigned keys are reassigned to n’s successor.

17

Consistent Hashing – Node Join

0

4

26

5

1

3

7

keys1

keys2

keys

keys

7

5

18

Consistent Hashing – Node Dep.

0

4

26

5

1

3

7

keys1

keys2

keys

keys6

7

Consistent Hashing: more example For n = 6, # of identifiers is 64. The following DHT ring has 10 nodes

and stores 5 keys. The successor of key 10 is node 14.

20

1

3

4

5

810

12

15

Circular DHT (1)

Each peer only aware of immediate successor and predecessor.

21

Circle DHT (2)

0001

0011

0100

0101

10001010

1100

1111

Who’s resp

for key 1110 ?I am

O(N) messageson avg to resolvequery, when thereare N peers

1110

1110

1110

1110

1110

1110

Define closestas closestsuccessor

22

Circular DHT with Shortcuts

Each peer keeps track of IP addresses of predecessor, successor, and short cuts.

Reduced from 6 to 3 messages. Can design shortcuts such that O(log N) neighbors per

peer, O(log N) messages per query

0001

0011

0100

0101

10001010

1100

1111

Who’s resp for key 1110?

Scalable Key Location – Finger Tables

0

4

26

5

1

3

7

124

130

finger tablestart succ.

keys1

235

330


keys2

457

000


keys6

0+20

0+21

0+22

For.

1+20

1+21

1+22

For.

3+20

3+21

3+22

For.

24

Chord key location

Lookup in finger table the furthest node that precedes key

-> O(log n) hops

25

Peer Churn

•To handle peer churn, require each peer to know the IP address of its two successors •Each peer periodically pings its two successors to see if they are still alive •Limited solution for single join or single failure

Consistent Hashing – Node Join

0

4

26

5

1

3

7

keys1

keys2

keys

keys

7

5

Consistent Hashing – Node Dep.

0

4

26

5

1

3

7

keys1

keys2

keys

keys6

7

28

Node Joins and Stabilizations

The most important thing is the successor pointer.

If the successor pointer is ensured to be up to date, which is sufficient to guarantee correctness of lookups, then finger table can always be verified.

Each node runs a “stabilization” protocol periodically in the background to update successor pointer and finger table.

29

Node Joins and Stabilizations

“Stabilization” protocol contains 6 functions: create() join() stabilize() notify() fix_fingers() check_predecessor()

When node n first starts, it calls n.join(n’), where n’ is any known Chord node.

The join() function asks n’ to find the immediate successor of n.

30

Node Joins – stabilize()

Each time node n runs stabilize(), it asks its successor for the it’s predecessor p, and decides whether p should be n’s successor instead.

stabilize() notifies node n’s successor of n’s existence, giving the successor the chance to change its predecessor to n.

The successor does this only if it knows of no closer predecessor than n.

31

Node Joins – Join and Stabilization

np

su

cc(n

p)

= n

s

ns

n

pre

d(n

s)

= n

p

n joins

predecessor = nil n acquires ns as successor via some n’

n runs stabilize n notifies ns being the new predecessor

ns acquires n as its predecessor

np runs stabilize

np asks ns for its predecessor (now n)

np acquires n as its successor

np notifies n

n will acquire np as its predecessor

all predecessor and successor pointers are now correct

fingers still need to be fixed, but old fingers will still work

nil

pre

d(n

s)

= n

su

cc(n

p)

= n

32

Node Failures

Key step in failure recovery is maintaining correct successor pointers

To help achieve this, each node maintains a successor-list of its r nearest successors on the ring

If node n notices that its successor has failed, it replaces it with the first live entry in the list

Successor lists are stabilized as follows: node n reconciles its list with its successor s by

copying s’s successor list, removing its last entry, and prepending s to it.

If node n notices that its successor has failed, it replaces it with the first live entry in its successor list and reconciles its successor list with its new successor.

33

Handling failures: redundancy Each node knows IP addresses of next r

nodes. Each key is replicated at next r nodes

Evaluation results

10,000 node network 34

Load distribution

Probability density function

35

Failure rate

36

Path length

37

Failed lookups vs churn rate

Start with 500 nodes

38

Chord main problem

Not good churn-handling solution Only merely achieves “correctness”

The definition of a correct Chord is letting each node maintain the predecessor and successor.

Which allows a query to eventually arrive the key location, but….

Takes at most O(N) hops to find the key! Not log(N) as the original design claimed.

39

Chord main problem

No good solution to maintain both scalable and consistent finger table under Churn.

Not practical for P2P systems which are highly dynamic

Paper talking about high consistency:

Simon S. Lam and Huaiyu Liu, ``Failure Recovery for Structured P2P Networks: Protocol Design and Performance Evaluation,'' Proceedings of ACM SIGMETRICS 2004,

40

Chord problem 2

Only good for exact search Cannot support range search and

approximate search

41

Solution of BitTorrent

Maintain trackers (servers) as DHT, which are more reliable

Users queries trackers to get the locations of the file

File sharing are not structured.

42

DHT in a cloud Architecture

Servers are hosted in a cloud. Data are distributed among servers User is a device outside the cloud.

User sends a query for a key (webpage, file, data, etc) to the cloud

The query first arrives at an arbitrary server and be routed among the servers using DHT. It finally arrives at the server which has the data

The server replies the user.

43

44

End of Lecture02

Next paper: Read and Write a review

Vivaldi: A Decentralized Network Coordinate System, Frank Dabek,

Russ Cox, Frans Kaashoek and Robert Morris, Proceedings

SIGCOMM 2004.

1 Lecture 2 Distributed Hash Table. Information search in P2P r Suppose we have a P2P systems with N...

Documents

Transcript of 1 Lecture 2 Distributed Hash Table. Information search in P2P r Suppose we have a P2P systems with N...