Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

44
1 Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord Spring 2008 Idit Keidar

description

Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord. Spring 2008 Idit Keidar. Today’s Material. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Stoica et al. Reminder: Peer-to-Peer Lookup. Insert (key, file) Lookup (key) - PowerPoint PPT Presentation

Transcript of Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

Page 1: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

1Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Principles of Reliable Distributed Systems

Lecture 2: Distributed Hash

Tables (DHT), Chord

Spring 2008 Idit Keidar

Page 2: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

2Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Today’s Material

• Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications– Stoica et al.

Page 3: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

3Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Reminder: Peer-to-Peer Lookup

• Insert (key, file)• Lookup (key)

– Should find keys inserted in any node

Page 4: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

4Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Reminder: Overlay Networks

• A virtual structure imposed over the physical network (e.g., the Internet)– over the Internet, there is a

(IP level) link between every pair of nodes

– an overlay uses a fixed subset of these

• Why restrict to a subset?

Page 5: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

5Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Routing/Lookup in Overlays

• How does one route a packet to its destination in an overlay?

• How about lookup (key)?• Unstructured overlay: (last week)

– Flooding or random walks• Structured overlay: (today)

– The links are chosen according to some rule– Tables define next-hop for routing and lookup

Page 6: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

6Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Structured Lookup Overlays• Many academic systems –

– CAN, Chord , D2B, Kademlia, Koorde, Pastry, Tapestry, Viceroy, …

• OverNet based on the Kademlia algorithm• Symmetric, no hierarchy• Decentralized self management• Structured overlay – data stored in a defined place,

search goes on a defined path• Implement Distributed Hash Table (DHT)

abstraction

Page 7: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

7Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Reminder: Hashing

• Data structure supporting the operations:– void insert( key, item ) – item search( key )

• Implementation uses hash function for mapping keys to array cells

• Expected search time O(1)– provided that there are few collisions

Page 8: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

8Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Distributed Hash Tables (DHTs)

• Nodes store table entries– The role of array cells

• Good abstraction for lookup? – Why?

Page 9: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

9Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

The DHT Service Interface

lookup( key ) returns the location of the node currently

responsible for this keykey is usually numeric (in some range)

Page 10: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

10Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Using the DHT Interface

• How do you publish a file?• How do you find a file?• Requirements for an application being able

to use DHTs?– Data identified with unique keys– Nodes can (agree to) store keys for each other

• location of object (pointer) or actual object (data)

Page 11: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

11Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

What Does a DHT Implementation Need to Do?

• Map keys to nodes– Needs to be dynamic as nodes join and leave– How does this affect the service interface?

• Route a request to the appropriate node– Routing on the overlay

Page 12: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

12Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Lookup Example

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

insert(K1,V1)

K V(K1,V1)

lookup(K1)

Page 13: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

13Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Mapping Keys to Nodes

• Goal: load balancing– Why?

• Typical approach: – Give an m-bit id to each node and each key

(e.g., using SHA-1 on the key, IP address)– Map key to node whose id is “close” to the key

(need distance function) – How is load balancing achieved?

Page 14: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

14Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Routing Issues

• Each node must be able to forward each lookup query to a node closer to the destination

• Maintain routing tables adaptively– Each node knows some other nodes– Must adapt to changes (joins, leaves, failures)– Goals?

Page 15: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

15Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Handling Join/Leave

• When a node joins it needs to assume responsibility for some keys – Ask the application to move these keys to it– How many keys will need to be moved?

• When a nodes fails or leaves, its keys have to be moved to others– What else is needed in order to implement this?

Page 16: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

16Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

P2P System Interface

• Lookup• Join• Move keys

Page 17: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

17Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Chord

Stoica, Morris, Karger, Kaashoek, and Balakrishnan

Page 18: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

18Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Chord Logical Structure

• m-bit ID space (2m IDs), usually m=160.• Think of nodes as organized in a logical ring

according to their IDs.N1

N8

N10

N14

N21

N30N38

N42

N48

N51N56

Page 19: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

19Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Consistent Hashing: Assigning Keys to Nodes

• Key k is assigned to first node whose ID equals or follows k – successor(k)

N1N8

N10

N14

N21

N30N38

N42

N48

N51N56

K54

Page 20: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

20Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Moving Keys upon Join/Leave

• When a node joins, it becomes responsible for some keys previously assigned to its successor – Local change– Assuming load is balanced, how many keys

should move?• And what happens when a node leaves?

Page 21: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

21Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Consistent Hashing Guarantees• For any set of N nodes and K keys, w.h.p.:

– Each node is responsible for at most (1 + )K/N keys– When an (N + 1)st node joins or leaves,

responsibility for O(K/N) keys changes hands (only to or from the joining or leaving node)

• For the scheme described above, = O(logN) can be reduced to an arbitrarily small constant

by having each node run (logN) virtual nodes, each with its own identifier

Page 22: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

22Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Simple Routing Solutions

• Each node knows only its successor – Routing around the circle– Good idea?

• Each node knows all other nodes– O(1) routing– Cost?

Page 23: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

23Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Chord Skiplist Routing• Each node has “fingers” to nodes ½ way around the ID

space from it, ¼ the way…• finger[i] at n contains successor(n+2i-1)• successor is finger[1]

N0N8

N10

N14

N21

N30N38

N42

N48

N51N56

How many entries in the finger table?

Page 24: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

24Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Example: Chord FingersN0

N10

N21

N30

N47

finger[1..4]

N72

N82

N90

N114

finger[5]

finger[6]

finge

r[7]

m entrieslog N distinct fingers with high probability

Page 25: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

25Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Chord Data Structures (At Each Node)

• Finger table• First finger is successor• Predecessor

Page 26: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

26Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Forwarding Queries

• Query for key k is forwarded to finger with highest ID not exceeding k

K54 Lookup( K54 )N0

N8N10

N14

N21

N30N38

N42

N48

N51N56

Page 27: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

27Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

How long does it take?

Remote Procedure Call (RPC)

Page 28: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

28Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Routing Time• Node n looks up a key stored at node p• p is in n’s ith interval:

p ((n+2i-1)mod 2m, (n+2i)mod 2m] • n contacts f=finger[i]

– The interval is not empty (because p is in it) so: f ((n+2i-1)mod 2m, (n+2i)mod 2m]

– RPC f• f is at least 2i-1 away from n• p is at most 2i-1 away from f• The distance is halved: maximum m steps

Page 29: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

29Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Routing Time Refined

• Assuming uniform node distribution around the circle, the number of nodes in the search space is halved at each step: – Expected number of steps: log N

• Note that:– m = 160 – For 1,000,000 nodes, log N = 20

Page 30: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

30Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

What About Network Distance?K54

Lookup( K54 )N0N8

N10

N14

N21

N30N38

N42

N48

N51N56

Haifa

Texas

China

Page 31: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

31Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Joining Chord

• Goals?• Required steps:

– Find your successor– Initialize finger table and predecessor– Notify other nodes that need to change their

finger table and predecessor pointer• O(log2N)

– Learn the keys that you are responsible for; notify others that you assume control over them

Page 32: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

32Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Join Algorithm: Take II

• Observation: for correctness, successors suffice – Fingers only needed for performance

• Upon join, update successor only• Periodically,

– Check that successors and predecessors are consistent

– Fix fingers

Page 33: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

33Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Creation and Join

Page 34: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

34Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Page 35: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

35Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Join Examplejoiner finds successor

getskeys

stabilizefixes

successor

stabilizefixes

predecessor

Page 36: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

36Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Join Stabilization Guarantee

• If any sequence of join operations is executed interleaved with stabilizations,– Then at some time after the last join – The successor pointers form a cycle on all the

nodes in the network• Model assumptions?

Page 37: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

37Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Performance with Concurrent Joins

• Assume a stable network with N nodes with correct finger pointers

• Now, another set of up to N nodes joins the network, – And all successor pointers (but perhaps not all

finger pointers) are correct, • Then lookups still take O(logN) time w.h.p.• Model assumptions?

Page 38: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

38Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Failure Handling

• Periodically fixing fingers • List of r successors instead of one successor• Periodically probing predecessors:

Page 39: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

39Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Failure Detection

• Each node has a local failure detector module• Uses periodic probes and timeouts to check

liveness of successors and fingers– If the probed node does not respond by a designated

timeout, it is suspected to be faulty• A node that suspects its successor (finger) finds a

new successor (finger)• False suspicion - the suspected node is not faulty

– Suspected due to communication problems

Page 40: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

40Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

The Model?• Reliable messages among correct nodes

– No network partitions• Node failures can be accurately detected!

– No false suspicions• Properties hold as long as failure is bounded:

– Assume a list of r = (logN) successors– Start from stable state and then each node fails with prob. 1/2– Then w.h.p. find successor returns the closest living successor to

the query key– And the expected time to execute find successor is O(logN)

Page 41: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

41Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

What Can Partitions Do?

N0N8

N10

N14

N21N38

N42

N51N56

Suspect successor

N30Suspect

successor

N48

Suspect successor

Page 42: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

42Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

What About Moving Keys?

• Left up to the application• Solution: keep soft state, refreshed

periodically– Every refresh operation performs lookup(key)

before storing the key in the right place• How can we increase reliability for the time

between failure and refresh?

Page 43: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

43Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

Summary: DHT Advantages

• Peer-to-peer: no centralized control or infrastructure

• Scalability: O(log N) routing, routing tables, join time

• Load-balancing• Overlay robustness

Page 44: Principles of Reliable  Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

44Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008

DHT Disadvantages

• No control where data is stored• In practice, organizations want:

– Content Locality – explicitly place data where we want (inside the organization)

– Path Locality – guarantee that local traffic (a user in the organization looks for a file of the organization) remains local

• No prefix search