1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer...

40
1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG

Transcript of 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer...

Page 1: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

1

CS 525 Advanced Distributed

SystemsSpring 2014

Indranil Gupta (Indy)Lecture 5

Peer to Peer Systems II February 4, 2014

All Slides © IG

Page 2: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

2

Systems used widely in practice but with no provable properties• Non-academic P2P systems

e.g., Gnutella, Napster, Grokster,BitTorrent (previous lecture)

Systems with provable propertiesbut with less wider usage• Academic P2P systems

e.g., Chord, Pastry, Kelips (this lecture)

Two types of P2P Systems

Page 3: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

3

DHT=Distributed Hash Table• A hash table allows you to insert, lookup and delete objects with

keys• A distributed hash table allows you to do the same in a

distributed setting (objects=files)• Performance Concerns:

– Load balancing– Fault-tolerance– Efficiency of lookups and inserts– Locality

• Napster, Gnutella, FastTrack are all DHTs (sort of)• So is Chord, a structured peer to peer system that we study next

Page 4: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

4

Comparative Performance

Memory Lookup

Latency

#Messages

for a lookup

Napster O(1)

(O(N)@server)

O(1) O(1)

Gnutella O(N) O(N) O(N)

Page 5: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

5

Comparative Performance

Memory Lookup

Latency

#Messages

for a lookup

Napster O(1)

(O(N)@server)

O(1) O(1)

Gnutella O(N) O(N) O(N)

Chord O(log(N)) O(log(N)) O(log(N))

Page 6: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

6

Chord

• Developers: I. Stoica, D. Karger, F. Kaashoek, H. Balakrishnan, R. Morris, Berkeley and MIT

• Intelligent choice of neighbors to reduce latency and message cost of routing (lookups/inserts)

• Uses Consistent Hashing on node’s (peer’s) address– SHA-1(ip_address,port) 160 bit string – Truncated to m bits– Called peer id (number between 0 and )– Not unique but id conflicts very unlikely– Can then map peers to one of logical points on a circlem2

12 m

Page 7: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

7

Ring of peers

N80

N112

N96

N160Say m=7

N32

N45

6 nodes

Page 8: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

8

Peer pointers (1): successors

N80

0Say m=7

N32

N45

N112

N96

N16

(similarly predecessors)

Page 9: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

9

Peer pointers (2): finger tables

N8080 + 20

80 + 21

80 + 22

80 + 23

80 + 24

80 + 25 80 + 26

0Say m=7

N32

N45

ith entry at peer with id n is first peer with id >=

n 2i(mod2m )

N112

N96

N16i ft[i]0 961 962 963 964 965 1126 16

Finger Table at N80

Page 10: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

10

What about the files?

• Filenames also mapped using same consistent hash function– SHA-1(filename) 160 bit string (key)– File is stored at first peer with id greater than its key

(mod )

• File cnn.com/index.html that maps to key K42 is stored at first peer with id greater than 42– Note that we are considering a different file-sharing

application here : cooperative web caching– The same discussion applies to any other file sharing

application, including that of mp3 files.

2m

Page 11: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

11

Mapping Files

N80

0Say m=7

N32

N45

File with key K42 stored here

N112

N96

N16

Page 12: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

12

Search

N80

0Say m=7

N32

N45

File cnn.com/index.html with key K42 stored here

Who has cnn.com/index.html?(hashes to K42)

N112

N96

N16

Page 13: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

13

Search

N80

0Say m=7

N32

N45

File cnn.com/index.html with key K42 stored here

At node n, send query for key k to largest successor/finger entry <= kif none exist, send query to successor(n)

N112

N96

N16

Who has cnn.com/index.html?(hashes to K42)

Page 14: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

14

Search

N80

0Say m=7

N32

N45

File cnn.com/index.html with key K42 stored here

At node n, send query for key k to largest successor/finger entry <= kif none exist, send query to successor(n)

All “arrows” are RPCs

N112

N96

N16

Who has cnn.com/index.html?(hashes to K42)

Page 15: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

15

AnalysisSearch takes O(log(N)) time

Proof

– (intuition): at each step, distance between query and peer-with-file reduces by a factor of at least 2 (why?)

Takes at most m steps: is at most a constant multiplicative factor above N, lookup is O(log(N))

– (intuition): after log(N) forwardings, distance to key is at most (why?)

Number of node identifiers in a range of

is O(log(N)) with high probability (why? SHA-1!)

So using successors in that range will be ok

Nm /2Nm /2

m2

Here

Next hop

Key

Page 16: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

16

Analysis (contd.)

• O(log(N)) search time holds for file insertions too (in general for routing to any key)– “Routing” can thus be used as a building block for

• All operations: insert, lookup, delete

• O(log(N)) time true only if finger and successor entries correct

• When might these entries be wrong?– When you have failures

Page 17: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

17

Search under peer failures

N80

0Say m=7

N32

N45

File cnn.com/index.html with key K42 stored here

XX

X

Lookup fails (N16 does not know N45)

N112

N96

N16

Who has cnn.com/index.html?(hashes to K42)

Page 18: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

18

Search under peer failures

N80

0Say m=7

N32

N45

File cnn.com/index.html with key K42 stored here

X

One solution: maintain r multiple successor entriesIn case of failure, use successor entries

N112

N96

N16

Who has cnn.com/index.html?(hashes to K42)

Page 19: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

19

Search under peer failures

• Choosing r=2log(N) suffices to maintain lookup correctness w.h.p.(i.e., ring connected)– Say 50% of nodes fail– Pr(at given node, at least one successor alive)=

– Pr(above is true at all alive nodes)=

2log2 1

1)2

1(1

NN

1)1

1( 2

12/

2

NN e

N

Page 20: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

20

Search under peer failures (2)

N80

0Say m=7

N32

N45

File cnn.com/index.html with key K42 stored here

X

X

Lookup fails (N45 is dead)

N112

N96

N16

Who has cnn.com/index.html?(hashes to K42)

Page 21: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

21

Search under peer failures (2)

N80

0Say m=7

N32

N45

File cnn.com/index.html with key K42 stored here

X

One solution: replicate file/key at r successors and predecessors

N112

N96

N16

K42 replicated

K42 replicated

Who has cnn.com/index.html?(hashes to K42)

Page 22: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

22

Need to deal with dynamic changes

Peers fail• New peers join• Peers leave

– P2P systems have a high rate of churn (node join, leave and failure)

• 25% per hour in Overnet (eDonkey)• 100% per hour in Gnutella• Lower in managed clusters, e.g., CSIL• Common feature in all distributed systems, including wide-area (e.g.,

PlanetLab), clusters (e.g., Emulab), clouds (e.g., Cirrus), etc.

So, all the time, need to: Need to update successors and fingers, and copy keys

Page 23: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

23

New peers joining

N80

0Say m=7

N32

N45

N112

N96

N16

N40

Introducer directs N40 to N45 (and N32)N32 updates successor to N40N40 initializes successor to N45, and inits fingers from itN40 periodically talks to neighbors to update finger table

Stabilization Protocol(followed byall nodes)

Page 24: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

24

New peers joining (2)

N80

0Say m=7

N32

N45

N112

N96

N16

N40

N40 may need to copy some files/keys from N45(files with fileid between 32 and 40)

K34,K38

Page 25: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

25

New peers joining (3)

• A new peer affects O(log(N)) other finger entries in the system, on average [Why?]

• Number of messages per peer join= O(log(N)*log(N))

• Similar set of operations for dealing with peers leaving– For dealing with failures, also need failure detectors

(we’ll see these later in the course!)

Page 26: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

26

Experimental Results

• Sigcomm 01 paper had results from simulation of a C++ prototype

• SOSP 01 paper had more results from a 12-node Internet testbed deployment

• We’ll touch briefly on the first set

• 10000 peer system

Page 27: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

27

Solution: Each real node pretends to be r multiple virtual nodessmaller load variation, lookup cost is O(log(N*r)= O(log(N)+log(r))

Load BalancingLargevariance

Page 28: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

28

Lookups

Avera

ge M

ess

ag

es

per

Looku

p

Number of Nodes

log, as expected

Page 29: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

29

Stabilization Protocol

• Concurrent peer joins, leaves, failures might cause loopiness of pointers, and failure of lookups– Chord peers periodically run a stabilization algorithm that

checks and updates pointers and keys – Ensures non-loopiness of fingers, eventual success of

lookups and O(log(N)) lookups w.h.p.– [TechReport on Chord webpage] defines weak and strong

notions of stability– Each stabilization round at a peer involves a constant

number of messages– Strong stability takes stabilization rounds (!))( 2NO

Page 30: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

30

Churn• When nodes are constantly joining, leaving, failing

– Significant effect to consider: traces from the Overnet system show hourly peer turnover rates (churn) could be 25-100% of total number of nodes in system

– Leads to excessive (unnecessary) key copying (remember that keys are replicated)

– Stabilization algorithm may need to consume more bandwidth to keep up

– Main issue is that files are replicated, while it might be sufficient to replicate only meta information about files

– Alternatives• Introduce a level of indirection (any p2p system)• Replicate metadata more, e.g., Kelips (later in this lecture)

Page 31: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

31

Fault-tolerance

500 nodes (avg. path len=5)Stabilization runs every 30 s

1 joins&fails every 10 s(3 fails/stabilization round) => 6% lookups fail

Numbers look fine here, butdoes stabilization scale with N?

What about concurrent joins &leaves?

Page 32: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

Pastry – another DHT• From Microsoft Research

• Assigns ids to nodes, just like Chord (think of a ring)

• Leaf Set - Each node knows its successor(s) and predecessor(s)

• Routing tables based prefix matching– But select the closest (in network RTT) peers among those

that have the same prefix

• Routing is thus based on prefix matching, and is thus log(N)– And hops are short (in the underlying network)

32

Page 33: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

33

Page 34: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

34

Wrap-up Notes

• Memory: O(log(N)) successor pointer, m finger/routing entries

• Indirection: store a pointer instead of the actual file• Do not handle partitions (can you suggest a possible

solution?)

Page 35: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

35

Summary of Chord and Pastry

• Chord and Pastry protocols– More structured than Gnutella– Black box lookup algorithms– Churn handling can get complex– O(log(N)) memory and lookup cost

• Can we do better?

Page 36: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

Kelips – 1 hop Lookup• K “affinity groups”

– K ~ √ N

• Each node hashed to a group• Node’s neighbors

– (Almost) all other nodes in its own affinity group

– One contact node per foreign affinity group

• File can be stored at any (few) node(s)

• Each filename hashed to a group– All nodes in the group replicate

<filename, file location> – Affinity group does not store files

• Lookup = 1 hop (or a few)• Memory cost O(√ N) low

• 1.93 MB for 100K nodes, 10M files

36

…AffinityGroup # 0 # 1 # k-1

129

30

15

160

76

18

167

• PennyLane.mp3 hashes to k-1• Everyone in this group stores <PennyLane.mp3, who-has-file>

Page 37: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

Summary – P2P

• Range of tradeoffs available– Memory vs. lookup cost vs. background

bandwidth (to keep neighbors fresh)

37

Page 38: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

Announcements

• No office hours today (talk to me now)

• Next week Thu: Student-led presentations start– Please Read instructions on the website!

• Next week Thu: Reviews start– Please Read instructions on the website!

38

Page 39: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

39

Backup Slides

Page 40: 1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.

40

Wrap-up Notes (3)

• Virtual Nodes good for load balancing, but– Effect on peer traffic?– Result of churn?

• Current status of project:– File systems (CFS,Ivy) built on top of Chord– DNS lookup service built on top of Chord– Internet Indirection Infrastructure (I3) project at UCB– Spawned research on many interesting issues about p2p

systems

http://www.pdos.lcs.mit.edu/chord/