1 Peer-to-Peer Networks. Overlay Network A logical network laid on top of the Internet A B C...

1

Peer-to-Peer Networks

Overlay Network

A logical network laid on top of the Internet

A

B

C

Internet

Logical link AB Logical link BC

The Formal Model

Let V be a set of nodes. The functions

id : V Z+ assigns a unique id to each node in V

rs : V {0, 1}* assigns a random bit string to each node in V (optional)

A family of overlay networks ON : F G, where F is the set

of all triples λ= (V; id; rs) and G is the set of all directed graphs.

A unique directed graph ON(λ)∈ G with each labeled set λ = (V;

id; rs) of nodes.

Each node contains one or more objects. One important objective

is SEARCH: any node must be able to access any object as quickly

as possible

Structured vs. Unstructured

Overlay networks

Unstructured Structured

No restriction on

network topology.

Examples: Gnutella,

Kazaa, Bittorrent,

Skype etc.

Network topology

satisfies specific

invariants. Examples:

Chord, CAN, Pastry

Skip Graph etc

5

Gnutella

The Gnutella network is a fully distributed

alternative to the centralized Napster.

Initial popularity of the network was spurred

on by Napster's legal demise in early 2001.

6

What is Gnutella?

object1

No central authority.

object2peer

peer

A protocol for distributed search

7

Remarks

Gnutella uses the simple idea of searching by flooding, but scalability is an issue, since query flooding wastes bandwidth. Uses TTL to control flooding.

Sometimes, existing objects may not be located due to limited TTL.

Subsequently, various improved search strategies have been proposed.

8

Searching in Gnutella

The topology is dynamic, i.e. constantly changing. How do

we model a constantly changing topology? Usually, we begin

with a static topology, and later account for the effect of churn.

Modeling topology

-- Random graph

-- Power law graph

(measurements provide useful inputs)

9

Random graph: Erdös-Rényi model

A random graph G(n, p) is constructed by starting with

a set of n vertices, and adding edges between pairs of

nodes at random. Every possible edge occurs independently

with probability p.

Q. Is Gnutella topology a random graph?

10

Gnutella topology

Gnutella topology is almost a power-law graph. (Also called scale-free graph)

What is a power-law graph? The number of nodes with degree k = c.k - r

(Contrast this with Gaussian distribution where the number of nodeswith degree k = c. 2 - k. )

Many graphs in the nature exhibit power-law characteristics. Examples, world-wide web (the number of pages that have k in-links is proportional to k- 2), The fraction of scientific papers that receive k citations is k-3 etc.

11

AT&T Call Graph

# o

f te

lep

hon

e n

um

be

rsfr

om

wh

ich

ca

lls w

ere

ma

de

# of telephone numbers called

4

How many telephonenumbers receive calls from kdifferent telephone numbers?

12

100 101

100

101

102

number of neighbors

pro

po

rtio

n o

f n

od

esdatapower-law fit = 2.07

Gnutella network

power-law link distribution

summer 2000,data provided by Clip2

5

13

A possible explanation

Nodes join at different times.

The more connections a node has, the more likely it is to acquire new connections (“Rich gets richer”).Popular webpages attract new pointers.

It has been mathematically shown that such a growth process produces power-law network

7

14

Search strategies

•Flooding

•Random walk /

- Biased random walk/

- Multiple walker random walk

(Combined with)

• One-hop replication /

• Two-hop replication

• k-hop replication

15

On Random walk

Let p(d) be the probability that a random walk on a d-D lattice returns to the origin. In 1921, Pólya proved that,

(1) p(1)=p(2)=1, but(2) p(d)<1 for d>2

There are similar resultson two walkers meetingeach other via random walk

16

Search via random walk

Existence of a path doesnot necessarily mean that such a path can be discovered

17

Search via Random Walk

Search metrics

Delay = discovery time in hops

Overhead = total distance covered by the walker

Both should be as small as possible.

For a single random walker, these are equal.

K random walkers is a compromise.

For search by flooding, if delay = h then

overhead = d + d2 + … + dh where

d = degree of a node.

18

A simple analysis of random walk

Let p = Population of the object. i.e. the fraction of nodes hosting the object

T = TTL (time to live)

Hop count h Probability of success

1 p

2 (1-p).p

3 (1-p)2.p

T (1-p)T-1.p

19

A simple analysis of random walk

Expected hop count E(h) =

1.p + 2.(1-p).p + 3(1-p)2.p + …+ T.(1-p)T-1.p

= 1/p. (1-(1-p)T) - T(1-p)T

With a large TTL, E(h) = 1/p

With a small TTL, there is a risk that search will time out before an existing object is located.

20

K random walkers

As k increases, the overhead increases, but the delay decreases. There is a tradeoff.

Assume they all k walkers start in unison. Probability that none could find the object after one hop = (1-p)k. The probability. that none succeeded after T hops = (1-p)kT. So the probability that at least one walker succeeded is 1-(1-p)kT. A typical assumption is that the search is abandoned as soon as at least one walker succeeds. Usingthese, one can derive a new value of E(h)

21

Increasing search efficiency

Major strategies

1. Biased walk utilizing node degree heterogeneity.

2. Utilizing structural properties like random graph,

power-law graphs, or small-world properties

3. Topology adaptation for faster search

4. Introducing two layers in the graph structure using

supernodes

22

One hop replication

Each node keeps track of the indices of the files belonging to its

immediate neighbors. As a result, high capacity / high degree nodes

can provide much better clues to a large number of search queries.

Where is

23

Biased random walk

P=5/10

P=3/10

P=2/10

Each node records the degree of the neighboring nodes. Search easily gravitates towards high degree nodes that hold more clues.

24

1

6

54

63

67

2

94

number ofnodes found

power-law graph

9

Deterministic biased walk

25

The next step

This growing surge in popularity revealed the limits of the initial

protocol's scalability. In early 2001, variations on the protocol

improved the scalability. Instead of treating every node as client

and server, some resource-rich nodes were used as ultrapeers

or “supernodes,” containing indices of the objects in the local

neighborhood. Search requests and responses were routed

through them leading to faster response.

26

The KaZaA approach

Powerful nodes (supernodes) act as local index servers, and

client queries are propagated to other supernodes. Two-layered

architecture.

Supernode

download

Supernode

Where isABC?

ABC

The Chord P2P Network

Some slides have been borrowed from the original presentation by the authors

Main features of Chord

-- Load balancing via Consistent Hashing

– Small routing tables per node: log n

– Small routing delay: log n hops

– Fast join/leave protocol (polylog time)

Consistent Hashing

-- Assigns both nodes and objects an m-bit key.-- Order these nodes around an identifier circle (what does a circle mean here?) according to the order of their keys (0 .. 2m-1). This ring is known as the Chord Ring.An object with key k is assigned to the first node whosekey is ≥ k (called the successor node of key k)

Nodes and Objects on the Chord Ring

N32

N90

N105

K80

K20

K5

Circular 7-bitID space

Key 5Node 105

A key k is stored at its successor (node with key ≥ k)

Consistent Hashing [Karger 97]

Property 1

If there are N nodes and K object keys, then with high probability,

each node is responsible for (1+ )K/N objects.

Property 2

When a node joins or leaves the network, the responsibility

of at most O(K/N) keys changes hand (only to or from the node

that is joining or leaving.

When K is large, the impact is quite small.

The log N Fingers

(0)

Each node knows of only log N other nodes.

N80

1/8

1/161/321/641/128

Circular (log N)-bitID space

Distance of N80’sneighbors from

N80

1/4 1/2

Finger i points to successor of n+2i

N80

½¼

1/8

1/161/321/641/128

112

N120

Chord Finger Table

(0)

N32

N60

N79

N70

N113

N102

N40

N52

33..33 N4034..35 N4036..39 N4040..47 N4048..63 N5264..95 N7096..31 N102

Node n’s i-th entry: first node n + 2i-1

N32’sFinger Table

N80

N85

N=128

Finger table actually contains ID and IP address

Lookup

33..33 N4034..35 N4036..39 N4040..47 N4048..63 N5264..95 N7096..31 N102

N32’sFinger Table

Node 32, lookup(82): 32 70 80 85.

71..71 N7972..73 N7974..77 N7978..85 N8086..101 N102102..5 N1026..69 N32

N70’sFinger Table

(0)

N32

N60N79

N70

N113

N102

N40

N52

N80

N85 81..81 N8582..83 N8584..87 N8588..95 N10296..111 N102112..15 N11316..79 N32

N80’sFinger Table

Greedy routing

New Node Join

(0)

N32

N60

N80

N70

N113

N102

N40

N52

1 21..212 22..233 24..274 28..355 36..516 52..837 84..19

N20’sFinger Table

N20

Assume that the new node N20 knows one of the existing nodes.

New Node Join (2)(0)

N32

N60

N80

N70

N113

N102

N40

N52

21..21 N3222..23 N3224..27 N3228..35 N3236..51 N4052..83 N5284..19 N102

N20’sFinger Table

N20

Node 20 asks that node to locate the successors of 21, 22, …, 52, 84.

The Join procedure

The new node id asks a gateway node n to find the successor of id

n. find_successor(id)

if id (n, successor]

then return successor

else forward the query around the circle

fi

Needs O(n) messages for a simple Chord ring. This is slow.

Steps in join

id

n

Successor(n)

id

n

Finally

But the transition does not happen immediately

Linked list insert

A More Efficient Join

// ask n to find the successor of id

if id (n, successor]

then return successor

else n’= closest_ preceding_node (id)

return n’.find_successor(id)

fi

// search for the highest predecessor of id

n. closest_preceding_node(id)

for i = log N downto 1

if (finger[i] (n,id)

return finger[i]

Example

(0)

N32

N60

N80

N70

N113

N102

N40

N52

N20

K65

N20 wants tofind out thesuccessor of key 65

After join move objects(0)

N32

N60

N80

N70

N113

N102

N40

N52

21..21 N3222..23 N3224..27 N3228..35 N3236..51 N4052..83 N5284..19 N102

N20’sFinger Table

N20

Node 20 moves documents from node 32.

D114..20

Notify nodes that must includeN20 in their table. N113[1]=N20, not N32.

Three steps in join

Step 1. Initialize predecessor and fingers of the new node.

Step 2. Update the predecessor and the fingers of the

existing nodes. (Thus notify nodes that must include

N20 in their table. N113[1] = N20, not N32.

Step 3. Transfer objects to the new node as appropriate.

(Knowledge of predecessor is useful in stabilization)

Concurrent Join

New node n

n1

n2

[Before]

New node n

n2

n1

[After]

New node n’ New node n’

Stabilization

New node n

n2

n1

Predecessor.successor(n1) ≠ n1, so n1 adoptspredecessor.successor(n1) = n as its new successor

New node n

n2

Periodic stabilization is needed to integrate the newnode into the network and restore the invariant.

n1

The complexity of join

With high probability, any node joining or leaving

an N-node Chord network will use O(log2N) messages

to re-establish the Chord routing invariants and finger

tables.

Chord Summary

• Log(n) lookup messages and table space.• Well-defined location for each ID.• Natural load balance due to consistent hashing.• No name structure imposed.• Minimal join/leave disruption.

ChordAdvanced issues

Analysis

Theorem. Search takes O (log N) time

2m = key space, N= number of nodes

Proof. After log N forwarding steps, distance to key is at most

(N= 2log N). Number of nodes in the remaining range

is O(log N) with high probability (property of consistent hashing).

So by using successors in that range, it will take at most an

Additional O(log N) forwarding steps.

Nm /2

Analysis (contd.)

O(log N) search time is true if finger and

successor entries correct, But what if these

entries are wrong (which is possible during

join or leave operations, or process crash?)

Search under peer failures

N80

0Say m=7

N32

N45

File abcnews.com with key K42 stored here

XX

X

N32 crashed. Lookup for K42 fails (N16 does not know N45)

N112

N96

N16

Who has abcnews.com?(hashes to K42)


N80

0Say m=7

N32

N45


X

One solution: maintain r multiple successor entries in case of a failure, use other successor entries.

N112

N96

N16


Reactive vs.Proactive approach


Choosing r=2log(N) suffices to maintain the correctness “with high probability.” Say 50% of nodes fail (i.e prob of failure = ½). For a given node, Probability (at least one successor alive) =

2log2 1

1)2

1(1

NN −=−

Search under peer failures (2)

N80

0Say m=7

N32

N45


X

X

Lookup fails (N45 is dead)

N112

N96

N16


Search under peer failures (2)

N80

0Say m=7

N32

N45


X

One solution: replicate file/key at r successors and predecessors

N112

N96

N16

K42 replicated

K42 replicated


Dealing with dynamic issues

Peers fail

New peers join

Peers leave

Need to update successors and fingers, and ensure keys reside in the right places

New peers joining

N80

0Say m=7

N32

N45

N112

N96

N16

N40

Some gateway node directs N40 to its successor N45N32 updates successor to N40N40 initializes successor to N45, and obtains fingers from itN40 periodically talks to neighbors to update finger table

Stabilization protocol

New node

Gateway node

New peers joining (2)

N80

0Say m=7

N32

N45

N112

N96

N16

N40

N40 may need to copy some files/keys from N45(files with fileid between 32 and 40)

K34,K38

Concurrent join

N80

0

Say m=7

N32

N45

N112

N96

N16

N24

K38

N20

N28

Argue that each node will eventually be reachable

K24

Effect of join on lookup

If in a stable network with N nodes, another set

of N nodes joins the network, and the join

protocol correctly sets their successors, then

lookups will take O(log N) steps w.h.p

Effect of join on lookup

N80

0

N32

N45

N112

N96

N16

N24

K38

N20

N28

K24

Transfer pendingLinear ScanWill locate

K24

Consistent hashingguarantees that therebe O(log N) new nodes

w.h.p between twoconsecutive nodes

Weak and Strong Stabilization

N5

N3

N1

N24

N63

N78

N96

u (successor (predecessor (u))) = u. Still it is weakly stable but not strongly stable. Why?

Loopy network

Loopy network

What is funny / awkward about this?

v: u < v < successor (u)

(succ (pred (u))) = u

(Weakly stable)

stable

Must be falsefor strong stability

N5

N3

N1

N24

N63

N78

N96

Strong stabilization

The key idea of recovery from loopiness is: Let each node u ask its successor to walk around the ring until it reaches anode v : u <v ≤ successor(u). If

v: u <v < successor(u)

then loopiness exists, and reset successor(u):=v

Takes O(N2) steps. But loopiness is a rare event.

No protocol for recovery exists from a split ring.

New peers joining (3)

• A new peer affects O(log N) other finger entries in the system. So, the number of messages per peer join= O(log(N)*log(N))

• Similar set of operations for dealing with peers leaving

Bidirectional Chord

Each node u has fingers to

u+1, u+2, u+4, u+8 … as well asu-1, u-2, u-4, u-8 …

How does it help?

Skip Lists and Skip Graphs

Some slides adapted from the original slides by

James AspnesGauri Shah

68

Definition of Skip List

A skip list for a set L of distinct (key, element) items is

a series of linked lists L0, L1 , … , Lh such that

Each list Li contains the special keys and List L0 contains the keys of L in non-decreasing order Each list is a subsequence of the previous one, i.e.,

L0 L1 … Lh

List Lh contains only the two special keys and

69

Skip List

Dictionary based on a probabilistic data structure.

Allows efficient search, insert, and delete operations.

Each element in the dictionary typically stores additional

useful information beside its search key. Example:

<student id. Transcripts> [for University of Iowa]

<date, news> [for Daily Iowan]

Probabilistic alternative to a balanced tree.

70

Skip List

A G J M R W

HEAD TAIL

Each node linked at higher level with probability 1/2.

Level 0

A J M

Level 1

J

Level 2

71

Another example

56 64 78 31 34 44 12 23 26

31

64 31 34 23

L0

L1

L2

Each element of Li appears in Li+1 with probability p.Higher levels denote express lanes.

72

Searching in Skip List

Search for a key x in a skip list as follows:

Start at the first position of the top list At the current position P, compare x with y key(after(p))

x y -> return element(after (P))x y -> “scan forward” x y -> “drop down”

– If we move past the bottom list, then no such key exists

73

Example of search for 78

L1

L2

L3

31

64 31 34 23

56 64 78 31 34 44 12 23 26L0

At L1 P is at, is bigger than 78, we drop down

At L0, 78 = 78, so the search is over.

74

Insertion

• The insert algorithm uses randomization to decide in how many levels the new item <k> should be added to the skip list.

• After inserting the new item at the bottom level flip a coin.

• If it returns tail, insertion is complete. Otherwise, move to next higher level and insert <k> in this level at the

appropriate position, and repeat the coin flip.

75

Insertion Example

10 36

23

23

L0

L1

L2

L0

L1

L2

L3

10 362315

15

2315p0

p1

p2

1) Suppose we want to insert 152) Do a search, and find the spot between 10 and 233) Suppose the coin come up “head” three times

76

Deletion

• Search for the given key <k>. If a position with key <k> is not found, then no such key exists.

• Otherwise, if a position with key <k> is found (it will be definitely found on the bottom level), then we remove all occurrences of <k> from every level.

• If the uppermost level is empty, remove it.

77

Deletion Example

1) Suppose we want to delete 34

2) Do a search, find the spot between 23 and 45

3) Remove all the position above p

4512

23

23

L0

L1

L2

L0

L1

L2

4512 23 34

34

23 34p0

p1

p2

Remove this level

78

Constant number of pointers

Average number of pointers per node = O(1)

Total number of pointers

= 2.n + 2. n/2 + 2. n/4 + 2. n/8 + … = 4.n

So, the average number of pointers per node = 4

79

Number of levels

Pr[a given element x is above level c log n] = 1/2c log n = 1/nc

Pr[any element is above level c log n] = n. 1/nc

= 1/nc-1

The number of levels = O(log n) w.h.p

80

Search time

Consider a skiplist with two levels L0 and L1. To search

a key, first search L1 and then search L0.

Cost (i.e. search time) = length (L1) + n / length (L1)

Minimum when length (L1) = n / length (L1).

Thus length(L1) = (n) 1/2, and cost = 2. (n) 1/2

(Three lists) minimum cost = 3. (n)1/3

(Log n lists) minimum cost = log n. (n) 1/log n = 2.log n

81

Skip lists for P2P?

• Heavily loaded top-level nodes.

• Easily susceptible to failures.

• Lacks redundancy.

Disadvantages

Advantages

• O(log n) expected search time.

• Retains locality.

• Dynamic node additions/deletions.

82

A Skip Graph

A001

J001

M011

G100 W101

R110

Level 1

G

R

W

A J M000 001 011

101

110

100Level 2

A G J M R W001 001 011100 110 101Level 0

Membership vectors

Link at level i to nodes with matching prefix of length i.Think of a tree of skip lists that share lower layers.

83

Properties of skip graphs

1. Efficient Searching.2. Efficient node insertions & deletions.3. Independence from system size.4. Locality and range queries.

84

Searching: avg. O (log n)

Same performance as DHTs.

A J MG WR

Level 1

GR

WA J MLe

vel 2

A G J M R W

Level 0

Restricting to the lists containing the starting element of the search, we get a skip list.

85

Node Insertion – 1

A

001

M

011

G

100

W

101

R

110

Level 1

G

R

W

A M

000 011

101

110

100Level 2

A G M R W

001 011100 110 101Level 0

J

001

Starting at buddy node, find nearest key at level 0.Takes O(log n) time on average.

buddy new node

86

Node Insertion - 2At each level i, find nearest node with matching

prefix of membership vector of length i+1.

A

001

M

011

G

100

W

101

R

110

Level 1G

R

W

A M

000 011

101

110

100Level 2

A G M R W

001 011100 110 101Level 0

J

001

J

001

J

001

Total time for insertion: O(log n)DHTs take: O(log2n)

87

Independent of system sizeNo need to know size of keyspace or number of nodes.

E Z

1 0

E ZJ

insert

Level 0

Level 1

E Z

1 0

E Z

J

0

J00 01

E ZJ

Level 0

Level 1

Level 2

Old nodes extend membership vector as required with arrivals.DHTs require knowledge of keyspace size initially.

88

Locality and range queries

• Find key < F, > F.• Find largest key < x.• Find least key > x.

• Find all keys in interval [D..O].

• Initial node insertion at level 0.

D F A I

D F A I L O S

89

Applications of locality

news:02/13

e.g. find latest news from yesterday. find largest key < news: 02/13.

news:02/11 news:02/12news:02/10news:02/09Level 0

DHTs cannot do this easily as hashing destroys locality.

e.g. find any copy of some Britney Spears song.

britney05britney03 britney04britney02britney01Level 0

Data Replication

Version Control

90

Load balancing

Interested in average load on a node u.i.e. the number of searches from source s to destination t that use node u.

Theorem: Let dist (u, t) = d. Then the probability that a search from s to t passes through u is < 2/(d+1).

where V = {nodes v: u <= v <= t} and |V| = d+1.

91

Nodes u

Skip list restriction

Level 0

Level 1

Level 2

Node u is on the search path from s to t only if it is inthe skip list formed from the lists of s at each level.

s

92

Tallest nodes

Node u is on the search path from s to t only if it isin T = the set of k tallest nodes in [u..t].

u

u t

s u is not on path.

tu

u

s

u

u is on path.

Pr [u T] = Pr[|T|=k] • k/(d+1) = E[|T|]/(d+1).å ∑k=1

d+1

Heights independent of position, so distances are symmetric.

1 Peer-to-Peer Networks. Overlay Network A logical network laid on top of the Internet A B C...

Documents

Transcript of 1 Peer-to-Peer Networks. Overlay Network A logical network laid on top of the Internet A B C...