1 Peer-to-Peer Networks. Overlay Network A logical network laid on top of the Internet A B C...
-
Upload
merryl-white -
Category
Documents
-
view
222 -
download
4
Transcript of 1 Peer-to-Peer Networks. Overlay Network A logical network laid on top of the Internet A B C...
1
Peer-to-Peer Networks
Overlay Network
A logical network laid on top of the Internet
A
B
C
Internet
Logical link AB Logical link BC
The Formal Model
Let V be a set of nodes. The functions
id : V Z+ assigns a unique id to each node in V
rs : V {0, 1}* assigns a random bit string to each node in V (optional)
A family of overlay networks ON : F G, where F is the set
of all triples λ= (V; id; rs) and G is the set of all directed graphs.
A unique directed graph ON(λ)∈ G with each labeled set λ = (V;
id; rs) of nodes.
Each node contains one or more objects. One important objective
is SEARCH: any node must be able to access any object as quickly
as possible
Structured vs. Unstructured
Overlay networks
Unstructured Structured
No restriction on
network topology.
Examples: Gnutella,
Kazaa, Bittorrent,
Skype etc.
Network topology
satisfies specific
invariants. Examples:
Chord, CAN, Pastry
Skip Graph etc
5
Gnutella
The Gnutella network is a fully distributed
alternative to the centralized Napster.
Initial popularity of the network was spurred
on by Napster's legal demise in early 2001.
6
What is Gnutella?
object1
No central authority.
object2peer
peer
A protocol for distributed search
7
Remarks
Gnutella uses the simple idea of searching by flooding, but scalability is an issue, since query flooding wastes bandwidth. Uses TTL to control flooding.
Sometimes, existing objects may not be located due to limited TTL.
Subsequently, various improved search strategies have been proposed.
8
Searching in Gnutella
The topology is dynamic, i.e. constantly changing. How do
we model a constantly changing topology? Usually, we begin
with a static topology, and later account for the effect of churn.
Modeling topology
-- Random graph
-- Power law graph
(measurements provide useful inputs)
9
Random graph: Erdös-Rényi model
A random graph G(n, p) is constructed by starting with
a set of n vertices, and adding edges between pairs of
nodes at random. Every possible edge occurs independently
with probability p.
Q. Is Gnutella topology a random graph?
10
Gnutella topology
Gnutella topology is almost a power-law graph. (Also called scale-free graph)
What is a power-law graph? The number of nodes with degree k = c.k - r
(Contrast this with Gaussian distribution where the number of nodeswith degree k = c. 2 - k. )
Many graphs in the nature exhibit power-law characteristics. Examples, world-wide web (the number of pages that have k in-links is proportional to k- 2), The fraction of scientific papers that receive k citations is k-3 etc.
11
AT&T Call Graph
# o
f te
lep
hon
e n
um
be
rsfr
om
wh
ich
ca
lls w
ere
ma
de
# of telephone numbers called
4
How many telephonenumbers receive calls from kdifferent telephone numbers?
12
100 101
100
101
102
number of neighbors
pro
po
rtio
n o
f n
od
esdatapower-law fit = 2.07
Gnutella network
power-law link distribution
summer 2000,data provided by Clip2
5
13
A possible explanation
Nodes join at different times.
The more connections a node has, the more likely it is to acquire new connections (“Rich gets richer”).Popular webpages attract new pointers.
It has been mathematically shown that such a growth process produces power-law network
7
14
Search strategies
•Flooding
•Random walk /
- Biased random walk/
- Multiple walker random walk
(Combined with)
• One-hop replication /
• Two-hop replication
• k-hop replication
15
On Random walk
Let p(d) be the probability that a random walk on a d-D lattice returns to the origin. In 1921, Pólya proved that,
(1) p(1)=p(2)=1, but(2) p(d)<1 for d>2
There are similar resultson two walkers meetingeach other via random walk
16
Search via random walk
Existence of a path doesnot necessarily mean that such a path can be discovered
17
Search via Random Walk
Search metrics
Delay = discovery time in hops
Overhead = total distance covered by the walker
Both should be as small as possible.
For a single random walker, these are equal.
K random walkers is a compromise.
For search by flooding, if delay = h then
overhead = d + d2 + … + dh where
d = degree of a node.
18
A simple analysis of random walk
Let p = Population of the object. i.e. the fraction of nodes hosting the object
T = TTL (time to live)
Hop count h Probability of success
1 p
2 (1-p).p
3 (1-p)2.p
T (1-p)T-1.p
19
A simple analysis of random walk
Expected hop count E(h) =
1.p + 2.(1-p).p + 3(1-p)2.p + …+ T.(1-p)T-1.p
= 1/p. (1-(1-p)T) - T(1-p)T
With a large TTL, E(h) = 1/p
With a small TTL, there is a risk that search will time out before an existing object is located.
20
K random walkers
As k increases, the overhead increases, but the delay decreases. There is a tradeoff.
Assume they all k walkers start in unison. Probability that none could find the object after one hop = (1-p)k. The probability. that none succeeded after T hops = (1-p)kT. So the probability that at least one walker succeeded is 1-(1-p)kT. A typical assumption is that the search is abandoned as soon as at least one walker succeeds. Usingthese, one can derive a new value of E(h)
21
Increasing search efficiency
Major strategies
1. Biased walk utilizing node degree heterogeneity.
2. Utilizing structural properties like random graph,
power-law graphs, or small-world properties
3. Topology adaptation for faster search
4. Introducing two layers in the graph structure using
supernodes
22
One hop replication
Each node keeps track of the indices of the files belonging to its
immediate neighbors. As a result, high capacity / high degree nodes
can provide much better clues to a large number of search queries.
Where is
23
Biased random walk
P=5/10
P=3/10
P=2/10
Each node records the degree of the neighboring nodes. Search easily gravitates towards high degree nodes that hold more clues.
24
1
6
54
63
67
2
94
number ofnodes found
power-law graph
9
Deterministic biased walk
25
The next step
This growing surge in popularity revealed the limits of the initial
protocol's scalability. In early 2001, variations on the protocol
improved the scalability. Instead of treating every node as client
and server, some resource-rich nodes were used as ultrapeers
or “supernodes,” containing indices of the objects in the local
neighborhood. Search requests and responses were routed
through them leading to faster response.
26
The KaZaA approach
Powerful nodes (supernodes) act as local index servers, and
client queries are propagated to other supernodes. Two-layered
architecture.
Supernode
download
Supernode
Where isABC?
ABC
The Chord P2P Network
Some slides have been borrowed from the original presentation by the authors
Main features of Chord
-- Load balancing via Consistent Hashing
– Small routing tables per node: log n
– Small routing delay: log n hops
– Fast join/leave protocol (polylog time)
Consistent Hashing
-- Assigns both nodes and objects an m-bit key.-- Order these nodes around an identifier circle (what does a circle mean here?) according to the order of their keys (0 .. 2m-1). This ring is known as the Chord Ring.An object with key k is assigned to the first node whosekey is ≥ k (called the successor node of key k)
Nodes and Objects on the Chord Ring
N32
N90
N105
K80
K20
K5
Circular 7-bitID space
Key 5Node 105
A key k is stored at its successor (node with key ≥ k)
Consistent Hashing [Karger 97]
Property 1
If there are N nodes and K object keys, then with high probability,
each node is responsible for (1+ )K/N objects.
Property 2
When a node joins or leaves the network, the responsibility
of at most O(K/N) keys changes hand (only to or from the node
that is joining or leaving.
When K is large, the impact is quite small.
The log N Fingers
(0)
Each node knows of only log N other nodes.
N80
1/8
1/161/321/641/128
Circular (log N)-bitID space
Distance of N80’sneighbors from
N80
1/4 1/2
Finger i points to successor of n+2i
N80
½¼
1/8
1/161/321/641/128
112
N120
Chord Finger Table
(0)
N32
N60
N79
N70
N113
N102
N40
N52
33..33 N4034..35 N4036..39 N4040..47 N4048..63 N5264..95 N7096..31 N102
Node n’s i-th entry: first node n + 2i-1
N32’sFinger Table
N80
N85
N=128
Finger table actually contains ID and IP address
Lookup
33..33 N4034..35 N4036..39 N4040..47 N4048..63 N5264..95 N7096..31 N102
N32’sFinger Table
Node 32, lookup(82): 32 70 80 85.
71..71 N7972..73 N7974..77 N7978..85 N8086..101 N102102..5 N1026..69 N32
N70’sFinger Table
(0)
N32
N60N79
N70
N113
N102
N40
N52
N80
N85 81..81 N8582..83 N8584..87 N8588..95 N10296..111 N102112..15 N11316..79 N32
N80’sFinger Table
Greedy routing
New Node Join
(0)
N32
N60
N80
N70
N113
N102
N40
N52
1 21..212 22..233 24..274 28..355 36..516 52..837 84..19
N20’sFinger Table
N20
Assume that the new node N20 knows one of the existing nodes.
New Node Join (2)(0)
N32
N60
N80
N70
N113
N102
N40
N52
21..21 N3222..23 N3224..27 N3228..35 N3236..51 N4052..83 N5284..19 N102
N20’sFinger Table
N20
Node 20 asks that node to locate the successors of 21, 22, …, 52, 84.
The Join procedure
The new node id asks a gateway node n to find the successor of id
n. find_successor(id)
if id (n, successor]
then return successor
else forward the query around the circle
fi
Needs O(n) messages for a simple Chord ring. This is slow.
Steps in join
id
n
Successor(n)
id
n
Finally
But the transition does not happen immediately
Linked list insert
A More Efficient Join
// ask n to find the successor of id
if id (n, successor]
then return successor
else n’= closest_ preceding_node (id)
return n’.find_successor(id)
fi
// search for the highest predecessor of id
n. closest_preceding_node(id)
for i = log N downto 1
if (finger[i] (n,id)
return finger[i]
Example
(0)
N32
N60
N80
N70
N113
N102
N40
N52
N20
K65
N20 wants tofind out thesuccessor of key 65
After join move objects(0)
N32
N60
N80
N70
N113
N102
N40
N52
21..21 N3222..23 N3224..27 N3228..35 N3236..51 N4052..83 N5284..19 N102
N20’sFinger Table
N20
Node 20 moves documents from node 32.
D114..20
Notify nodes that must includeN20 in their table. N113[1]=N20, not N32.
Three steps in join
Step 1. Initialize predecessor and fingers of the new node.
Step 2. Update the predecessor and the fingers of the
existing nodes. (Thus notify nodes that must include
N20 in their table. N113[1] = N20, not N32.
Step 3. Transfer objects to the new node as appropriate.
(Knowledge of predecessor is useful in stabilization)
Concurrent Join
New node n
n1
n2
[Before]
New node n
n2
n1
[After]
New node n’ New node n’
Stabilization
New node n
n2
n1
Predecessor.successor(n1) ≠ n1, so n1 adoptspredecessor.successor(n1) = n as its new successor
New node n
n2
Periodic stabilization is needed to integrate the newnode into the network and restore the invariant.
n1
The complexity of join
With high probability, any node joining or leaving
an N-node Chord network will use O(log2N) messages
to re-establish the Chord routing invariants and finger
tables.
Chord Summary
• Log(n) lookup messages and table space.• Well-defined location for each ID.• Natural load balance due to consistent hashing.• No name structure imposed.• Minimal join/leave disruption.
ChordAdvanced issues
Analysis
Theorem. Search takes O (log N) time
2m = key space, N= number of nodes
Proof. After log N forwarding steps, distance to key is at most
(N= 2log N). Number of nodes in the remaining range
is O(log N) with high probability (property of consistent hashing).
So by using successors in that range, it will take at most an
Additional O(log N) forwarding steps.
Nm /2
Analysis (contd.)
O(log N) search time is true if finger and
successor entries correct, But what if these
entries are wrong (which is possible during
join or leave operations, or process crash?)
Search under peer failures
N80
0Say m=7
N32
N45
File abcnews.com with key K42 stored here
XX
X
N32 crashed. Lookup for K42 fails (N16 does not know N45)
N112
N96
N16
Who has abcnews.com?(hashes to K42)
Search under peer failures
N80
0Say m=7
N32
N45
File abcnews.com with key K42 stored here
X
One solution: maintain r multiple successor entries in case of a failure, use other successor entries.
N112
N96
N16
Who has abcnews.com?(hashes to K42)
Reactive vs.Proactive approach
Search under peer failures
Choosing r=2log(N) suffices to maintain the correctness “with high probability.” Say 50% of nodes fail (i.e prob of failure = ½). For a given node, Probability (at least one successor alive) =
2log2 1
1)2
1(1
NN −=−
Search under peer failures (2)
N80
0Say m=7
N32
N45
File abcnews.com with key K42 stored here
X
X
Lookup fails (N45 is dead)
N112
N96
N16
Who has abcnews.com?(hashes to K42)
Search under peer failures (2)
N80
0Say m=7
N32
N45
File abcnews.com with key K42 stored here
X
One solution: replicate file/key at r successors and predecessors
N112
N96
N16
K42 replicated
K42 replicated
Who has abcnews.com?(hashes to K42)
Dealing with dynamic issues
Peers fail
New peers join
Peers leave
Need to update successors and fingers, and ensure keys reside in the right places
New peers joining
N80
0Say m=7
N32
N45
N112
N96
N16
N40
Some gateway node directs N40 to its successor N45N32 updates successor to N40N40 initializes successor to N45, and obtains fingers from itN40 periodically talks to neighbors to update finger table
Stabilization protocol
New node
Gateway node
New peers joining (2)
N80
0Say m=7
N32
N45
N112
N96
N16
N40
N40 may need to copy some files/keys from N45(files with fileid between 32 and 40)
K34,K38
Concurrent join
N80
0
Say m=7
N32
N45
N112
N96
N16
N24
K38
N20
N28
Argue that each node will eventually be reachable
K24
Effect of join on lookup
If in a stable network with N nodes, another set
of N nodes joins the network, and the join
protocol correctly sets their successors, then
lookups will take O(log N) steps w.h.p
Effect of join on lookup
N80
0
N32
N45
N112
N96
N16
N24
K38
N20
N28
K24
Transfer pendingLinear ScanWill locate
K24
Consistent hashingguarantees that therebe O(log N) new nodes
w.h.p between twoconsecutive nodes
Weak and Strong Stabilization
N5
N3
N1
N24
N63
N78
N96
u (successor (predecessor (u))) = u. Still it is weakly stable but not strongly stable. Why?
Loopy network
Loopy network
What is funny / awkward about this?
v: u < v < successor (u)
(succ (pred (u))) = u
(Weakly stable)
stable
Must be falsefor strong stability
N5
N3
N1
N24
N63
N78
N96
Strong stabilization
The key idea of recovery from loopiness is: Let each node u ask its successor to walk around the ring until it reaches anode v : u <v ≤ successor(u). If
v: u <v < successor(u)
then loopiness exists, and reset successor(u):=v
Takes O(N2) steps. But loopiness is a rare event.
No protocol for recovery exists from a split ring.
New peers joining (3)
• A new peer affects O(log N) other finger entries in the system. So, the number of messages per peer join= O(log(N)*log(N))
• Similar set of operations for dealing with peers leaving
Bidirectional Chord
Each node u has fingers to
u+1, u+2, u+4, u+8 … as well asu-1, u-2, u-4, u-8 …
How does it help?
Skip Lists and Skip Graphs
Some slides adapted from the original slides by
James AspnesGauri Shah
68
Definition of Skip List
A skip list for a set L of distinct (key, element) items is
a series of linked lists L0, L1 , … , Lh such that
Each list Li contains the special keys and List L0 contains the keys of L in non-decreasing order Each list is a subsequence of the previous one, i.e.,
L0 L1 … Lh
List Lh contains only the two special keys and
69
Skip List
Dictionary based on a probabilistic data structure.
Allows efficient search, insert, and delete operations.
Each element in the dictionary typically stores additional
useful information beside its search key. Example:
<student id. Transcripts> [for University of Iowa]
<date, news> [for Daily Iowan]
Probabilistic alternative to a balanced tree.
70
Skip List
A G J M R W
HEAD TAIL
Each node linked at higher level with probability 1/2.
Level 0
A J M
Level 1
J
Level 2
71
Another example
56 64 78 31 34 44 12 23 26
31
64 31 34 23
L0
L1
L2
Each element of Li appears in Li+1 with probability p.Higher levels denote express lanes.
72
Searching in Skip List
Search for a key x in a skip list as follows:
Start at the first position of the top list At the current position P, compare x with y key(after(p))
x y -> return element(after (P))x y -> “scan forward” x y -> “drop down”
– If we move past the bottom list, then no such key exists
73
Example of search for 78
L1
L2
L3
31
64 31 34 23
56 64 78 31 34 44 12 23 26L0
At L1 P is at, is bigger than 78, we drop down
At L0, 78 = 78, so the search is over.
74
Insertion
• The insert algorithm uses randomization to decide in how many levels the new item <k> should be added to the skip list.
• After inserting the new item at the bottom level flip a coin.
• If it returns tail, insertion is complete. Otherwise, move to next higher level and insert <k> in this level at the
appropriate position, and repeat the coin flip.
75
Insertion Example
10 36
23
23
L0
L1
L2
L0
L1
L2
L3
10 362315
15
2315p0
p1
p2
1) Suppose we want to insert 152) Do a search, and find the spot between 10 and 233) Suppose the coin come up “head” three times
76
Deletion
• Search for the given key <k>. If a position with key <k> is not found, then no such key exists.
• Otherwise, if a position with key <k> is found (it will be definitely found on the bottom level), then we remove all occurrences of <k> from every level.
• If the uppermost level is empty, remove it.
77
Deletion Example
1) Suppose we want to delete 34
2) Do a search, find the spot between 23 and 45
3) Remove all the position above p
4512
23
23
L0
L1
L2
L0
L1
L2
4512 23 34
34
23 34p0
p1
p2
Remove this level
78
Constant number of pointers
Average number of pointers per node = O(1)
Total number of pointers
= 2.n + 2. n/2 + 2. n/4 + 2. n/8 + … = 4.n
So, the average number of pointers per node = 4
79
Number of levels
Pr[a given element x is above level c log n] = 1/2c log n = 1/nc
Pr[any element is above level c log n] = n. 1/nc
= 1/nc-1
The number of levels = O(log n) w.h.p
80
Search time
Consider a skiplist with two levels L0 and L1. To search
a key, first search L1 and then search L0.
Cost (i.e. search time) = length (L1) + n / length (L1)
Minimum when length (L1) = n / length (L1).
Thus length(L1) = (n) 1/2, and cost = 2. (n) 1/2
(Three lists) minimum cost = 3. (n)1/3
(Log n lists) minimum cost = log n. (n) 1/log n = 2.log n
81
Skip lists for P2P?
• Heavily loaded top-level nodes.
• Easily susceptible to failures.
• Lacks redundancy.
Disadvantages
Advantages
• O(log n) expected search time.
• Retains locality.
• Dynamic node additions/deletions.
82
A Skip Graph
A001
J001
M011
G100 W101
R110
Level 1
G
R
W
A J M000 001 011
101
110
100Level 2
A G J M R W001 001 011100 110 101Level 0
Membership vectors
Link at level i to nodes with matching prefix of length i.Think of a tree of skip lists that share lower layers.
83
Properties of skip graphs
1. Efficient Searching.2. Efficient node insertions & deletions.3. Independence from system size.4. Locality and range queries.
84
Searching: avg. O (log n)
Same performance as DHTs.
A J MG WR
Level 1
GR
WA J MLe
vel 2
A G J M R W
Level 0
Restricting to the lists containing the starting element of the search, we get a skip list.
85
Node Insertion – 1
A
001
M
011
G
100
W
101
R
110
Level 1
G
R
W
A M
000 011
101
110
100Level 2
A G M R W
001 011100 110 101Level 0
J
001
Starting at buddy node, find nearest key at level 0.Takes O(log n) time on average.
buddy new node
86
Node Insertion - 2At each level i, find nearest node with matching
prefix of membership vector of length i+1.
A
001
M
011
G
100
W
101
R
110
Level 1G
R
W
A M
000 011
101
110
100Level 2
A G M R W
001 011100 110 101Level 0
J
001
J
001
J
001
Total time for insertion: O(log n)DHTs take: O(log2n)
87
Independent of system sizeNo need to know size of keyspace or number of nodes.
E Z
1 0
E ZJ
insert
Level 0
Level 1
E Z
1 0
E Z
J
0
J00 01
E ZJ
Level 0
Level 1
Level 2
Old nodes extend membership vector as required with arrivals.DHTs require knowledge of keyspace size initially.
88
Locality and range queries
• Find key < F, > F.• Find largest key < x.• Find least key > x.
• Find all keys in interval [D..O].
• Initial node insertion at level 0.
D F A I
D F A I L O S
89
Applications of locality
news:02/13
e.g. find latest news from yesterday. find largest key < news: 02/13.
news:02/11 news:02/12news:02/10news:02/09Level 0
DHTs cannot do this easily as hashing destroys locality.
e.g. find any copy of some Britney Spears song.
britney05britney03 britney04britney02britney01Level 0
Data Replication
Version Control
90
Load balancing
Interested in average load on a node u.i.e. the number of searches from source s to destination t that use node u.
Theorem: Let dist (u, t) = d. Then the probability that a search from s to t passes through u is < 2/(d+1).
where V = {nodes v: u <= v <= t} and |V| = d+1.
91
Nodes u
Skip list restriction
Level 0
Level 1
Level 2
Node u is on the search path from s to t only if it is inthe skip list formed from the lists of s at each level.
s
92
Tallest nodes
Node u is on the search path from s to t only if it isin T = the set of k tallest nodes in [u..t].
u
u t
s u is not on path.
tu
u
s
u
u is on path.
Pr [u T] = Pr[|T|=k] • k/(d+1) = E[|T|]/(d+1).å ∑k=1
d+1
Heights independent of position, so distances are symmetric.