LOOKING UP DATA IN P2P SYSTEMS
description
Transcript of LOOKING UP DATA IN P2P SYSTEMS
![Page 1: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/1.jpg)
LOOKING UP DATAIN P2P SYSTEMS
Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris
Ion StoicaMIT LCS
![Page 2: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/2.jpg)
Key Idea
• Survey paper• Discusses how to access data in a P2P system• Covers four solutions
– CAN– Chord– Pastry– Tapestry
![Page 3: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/3.jpg)
INTRODUCTION
• P2P systems are popular due to– Low startup cost– High scalability at very low cost– Use of resources that would otherwise remain
unused– Potential for greater robustness
• Fully decentralized and distributed
![Page 4: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/4.jpg)
The lookup problem
• How do we locate data in large P2P systems?• One solution
– Distributed hash tables (DHT)
![Page 5: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/5.jpg)
Previous solutions (I)
• Centralized database– Napster
• Not scalable • Vulnerable to attacks on database
![Page 6: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/6.jpg)
Previous solutions (II)
• Broadcasting– Customers broadcast their requests to their
neighbors, which forward them to their own neighbors and so on
– Gnutella– Does not scale either
• Broadcast messages consume too much bandwidth
![Page 7: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/7.jpg)
Previous solutions (III)
• Internet DNS– Organizes network nodes into an hierarchy– All searches start at top of hierarchy
• Propagate down– Used by KaZaA, Grokster and others– Nodes higher in the tree do much more work than
lower nodes– Solution vulnerable to loss of root node(s)
![Page 8: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/8.jpg)
Previous solutions (IV)
• Freenet– Forwards queries from node to node until
requested data are found– Emphasis is on anonymity
• Not performance• Unpopular documents may become
inaccessible –Nobody cares!
![Page 9: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/9.jpg)
DISTRIBUTED HASH TABLES
• Implements primitive lookup(key)– Produces a path going from a node no to the
node holding key
• Big tradeoff is between– Keeping paths short– Minimizing state information kept by nodes
![Page 10: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/10.jpg)
Main design issues
• Mapping keys to nodes in a balanced way– Use a hash function
• Forwarding a lookup for a key to appropriate node– Find at each step a node closer to the node
holding the key• Building routing tables
– Each node should have a successor
![Page 11: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/11.jpg)
CAN
• Uses a d-dimensional key space– Partitioned into hyper-rectangles
• "Zones"– Each node manages a zone
• Responsible for all keys in zone
![Page 12: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/12.jpg)
Neighbors
• Each node keeps track of addresses of all its neighbors – Routing table
• Neighbors are defined as nodes sharing a (d-1) dimensional hyper-plane– Contacts with fewer dimensions in common
do not count
![Page 13: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/13.jpg)
A two-dimensional example (I)
![Page 14: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/14.jpg)
A two-dimensional example (II)
X(0, 0; 0.5, 0.5)
(0, 0) (1, 0)
(1, 1)(0, 1)
In reality the state space wraps
X(0, 0.5; 0.5, 1)
X(0.5, 0.5; 1, 1)
X(0.5, 0.25; 0.75, 0.5)
X(0.5, 0; 0.75, 0.25)
X(0.75, 0;1, 0.5)
![Page 15: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/15.jpg)
A path from (0.25, 0.3) to (0.8, 0.8)
X(0, 0; 0.5, 0.5)
(0, 0) (1, 0)
(1, 1)(0, 1)
In reality the state space wraps
X(0, 0.5; 0.5, 1)
X(0.5, 0.5; 1, 1)
X(0.5, 0.25; 0.75, 0.5)
X(0.5, 0; 0.75, 0.25)
X(0.75, 0;1, 0.5)
![Page 16: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/16.jpg)
Lookup
• Routing tries to approximate the straight path between current zone and zone holding the key
• Various optimizations attempt to reduce lookup latency
![Page 17: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/17.jpg)
Dynamic behavior
• When a node joins the network – It picks random point in space– Find node managing the zone– Splits with it current zone
• When a node departs– Zones are merged
• More complex process
![Page 18: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/18.jpg)
Fault-tolerance
• When a node fails neighbor with smallest zone takes over– Multiple failures may cause too many nodes
to handle multiple zones
![Page 19: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/19.jpg)
CHORD
• Assigns ID's to keys and nodes in the same address space
• ID's are organized in a ring– ID 0 follows the highest ID
• Each node is responsible for all keys that immediately precede it in the key space
![Page 20: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/20.jpg)
Example
N 4
K 6
N 12
N 20
N 24 K1
K 10K 15
![Page 21: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/21.jpg)
Finger table
• Each node keeps a table containing IP addresses of nodes– Halfway around in the key space– Quarter-of-the-way around– …
• Table has log N entries– Allows O(log N) searches
![Page 22: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/22.jpg)
Partial example
N 4
N 12
N 20
N 24
![Page 23: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/23.jpg)
Fault-tolerance
• Each node has a successor list – Contains IP addresses of next r successors
• Guarantees routing progress as long as all r successors are not down
![Page 24: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/24.jpg)
Dynamic behavior
• New node n learns its place in the Chord ring by asking any extant node to do a lookup(n)
• Must also– Update successor list of its predecessor– Create its own successor list
![Page 25: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/25.jpg)
PASTRYPASTRY
• Scalable, self-organizing, routing and object Scalable, self-organizing, routing and object location infrastructurelocation infrastructure
• Each node has a node IDEach node has a node ID– IDs are uniformly distributed in the ID spaceIDs are uniformly distributed in the ID space
• Includes a Includes a proximity metricproximity metric to measure to measure distances between pairs of ID'sdistances between pairs of ID's
![Page 26: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/26.jpg)
Pastry NodesPastry Nodes
• Each node maintains three sets of nodesEach node maintains three sets of nodes– Leaf setLeaf set
• Closest nodes in terms of node ID'sClosest nodes in terms of node ID's• Same function as Chord's successor listSame function as Chord's successor list
– Nodes in routing tableNodes in routing table• Prefix routing (big idea)Prefix routing (big idea)
– Neighborhood setNeighborhood set• Closest nodes in terms of proximity metricClosest nodes in terms of proximity metric
![Page 27: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/27.jpg)
Dynamic behaviorDynamic behavior
• Pastry is self-organizingPastry is self-organizing– Nodes come and goNodes come and go– Includes a seed discovery protocolIncludes a seed discovery protocol
![Page 28: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/28.jpg)
Prefix RoutingPrefix Routing
• At each step, a node forwards an incoming At each step, a node forwards an incoming request to a node whose node id has largest request to a node whose node id has largest common prefix with common prefix with
• Destination ID: Destination ID: 12301230• Node ID: Node ID: 1023023• Next Hop: Next Hop: 1212----
![Page 29: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/29.jpg)
Routing table for node 1023
0221 2230 3120
1130 1233 1302
1003 1013 1032
1020 1022
No common prefixOne common digitTwo common digitsThree common digits
![Page 30: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/30.jpg)
Routing request for node 1230
0221 2230 3120
1130 1223 1302
1003 1013 1032
1020 1022
No common prefixOne common digit
Two common digitsThree common digits
Request is always send to a node having at least one more common prefix digit. Here it's node 1223
![Page 31: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/31.jpg)
At node 1233
0221 2230 3120
1030 1130 1302
1201 1211 1220
1230 1232
No common prefixOne common digitTwo common digitsThree common digits
Node with at least one more common prefix digitis node 1230
![Page 32: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/32.jpg)
TAPESTRY
• Interprets keys as sequences of digits• Incremental prefix routing
– Similar to Pastry• Main contribution is emphasis on proximity
– In the actual world• Reduces query latency• Makes system much more complex
![Page 33: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/33.jpg)
CONCLUSIONS
• Major issues include– Operational costs:
searches are all O(log n); storage costs vary– Fault-tolerance and concurrent changes:
only Chord and Tapestry can handle them– Proximity routing:
Pastry, CAN and Tapestry have heuristics– Malicious nodes:
Pastry checks node ID's
![Page 34: LOOKING UP DATA IN P2P SYSTEMS](https://reader036.fdocuments.in/reader036/viewer/2022062304/56814413550346895db0b1a1/html5/thumbnails/34.jpg)
Summary of costs
CAN Chord Pastry Tapestry
Node state1 d log N log N log NLookup2 dN1/d log N log N log NJoin2 dN1/d +
d log Nlog2 N log2 N log2 N
1 number of other nodes known by a given