Post on 15-Mar-2016
description
Large Scale SharingMarco F. Duarte
COMP 520: Distributed SystemsSeptember 19, 2004
Introduction P2P sharing systems are very popular In P2P, all nodes have identical capabilities
and responsibilities Popular approaches are partially centralized,
do not scale well, or do not provide desired anonymity
Scalability of systems critical Need for decentralized, load-balancing
architectures
Features desired in a P2P sharing system Decentralized architecture – no single point
of failure Scalability – bandwidth and load balancing Fault tolerance – content replication Anonymity for users – posters, readers,
storers Resilient against DoS attacks
Freenet provides anonymity No requester, provider information implicit in
communication Presence of a file in a node does not imply
authorship Popular files are replicated to improve locality Does not intend to provide
permanent storage
Freenet Queries Files receive FileIDs (160-
bit SHA-1 hash of “file identifier”)
Queries have pseudo-unique random identifiers (QueryIDs) and hops-to-live count.
Routing tables contain table of previously retrieved FileIDs and their locations
Queries are routed to location with closest FileID at each stage; loops are detected with QueryID
FileID Node Address
00231311 192.168.3.24
11310231 192.168.52.111
20130102 192.168.122.38
23102312 192.168.213.231
30002312 192.168.58.47
32302132 192.168.33.241
32320303 192.168.194.28
33103123 192.168.12.242
31302313?
Freenet Queries: Lookups and Stores
•Copies of the file are stored at all nodes•File record for a is added to routing tables•Writes perform lookup, insert file along path if no match found
a
e
b
Freenet Properties FileID-based clustering allows for improved routing
as usage increases LRU-like capacity management: rarely used files are
purged from the system Random nature of FileIDs allow for diversity of
information at nodes Attempts to supplant existing files will lead to real file
propagation Anonymity features:
File ownership assumed randomly by other nodes Minimal routing information necessary at each hop Hops-to-live count of 1 updated randomly
Freenet Problems Files that are stored in the network may not
be found. Freenet does not provide reliable storage No notion of locality in routing Simulations do not involve file insertion or
node discovery
PAST: Reliable Distributed Storage Customizable file persistence High availability and load balancing Efficient Routing and Storage Allocation Uses FileIDs generated from hashes like in
Freenet Uses owner credentials to verify identity of
authors Interface: Insert, Lookup, Reclaim
PAST Architecture FileID computed from hash of filename,
owner’s public key and a random salt. Each node receives a pseudorandom
NodeID, independent of the node properties. Owner specifies number k of replicas of a file
to store in the system on insert. File is stored in the k nodes with NodeIDs
closest to the FileID. Routing provided by Pastry.
Pastry: Routing for P2P Networks Paths with less than hops Delivery guaranteed under at most node
failures Flexible proximity metric. Each node contains:
Leaf set – l nodes with closest NodeIDs Routing table – set of neighbors organized by NodeIDs Neighborhood set – l closest nodes Each NodeID is paired with its network
address Direct routes to neighbors and l closest
NodeIDs
Nb2log
2/l
Pastry: Example Routing table
organized by similarity to NodeID.
Neighborhood set used for node addition/recovery.
Queries are forwarded to a numerically closer node (by shared NodeID header, and NodeID proximity).
Pastry Routing Table0=2M
Leaf Set
Neighborhood Set
2300
0302
1033
1123
1202
1311 2031
2121
0231
3013
3321
3133
Pastry Routing Example0=2M
0302
1033
1123
1202
1311 2031
2121
0231 3321
3133
2300
30133133
?
Other nodes exist but are not shown
Pastry Node Insertion Example0=2M
0302
1033
1123
1202
1311 2031
2121
0231 3321
3133
2300
3013
3130
NeighborhoodSet
3130
Leaf Set
Pastry Node Removal Example0=2M
3321
3133
3013
PAST Insertions0=2M
0302
1033
1123
1202
1311 2031
2121
0231 3321
3133
2300
3013
Insert File, FileID 3130
Owner
3130: File,Certificate
3130: File,Certificate
3130: File,Certificate
fileID = Insert(name, owner-credentials, k, file)
Insert File K times
PAST Insertions0=2M
0302
1033
1123
1202
1311 2031
2121
0231 3321
3133
2300
3013
Owner
k Store Receipts
k StoreReceipts
k StoreReceipts
fileID = Insert(name, owner-credentials, k, file)
PAST Semantics fileID = lookup(fileID)
Routed to NodeID = FileID First of k closest nodes found returns file, credentials
Reclaim(fileID, owner-credentials) Same semantics as Insert Owner issues Reclaim Certificate Storing nodes issue Reclaim Receipt
Changes in leaf sets will trigger changes in replica locations A new node creates “pointers” to files it should contain;
migration is gradual
Load Balancing in PAST: Replica Diversion
3130 Leaf Set
3201Leaf Set
Load Balancing in PAST: File Diversion
3130 Leaf Set
3201Leaf Set
Change ID by changing salt
Policies for acceptance of replicas and diverted replicas, and selection of diverted replica node.Maximum ratio of file size to free space for insertion tpri, tdiv
Caching in PAST Highly popular files might demand more
replicas than specified. Files located “far away” only need to be
fetched once locally Unused disk space is allocated as cache. Caching performance degrades gradually
with increased utilization Cache insertion policy similar to diversion
policies.
PAST Performance: tpri comparison, tdiv =0.05
82.00%84.00%86.00%88.00%90.00%92.00%94.00%96.00%98.00%
100.00%
0.05 0.1 0.2 0.5
t_pri
Perc
enta
ge
SucceedUtilization
PAST Performance: tpri comparison, tdiv =0.05
PAST Performance:Ratio of File Diversions
PAST Performance: Ratio of Replica Diversions
PAST Performance: Failed Insertions
PAST Performance: Cache Hits
Conclusions Content based routing improves scalability of
distributed storage systems. Need for user authentication in distributed
systems. Caching is crucial for system performance. Diversion allows for graceful performance
degradation. Need file mutability, file search or indexing
services