1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility...

49
1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics in Reliable Distributed Computing 21/11/2004 Partially borrowed from Peter Druschel’s presentation
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    1

Transcript of 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility...

Page 1: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

1

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

Gabi Kliot, Computer Science Department, Technion

Topics in Reliable Distributed Computing21/11/2004

Partially borrowed from Peter Druschel’s presentation

Page 2: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

2

Outline

Introduction Pastry overview PAST Overview Storage Management Caching Experimental Results Conclusion

Page 3: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

3

“Storage management and caching in PAST, a large-scale persistent peer-to-peer storage utility” Antony Rowstron (Microsoft Research) Peter Druschel (Rice University)

“Pastry: scalable, decentralized object location and routing for large-scale peer-to-peer systems” Antony Rowstron (Microsoft Research) Peter Druschel (Rice University)

Sources

Page 4: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

4

PASTRYPASTRY

Page 5: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

5

PastryGeneric p2p location and routing substrate

(DHT)

Self-organizing overlay network (join, departures, locality repair)

Consistent hashing Lookup/insert object in < log2

b N routing steps

(expected) O(log N) per-node state Network locality heuristics

Scalable, fault resilient, self-organizing, locality aware, secure

Page 6: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

6

Pastry: API

nodeId=pastryInit(Credentials, Applicaton): join local node to Pastry network

route(M, X): route message M to node with nodeId numerically closest to X

Application callbacks: deliver(M): deliver message M to application forwarding(M, X): message M is being

forwarded towards key X newLeaf(L): report change in leaf set L to

application

Page 7: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

7

Pastry: Object distribution

objId/key

Consistent hashing

128 bit circular id space

nodeIds (uniform random)

objIds/keys (uniform random)

Invariant: node with numerically closest nodeId maintains object

nodeIds

O2128 - 1

Page 8: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

8

Pastry: Object insertion/lookup

X

Route(X)

Msg with key X is routed to live node with nodeId closest to X

Problem:

complete routing table not feasible

O2128 - 1

Page 9: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

9

Pastry: Routing

Tradeoff

O(log N) routing table size 2b * log2

bN + 2l

O(log N) message forwarding steps

Page 10: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

10

Pastry: Routing table (# 10233102)

L nodes in leaf set

log2b N Rows

(actuallylog2b 2128=

128/b)

2b columns

L neighbors

Page 11: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

Pastry: Leaf sets

Each node maintains IP addresses of the nodes with the L numerically closest larger and smaller nodeIds, respectively. routing efficiency/robustness fault detection (keep-alive) application-specific local coordination

Page 12: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

12

Pastry: Routing procedure

If (destination is within range of our leaf set) forward to numerically closest member

elselet l = length of shared prefix let d = value of l-th digit in D’s addressif (Rl

d exists) forward to Rl

d

else forward to a known node (from ) that (a) shares at least as long a prefix(b) is numerically closer than this node

MRL

Page 13: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

13

Pastry: Routing

Properties• log2

b N steps • O(log N) state

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

Page 14: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

14

Pastry: RoutingIntegrity of overlay: guaranteed unless L/2 simultaneous

failures of nodes with adjacent nodeIds

Number of routing hops: No failures: < log2

b N expected, 128/b

+ 1 max During failure recovery:

O(N) worst case, average case much better

Page 15: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

15

Pastry: Locality propertiesAssumption: scalar proximity metric e.g. ping/RTT delay, # IP hops traceroute, subnet masks a node can probe distance to any other node

Proximity invariant: Each routing table entry refers to a node closeto the local node (in the proximity space), amongall nodes with the appropriate nodeId prefix.

Page 16: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

16

Pastry: Geometric Routing in proximity space

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1 d467c4

65a1fcd13da3

d4213f

d462ba

Proximity space

The proximity distance traveled by message in each routing step is exponentially increasing (entry in row l is chosen from a set of nodes of size N/2bl)The distance traveled by message from its source increases monotonically at each step (message takes larger and larger strides)

NodeId space

Page 17: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

17

Pastry: Locality properties

Each routing step is local, but there is no guarantee of globally shortest path

Nevertheless, simulations show: Expected distance traveled by a message

in the proximity space is within a small constant of the minimum

Among k nodes with nodeIds closest to the key, message likely to reach the node closest to the source node first

Page 18: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

18

Pastry: Self-organization

Initializing and maintaining routing tables and leaf sets

Node addition Node departure (failure)

The goal is to maintain all routing table entries

to refer to a near node, among all live nodes with appropriate prefix

Page 19: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

19

New node X contacts nearby node A A routes “join” message to X, which arrives

to Z, closest to X X obtains leaf set from Z, i’th row for

routing table from i’th node from A to Z X informs any nodes that need to be aware

of its arrival X also improves its table locality by requesting

neighborhood sets from all nodes X knows In practice: optimistic approach

Pastry: Node addition

Page 20: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

20

Pastry: Node addition

X=d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

A = 65a1fc

Z=d467c4d471f1

New node: X=d46a1c

Page 21: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

21

d467c4

65a1fcd13da3

d4213f

d462ba

Proximity space

Pastry: Node addition

New node: d46a1c

d46a1c

Route(d46a1c)

d462bad4213f

d13da3

65a1fc

d467c4d471f1

NodeId spaceX is close to A, B is close to B1. Why X is close to B1?The expected distance from B to its row one entries (B1) is much largerthan the expected distance from A to B (chosen from exponentially decreasing set size)

Page 22: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

22

Node departure (failure)

Leaf set repair (eager – all the time): Leaf set members exchange keep-alive

messages request set from furthest live node in set

Routing table repair (lazy – upon failure): get table from peers in the same row, if not

found – from higher rows Neighborhood set repair (eager)

Page 23: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

23

Pastry: Security

Secure nodeId assignment Randomized routing – pick random

node among all potential Byzantine fault-tolerant leaf set

membership protocol

Page 24: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

24

Pastry: Distance traveled

|L|=16, 100k random queriesProximity in emulated network. Nodes paced randomly

0.8

0.9

1

1.1

1.2

1.3

1.4

1000 10000 100000Number of nodes

Rel

ativ

e D

ista

nce

Pastry

Complete routing table

Page 25: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

25

Pastry: Summary

Generic p2p overlay network Scalable, fault resilient, self-

organizing, secure O(log N) routing steps (expected) O(log N) routing table size Network locality properties

Page 26: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

26

PASTPAST

Page 27: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

27

INTRODUCTION PAST system

Internet-based, peer-to-peer global storage utility Characteristics:

strong persistence, high availability (by using k replicas) scalability (due to efficient Pastry routing) short insert and query paths query load balancing and latency reduction (due to wide

dispersion, Pastry locality and caching) security

Composed of nodes connected to internet, each node has 128-bit nodeId

Use Pastry for efficient routing scheme No support for mutable files, searching, directory

lookup

Page 28: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

28

INTRODUCTION Function of nodes :

store replicas of files initiate and route client requests to insert or

retrieve files in PAST File-related property :

Inserted files have quasi-unique fileId, File is replicated across multiple nodes To retrieve file, client must know fileId and

decryption key (if necessary) fileId : 160-bit computed as SHA-1 of file

name, owner’s public key, random salt number

Page 29: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

29

PAST Operation Insert: fileId = Insert(name, owner-

credentials, k, file)1. fileId computed (hash code of file name,

public key, etc.)2. Request Message reaches one of k nodes

closest to fileId3. Node accepts a replica of the file, forwards

message to k-1 nodes existing in leaf set 4. Once k nodes accept, ‘ack’ message with

store receipt is passed to client Lookup: file = Lookup(fileId) Reclaim: Reclaim(fileId, owner-credentials)

Page 30: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

30

STORAGE MANAGEMENTwhy? Responsibility

Replicas of files be maintained by k nodes with nodeId closest to fileId

Balance free storage space among nodes in PAST

Conflict : K nodes having insufficient storage vs. neighbor nodes having sufficient storage

Cause of load imbalance : 3 differences Number of files assigned to each node Size of each inserted file Storage capacity of each node

Resolution : Replica diversion, File diversion

Page 31: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

31

STORAGE MANAGEMENTReplica Diversion

GOAL : balance the remaining free storage space among nodes in leaf set

Diversion steps of node A (that received insertion request but has insufficient space)

1. choose node B among nodes in leaf set except k closest, s.t. B does not already holds diverted replica

2. ask B to store a copy3. enter an file entry in table with pointer to B 4. send store receipt as usual

Page 32: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

32

STORAGE MANAGEMENTReplica Diversion

Policy for accepting a replica by node Node rejects file if

file_size/remaining_storage > t Threshold t -> tpri (in primary replica),

tdiv (in diverted replica) Avoids unnecessary diversion when node still

has space Prefer diverting large files – minimize number

of diversions Prefer accepting primary replicas than

diverted replicas

Page 33: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

33

STORAGE MANAGEMENTFile Diversion

GOAL : balancing the remaining free storage space among nodes in PAST network

When all k nodes and their leaf sets have insufficient space

Client node generate new fileId using different salt value

Repeats limit : 3 times Fourth fail -> make smaller file size by

fragmenting

Page 34: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

34

STORAGE MANAGEMENTnode strategy to maintain k replicas

In Pastry, neighboring nodes exchange keep-alive message

If period T is passed, leaf nodes removes the failed node from leaf set includes a live node with next closest noidId

File strategy for node joining and dropping in leaf sets if failed node is one of k nodes for certain files (primary or

diverted replica holder), re-creating replicas held by failed node

To cope with diverter failure – replicate diversion pointers Optimization – joining node may, instead of requesting all

its replicas, install a pointer to the previous replica holder in file table (like replica diversion). Than gradual migration

Page 35: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

35

STORAGE MANAGEMENTFragmenting and File encoding

In Reed-Solomon encoding, to increase high availability

Fragmentation: improves equal disk utilization improves bandwidth – parallel

download Higher latency to contact several

nodes for retreaval

Page 36: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

36

CACHING GOAL : minimizing client access latency,

maximizing query throughput, balancing query load

Create and maintain additional copies of highly popular file in “unused” disk space of nodes

During successful insertion and lookup, on all routed nodes

GreedyDual-Size (GD-S) policy for replacement Applying Hf(=cost(f)/size(f)) value to each cached

file File with lowest Hf is replaced

Page 37: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

37

Security in PAST Smartcard – private/public key scheme

ensure nodeId / fileId assignment integrity Against a malicious node

Getting store receipt – prevent fewer than k replicas File certificate – verify the authenticity of file

content File privacy by clients encryption Signing routing tables entries Randomizing the routing scheme, to prevent DOS

Can not completely prevent malicious node to suppress valid entries

Page 38: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

38

EXPERIMENTAL RESULTS Effects of Storage Management

No diversion (tpri = 1, tdiv = 0):

max utilization 60.8% 51.1% inserts failed

- leaf set size : effect of local load balancing

Replica/file diversion (tpri = 0.1, tdiv = .05):

max utilization > 98%< 1% inserts failed

-Policy-

Accept a file if file_size / free_space < t

Page 39: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

39

EXPERIMENTAL RESULTS Determine Threshold Values

Insertion Statistics and Utilization as tpri varied, tdiv = 0.05

Insertion Statistics and Utilization as tdiv varied, tpri = 0.1

-Policy-

Accept a file if file_size / free_space < t

As tpri increases, fewer files are successfully inserted, but higher storage utilization is achieved

The lower tpri, the less likely that large file can be stored, therefore many small files can be stored instead. Util drops, cause large files are rejected at low utilization levels

As tdiv increases, storage utilization improves, but fewer files are successfully inserted,

Page 40: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

40

EXPERIMENTAL RESULTS Impact of file and replica diversion

File diversions are negligible for storage utilization below 83%

Number of replica diversions is small even at high utilization: at 80% utilization less than 10% replicas are

diverted

=> The overhead imposed by replica and file diversions is small as long as utilization is below 95%

Page 41: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

41

EXPERIMENTAL RESULTSFile Insertion Failure

File insertion failures vs. storage utilization

Utilization vs. Smaller files’ failure

Failure ratio increases from 90% Utilization

Failed insertions are heavily biased towards large files

Page 42: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

42

EXPERIMENTAL RESULTSCaching

Global cache hit ratio and average number of message hops

Dropping hit ratio : Storage Util. and file number increases,

replace files in caches

hit ratio ↓ -> routing hops ↑

log 16 2250 = 3

Page 43: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

43

CONCLUSION

Design and evaluation of PAST Storage Management, Caching Nodes and files are assigned uniformly distributed

ID Replicas of file stored at k nodes closest to fildId

Experimental results Achieve storage utilization of 98% Low file insertion failure ratio at high storage

utilization Effective caching achieves load balancing

Page 44: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

44

Weakness Does not support mutable files –read only No searching, directory lookup Local fault in segment of network may cause

functioning node not to be able to contact outside world, since its routing table is mainly local

No direct support for anonymity or confidentiality

Breaking large node apart – is it good or bad? Simulation is too sterile No experimental comparison of PAST to other

systems

Page 45: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

45

Comparison Comparison to other to other systemssystems

Page 46: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

46

Comparison

PASTRY compared to Freenet and Gnutella: Guaranteed answer in bounded number of steps, while retaining

scalabilty of Freenet and self-organization of Freenet and Gnutella PASTRY Compared to Chord

Chord makes no explicit effort to achieve good network locality PAST compared to OceanStore

PAST has no support for mutable files, searching, directory lookup more sophisticated storage semantics could be build on top of PAST

Pastry (and Tapestry) are similar to Plaxton: routing based on prefixes, generalization of hypercube routing Plaxton is not self organizing; one node associated per file, thus

single point of failure

Page 47: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

47

Comparison PAST compared to FarSite

FarSite has traditional file system semantics, distributed directory service to locate content.

Every node maintains partial list of live nodes, from which it chooses nodes to store replicas

LAN assumptions of FarSite may not hold in a wide-area environment

PAST compared to CFS CFS built on top of Chord File sharing medium, block oriented, read only Each block is stored on multiple nodes with adjacent Chord

nodeIds, caching of popular blocks Increased file retrieval overhead Parallel block retrieval good for large files

CFS assumes abundance of free disk space Relies on hosting multiple logical nodes in one physical Chord

node, with separate ids, in order to accommodate nodes with big storage capacity => increasing query overhead

Page 48: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

48

Comparison PAST compared to LAND

Expected constant number of outgoing links in each node

Constant number of pointers to each object Constant bound on distortion (stretch):

accumulative route cost divided by distance cost Links choice enforces distance upper bound on each

stage of the route LAND uses two tier architecture: super-nodes

Page 49: 1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

49

The The ENDEND