Storage management and caching in PAST

20
Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper

description

Storage management and caching in PAST. Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper. Outline. PAST goals PAST api File storage overview File and replica diversion Replica management Caching Performance Discussion. PAST (non)goals. P2P global storage network - PowerPoint PPT Presentation

Transcript of Storage management and caching in PAST

Page 1: Storage management and caching in PAST

Storage management and caching in PAST

Antony Rowstron and Peter DruschelPresented to cs294-4 by Owen Cooper

Page 2: Storage management and caching in PAST

Outline

• PAST goals• PAST api• File storage overview• File and replica diversion• Replica management• Caching• Performance• Discussion

Page 3: Storage management and caching in PAST

PAST (non)goals

• P2P global storage network– Use properties of existing p2p systems (Pastry)– Support for strong persistence

• Via a core set of replicas– High availability

• Via local caching– Scalable

• Obtain high storage utilization via local cooperation– Secure

• Design goals do not include– Replacing the file system – Updatable files– Directory or lookup service

Page 4: Storage management and caching in PAST

Security Model

• Pastry node ids are a hash of a public key• Smartcard based security

– Provides keys– Quota management

• Nodeid and fileid generation controlled – Try to stop nodes from getting consecutive ids – Or clients from overloading parts of the network

• But node id and real world identity may not be linked

• Data not encrypted

Page 5: Storage management and caching in PAST

PAST API’s

• In PAST, files are immutable • Fileid=Insert(filename,credentials, k, file)

– Insert k copies of the file into the network, or fail.– Fileid a signed (filename, credentials, salt)– Successful if ack with receipts from k nodes

• File=lookup(fileid)– Return a copy of the file if it exists

• Reclaim(fileid, cradentials)– Reclaim accepted if requested by the owner– Allows, but does not require, storage reclamation

Page 6: Storage management and caching in PAST

File insertion

• Insert(name, c, k, file)– Computes a storage certificate

• Contains fileid, hash of content, k, salt

– Deducts k*filesize from quota– Routes file and storage certificate using pastry using

fileid.– Node verifies the integrity of the file, stores it, and asks

k-1 closest nodes to store the file. • K-1 nodes in leaf set (k-1 <= l)

– Node returns ack with k signed storage receipts, or a nak.

Page 7: Storage management and caching in PAST

Lookup and Reclamation

• Pastry ensures replica is found– Since a lookup is routed to the closest nodeid

• Reclamation– Client generates a reclaim certificate– Sends it to the fileid via pastry– Recipients verify the certificate & issue receipt– Client reclaims quota

Page 8: Storage management and caching in PAST

Diversion

• A file or replica can be relocated • For a replica, to another close node

– If one of the K closest is overloaded

• For a file, to another set of nodes in the idspace– If the nodes around a fileid are (possibly locally)

congested

• Why is this necessary?– Differing storage capacity at nodes– Differing file size for inserted files

Page 9: Storage management and caching in PAST

Replica Diversion

• Node responsible for fileid asks k-1 neighbors to store the file

• Neighbor (N) may divert a copy to a node in its leaf set– Pointer to copy inserted at N

– N issues storage certificate

– N also inserts a pointer on the k+1th closest node• No orphan if N fails

• N remains responsible for pointer maintenance

Page 10: Storage management and caching in PAST

File Diversion

• Replica diversion is local– Allows storage choice between nodes around

fileid

• File Diversion– Triggered when an insert with a fileid fails– Insert is tried a total of three times– New fileid generated by changing the salt

Page 11: Storage management and caching in PAST

Storage Policy

• How does a node choose to accept or reject a replica?– Computes sizeof(file)/sizeof(free_space)– Compares to Tpri or Tdiv depending node’s role– Tpri > Tdiv

• How is node chosen for replica diversion– Search leaf set for the node that

• Has maximal free space• Doesn’t already hold a diverted or primary replica

• File diversion– K copies cannot be located (via primary or diversion)

Page 12: Storage management and caching in PAST

Replica maintenance

• Node join/leave causes responsibility shift– Pastry node failure detection will cause leaf set updates

• Past detects responsibility shifts this way

• Newly responsible node must copy files– Make a copy immediately, OR– pointer to old owner & copy lazily

• Diverted replicas– Target of diversion may move out of leaf set

• Node to store repica can be any one in leaf set

– Must exchange keepalive messages themselves– Should be relocated

Page 13: Storage management and caching in PAST

Replica maintenance (2)

• Node failure may cause storage shortage– No node in leaf set can take over ownership

• Search space is widened– Ask most extreme nodes to locate storage

• Increases search space to 2l nodes

– If no storage space found, fail.

Page 14: Storage management and caching in PAST

Caching

• Pastry’s locality based routing will tend to direct requests to nearby copies

• PAST also stores cached copies– Along routing path between client and fileid– For insert and lookup operations– Cache maintained using GD-size algorithm

• Weight per file: 1/size(file)• Eviction:

– Pick file with minimum weight– Subtract weight of evicted file from all others

Page 15: Storage management and caching in PAST

Experiments: without diversion

• Experiments use– Large trace from web server– Files from local web server

• The case for diversion with web trace– Without diversion:

• 51.1% of insertions failed

• 60.8% storage utilization

Page 16: Storage management and caching in PAST

Experiments (2): with diversion

• With diversion– Bigger leaf set size a plus

Page 17: Storage management and caching in PAST

Experiments (3):varying Tpri

• Effects of varying Tpri

• # files stored v.s. size of file

Page 18: Storage management and caching in PAST

Experiments (4): Varying Tdiv

• Varying Tdiv

• Tpri is constant

Page 19: Storage management and caching in PAST

File and Replica Diversion

Page 20: Storage management and caching in PAST

caching

• 8 traces combined• Requests from clients in each trace are mapped to close PAST nodes