Total Recall: System Support for Automated Availability ... · Total Recall: System Support for...

Availability Monitor

DownDown

TimeTime

Total Recall: System Support for Automated Availability ManagementRanjita Bhagwan, Kiran Tati, Yuchung Cheng, Stefan Savage, Geoffrey M. VoelkerUniversity of California, San Diego

System Evaluation

Writes are most time-consuming since they comprise of the following operations: inode read, data read, data write and inode write Reads are less time-consuming, sincethey comprise the followingoperations: inode read, data read, and inode write.Repairs are the fastest since they do not involve remote inode read/writeoperations.

System Design

EventHandler

Block Store

Policy Module

Repair MechanismRedundancy Mechanism

Redundancy Engine

Online codingRedundancy calculator

Encoder

Decoder

ReplicationRedundancy calculator

Storage System Operations - create, read, write and repair.

* Inodes are eagerly repaired.* Data may be eagerly or lazily repaired.

Current Prototype

* Runs on PlanetLab.* Exports the NFSv3 interface.* Builds on the SFS toolkit and MIT's Chord implementation. * Uses replication and Online codes as redundancy mechanisms.

Availability Management

Automated Availability ManagementAutomated Availability Management

Goal: Highly available data storage in large-scale distributed systems in which * Hosts are transiently inaccessible * Individual host failures are common

Current peer-to-peer systems are prime examples * Highly dynamic, challenging environment * hosts join and leave frequently in short-term * Hosts leave permanently over long-term * Workload varies in terms of popularity, access patterns, file size

These systems require automated availability management.* Availability prediction* Redundancy management to tolerate transient host disconnectivity. * Dynamic Repair to tolerate long-term host failures.

We are exploring the challenges of automated availability management in the design,implementation and evaluation of a read/write peer-to-peer file system called Total Recall.

For Total Recall, eager repair => replication, lazy repair => coding.

Redundancy

LAZYEager repair: System repairs data redundancyimmediately in reaction to host departures.

Lazy repair: System uses additional redundancy to mask transient host departures and defer the costs of repair.

Dynamic RepairDynamic Repair

NFSv3 interface

GETATTR, SETATTR,LOOKUP, READLINK,READ, WRITE, ...

Prototype Evaluation

* Ran Total Recall prototype on 16 PlanetLab nodes from USA and Europe.* Used "cp" command through NFS interface to measure file read and write time.* Measured file repair time. * All numbers reported for one-file read/write/repair.

Simulation

* Simulated 5500 hosts from traces obtained from Overnet P2P file-sharing system.* Simulated 5500 files. File size distribution obtained from Saroiu et al.'s description of KaZaA workload.

Repair behavior of Total Recall over time

* System bandwidth varies with host availability. Host departures trigger high-bandwidth data repairs, Host arrivals trigger lower-bandwidth metadata repairs.* Available file redundancy = amount of redundant data the system has available to it to reconstruct the file.* Avg. file redundancy achieves stable value even as host availability varies substantially.

* Total Recall trades off storage overhead and repair bandwidth.* Eager repair requires the least storage, but most repair bandwidth. Ideal for small metadata.* Lazy repair smoothly trades off storage (coding stretch factor) with repair bandwidth. Total Recall adjusts degree of redundancy to host availability characteristics.

Host bandwidth usage for different repair policies as a CDF

* Empirically predict availability based on based on measurements.* Make predictions based on aggregates rather than for individual hosts.* Short-term availability changes due to transient host failures.* Long-term availability changes due to long-term host departures/failures.

Availability PredictionAvailability Prediction

* Replication, Reed-Solomon codes, Tornado codes, Online codes.* Choose redundancy mechanism based on storage, bandwidth and performance tradeoffs.* Use availability prediction to calculate required redundancy level.

Redundancy MechanismRedundancy Mechanism

Total Recall: System Support for Automated Availability ... · Total Recall: System Support for...

Documents

Transcript of Total Recall: System Support for Automated Availability ... · Total Recall: System Support for...

University of Groningen Collaborative recall of details of ...€¦ · Collaborative recall of … 1 1 Running head: Collaborative recall and emotion Collaborative recall of details

IMPORTANT SAFETY RECALL Safety Recall: 21V-179 21V-184 …

FOOD RECALL PLAN Recall Team Personnel - HACCP Manager Software

Total Recall VR Classic Desktop - Two Way · 2013-06-20 · Total Recall VR Overview Guide Total Recall VR Deployment User Guide Total Recall VR Embedded GUI user Guide Total Recall

Safety Recall Code: 47N3 · March 2017 47N3 Page 1 of 60 Safety Recall Code: 47N3 Subject Brake Lines Release Date March 9, 2017 Criteria Repair Availability information Criteria

SAFETY RECALL BULLETIN - Mitsupartsworld Recall/SR08003.pdf · Page 1 of 16 SAFETY RECALL BULLETIN FILE UNDER: SAFETY RECALL BULLETINS, in the Dealer Service Information Binder (3306)

Focus recall. - Axis Communications 3. How does focus recall work? The focus recall feature is very easy to use. The user sets a focus recall area by clicking the focus recall button

Stroller Recall

Your guide to a safe, effective, compliant product recall · Your guide to a safe, effective, compliant product recall Recall Management: Survival of the Fittest. Recall Management:

Context availability and the recall of abstract and ...for isolatedabstract and concrete words than should ratings ofimageability. In support ofthis prediction, Schwanen flugel et

Recall: Stride Scheduling Recall: Linux Completely Fair ...

Structured Superpeers: Leveraging Heterogeneity to Provide Constant-Time Lookup Alper Mizrak (Presenter) Yuchung Cheng Vineet Kumar Stefan Savage Department.

Van Jacobson dev · PDF fileBBR Congestion Control: IETF 99 Update Neal Cardwell, Yuchung Cheng, C. Stephen Gunn, Soheil Hassas Yeganeh Ian Swett, Jana Iyengar, Victor Vasiliev

Safety Recall Communication Guide · Safety Recall Communication Guide 2016 . 2 ... Dealer Recall Communication Guide | 2016 TABLE OF CONTENTS TOC . 3 Recall/Action Population Safety

Recall Manual

Hydroxycut Recall

Patient recall

DEVELOPMENT OF A RECALL PLANFILE/chapter_10_recall.pdfDevelopment of a Recall Plan Chapter 10 DEVELOPMENT OF A RECALL PLAN 1.0 WHAT IS A RECALL? 1.1 When is a Recall Necessary? 1.2

The Recall ImPact - WordPress.com · The Recall ImPact Medical Device Manufacturing An involuntary recall which is mandated by a regulative agency involuntary recall A voluntary recall,

2006/12/04NTU/nslab1 Jigsaw: Solving the Puzzle of Enterprise 802.11 Analysis YuChung ChengYuChung Cheng, John Bellardo, P´eter Benk¨o Alex C. Snoeren,