Copysets : Reducing the Frequency of Data Loss in Cloud Storage

1
Copysets: Reducing the Frequency of Data Loss in Cloud Storage Asaf Cidon, Stephen Rumble, Ryan Stutsman Sachin Katti, John Ousterhout and Mendel Rosenblum Stanford University Each Power Outage Causes Data Loss Minimize Copysets Minimize Data Loss Events • Cloud storage systems use random replication • Random replication is vulnerable to power outages • ~1% of nodes fail to reboot after power outage • Each data loss event has a fixed cost: • Better to lose data infrequently at the expense of losing more data in each event Copyset: unique set of nodes that contain all replicas of a chunk of data • System loses data when nodes of at least 1 copyset fail simultaneously • Random replication creates too many copysets Minimum Copysets: statically split nodes into copysets. Each node belongs to single copyset • Place first replica on random node • Place other replicas deterministically on first node’s copyset • On 5000 node cluster, data loss event every 625 years, each event loses data of entire node Copyset Replication • Problem: most systems need to scatter data across a number of nodes (scatter width) • Otherwise, we increase recovery time and impact load balancing Copyset Replication: Given a scatter width, minimize the number of copysets: • Copyset Replication significantly reduces data loss • While preserving system’s scatter width and node recovery time • Implemented and evaluated on HDFS and RAMCloud • Minimal overhead on normal operations and recovery

description

Copysets : Reducing the Frequency of Data Loss in Cloud Storage. Asaf Cidon, Stephen Rumble, Ryan Stutsman Sachin Katti , John Ousterhout and Mendel Rosenblum Stanford University. Each Power Outage Causes Data Loss. Cloud storage systems use random replication - PowerPoint PPT Presentation

Transcript of Copysets : Reducing the Frequency of Data Loss in Cloud Storage

Page 1: Copysets : Reducing the Frequency of Data Loss in Cloud Storage

Copysets: Reducing the Frequency of Data Loss in Cloud StorageAsaf Cidon, Stephen Rumble, Ryan Stutsman

Sachin Katti, John Ousterhout and Mendel RosenblumStanford University

Each Power Outage Causes Data Loss

Minimize Copysets Minimize Data Loss Events

• Cloud storage systems use random replication• Random replication is vulnerable to power outages• ~1% of nodes fail to reboot after power outage• Each data loss event has a fixed cost:

• Better to lose data infrequently at the expense of losing more data in each event

• Copyset: unique set of nodes that contain all replicas of a chunk of data• System loses data when nodes of at least 1 copyset fail simultaneously• Random replication creates too many copysets• Minimum Copysets: statically split nodes into copysets. Each node belongs to single copyset

• Place first replica on random node• Place other replicas deterministically on first node’s copyset

• On 5000 node cluster, data loss event every 625 years, each event loses data of entire node

Copyset Replication• Problem: most systems need to scatter data across a number of nodes (scatter width)

• Otherwise, we increase recovery time and impact load balancing• Copyset Replication: Given a scatter width, minimize the number of copysets:

• Copyset Replication significantly reduces data loss• While preserving system’s scatter width and node recovery time

• Implemented and evaluated on HDFS and RAMCloud• Minimal overhead on normal operations and recovery