Fault Tolerance in Cassandra
-
Upload
acunu -
Category
Technology
-
view
4.292 -
download
2
description
Transcript of Fault Tolerance in Cassandra
Richard Low
[email protected]@acunu
@richardalow
Cassandra London Meetup, 5 Sept 2011
Fault tolerance in Cassandra
Tuesday, 6 September 2011
Menu
• Failure modes
• Maintaining availability
• Recovery
Tuesday, 6 September 2011
Failure modes
Tuesday, 6 September 2011
Failures are the norm
• With more than a few nodes, something goes wrong all the time
• Don’t want to be down all the time
Tuesday, 6 September 2011
Failure causes
• Hardware failure
• Bug
• Power
• Natural disaster
Tuesday, 6 September 2011
Failure modes
• Data centre failure
• Node failure
• Disk failure
Tuesday, 6 September 2011
Failure modes
• Data centre failure
• Node failure
• Disk failure
• Temporary
• Permanent
Tuesday, 6 September 2011
Failure modes
• Network failure
• One node
• Network partition
• Whole data centre
Tuesday, 6 September 2011
Failure modes
• Operator failure
• Delete files
• Delete entire database
• Incorrect configuration
Tuesday, 6 September 2011
Failure modes
• Want a system that can tolerate all the above failures
• Make assumptions about probabilities of multiple events
• Be careful when assuming independence
Tuesday, 6 September 2011
Solutions
• Do nothing
• Make boxes bullet proof
• Replication
Tuesday, 6 September 2011
AvailabilityTuesday, 6 September 2011
How maintain availability in the
presence of failure?
Tuesday, 6 September 2011
Replication
• Buy cheap nodes and cheap disks
• Store multiple copies of the data
• Don’t care if some disappear
Tuesday, 6 September 2011
Replication
• What about consistency?
• What if I can’t tolerate out-of-date reads?
• How restore a replica?
Tuesday, 6 September 2011
RF and CL
• Replication factor
• How many copies
• How much failure can tolerate
• Consistency Level
• How many nodes must be contactable for operation to succeed
Tuesday, 6 September 2011
Simple example
• Replication factor 3
• Uniform network topology
• Read and write at CL.QUORUM
• Strong consistency
• Available if any one node is down
• Can recover if any two nodes fail
Tuesday, 6 September 2011
In general
• RF N, reads and writes at CL.QUORUM
• Available if ceil(N/2)-1 nodes fail
• Can recover if N-1 nodes fail
Tuesday, 6 September 2011
Multi data centre
• Cassandra knows location of hosts
• Through the snitch
• Can ensure replicas in each DC
• NetworkTopologyStrategy
• => can cope with whole DC failure
Tuesday, 6 September 2011
RecoveryTuesday, 6 September 2011
Recovery
• Want to maintain replication factor
• Ensures recovery guarantees
• Methods:
• Automatic
• Manual
Tuesday, 6 September 2011
Automatic
Tuesday, 6 September 2011
Automatic processes
• Eventually moves replicas towards consistency
• The ‘eventual’ in ‘eventual consistency’
Tuesday, 6 September 2011
Hinted Handoff
• Hints
• Stored on any node
• When a node is temporarily unavailable
• Delivered when the node comes back
• Can use CL.ANY
• Writes not immediately readable
Tuesday, 6 September 2011
Read Repair
• Since done a read, might as well repair any old copies
• Compare values, update any out of sync
Tuesday, 6 September 2011
Manual
Tuesday, 6 September 2011
Repair: method
• Ensures a node is up to date
• Run ‘nodetool -h <node> repair’
• Reads through entire data on the node
• Builds a Merkel tree
• Compares with replicas
• Streams differences
Tuesday, 6 September 2011
Repair: when
• After node has been down a long time
• After increasing replication factor
• Every 10 days to ensure tombstones are propagated
• Can be used to restore a failed node
Tuesday, 6 September 2011
Replace a node: method
• Bootstrap new node with <old_token>-1
• Tell existing nodes old node is dead
• nodetool remove
Tuesday, 6 September 2011
Replace a node: when
• Complete node failure
• Cannot replace failed disk
• Corruption
Tuesday, 6 September 2011
Restore from backup: method
• Stop Cassandra on the node
• Copy SSTables from backup
• Restart Cassandra
• Make take a while reading indexes
Tuesday, 6 September 2011
Restore from backup: when
• Disk failure
• with no RAID rebuild available
• Operator error
• Corruption
• Hacker
Tuesday, 6 September 2011
Thanks :)
@acunu@richardalow
Tuesday, 6 September 2011