OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert,...
-
date post
19-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert,...
OPODIS 05
Reconfigurable Distributed Storage for Dynamic
NetworksGregory Chockler, Seth Gilbert,
Vincent Gramoli, Peter M Musial, Alexander A Shvartsman
OPODIS 05
Goals
Reconfigurable Distributed Storage (RDS)• Atomic consistency (read/write)• Fault Tolerance
…in Dynamic and Asynchronous Systems.
OPODIS 05
Model
• Distributed– Connected set of processors– Each processor has a unique id i I– MWMR, any processor is a potential client
• Asynchronous– Asynchronous processors – Point-to-point asynchronous unreliable
channels• Dynamic
– Processors join and leave the system– Processors may crash
OPODIS 05
What is a configuration?
• Configuration <members, read-quorums, write-quorums>– members is a set of processors,– read-quorums, write-quorums two sets of quorums RQ read-quorums, WQ write-quorums
• RQ members • WQ members • RQ WQ (only for a given configuration)
• Every client maintains a set of configurations, initially containing the default one.
OPODIS 05
Single Object Operations Overview
After [ABD95]• tag = <c,i> N I, val a possible value
• val = Read()i
(<c,j>,val)=query();[prop(<c,j>,val);]
• Write(val)i (<c’,j>,val’)=query();prop(<c’++,i>,val);
1.(tag,val) query(NULL): gathers (tag,val) pairs of all processors of a RQ and returns the one with the largest tag.
2.NULL prop(tag,val): updates (tag,val) pairs at all processors of a WQ.
Write tag
Read tag
OPODIS 05
Reconfiguration Design Goals
• Sound– Totally ordered configurations
• Flexible – No dependences between configurations
• Non-intrusive– Makes possible concurrent read/write
operations
• Fast– Strengthening fault tolerance
OPODIS 05
Decoupling Reconfiguration
• Reconfiguration = Replacing Configurations– {I} Installing a new configuration– {R} Removing old configuration(s)
• If {R} ≺ {I} Operations are delayed
• If {I} ≺ {R} Stronger configuration viability assumption is required
OPODIS 05
Solution
({R} ≺ {I}) ({I} ≺ {R})
{I} // {R}
Tighter coupling between removal and installation
OPODIS 05
RDS Reconfiguration• Reconfiguration is based on Paxos (3 phases leader-based consensus alorithm)• l is the leader• c is the current configuration• configs is the set of active configurations• A ballot has a unique identifier b and a value v,
which is a configuration• Paxos phases:
– Prepare: l creates a new ballot and chooses/gets the value to propose.
– Propose: l proposes <b,v> and gathers votes from a majority.
– Propagate: l propagates decision
OPODIS 05
RDS Reconfiguration
l
RQWQ
<1a, b>
<1b, b, configs, <b’’, c’’>>
•Updates its ballot’s value v with the one received •Updates its configs set
Prepare phaseRecon(c,c’)
OPODIS 05
RDS Reconfiguration
l
RQWQ
<1a, b>
<1b, b, configs, <b’’, c’’>>
<2a, b, c, v>
Propose phaseRecon(c,c’)
OPODIS 05
RDS Reconfiguration
l
RQWQ
<1a, b>
<1b, b, configs, <b’’, c’’>>
<2a, b, c, v>
<2b, b, c, v, tag, val>
Recon(c,c’)
<2b, b, c, v, tag, val>
Propose phase
•Updates their tag and val•Adds v to their configs set
OPODIS 05
RDS Reconfiguration
l
RQWQ
<1a, b>
<1b, b, configs, <b’’, c’’>>
<2a, b, c, v>
<2b, b, c, v, tag, val><3a, c, v, tag, val>
<3a, c, v, tag, val>
Recon(c,c’)
<2b, b, c, v, tag, val>
Propagation phase
•Update their tag and val•Remove configuration c from their configs set
<3a, c, v, tag, val>
OPODIS 05
Proving Atomicity
• Ordering configurations
• Ordering operations
Theorem 1: The set of installed configurations in the system is totally ordered.
Theorem 2: If operation 1 precedes operation 2 then 1’s tag is not larger than 2’s tag.
OPODIS 05
Additional Assumptions
• Eventual stabilization with– Unique leader l – Message delay bound d (unkown to the algorithm) – Gossip with frequency d– Restricted reconfiguration rate– Some quorums remain alive in active configurations
ts
ts: System stabilization time
Let’s tr be the Request time
2d
tl: Algorithm stabilization time
tl
OPODIS 05
Reconfiguration Latency
Worst case scenario: Last reconfiguration was done by a different leader.
Prepare
max(tl, tr)
Propose Propagate
2d 2d d
te
te: end timeReconfiguration is complete
5d
OPODIS 05
Reconfiguration Latency
Other cases: The leader made the previous reconfiguration.
max(tl, tr)
Propose Propagate
2d d
te
te: end timeReconfiguration is complete
3d
OPODIS 05
Operation Latency
Phase latency: • 2d is sufficient for the phase round trip.• In some cases (pending reconfiguration), the phase might be delayed twice.
1st round trip
Operation latency: • Operations are bounded by 8d.• In some cases, the propagation phase of the read operation can be ignored, leading to a possible bound of 2d.
2nd round trip
2d 2d
New configuration discovered
OPODIS 05
Experimental Results
• IOA to Java code following set of rules.
• Implementation of Attiya, Bar-Noy, and Dolev algorithm « ABD » (w/o Reconfiguration) and RDS which shares parts of the ABD code.
• Using majority-based configurations.
• Measuring operation latency1. While varying configuration size2. While varying algorithm instances
OPODIS 05
Experimental Results
• Operation latency of RDS is competitive with ABD, confirming the theory.
• Reconfiguration messages contain operation information which might accelerate operations in RDS.
OPODIS 05
Conclusion
• RDS, Reconfigurable Distributed Storage.• With sound, flexible, non-intrusive and
fast reconfiguration.• It solves two problems in one:
Configuration replacement and Consensus.
• Reconfiguration is inexpensive (time).• Fault tolerance is strenghtened.• RAMBO can become more agressive: it is
exactly what we did here!