Dynamic atomic storage without consensus

27
DYNAMIC ATOMIC STORAGE WITHOUT CONSENSUS Aguilera, Keidar, Malkhi, Shraer, J. ACM 58, 2, 2011 Sarai Duek

description

Dynamic atomic storage without consensus. Aguilera, Keidar , Malkhi , Shraer , J . ACM 58, 2, 2011 Sarai Duek. The Problem. Implement an read/write register in a dynamic system. Read Write Reconfig. atomic. The Problem. What is atomicity?. The Problem. - PowerPoint PPT Presentation

Transcript of Dynamic atomic storage without consensus

Page 1: Dynamic atomic storage without consensus

DYNAMIC ATOMIC STORAGE WITHOUT CONSENSUS

Aguilera, Keidar, Malkhi, Shraer, J. ACM 58, 2, 2011Sarai Duek

Page 2: Dynamic atomic storage without consensus

THE PROBLEM Implement an read/write register in a dynamic system.

|Read|Write|Reconfig

atomic

Page 3: Dynamic atomic storage without consensus

THE PROBLEMWhat is atomicity?

Page 4: Dynamic atomic storage without consensus

THE PROBLEMAtomicity is when each operation appears to occur at some point between its invocation and response.

R

W

R

W

Page 5: Dynamic atomic storage without consensus

THE PROBLEMAtomicity is when each operation appears to occur at some point between its invocation and response.

What is liveness?

Page 6: Dynamic atomic storage without consensus

THE PROBLEMAtomicity is when each operation appears to occur at some point between its invocation and response.

Liveness is a guarantee that the system will make progress under some conditions (e.g. majority).

Page 7: Dynamic atomic storage without consensus

THE PROBLEM

P0

P1

P2

P3

t-resilient R/W storage guarantees progress if fewer than t processes crash. For an n-process system, it is well known that t-resilient R/W storage exists when t < n/2, and does not exist when t ≥ n/2.

W

R

P2

P3

⊥×

Page 8: Dynamic atomic storage without consensus

THE PROBLEM

P3

P2

P4

In a dynamic system the majority can change. And liveness is achieved by reconfig operation.

reconfig1(+,4)

P0

P1

Page 9: Dynamic atomic storage without consensus

THE PROBLEM The model|Unknown and unbounded universe of processes ∏.

|Asynchronous reliable communication channels between each pair of processes.

|Processes can be added, removed, crash or halt.

p3p6

p7

p8

p4

p1

p2

p5

p…

p9 p…

Page 10: Dynamic atomic storage without consensus

THE PROBLEMA view is a set of changes.

Changes lead to a new configuration of processes.

Liveness conditions|The set of crashed processes and those whose removal is pending is a minority of the current or any pending future views.

|No new reconfig operations will be invoked for “sufficiently long” for the started operations to complete.

p0

p1

p3

p2 p4

p5

Page 11: Dynamic atomic storage without consensus

THE PROBLEM |MWMR – Any process can write and read.|Written values are unique – (val, pid, ts). |Every process in the system knows the initial view.|We say, by convention, that a reconfig(Init) completes by time 0.|Members of view w store information about the current view.

Changes – {Remove, Add}View – Set of changes For view w:w.remove – removal set w.join – join set w.members – set w.join\w.remove V(t) – union of all sets c such that a reconfig(c) completes by time tInit = V(0)P(t) – set of pending changes at time tF(t) – set of processes that crashed by time t

Page 12: Dynamic atomic storage without consensus

THE PROBLEM Dynamic Service LivenessIf at every time t in the execution, fewer than |V(t).members|/2 processes out of V(t).members ∪ P(t).join are in F(t) ∪ P(t).remove, and the number of different changes proposed in the execution is finite, then the following hold:|Eventually, the enable operations event occurs at every active process that was added by a complete reconfig operation.|Every operation invoked at an active process eventually completes.

Changes – {Remove, Add}View – Set of changes For view w:w.remove – removal set w.join – join set w.members – set w.join\w.remove V(t) – union of all sets c such that a reconfig(c) completes by time tInit = V(0)P(t) – set of pending changes at time tF(t) – set of processes that crashed by time t

Page 13: Dynamic atomic storage without consensus

THE PROBLEMDynamic Service Livenessat every time t in the execution, fewer than |V(t).members|/2 processes out of V(t).members ∪ P(t).join are in F(t) ∪ P(t).remove.

p0

p1

p6

p9

V(t)

p4

p5

p2

p3

P(t).remove

p8

p10

p7

F(t)××

×

P(t).join {¿ ¿ ¿ ¿ ¿

3 {¿ ¿ ¿

4 .5

Page 14: Dynamic atomic storage without consensus

THE ALGORITHM OUTLINE Write – phase

|generate next sequence number|send a message with the value and the sequence number to all processes

|each recipient updates its replica and sends ack

|writer waits for majority of acks|Read configurations information|If a new view was discovered then restart read – phase in the new view (followed by a write – phase again).

Read – phase|Read configurations information|If a new view was discovered then restart read – phase in the new view.

|send a request to all processes|each recipient sends back current value of its replica

|wait for the majority to reply|return value associated with largest sequence number

Read – phasesend a request to all processes|each recipient sends back current value of its replica

|wait for the majority to reply|return value associated with largest sequence number

Write – phase |generate next sequence number|send a message with the value and the sequence number to all processes

|each recipient updates its replica and sends ack

|writer waits for majority of acks

Page 15: Dynamic atomic storage without consensus

THE ALGORITHM OUTLINE Reconfiguration

|write information about the new view to the quorum of the old one

|execute the read and write phases, starting in the old view.

Page 16: Dynamic atomic storage without consensus

WEAK OBJECTArrive and query obey the following semantics:|Integrity|Validity|Monotonicity of queries|Non-empty common intersection|Termination

Allows a fixed set of processes P to use two operations| Arrivei(c)

| Queryi()

Page 17: Dynamic atomic storage without consensus

WEAK OBJECT

Each process pi in P has a value field pi.val

SWMR – only pi can use pi.val.write(c) but all processes can use pi.val.read()

The weak object algorithm

Operation arrivei(c) if collect() = Ø then pi.val.wirte(c)

return OK

Operation queryi()

C1 collect()

if C1 = Ø then return Ø

C2 collect()

return C2

Procedure collect() C Ø

foreach pi P

c pi.val,read()

if c then C C U {c} return C

Page 18: Dynamic atomic storage without consensus

WEAK OBJECT

The weak object algorithm

Operation arrivei(c) if collect() = Ø then pi.val.wirte(c)

return OK

Operation queryi()

C1 collect()

if C1 = Ø then return Ø

C2 collect()

return C2

Procedure collect() C Ø

foreach pi P

c pi.val.read()

if c then C C U {c} return C

P0

P1

P3

P2

P4

P5

arrive(v1)

arrive(v2)

C = { }

P0v1

P5v2

Page 19: Dynamic atomic storage without consensus

WEAK OBJECT

The weak object algorithm

Operation arrivei(c) if collect() = Ø then pi.val.wirte(c)

return OK

Operation queryi()

C1 collect()

if C1 = Ø then return Ø

C2 collect()

return C2

Procedure collect() C Ø

foreach pi P

c pi.val.read()

if c then C C U {c} return C

P1

P3

P2

P4

query()

C = { }

P0v1

P5v2

C = {v1}C = {v1, v2}

Page 20: Dynamic atomic storage without consensus

querya{ }

queryb{ }

WEAK OBJECT

The weak object algorithm

Operation arrivei(c) if collect() = Ø then pi.val.wirte(c)

return OK

Operation queryi()

C1 collect()

if C1 = Ø then return Ø

C2 collect()

return C2

Procedure collect() C Ø

foreach pi P

c pi.val.read()

if c then C C U {c} return C

collect {a}

collect {a, b}

querya queryb

collect {a}

collect {b}

Page 21: Dynamic atomic storage without consensus

THE ALGORITHM

operation readi (): pickNewTSi ← FALSE newView ← Traverse(∅,⊥) NotifyQ(newView) return vi

max

operation writei (v): pickNewTSi ← TRUE newView ← Traverse(∅, v) NotifyQ(newView) return OK

operation reconfigi (cng): pickNewTSi ← FALSE newView ← Traverse(cng, ⊥) NotifyQ(newView) return OK

procedure NotifyQ(w) if did not receive {NOTIFY, w } then send {NOTIFY, w } to w.members wait for {NOTIFY, w} from majority of w.members

Page 22: Dynamic atomic storage without consensus

THE ALGORITHM

procedure Traverse(cng, v) desiredView ← curViewi ∪ cng Front ← {curViewi} do s ← min{|| : ∈ Front} w ← any ∈ Front s.t. | | = s if (i w.members) then halti if w desiredView then arrivei (w, desiredView \ w) ChangeSets ← ReadInView(w) if ChangeSets ∅ then Front ← Front \ {w} foreach c ∈ ChangeSets desiredView ← desiredView ∪ c Front ← Front ∪ {w ∪ c} else ChangeSets ← WriteInView(w, v) while ChangeSets ∅ curViewi ← desiredView return desiredView

Traverse is used to look for the next view considering all the changes suggested so far.

Page 23: Dynamic atomic storage without consensus

THE ALGORITHM

procedure Traverse(cng, v) desiredView ← curViewi ∪ cng Front ← {curViewi} do s ← min{|| : ∈ Front} w ← one ∈ Front s.t. | | = s if (i w.members) then halti if w desiredView then arrivei (w, desiredView \ w) ChangeSets ← ReadInView(w) if ChangeSets ∅ then Front ← Front \ {w} foreach c ∈ ChangeSets desiredView ← desiredView ∪ c Front ← Front ∪ {w ∪ c} else ChangeSets ← WriteInView(w, v) while ChangeSets ∅ curViewi ← desiredView return desiredView

Initview

Page 24: Dynamic atomic storage without consensus

THE ALGORITHM

procedure Traverse(cng, v) desiredView ← curViewi ∪ cng Front ← {curViewi} do s ← min{|| : ∈ Front} w ← any ∈ Front s.t. | | = s if (i w.members) then halti if w desiredView then arrivei (w, desiredView \ w) ChangeSets ← ReadInView(w) if ChangeSets ∅ then Front ← Front \ {w} foreach c ∈ ChangeSets desiredView ← desiredView ∪ c Front ← Front ∪ {w ∪ c} else ChangeSets ← WriteInView(w, v) while ChangeSets ∅ curViewi ← desiredView return desiredView

V1

V2

V3

V4

V5

V6

Initview

Initial

Front

Front after

iteration 1

Front after

iteration4

Front after

iteration6

{(+,3)}

{(+,3), (-,1),

(+,4)}{(-,1), (+,4)}

{(+,5), (-,1),(+,4)}{(+,7)}

{(+,5)}

{(+,7)} {(+,3),

(+,5)}

InitView U{(+,3), (+,5), (-,1),(+,4), (+,7)}

=

Edge returned from ReadInViewEdge updated by

Pi

Page 25: Dynamic atomic storage without consensus

THE ALGORITHM procedure ReadInView(w)

ChangeSets ← queryi (w) ContactQ(R, w.members) return ChangeSets

procedure WriteInView(w, v) if pickNewTSi then (pickNewTSi, vi

max , tsimax) ←(FALSE, v, (tsi

max .num+ 1, i)) ContactQ(W, w.members) ChangeSets ← queryi (w) return ChangeSets

Procedure ContactQ sends a write-request including vi

max and tsimax when writing

a quorum, and a whenreading a quorum.

Page 26: Dynamic atomic storage without consensus

ESTABLISHED VIEWS

The unique sequence of established views E is constructed as follows:| the first view in E is the initial view Init| if w is in E, then the next view after w in E is w’

= w ∪ c, where c is an element chosen arbitrarily from the intersection of all sets C∅ returned by some query(w) operation in the execution.

Page 27: Dynamic atomic storage without consensus

THANK YOU