Ordering and Consistent Cuts Presented by Chi H. Ho.

Post on 21-Dec-2015

215 views 1 download

Tags:

Transcript of Ordering and Consistent Cuts Presented by Chi H. Ho.

Ordering and Consistent Cuts

Presented by Chi H. Ho

Time, Clocks, and the Ordering of Events in a Distributed System

Leslie Lamport

Introduction

• 2000 PODC Influential Paper Award• Outline of the paper: not in presented order

– Partial and Total Orderings– Logical and Physical Clocks– Clock and Strong Clock Conditions– Synchronize Physical Clocks

• Beyond…

“Happened Before”

• a b: if– a and b are events in the same process and a

comes before b, or– a is the send event of some message, and b is

the receive event of the same message.

• Transitive:

(a b) & (b c) (a c)• Concurrent: (a b) & (b a).

Partial Ordering

Examples

• q5 p4

• q2 q3

• p1 r3

• q2 // p2

• q2 // p3

Partial Ordering

Logical Clock

• Clock Condition:

a,b:a b C(a) < C(b)

Partial Ordering Implementation

Logical Clock

• Implementation Rules:– IR1:

• Each process Pi increments Ci between any two successive events.

– IR2:• If event a is the sending of a message m by process Pi,

then the message contains a timestamp Tm = Ci(a).

• Upon receiving a message m, process Pj sets Cj greater than or equal to its present value and greater than Tm.

Partial Ordering Implementation

Examples

Partial Ordering Implementation

P0

P1

0

Examples

Partial Ordering Implementation

P0

P1

0

0

Examples

Partial Ordering Implementation

P0

P1

0

0

1

Examples

Partial Ordering Implementation

P0

P1

0

0

1

1

Examples

Partial Ordering Implementation

P0

P1

0

0

1

1 2

Examples

Partial Ordering Implementation

P0

P1

0

0

1

1 2 3

[3]

Examples

Partial Ordering Implementation

P0

P1

0

0

1

1 2 3

[3]

2

[2]

Examples

Partial Ordering Implementation

P0

P1

0

0

1

1 2 3

[3]

2

[2]

4

Examples

Partial Ordering Implementation

P0

P1

0

0

1

1 2 3

[3]

2

[2]

4

4

Extended “Happened Before”

• a => b: iff– Ci(a) < Cj(b), or

– (Ci(a) = Cj(b)) & (Pi ≺ Pj)

Total Ordering

Example Application

• Shared resource granting– Fixed number of processes– Single shared resource– Requirements:

I. Mutual Exclusive

II. Fair

III. Exhaustive

Total Ordering

Example Application

• Solution: Distributed algorithm• Model:

– Channels are FIFO– Each process maintains a process queue

• Algorithm– Request: broadcast Tm:Pi request resource– Release: broadcast Tm:Pi release resource– Receive request: enqueue– Receive release: dequeue– Resource granted (local decision): Pi

• Tm:Pi request resource w/ Tm min• Pi has received from every process a msg timestamped later than Tm

• Note: – Can be generalized to solve Replicated State Machine!

Total Ordering

Anomaly

Amazon.com[19]

Anomaly

Amazon.com[19]

Anomaly

Amazon.com[19] [7]

Anomaly

Amazon.com[19]

[7]

Anomaly

Amazon.com[19]

[7]

External event

Strong Clock Condition

• S = {events in the system}

• S = S ⋃ {relevant external events}

is “happened before” for S

∀a,b ∈ S:

a b C(a) < C(b)

Avoid Anomaly

Physical Clocks

• PC1: (drift rate bound)

∃ << 1 such that ∀i: |dCi(t)/dt – 1| <

• PC2: (drift bound)

i,j: |Ci(t) – Cj(t)| <

Avoid Anomaly

< shortest msg transmission time• ∀i,j,t:

Ci(t+) – Cj(t) > 0

Physical Clocks

/(1-)

Amazon.com

ji

Cj(t) > Ci(t+)

>

Implementation Rules

• IR1’: – For each i, if Pi does not receive a message at

physical time t, then Ci is differentiable at t and dCi(t)/dt > 0.

• IR2’:– (a) If Pi sends a message m at physical time t,

then m contains a timestamp Tm = Ci(t).– (b) Upon receiving a message m at time t’,

process Pj sets Cj(t’) equal to maximum (Cj(t’-0), Tm + m)

Physical Clocks

Synchronize Physical Clocks

Physical Clocks

• Problem statement:– IR1’ and IR2’ are followed,– Message delay is bounded,– Clocks satisfied PC1,– Goal: PC2

• Algorithm:– Every seconds, a message is sent over every arc.

• Guarantees:– Clocks are synchronized after t0 + d d(2 + )

Beyond…

• Shortcomings:– No gap-detection property– C(a) < C(b) ???– Bounds are not practical (So is PC!)

Gap Detection Property

• Problem statement:– Given: a, b, C(a), C(b), C(a) < C(b), – Determine if c exists, where

C(a) < C(c) < C(b)?

Beyond…

Another Strong Clock Condition

a b C(a) < C(b)

Beyond…

What clock, then?

• Causal histories:

Beyond…

• Vector Clocks:

More on Vector Clocks

Strong Clock ConditionConcurrentPair-wise InconsistentConsistent CutCountingGap Detection

Beyond…

More on Vector Clocks

Strong Clock ConditionConcurrentPair-wise InconsistentConsistent CutCountingGap Detection, but…

Beyond…

X Weak Gap-Detection

Given a, b, can detect existence of c such that

(c a) & (c b)

Reference

• O. Babaoglu and K. Marzullo. Consistent global states of distributed systems: Fundamental concepts and mechanisms. In Sape Mullender, editor, Distributed Systems, ch. 4, pages 55--96. Addison Wesley, 2nd ed., 1993. http://citeseer.ist.psu.edu/babaoglu93consistent.html

• Note: some materials in this paper are used to clarify a few concepts in the next paper.

Beyond…

Distributed Snapshots: Determining Global States of Distributed Systems

K. Mani Chandy

Leslie Lamport

Introduction

• Outline of the paper:– Motivation– Model– Algorithm– Correctness– Other issues

• Beyond…

Motivation

• Capture the global state of a system.

• Really?True global state:

Impossible!!!

p1

e11 e1

2 e13 e1

4 e15 e1

6

p2e2

1 e22 e2

3 e24 e2

5

p1

e11 e1

2 e13 e1

4 e15 e1

6

p2e2

1 e22 e2

3 e24 e2

5

Motivation

• Capture the global state of a system.

• Really?These are what

can be done

Are they useful?

p1

e11 e1

2 e13 e1

4 e15 e1

6

p2e2

1 e22 e2

3 e24 e2

5

Motivation

• Capture the global state of a system.

• Useful?

Equivalent!

p1

e11 e1

2 e13 e1

4 e15 e1

6

p2e2

1 e22 e2

3 e24 e2

5

p1

e11 e1

2 e13 e1

4 e15 e1

6

p2e2

1 e22 e2

3 e24 e2

5

Motivation

• Capture the global state of a system.

• Useful?

Consistent, but not happens in reality.

p1

e11 e1

2 e13 e1

4 e15 e1

6

p2e2

1 e22 e2

3 e24 e2

5

p1

e11 e1

2 e13 e1

4 e15 e1

6

p2e2

1 e22 e2

3 e24 e2

5

Motivation

• Capture the global state of a system.

• Useful?

Not even consistent!

p1

e11 e1

2 e13 e1

4 e15e1

6

p2e2

1 e22 e2

3 e24 e2

5

Motivation

• Capture the global state of a system.

• Useful? Yes:– To detect stable

properties of a system:y(S) y(S’)

for all S’ reachable from S.

– E.g.: • “computation has

terminated,” • “the system is

deadlocked,”• “all tokens in a token

ring have disappeared.”

Model

A distributed system

• A distributed system (on the right).

• A global state = set of processes’ and channels’ states.

• Event:– atomic– e = <p, S, S’, m,

c>• Computation:

– seq =(ei: 0 i n)

– Si+1 = next(Si, ei)

• Channels’ assumptions:– Singly directed– FIFO– Asynchronous– Error free– Infinite buffer

Algorithm

• Invoker: behave as if receiving a marker from a virtual node.• Receiving rule for process q receiving a marker along channel

c:if q has not recorded its state then

begin q records its state; q records the state c as the empty sequenceend

else q records the state of c as the sequence of messages received along c after q’s state was recorded and before q received the marker along c.

• Sending rule for a process p: for each outgoing channel c:p sends one marker along c after p records its

state and before p sends further messages along c.

Illustration

• Next 14 slides, courtesy of Professor Birman.

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

I want to start a

snapshot

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

p records local state

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

p starts monitoring incoming channels

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

“contents of channel p-y”

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

p floods message on outgoing channels…

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

q is done

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

q

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

q

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

q

zs

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

q

v

z

x

u

s

Chandy/Lamport

p

qr

s

t

u

v

w

xy

z

A network

q

v

w

z

x

u

s

y

r

Chandy/Lamport

pq

r

s

t

u

v

w

xy

z

A snapshot of a network

q

x

u

s

v

r

t

w

p

y

z

Done!

Correctness

• Consistency

• Termination

Consistency

• m is recorded iff so is send(m):– sender’s state recording and marker

sending are done atomically.• m is not recorded more than once:

– if channel is recorded before receiver, it will be empty.

– if channel is recorded after receiver, none of the in-channel messages will be recorded as the receiver’s state.

Correctness:

Termination

• Assumptions:– L1: no marker remains forever in a channel.– L2: processes’ states are recorded in finite time.

• Every process either spontaneously records its state, or there is a path from such a process.

• Every channel is flushed by a marker after the sender records its state.

Correctness:

Remained Issues

• Property of recorded state:Si --> S* --> Sf

• Stable detection:– Stable property:

• y(Si) definite• definite y(Sf)

– Algorithm:begin

record a global state S*;definite := y(S*)

end.

Beyond…

• Channels’ assumptions:–Singly directed–FIFO–Asynchronous–Error free–Infinite buffer

Non-FIFO

• What is FIFO for?–Separate messages between

before-snapshot and after-snapshot.

• A snapshot counter piggybacked on messages would do just fine!

Beyond:

Beyond…

• Channels’ assumptions:–Singly directed–FIFO–Asynchronous–Error free–Infinite buffer

Messages can be corrupted/duplicated

Messages can be dropped

Unreliable channels

• How to deal with corruption?– Checksum/ECC; reduced to drop.

• How to deal with duplication?– Message ID

• How to deal with dropping?– Channel states are not needed

anymore.– Markers indicate completion.

Beyond:

Even More Aggressive…

• Don’t want to piggyback!• Step 1: no piggybacking:

– Block all messages sent after recording local state and before receiving marker from all neighbors.

• Step 2: no blocking, min piggybacking– Blocked messages are sent with

piggybacked snapshot info.

Beyond:

Conclusion

• Two influential papers.• Much work built upon these results.• Can be improved significantly when

being adopted to particular systems.

• Additional comments/suggestions?