Time and Global States - Tulane University
Transcript of Time and Global States - Tulane University
Time and Global StatesCMPS 4760/6760: Distributed Systems
1
Acknowledgement: slides adapted from Indranil Guptaβs lecture notes: https://courses.engr.illinois.edu/cs425/fa2019/index.html
Overview
Β§ Physical clocks
Β§ States and events
Β§ Logical clocks and vector clocks
Β§ Global states and snapshot algorithm
2
Time and Clock
Β§ Primary standard of time: rotation of earth β’ 1 solar second = 1/86,400th of a solar day that the Earth takes to complete one
revolution around its axis
Β§ De facto primary standard of time: atomic clocksβ’ 1 atomic second = 9,192,631,770 orbital transitions of Cesium-133 atom. β’ 86400 atomic sec = 1 solar day β approx. 3 ms (leap second correction each year)
Β§ Coordinated Universal Time (UTC) does the adjustment for leap seconds Β±number of hours in your time zone
3
Global Posi6oning System: GPS
A system of 30+ satellites broadcasting accurate spatial coordinates and atomic times
β’ Location and precise time computed by triangulation
β’ no leap sec. correction => 18 seconds ahead of UTC (as of 2017)
β’ Per the theory of relativity, an additional correction is needed. Locally compensated by the receivers
4
TerminologyDrift rate π = ! " # $#
!#Clock skew πΏResynchronization interval π
Max drift rate π%&' implies:
1 β π%&' β€ !"(#)!#
β€ 1 + π%&'
Drift is unavoidable:Β§ Ordinary quartz-oscillators clocks: 10$*
Β§ βHigh precisionβ quartz clocks: 10$+
Β§ Atomic clocks: 10$,-
5
clock drift
Physical clock synchronization
Β§ Why accurate physical Bme is important?
β’ Accurate Ime keeping: air-traffic control systems
β’ Accurate Imestamps: mulI-version objects
β’ Some security mechanisms depend on the physical Imes of events, e.g., Kerberos
6
Physical clock synchronization
Β§ External Synchronization: πΆ! π‘ β π π‘ < πΏ/2
β’ π: a source of UTC time
Β§ Internal Synchronization: πΆ! π‘ β πΆ" π‘ < πΏ
Β§ Challenges: account for propagation delay, processing delay, and faulty clocks
7
Physical clock synchronization
Β§ Bounded drift rate
Β§ Monotonicity: π‘# > π‘ β πΆ π‘# > πΆ(π‘)
β’ Y2K bug
Β§ Hardware clock vs. Software clock
β’ πΆ! π‘ = πΌπ»! π‘ + π½
8
External Synchronization: Cristianβs method
9
Β§ Developed by Cristian in 1989
Β§ Client sends a request to a time server at π$Β§ Server receives the request at π% and sends it back
Β§ Client receives the response at π&, estimates the round triptime π ππ = π& β π$, and sets πΆ! = π% + π ππ/2
β’ Q: Assume that the minimum transmission πππ is known, what is the accuracy of clientβs clock right after synchronization ?
β’ A: Β±(π ππ/2 βπππ)
External Synchronization: Cristianβs method
Β§ πΏ/2 β desired accuracy bound
Β§ π β clock drift rate
Β§ Client pulls data from a time server every π unit of time, whereπ < πΏ/2π (why?)
β’ RTT should be sufficiently short compared with the required accuracy
Β§ Improve accuracy and fault tolerance
β’ Query multiple times & take minimum RTT
β’ Query multiple time servers
10
Internal Synchronization: The Berkeley Algorithm
11
Β§ Initially designed by Gusella and Zatti in 1989 for internal synchronization of a collection of computers running Berkeley UNIX
Β§ The participants elect a master (leader)
Β§ The master coordinates the synchronization
Step 1. Salves send their clock values to the master
Step 2. Master discards outliers and computes the average
Step 3. Master sends the needed adjustment to the slaves
The Berkeley Algorithm
To maintain Monotonicity
Β§ Negative correction => slowdown
Β§ Positive correction => speedup
12
RTT adjustment as in Cristianβs method (not shown in the figure)
Internal synchronization with Byzantine clocks
Β§ Lamport and Melliar-Smithβs algorithm
13
A faulty clock exhibits 2-faced or byzantine behavior
Bad clock
j
k
i
c +Ξ΄
c c βΞ΄
c β 2Ξ΄
Assumeπ clocks, at most π are faulty
Clock π runs the following algorithm:
Step 1. Read every clock in the system
β’ π![π]: clock πβs reading of clock πβs value
Step 2. if π.[π] β π.[π] > πΏ, π.[π] = π.[π]
Step 3. Update the clock using the average ofthese values
Synchronization is maintained if π΅ > ππ
Lamport and Melliar-Smithβs algorithm
14
Bad clocks
j
k
i
c +Ξ΄
c c βΞ΄
c β 2Ξ΄
The maximum difference between the averages computed by two non-faulty nodes is (3ππΏ/π)
To keep the clocks synchronized,'()* < πΏ βΊ π > 3π
Overview
Β§ Physical clocks
Β§ States and events
Β§ Logical clocks and vector clocks
Β§ Global states and snapshot algorithm
15
System Model
Β§ A distributed program consists of a set of π processes, {π&, π', β¦π(}, and a set of unidirectional channels.
β’ Message passing only, no shared memory, no global clock
β’ Channel model: error free, arbitrary but finite delay, no assumptions on ordering
16
π$ π&
π'
States and Actions
Β§ Each process π! has a state π ! that, in general, it transforms as it executes
β’ a state includes the values of all the variables within it (including the program counter)
β’ may also include the values of any objects in its local operating system environment that it affects, such as files
Β§ As each process π! executes it takes a series of actions, each of which is either
β’ message send or receive operation
β’ or an operation that transforms π.βs state β one that changes one or more of the values in π .
Β§ State of a channel: sequence of messages set along the channel but not received
β’ A process may record messages sent and received as part of its local state
17
States and Events
Β§ An event π β πΈ corresponds to an acIon and may change the state of a process and the state of at most one channel incident on that process
β’ Internal events only change the state of a process
β’ External events: sends to/receives from other process
Β§ A set of events (π!+, π!$, π!&, β¦ ) in a single process is called sequenIal, and their occurrences can be totally ordered in Ime using the clock at that process
Β§ A run or a computaIon of a process π! is defined as a sequence of local states and events: π !+π!+π !$β¦π!,π !,
18
Global States
Β§ The set of global states = the cross product of local states and the states of channels.
Β§ An initial global state is one in which all local states are initial, and all the channels are empty
19
Global States
Β§ βtime-based modelβ: a global state is a set of local states that occur simultaneously
Β§ βhappened-before modelβ: a global state is a set of local states that are all concurrentwith each other
20
Alice
BobExplosion 1
Explosion 2
There is nothing called simultaneous in the physical world.
Global States
Β§ Questions:
β’ How do we decide if an event happened before another event when the two events are on different processes?
β’ How do we decide if two events are concurrent?
β’ Can we do these without using physical clocks, since physical clocks are not be perfectly synchronized?
21
Causality
Β§ The event of sending message must have happened before the event of receiving that message
22
Happened Before Model
Notations:
π βΊ π iff π, π are two events in a single process π, and π proceeds π
π βΌ π iff π βΊ π β¨ π = π
π β π iff π = sending a message, and π = receipt of that message
These definitions also apply to states
23
Happened Before Model
The happened-before relation (β) is the smallest relation that satisfies
Rule 1. if (π βΊ π) β¨ (π β π) then π β π.
Rule 2. π β π β§ π β π β π β π
β defines a partial order on πΈ (the set of all events) β why?
π and π are concurrent (denoted by π||π) if Β¬ π β π β§ Β¬ π β π
Again, we can similarly define happened-before relation for states
24
Overview
Β§ Physical clocks
Β§ States and events
Β§ Logical clocks and vector clocks
Β§ Global states and snapshot algorithm
25
Logical Clocks
Β§ A logical clock πΏπΆ is a map from πΈ to β with the constraint:
β π, π β πΈ, π β π β πΏπΆ π < πΏπΆ(π)
Β§ The constraint models:
β’ sequential nature of execution at each process
β’ Physical requirement that any message transmission requires a nonzero amount of time
26
Leslie B. Lamport
Lamport 6mestamps
27
Implementationπ·π:: var
πΏπΆ: integer initially 0;
internal event (): πΏπΆ = πΏπΆ + 1;
send event (π): πΏπΆ = πΏπΆ + 1;piggybacks πΏπΆ on π;
receive event (π, π‘): πΏπΆ = max(πΏπΆ, π‘) + 1;
28
A total ordering on events
Β§ Let π and π be two events at processes π and π, respectively. We can define a total order βͺ of events as:
π βͺ π iff either πΏπΆ π < πΏπΆ π
or πΏπΆ π = πΏπΆ π and π < π
29
Logical Clocks and Causality
Β§ Logical clocks cannot detect causality (why?)
β’ β π, π β πΈ, πΏπΆ π < πΏπΆ π β§ (π β π)
30
Vector Clocks
Β§ A vector clock ππΆ is a map from πΈ to β! with the constraint:
β π, π β πΈ, π β π β ππΆ π < ππΆ(π)
Β§ Given two vectors π₯ and π¦ of dimension π, we compare them as follows:
π₯ < π¦ βΆ= βπ: 0 β€ π β€ π β 1: π₯ π β€ π¦ π β§ βπ: 0 β€ π β€ π β 1: π₯ π < π¦ π
E.g., (2,1,0)<(2,1,1)
π₯ β€ π¦ βΆ= π₯ < π¦ β¨ (π₯ = π¦)
31
Vector timestamps
32
Implementationπ·π:: var
ππΆ: array[1. . π] of integer; initially ππΆ π = 0 βπ;
internal event (): ππΆ[π] = ππΆ[π] + 1;
send event (π): ππΆ[π] = ππΆ[π] + 1;
piggybacks ππΆ on π;
receive event (π, π‘): ππΆ[π] = ππΆ[π] + 1;
βπ: ππΆ π = max ππΆ π , π‘ π ;33
Properties
β π, π β πΈ, π β π β ππΆ π < ππΆ(π)
Theorem: For any two events π and π occurred at two processes π and πrespectively, we have
π β π β βπ β π: ππΆ π π β€ ππΆ π π β§ (ππΆ π π < ππΆ π π )
34
Overview
Β§ Physical clocks
Β§ States and events
Β§ Logical clocks and vector clocks
Β§ Global states and snapshot algorithm
35
Detecting Global Properties
36
Global State
Β§ Current global state is hard to get
β’ No process has global knowledge
Β§ Past global state is often sufficient
β’ Failure recovery
β’ Monitoring stable properties
37
Global State Predicates
Β§ A global state predicate that maps from the set of global states of processes in the system to {True, False}
Β§ A property is called stable if once it is true it stays true forever
β’ E.g., Is the object garbage? Is the system deadlocked? Or Is the system terminated?
38
Global Snapshot
39
Global State
Β§ βtime-based modelβ: a global state is a set of local states that occur simultaneously.
Β§ βhappened-before modelβ: a global state is a set of local states that are all concurrent with each other.
40
Local States
Β§ A distributed system β of π processes, {π$, π&, β¦π*}, connected by unidirectional channels.
Β§ Event ordering on a single process
β’ π β! π#: event π occurs before πβ² at π!β’ history(π!) = β! =< π!+, π!$, π!&, β¦ >
β’ β!, =< π!+, π!$, π!&, β¦ , π!, >β’ π !, - the state of process π! immediately before the πth event occurs, so that π !+ is the initial state of π!β’ processes record the sending or receipt of all messages as part of their state
41
Cuts
Β§ A cut of the systemβs execution is a subset of its global history that is a union of prefixes of process histories:
πΆ = β$-! βͺ β&
-" βͺβ―βͺ β*-#
Β§ π!-$ - the last event processed by π! in the cut πΆ
Β§ The set of events π!-$: π = 1,2, β¦ ,π is called the frontier of the cut
Β§ A cut defines a global state π = (π $, π &, β¦ , π *) where π ! is the state of π!immediately after event π!
-$
42
Cuts
43
m1 m2
p1
p2Physical
time
e10
Consistent cutInconsistent cut
e 11
e 12
e 13
e 20
e 21
e 22
A cut πΆ is consistent, if for each event it contains, it also contains all the events that happened-before that event: βπ β πΆ, π β π β π β πΆ
A consistent global state (or a snapshot) is one that corresponds to a consistent cut
Runs and Linearizations
Β§ A run is a total ordering of all the events in a global history that is consistent with each local historyβs ordering
Β§ A linearization (or a consistent run) is a total ordering of all the events in a global history that is consistent with the happened-before relation
Β§ A state πβ² is reachable from a state π if there is a linearization that passes through π and then πβ
44
Safety and Liveness
Β§ Safety β ``something bad does no occurββ
β’ A system is safe with respect to an undesirable property πΌ if the πΌ evaluates to
False for all states π reachable from the original state π0
Β§ Liveness β ``something good will eventually occurββ
β’ For any linearization πΏ starting in the state π0, a desirable property π½ evaluates to True for some state π1 reachable from π0
45
Chandy and Lamportβs snapshot algorithm
Β§ Assumptionsβ’ Neither channels nor processes failβ’ Channels are unidirectional and FIFO-ordered (First in First out)β’ The graph of processes and channels is strongly connected (there is a path
between any two processes)β’ Any process may initiate a global snapshot at any timeβ’ The processes may continue their execution and send and receive normal
messages while the snapshot takes place
46
Chandy and Lamportβs snapshot algorithm
Β§ Informal description
β’ Each process is either white or red. All processes are initially white
β’ After recording the local state, a process turns red
Β§ Two difficulties
β’ Need to ensure that the recorded local states are mutually concurrent
β’ Need to capture the state of channels
47
Classification of Messages
48
π
π
ww
wr
rrrw
Chandy and Lamportβs snapshot algorithm
Β§ Informal descriptionβ’ Each process is either white or red. All processes are initially whiteβ’ After recording the local state, a process turns redβ’ Once a process turns red, it is required to send a special message called a marker along all its outgoing channels before it sends out any normal message, and start recording messages from all incoming channels
β’ Once π! receives a marker from π"β’ it is required to turn red if it has not already done so
β’ it stops recording messages from π"
49
Chandy and Lamportβs snapshot algorithm
50
π·π::
varπππππ: {π€βππ‘π, πππ} initially π€βππ‘π;// assume π incoming channelsπβππ: array[1. . π] queues of messages initially ππ’ππ;ππππ ππ: array[1. . π] of boolean initially ππππ π;
π‘π’ππ_πππ() enabled if (πππππ == π€βππ‘π):save_local_state; πππππ = πππ;send (ππππππ) to all neighbors;
Upon ππππππ£π(ππππππ) on channel πif(πππππ == π€βππ‘π) π‘π’ππ_πππ();ππππ ππ[π] = π‘ππ’π;
Upon ππππππ£π(ππππ_πππ π πππ) on channel πif(πππππ == πππ β§ Β¬ππππ ππ[π])πβππ[π] = ππππππ(πβππ[π],
ππππ_πππ π πππ)
Chandy and Lamportβs snapshot algorithm
Β§ The algorithm terminates when β’ All processes have received a Marker β’ To record their own state
β’ All processes have received a Marker on all the (N-1) incoming channels at each β’ To record the state of all channels
Β§ Then, (if needed), a central server collects all these partial state pieces to obtain the full global snapshot
51
Example 1
P2
TimeP1
P3
A B C D E
F G H
I J K
MessageInstruction or Step
52
P1 is Initiator: β’ Record local state S1,β’ Send out markersβ’ Turn on recording on channels C21, C31
P2
TimeP1
P3
A B C D E
F G H
I J K
S1, Record C21, C31
β’ First Marker!β’ Record own state as S3β’ Mark C13 state as empty β’ Turn on recording on other incoming C23β’ Send out Markers
P2
TimeP1
P3
A B C D E
F G H
I J K
S1, Record C21, C31
β’ S3β’ C13 = < >β’ Record C23
P2
TimeP1
P3
A B C D E
F G H
I J K
S1, Record C21, C31
β’ S3β’ C13 = < >β’ Record C23
Duplicate Marker!State of channel C31 = < >
P2
TimeP1
P3
A B C D E
F G H
I J K
P2
TimeP1
P3
A B C D E
F G H
I J K
S1, Record C21, C31
β’ S3β’ C13 = < >β’ Record C23
C31 = < >
β’ First Marker!β’ Record own state as S2β’ Mark C32 state as empty β’ Turn on recording on C12β’ Send out Markers
P2
TimeP1
P3
A B C D E
F G H
I J K
S1, Record C21, C31
β’ S3β’ C13 = < >β’ Record C23
C31 = < >
β’ S2β’ C32 = < >β’ Record C12
P2
TimeP1
P3
A B C D E
F G H
I J K
S1, Record C21, C31
β’ S3β’ C13 = < >β’ Record C23
C31 = < >
β’ S2β’ C32 = < >β’ Record C12
β’ Duplicate! β’ C12 = < >
P2
TimeP1
P3
A B C D E
F G H
I J K
S1, Record C21, C31
β’ S3β’ C13 = < >β’ Record C23
C31 = < >
β’ S2β’ C32 = < >β’ Record C12
C12 = < >
β’ Duplicate! β’ C21 = <message HΓ D >
P2
TimeP1
P3
A B C D E
F G H
I J K
S1, Record C21, C31
β’ S3β’ C13 = < >β’ Record C23
C31 = < >
β’ S2β’ C32 = < >β’ Record C12
C12 = < >
C21 = <message HΓ D >
β’ Duplicate!β’ C23 = < >
P2
TimeP1
P3
A B C D E
F G H
I J K
S1
β’ S3β’ C13 = < >
C31 = < >
β’ S2β’ C32 = < > C12 = < >
C21 = <message HΓ D >
β’ C23 = < >
Algorithm has Terminated
P2
TimeP1
P3
A B C D E
F G H
I J K
S1
S3 C13 = < >
C31 = < >
S2 C32 = < >C12 = < >
C21 = <message HΓ D >
C23 = < >
Collect the Global Snapshot Pieces
Example 2
Β§ Two processes trade in βwidgetsβ
Β§ Process π$ sends orders for widgets over π& to π&, enclosing payment at the rate of $10 per widget
Β§ Some time later, process π&sends widgets along channel π$ to π$
64
An Execution of the System
65
An Execution of the System
66
The snapshot state is π$: $1000,0 , π&: $50,1995 , π1: bive widgets , π&:
Β§ this state differs from all the global states through which the system actually passed
Termination
Β§ Theorem: The ChandyβLamport algorithm ensures that eventually all processes turn redand all channels are closed
Β§ Proof sketch:
β’ We assume that a process that has received a marker message records its state within a finite time and sends marker messages over each outgoing channel within a finite time.
β’ If there is a path of communication channels and processes from a process π% to a process π&, then π& will record its state a finite time after π% recorded its state and close its incoming channel from π%. It will then send a marker to π% (if π% is an outgoing neighbor of π&) so that π% will close its incoming channel from π& within a finite time.
β’ Since the graph of processes and channels to be strongly connected, all processes will have recorded their states and the states of incoming channels a finite time after some process initially records its state.
67
Correctness
Β§ Theorem: The ChandyβLamport algorithm records a consistent cut/global state and all the WR messages
Β§ Proof sketch (of the first part):β’ Let π! and π" be events occurring at π! and π", respectively, such that π! β π". We assert that
if π" is in the cut, then π! is in the cutβ’ That is, if π" occurred before π" recorded its state, then π! must have occurred before π!
recorded its state. This is obvious if π = π. Assume π β π.β’ Assume π! recorded its state before π! occurred (proof by contradiction)β’ Consider the sequence of π» messages π#, π$, β¦ ,π%, giving rise to π! β π". β’ By FIFO ordering of channels and the marker sending and receiving rules, a marker message
would have reached π" ahead of each of π#, π$, β¦ ,π%, so that π" would have recorded its state before π", a contradiction.
68
Understanding snapshot
Β§ The snapshot state is a consistent global state that is reachablefrom the iniWal state. It may not actually be visited during a specific execuWon
Β§ The final state of the original computaWon is always reachablefrom the snapshot state
Β§ Proof in the textbook
69
Reachability
Β§ If a stable predicate is True in the state π%./0 then we may conclude that the predicate is True in the state π(!./1
Β§ If the predicate evaluates to False for π%./0 , then it must also be False for π!.!2
70