8/3/2019 D MUTEX (1)
1/34
Mutual Exclusion in Distributed Systems
Single Processor Systems
use semaphore, monitor, etc.
Distributed Systems
centralized algorithm central server coordinate the ordering for entering CS
overload the central site
introduce a single point of failure in the system
8/3/2019 D MUTEX (1)
2/34
Mutual Exclusion in Distributed Systems
decentralized algorithms
non-token based algorithms
Lamport's algorithm
Ricart-Agrawala's algorithm
Maekawa's algorithm
token based algorithms
token-ring algorithm
broadcast algorithm
tree-based algorithm
self-stabilizing algorithm
8/3/2019 D MUTEX (1)
3/34
Lamport's Algorithm
Request the CS:
1. Pibroadcasts request(ti, i) to all processors and puts the request in its local
queue (in the order of timestamps tof the requests)
2. Pj upon receiving the request (ti, i), puts the request in its local queue (in the
order of timestamps tof the requests) and sends reply (tj, j)to Pi
Enter the CS:
1. ifPi has received reply messages from all sites with timestamps larger than ti
and its request is at the top of the queue, then it enters the CS
Release the CS:
1. Pi, upon exiting CS, removes its request from the queue and sends release (ti)
to all processors
2. Pj, upon receiving the message, removes the request from the top of the
queue
8/3/2019 D MUTEX (1)
4/34
Lamport's Algorithm -- Properties
this algorithm requires
a total ordering of events
all sites to be alive
requires 3(N1) messages per request
response time in a very low load 2T
T: per message communication latency
assume there is no one in CS
send N1 request messages sent in parallel (T)
send N1 response messages sent in parallel (T) so, requester enters CS after 2T time
8/3/2019 D MUTEX (1)
5/34
Ricart-Agrawala's Algorithm
Request the CS:1. Pi broadcasts request(ti, i) to all processors
2. Pj, upon receiving the request
a) sends reply (tj, j) to Pi ifPj is neither requesting nor executing in the
CS
b) sends reply (tj, j) to Pi ifPj is requesting the CS but the timestamp forPjs request is larger than ti
c) defers the request otherwise
Enter the CS:
1. ifPi has received reply messages from all sites, then it enters the CS
Release the CS:
1. Pi upon exiting CS, sends reply (j) to all the deferred requests
8/3/2019 D MUTEX (1)
6/34
Ricart-Agrawala's Algorithm
this algorithm requires
a total ordering of events
require all sites to be alive
requires 2(N1) messages per request
response time in a very low load 2T
send N1 request messages in parallel (T)
send N1 response messages in parallel (T)
8/3/2019 D MUTEX (1)
7/34
Maekawa's Algorithm
Request set each node has a request set
when the node wants to enter the critical section, it sends its request to all
nodes in its request set
the request set of each node does not include all nodes in the system
the intersection of any two request sets is non-empty
Example
consider three nodes, X, Y, and Z
Xs request set include nodes X and Y
Ys request set include nodes Y and Z
Zs request set include nodes Z and X
8/3/2019 D MUTEX (1)
8/34
Maekawa's Algorithm
Request the CS:1. Pi multicasts request(ti, i) to its request set, including itself
2. Pj upon receiving the request
a) if it is not currently locked, then locks itself and sends reply (j) to Pi
b) otherwise, puts the request in a queue (in the order of the timestamp)
Enter the CS:
1. if Pi has received reply messages from all sites in its request set, then it
enters the CS
Release the CS:
1. Pi upon exiting CS, sends release (ti) to all processors in its request set
2. Pj upon receiving the message
a) if the waiting queue is not empty then it removes the entry in the queue
and sends reply (j) to that node
b) otherwise, unlocks itself
8/3/2019 D MUTEX (1)
9/34
Maekawa's Algorithm -- Properties
requires a total ordering of events
requires 3Nmessages per request
response time in a very low load
2T
send K1 request messages sent in parallel (T)
send K1 response messages sent in parallel (T)
has the potential deadlock problem
8/3/2019 D MUTEX (1)
10/34
Potential Deadlock Problem in Maekawa's Algorithm
requests reach different sites in different order
consider nodes X, Y, Z, who issue requests to enter the critical section
Xs request has the lowest timestamp, Zs request has the highest
A is the mediator of requests from X and Y
B is the mediator of requests from Y and Z
C is the mediator of requests from X and Z
A received Xs request first and locked itself for X
B received Ys request first and locked itself for Y
C received Zs request first and locked itself for Z
X will not get a reply from C
Y will not get a reply from A
Z will not get a reply from B
deadlock
8/3/2019 D MUTEX (1)
11/34
Solution to the Potential Deadlock Problem
detect the potential deadlock
when a request with a smaller timestamp is received, while the node is
locked for a request with a larger timestamp
resolution
ask the requester with a larger timestamp to give up its granted privilege if
it has not already gotten all replies
for the previous example, C asks Zto give up the granted privilege
8/3/2019 D MUTEX (1)
12/34
Resolve the Potential Deadlock Problem
Request the CS:
1. Pimulticasts request(ti, i) to its request set, including itself
2. Pz upon receiving the request
a) if it is not currently locked, then locks itself and sends reply (z) to Pi
b) if it is currently locked for Pk, then
if request from Pk has a smaller timestamp then puts the new
request in a waiting queue (in the order of the timestamp) and sends
failed(z) to Pi
otherwise (Pi's request has a smaller timestamp), sends inquire (z)
to Pk
8/3/2019 D MUTEX (1)
13/34
Resolve the Potential Deadlock Problem
Request the CS:
3. Pkupon receiving inquire (z)
a) if it has received a failed message then sends relinquish (k) to all sites in
its request set
b) if it has received all reply messages then ignores the inquire message
c) otherwise, simply waits
4. Pz, upon receiving relinquish (k),
a) changes the lock to lock for Pi and sends reply (z)to Pi
Property
requires at most 5N messages per request
response time under very low load: 2T
8/3/2019 D MUTEX (1)
14/34
Request Set Generation
Assume
totalNnodes
Let Si denote the request set for Pi, the request sets have to satisfy
SiSj, for all i,j
Si, for all i, always contains P
i
additional desirable properties
|Si| = |Sj| = K, for all i,j, and for some K
i.e., the request sets are of equal size, and each is of size K
O(Pi) = O(Pj) =D, for all i andj
O(Pi) denotes the number of occurrences ofPi in all request sets i.e., each node is involved inD request sets
8/3/2019 D MUTEX (1)
15/34
Request Set Generation
relationship between KandD
Nnodes, each has a request set of size K
totalNKnodes required (can be duplicates)
since there areNnodes, each site need to be duplicatedD times
K=D
request set size K
consider the first request set, it has Knodes, each of them can be in (K1)
other request sets
Each other request set should contain at least one of the nodes in the first
request set
total K(K1) extra request sets other than the first one
N= K(K1)+1 KN
8/3/2019 D MUTEX (1)
16/34
Request Set Generation
assumeN= K(K1) + 1, for some K, and K1 is a prime number
consider a matrix of size K1 by K1
it can generate Kgroups ofK1 nonintersecting sets
K1 nonintersecting rows
K
1 nonintersecting columns (K2) of (K1) nonintersecting diagonals
different diagonals: jump 1 on each row (the real diagonal), jump 2, ....,
jump (K1)1
each number (out of the first Knumbers) can be combined with each of
the K
1 nonintersecting sets to produce K
1 of 1-element-intersectedsets
8/3/2019 D MUTEX (1)
17/34
Request Set Generation Example -- K=6
N= 6 * 5 + 1 = 31, K= 6, matrix is 5 by 5
the first Knumbers 123456 form one set
1 combined with all rows to form one set
2 combined with all columns to form one set
3 combined with all jump-1 diagonals jump-1 diagonals: 7djpv, 8ekqr, 9flms, ....
4 combined with all jump-2 diagonals
jump-2 diagonals: 7elnu, 8fhov, 9gipr, ....
5 combined with all jump-3 diagonals
jump-3 diagonals: 7fiqt, 8gjmu, ....
6 combined with all jump-4 diagonals
jump-4 diagonals: 7gkos, 8clpt, , bfjnr
total K(K1)+1 = 31 sets
1 2 3 4 5 6
7 8 9 a b
c d e f g
h i j k l
m n o p q
r s t u v
8/3/2019 D MUTEX (1)
18/34
Request Set Assignment Example -- K=6
How to assign the 31 sets to the 31 nodes
node 1 gets the first set: 123456
the request set constructed from each row is assigned to
the 2nd node in the set
e.g., request set 1789ab is assigned to node 7
now, all nodes in the first column have their request sets
node 2 gets the set of 2 and first column
the request set constructed from each column is assigned
to the 2nd node in the set
e.g., node 8 has request set 28dins
note that, set 27chmr is assigned to node 2, not 7
now, the first node of each column and each row have
their request sets
the jump-X diagonals will be assigned to the rest of the
nodes
1 2 3 4 5 6
7 8 9 a b
c d e f g
h i j k l
m n o p q
r s t u v
3 4 5 6
d e f g
i j k ln o p q
s t u v
8/3/2019 D MUTEX (1)
19/34
Request Set Assignment Example -- K=6
the request set constructed from each jump-1 diagonal isassigned to the 3rd node in the request set
request set 37djpv is assigned to node d
but, set 3bciou is assigned to node 3, not node c
the request set constructed from each jump-2 diagonal is
assigned to the 4th node in the request set
e.g., request set 47elnu is assigned to node l
but, set 48fhov is assigned to node 4, not node h
the request set constructed from each jump-3 diagonal is
assigned to the 5th node in the request set
e.g., request set 57fiqt is assigned to node q
but, set 58gjmu is assigned to node 5, not node m
the request set constructed from each jump-4 diagonal is
assigned to the last node in the request set
e.g., request set 67gkos is assigned to node s
but, set 6bfjnr is assigned to node 6, not node r
1 2 3 4 5 6
7 8 9 a b
c d e f g
h i j k l
m n o p q
r s t u v
8/3/2019 D MUTEX (1)
20/34
Request Sets Generation Algorithm (Cont.)
ifK1 is a power of a prime number
it is possible to generate optimal request sets
ifK1 is not a power of a prime number orNcannot be expressed as
K(K1)+1
find a numberMwhereMis the smallest integer which is greater thanN
and can be expressed as K(K1), for some K, where Kis the power of a
prime number
generate the required sets forMprocessors
replace numbersN+1..Mby 1..MN
removeMNsets
same thing can be done for site failures
8/3/2019 D MUTEX (1)
21/34
consider the closest prime number that can be divided into K(K1)+1
N=5M=7
derive the sets fromM=7 and remove the duplicated nodes
1 2 3
4 51 2 -- replace nodes 6 and 7 by 1 and 2
S1 = {1, 2, 3}
S4 = {1, 4, 5}
S6 = {1, 1, 2} remove
S2 = {2, 4, 1} S5 = {2, 5, 2} {2, 5}
S7 = {3, 4, 2} remove
S3 = {3, 5, 1}
Request Set Generation Example -- N=5
8/3/2019 D MUTEX (1)
22/34
Token Ring Algorithm
a unique token is associated with the CS
Pi enters CS only if it owns the token
Request to enter CS:
1. ifPjowns the token and it does not need to enter the CS, then it passes thetoken to P(j+1) mod N
2. Pi will sooner or later gets the token
Enter the CS:
1. when Pi owns the token, it enters CS
Release the CS:
1. pass the token to the next processor
8/3/2019 D MUTEX (1)
23/34
Token Ring Algorithm -- Properties
simple and no deadlock or starvation
number of messages and response time
if only one node needs the token, the token will traverseN/2 nodes on
average
best case: 0 message (the node has the token) 0 delay
worst case:N1 messages (sequentially) (N1)T delay
tolerable overhead with smallN
cannot scale up for largeN
it is difficult to design a fault tolerant algorithm for this scheme
The concept of token is similar to centralized control, however, thecentral site is moving
8/3/2019 D MUTEX (1)
24/34
Suzuki-Kasami's Broadcast Algorithm
data structures:
vectorX: associated with the token
X[i]: the timestamp of the last request from Pi that has been served
vectorRTj: associated with node Pj
RTj[i]: the timestamp of the most current request from Pi known by Pj
nodej determines whether a node khas an outstanding request by checking
whetherRTj[k] >X[k]
8/3/2019 D MUTEX (1)
25/34
8/3/2019 D MUTEX (1)
26/34
Suzuki-Kasami's Broadcast Algorithm
Enter the CS:
ifPi has received the token then it enters the CS
Release the CS:
Pi upon exiting CS, setsX[i]= RTi[i]
execute (A)
8/3/2019 D MUTEX (1)
27/34
Suzuki-Kasami's Broadcast Algorithm -- Properties
this algorithm gives better fault tolerance in the sense of handlingrequests
as long as the request is received by some processors that will possess the
token, the request will be processed
however, the problem of missing token is still there
e.g. the token is held by a dead processors or is sent to a dead processor
requireNmessages per request
N1 messages for broadcasting the request
1 message sending the token
if the node that wants to enter the critical section happens to have the token,
then there is no message needed
response time
in general, there is a delay of 2T
in best case, there is no delay
8/3/2019 D MUTEX (1)
28/34
Raymond's Tree-Based Algorithm
the processors are structured as a tree and the token is placed at the rootnode
the tree restructures when the token moves
Request the CS (going up the tree):
1. Pi send request(i) to its parent and puts the request in its queue if it does not
hold the token
2. Pj upon receiving the request
a) puts the request in its queue
b) if it has not sent a request to its parent then
sends request(j) to its parent
c) otherwise (a request has already been sent to its parent for another
child node)
does nothing
8/3/2019 D MUTEX (1)
29/34
Raymond's Tree-Based Algorithm
Request the CS (going down the tree):
3. root site upon receiving the request
a) puts the request in its queue
b) executes (DTPR)
4. Pj, upon receiving the token,
a) if it was not requesting to enter CS or its request was not on the top of
its queue then executes (DTPR)
D. delete the top entry from its requesting queue
T. send the token to the requesting child
P. update parent pointer to point to the requesting child
R. if its request queue is non-empty then send a request to the new
parent
8/3/2019 D MUTEX (1)
30/34
Raymond's Tree-Based Algorithm
Enter the CS:
1. ifPi has received the token and its request is on the top of its queue then it
enters the CS
Release the CS:
1. Pi upon exiting CS
a) if its queue is not empty, then executes (DTPR)
8/3/2019 D MUTEX (1)
31/34
Raymond's Tree-Based Algorithm-- Example
1
2 3
4 5 6 7
1. token is at node 1node 5 made a request
1
2 3
4 5 6 7
3. node 4 also sends a request,node 2 receives it
1
2 3
4 5 6 7
4. token is at node 2 now
node 2 becomes the root
1
2 3
4 5 6 7
5. node 5 gets the token, it enters CS
6. node 2 sends a request to node 5
2. node 2 receives
the request, it sends
the request to node 1
8/3/2019 D MUTEX (1)
32/34
Raymond's Tree-Based Algorithm-- Example
1
2 3
4 5 6 7
7. node 5 sends the tokento node 2
1
2 3
4 5 6 7
8. node 4 gets the token, it enters CS9. node 3 sends a request
1
2 3
4 5 6 7
10. the request from node 2
comes to node 4
1
2 3
4 5 6 7
11. node 3 gets the token, and becomes
the root
8/3/2019 D MUTEX (1)
33/34
Raymond's Tree-Based Algorithm -- Properties
the node with the token is always the root node
requires the nodes on the entire path, from requester to root, to be alive
in order to process a request
still has the lost token problem
requires 2 logNmessages per request in average
longest path: 2 logN(when the root is at the leaf of the original tree)
best case: 0 messages
worst case: 4 logNmessages (2 logNto the root, 2 logNback with token)
response time
the message passing has to be done sequentially the average response time: T logN
the best case response time: 0
the worst case response time: 4T logN
8/3/2019 D MUTEX (1)
34/34
Performance Comparisons
T: per message transmission time
E: computation time
response time: consider low load
algorithm response time # messages
Lamport 2T+E 3(N1)Ricart-Ag 2T+E 2(N1)Maekawa 2T+E 3N 5Ntoken-ring
[0N
T]+E 0
N
broadcast [0 or 2T]+E 0 or N
tree-based [04T logN]+E [0 4 logN]
Top Related