Synchronization patterns Classical problems Deadlock ... · allow only one task to proceed, others...
Transcript of Synchronization patterns Classical problems Deadlock ... · allow only one task to proceed, others...
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
� Synchronization primitives
� HW primitives
� Atomic operations
� Low-level synchronization primitives
� Exclusive locks, rwlocks, seq. locks, non-blocking data structures
� Locking strategies and issues
� High-level synchronization primitives
� Synchronization patterns
� Classical problems
� Deadlock management
Università degli studi di Udine Sistemi operativi – Operating Systems
Concurrency
� Multiple applications (multiprogramming)
� independent application
� processes unaware of others
� competition on shared resources
� cooperating application
� processes indirectly aware of others
� cooperation by sharing resources
� synchronization
� Parallel applications
� processes/threads directly aware of others
� cooperation by communication (messages or shared variables)
� synchronization
Università degli studi di Udine Sistemi operativi – Operating Systems
Concurrence issues
� Race conditions
� final results depend on execution order
� Starvation
� some task waits indefinitely
� Deadlock
� a circular waiting dependency prevents work to proceed
Università degli studi di Udine Sistemi operativi – Operating Systems
Race condition
� Results depend on the order of the execution
a=a+b
process A
b=a+b
process B
shared vars
a=1 ; b=2
a=? ; b=?a=3 ; b=5
a=4 ; b=3
a=3 ; b=3
Università degli studi di Udine Sistemi operativi – Operating Systems
Race condition
� Results depend on the order of the execution
local tmpA
tmpA=count
tmpA=tmpA+1
count=tmpA
process A
local tmpB
tmpB=count
tmpB=tmpB+1
count=tmpB
process B
shared var
count=0
count=?count=2 OK
count=1 NO
Università degli studi di Udine Sistemi operativi – Operating Systems
Mutual exclusion
� Group of instructions must be executed atomically
local tmpA
tmpA=count
tmpA=tmpA+1
count=tmpA
process A
local tmpB
tmpB=count
tmpB=tmpB+1
count=tmpB
process B
shared var
count=0
count=2
BeginSection / Lock
EndSection / Unlock
Critical
Section
BeginSection / Lock
EndSection / Unlock
Critical
Section
Università degli studi di Udine Sistemi operativi – Operating Systems
Starvation
D C B A
Execute
E
ready processes RUN
D E C B
Execute
A
ready processes RUN
Università degli studi di Udine Sistemi operativi – Operating Systems
Deadlock
Task A Task B
Task D Task C
wait
wait
waitwait
Wrong synchronization!
System is blocked!
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
Synchronization primitives
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
� HW primitives
� processor instructions
� usually not privileged
� Low-level synchronization primitives
� built on top of HW primitives
� do not require scheduler intervention� can be implemented at user level
� High-level synchronization primitives
� built on top of low-level primitives
� interact with scheduler� from user level, imply syscalls
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
HW primitives
Synchronization primitives
Università degli studi di Udine Sistemi operativi – Operating Systems
HW primitives
� Atomic Read, atomic Write� not practical
� requires N accesses to synchronize N tasks� requires unique IDs for tasks
� Atomic Read-Modify-Write� allows to implement simple spin-locks
� implementation independent of involved tasks� no "unique IDs" requirement
� clever implementation can reduce contention� ticket, array-based, queue-based locks
� Atomic Read-Test-Modify-Write� allows wait-free synchronization
� Load-Link and Store-Conditional� do not require a double memory access in a single instruction
Università degli studi di Udine Sistemi operativi – Operating Systems
HW primitives
� Atomic Read-Modify-Write
� minimal feature to implement practical locks
� Test-and-Set
� Read-and-Increment
� x86: lock xadd
� Exchange
� x86: xchg
� ARM:swp
� deprecated since ARMv6
� Others:
� fetch_and_sub, fetch_and_or, ...
int Test-and-Set(int *ptr){ int old = *ptr; *ptr = 1; return old;}
int Read-and-Increment(int *ptr, int inc){ int old = *ptr; *ptr = old + incr; return old;}
int Exchange(int *ptr, int new){ int old = *ptr; *ptr = new; return old;}
pseudo-code
atomic
atomic
atomic
Università degli studi di Udine Sistemi operativi – Operating Systems
HW primitives
� Atomic Read-Test-Modify-Write
� allows wait-free and lock-free synchronization
� Compare-and-Exchange or Compare-and-Swap (CAS)
� x86: lock cmpxchg
int Compare-Exchange(int *ptr, int testval, int new){ int old = *ptr; if (old == testval) *ptr = new; return old;}
pseudo-code
atomic
Università degli studi di Udine Sistemi operativi – Operating Systems
HW primitives
� Load-Link and Store-Conditional
� do not require a double memory access in a single instruction
� MIPS:
� ll, sc
� ARM:
� ldrex, strex
int LL(int *ptr){ remember this access return *ptr;}
int SC(int *ptr, int val){ if (this cpu has executed LL on ptr) { if (*ptr written since the last LL performed by this cpu) return SC_FAILURE; /* fail */ else { /* *ptr has not changed */ *ptr = val; return SC_SUCCESS; /* success */ } } unspecified behavior}
pseudo-code
atomic
atomic
Università degli studi di Udine Sistemi operativi – Operating Systems
� Load-Link and Store-Conditional
� do not require a double memory access in a single instruction
� MIPS:
� ll, sc
� ARM:
� ldrex, strex
HW primitives
atomic
pseudo-code
atomic
LL x
Modify x
SC x
failure � operations not atomic: retry
LL x
Modify x
SC x
success � operations was atomic: go on
PROCESSOR A PROCESSOR B
Atomic Read-Modify-Write
Università degli studi di Udine Sistemi operativi – Operating Systems
HW primitives: summary
� Atomic accesses:
� Read-Modify-Write operations
� fetch_and_add, fetch_and_sub, fetch_and_or, fetch_and_and, ...� perform the operation suggested by the name, and return the old value
� swap, add_and_fetch, sub_and_fetch, or_and_fetch,
and_and_fetch, ...� perform the operation suggested by the name, and return the new value
� Read-Test-Modify-Write operations
� compare_and_swap
Università degli studi di Udine Sistemi operativi – Operating Systems
Operation costs
� Typical values:
� Best-case Atomic increment: 50 – 100 cycles
� Best-case Compare-and-Exchange: 50 – 100 cycles� CAS on a variable in cache
� Memory barrier: 100 – 150 cycles
� Single cache miss: 200 – 300 cycles
� Compare-and-Exchange cache miss: 500 – 1000 cycles
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
Low-level
synchronization primitives
Synchronization primitives
Università degli studi di Udine Sistemi operativi – Operating Systems
Low-level synchronization primitives
� Exclusive locks
� Reader-Writer locks
� Sequential locks
� Non-blocking data structures
Università degli studi di Udine Sistemi operativi – Operating Systems
Low-level synchronization primitives
� Exclusive locks
� allow only one task to proceed, others must wait
� several tasks compete to acquire a lock
� only one wins (acquires the lock)
� others wait until the lock is released
� e.g.,
� entering in a critical section � lock acquisition
� exiting the critical section � lock releasing
Università degli studi di Udine Sistemi operativi – Operating Systems
Exclusive lock
� Binary variable
� States:
� locked (or acquired, or held),
� unlocked (or free, or available)
� Operations
� lock(lock_var)
if lock_var is unlocked then lock_var becomes locked
else the calling task cannot proceed until lock_var becomes unlocked
� unlock(lock_var)
lock_var becomes unlocked
note: unlock should be called by the task that holds the lock
Università degli studi di Udine Sistemi operativi – Operating Systems
Exclusive lock implementations
� Classical locking algorithms
� Dekker's algorithm
� Peterson's algorithm
� Lamport's bakery algorithm
� Spinlocks
� Polling on a variable
� Basic implementation
� Ticket spinlock
� Array spinlock
Università degli studi di Udine Sistemi operativi – Operating Systems
Exclusive lock implementations
� Implementation
� based on Atomic Read and atomic Write
� Classical locking algorithms
� Dekker's algorithm
� Peterson's algorithm
� Lamport's bakery algorithm
� based on
Atomic Read-Modify-Write, or
Atomic Read-Test-Modify-Write, or
Load-Link and Store-Conditional
� spinlock
Needs unique task id (no id reuse)
Must read N memory locations
Requires sequential consistency
Limited to 2 tasks
Requires processor consistency
Università degli studi di Udine Sistemi operativi – Operating Systems
Classical locking algorithms
� Lock algorithm proprieties
� Mutual exclusion (safety property)
� critical sections of different threads do not overlap� cannot guarantee integrity of computation without this property!
� No deadlock
� if someone attempts to acquire the lock, then someone will acquire it� does not imply deadlock-free programs
� No starvation
� every thread that attempts to acquire the lock eventually succeeds
� implies no deadlock
� desirable but not essential� practical locks: many permit starvation, if it is unlikely to occur
� without a real-time guarantee, starvation freedom is weak
Università degli studi di Udine Sistemi operativi – Operating Systems
Classical locking algorithms
� Dekker's algorithm (1964)
� for 2 tasks
� Peterson's algorithm (1981)
� for 2 tasks
� generalizable to N tasks (filter algorithm)
� Lamport's “bakery” algorithm (1974)
� for N tasks
Università degli studi di Udine Sistemi operativi – Operating Systems
Classical locking algorithms
� Use atomic load and store only, no stronger atomic primitives
� Not used in practice
� locks based on stronger atomic primitives are more efficient
� Why study classical lock algorithms?
� understand the principles underlying synchronization
� ubiquitous in parallel programs
� appreciate their subtlety
� motivate the need for hardware
Università degli studi di Udine Sistemi operativi – Operating Systems
Wrong algorithm - 1
#define N 2 /* number of processes */int flag[N]; /* initialized to all 0s */
void lock(int process /* 0 or 1 */){ int other = 1 - process; flag[process] = 1; while (flag[other] == 1) ; /* wait */}
void unlock(int process /* 0 or 1*/){ flag[process] = 0;}
I'm interested
Università degli studi di Udine Sistemi operativi – Operating Systems
Wrong algorithm - 1
int other = 1 – process;
flag[process] = 1;
while (flag[other] == 1) ; /* wait */
....
critical section
...
flag[process] = 0;
task A (process=0)
int other = 1 – process;
flag[process] = 1;
while (flag[other] == 1) ; /* wait */
....
critical section
...
flag[process] = 0;
task B (process=1)
unlock
OK
Università degli studi di Udine Sistemi operativi – Operating Systems
Wrong algorithm - 1
int other = 1 – process;
flag[process] = 1;
while (flag[other] == 1) ; /* wait */
task A (process=0)
int other = 1 – process;
flag[process] = 1;
while (flag[other] == 1) ; /* wait */
task B (process=1)
lockedlocked
deadlock
Università degli studi di Udine Sistemi operativi – Operating Systems
Wrong algorithm - 2
#define N 2 /* number of processes */int turn = 0; /* who has prenoted access */
void lock(int process /* 0 or 1 */){ turn = process; while (turn == process) ; /* wait */}
void unlock(int process /* 0 or 1*/){}
other goes first
Università degli studi di Udine Sistemi operativi – Operating Systems
Wrong algorithm - 2
turn = process;while (turn == process) ; /* wait */
....
critical section
...
turn = process;while (turn == process) ; /* wait */
....
critical section
...
task A (process=0)
turn = process;while (turn == process) ; /* wait */
....
critical section
...
task B (process=1)
unlock
OK
unlock
Università degli studi di Udine Sistemi operativi – Operating Systems
Dekker's algorithm
#define N 2 /* number of processes */int flag[N]; /* initialized to all 0s */int turn = 0; /* who has prenoted access */
void lock(int process /* 0 or 1 */){ int other = 1-process; flag[process] = 1; while flag[other] { flag[process] = 0; while (turn != process) ; /* wait */ flag[process] = 1; }}
void unlock(int process /* 0 or 1*/){ turn = 1-process; /* other */ flag[process] = 0;}
Università degli studi di Udine Sistemi operativi – Operating Systems
Peterson's algorithm
#define N 2 /* number of processes */int flag[N]; /* initialized to all 0s */int turn = 0; /* who has prenoted access */
void lock(int process /* 0 or 1 */){ int other = 1-process; flag[process] = 1; turn=process; while( turn == process && flag[other] == 1 ) ; /* wait */ }
void unlock(int process /* 0 or 1*/){ flag[process] = 0;}
I'm interestedother
goes
first
Università degli studi di Udine Sistemi operativi – Operating Systems
Peterson's algorithm
other = 1-process;flag[process] = 1;turn=process;while(turn==process && flag[other]==1);
....
critical section
...
flag[process] = 0;
task A (process=0)
other = 1-process;flag[process] = 1;turn=process;while(turn==process && flag[other]==1);
....
critical section
...
flag[process] = 0
task B (process=1)OK
unlock
Università degli studi di Udine Sistemi operativi – Operating Systems
Peterson's algorithm
other = 1-process;flag[process] = 1;turn=process;
while(turn==process && flag[other]==1);
....
critical section
...
flag[process] = 0;
task A (process=0)
other = 1-process;flag[process] = 1;
turn=process;
while(turn==process && flag[other]==1);
....
critical section
...
flag[process] = 0
task B (process=1)OK
unlock
Università degli studi di Udine Sistemi operativi – Operating Systems
Lamport's bakery algorithm
� On arrival get a (incremental) ticket
� The bakery serves who has the smallest ticket
10 11 12 138
9
7
6
arriving tasks
tickets
waiting tasks
served
task
now serving:
6
Università degli studi di Udine Sistemi operativi – Operating Systems
Lamport's bakery algorithm
int flag[N]; /* initialized to all 0s */int ticket[N]; /* initialized to all 0s */ lock(int process) { int j; flag[process] = 1; ticket[process] = 1 + max(ticket[0], ..., ticket[N-1]); flag[process] = 0; for (j = 0; j < N; j++) { while (flag[j]) ; /* wait if task-j is getting its ticket */ /* Wait for threads with higher priority */ while ( ticket[j] != 0 && ( (ticket[j]<ticket[process]) || ((ticket[j]==ticket[process]) && j<process)) ) ; /* wait */ }}
void unlock(int process){ ticket[process] = 0;}
Università degli studi di Udine Sistemi operativi – Operating Systems
Observations
� Bakery algorithm is concise, elegant and fair
� Why is it not practical?
� must read N distinct locations (N could be very large)
� threads must be assigned unique IDs between 0 and N-1
� awkward for dynamic threads
� value of a label is monotonically increasing and unbounded�
� There can exist a more clever lock using only atomic
load/store that avoids these problems?
� No. Any deadlock-free algorithm requires reading or writing
at least N distinct locations in the worst case.
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
Spinlocks
Synchronization primitives
Low-level synchronization primitives
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlocks
� Repeatedly check the lock variable
� loop until is locked
� at kernel level: disable preemption
� Note: on uni-processor systems, just disable preemption
� clever implementation can reduce contention
� ticket locks
� array-based locks
� queue-based locks
� whenever possible, processor is turned in a low-power state
when waiting
� e.g., with a wfe or wfi in ARM
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (example)
void lock(int *lck){ while(Test-and-Set(lck) == 1) { continue; /* wait! */ } /* memory barrier if needed */}
void unlock(int *lck){ /* memory barrier if needed */ *lck = 0;}
void lock(int *lck){ while(Exchange(lck, 1) == 1) { continue; /* wait! */ } /* memory barrier if needed */}
void unlock(int *lck){ /* memory barrier if needed */ *lck = 0;}
Leveraging Test-and-Set
Leveraging Exchange
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (example)
void lock(int *lck){ while ((*lck == 1) || (Test-and-Set(lck) == 1)) { continue; /* wait! */ } /* memory barrier if needed */}
void unlock(int *lck){ /* memory barrier if needed */ *lck = 0;}
void lock(int *lck){ do { while (*lck == 1) { continue; /* wait! */ } } while(Exchange(lck, 1) == 1); /* memory barrier if needed */}
void unlock(int *lck){ /* memory barrier if needed */ *lck = 0;}
Leveraging Test-and-Set
(reducing communication)
Leveraging Exchange
(reducing communication)
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (example)
void lock(int *lck){ /* * write code here * */
/* memory barrier if needed */}
void unlock(int *lck){ /* memory barrier if needed */ *lck = 0;}
Using LL and SC
write code here
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (ticket)
� Same principle of Lamport's bakery algorithm
� arriving tasks get a ticket
� atomically
� there is a global indicator: the current turn
� each task waits until current turn is equal to its own ticket
� a leaving task increments the current turn
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (ticket)
typedef struct { int next; int current;} lock_t;
void init_lock(lock_t *lck){ lck->next = lck->current = 0;}
void lock(volatile int *lck){ int myturn;
myturn = /* get a ticket */
/* wait until it's my turn */
}
void unlock(int *lck){
/* increment current turn */
}
current turn
must be an atomic operation
atomicity not required:
only one task here
next available ticket
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (ticket)
typedef struct { int next; int current;} lock_t;
void init_lock(lock_t *lck){ lck->next = lck->current = 0;}
void lock(volatile int *lck){ int myturn;
myturn = fetch_and_add(lck->next, 1);
while (myturn != lck->current) continue; /* wait! */ /* memory barrier */}
void unlock(int *lck){ /* memory barrier */ lck->current++; /* memory barrier */}
atomically acquire current value
and store a new value for the field
next
loop until the field next becomes
equal to the field owner
lck must be volatile for this test
for efficiency: the previous write
become visible on all CPUs as
soon as possible: spinning is
reduced
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (ticket)
typedef struct { int next; int current;} lock_t;
void init_lock(lock_t *lck){ lck->next = lck->current = 0;}
void lock(volatile int *lck){ int myturn;
myturn = fetch_and_add(lck->next, 1);
while (myturn != lck->current) delay(myturn - lck->current); /* memory barrier */}
void unlock(int *lck){ /* memory barrier */ lck->current++; /* memory barrier */}
to reduce contention
(delay can be a simple empty loop)
With a back-off delay
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (ticket)
typedef struct { int next; int current;} lock_t;
void init_lock(lock_t *lck){ lck->next = lck->current = 0;}
void lock(volatile int *lck){ int myturn;
myturn = fetch_and_add(lck->next, 1);
while (myturn != lck->current) wait_for_event(); /* memory barrier */}
void unlock(int *lck){ /* memory barrier */ lck->current++; /* memory barrier */ send_event();}
to reduce contention
(if there is architectural support)
With sleeping
needed
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (ticket)
typedef struct { int next; int current;} lock_t;
void init_lock(lock_t *lck){ lck->next = lck->current = 0;}
void lock(volatile int *lck){ int myturn;
while (myturn != lck->current) continue; /* wait! */ /* memory barrier */}
void unlock(int *lck){ /* memory barrier */ lck->current++; /* memory barrier */}
implement fetch_and_add
using LL and SC
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (ticket)
�Only 1 atomic instruction executed per lock acquisition
�Fair, locks granted in order of request: no starvation
�Back off delay proportional to position in queue
� if time in critical section is constant, the delay can be calculated
such that the subsequent test of lck->current will just succeed
�Polling on a single shared location
� bus traffic with an invalidate cache coherency protocol (e.g., MESI)
� delay not necessary with a write-update protocol (e.g., Firefly)
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (array-based)
� Each task must poll a different location
� arriving tasks get an index
� atomically
� each task waits until current its own lock becomes free
� a leaving task unlocks the following one
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (array-based)
LockedLocked
UnlockedUnlocked
LockedLocked
LockedLocked
LockedLocked
LockedLocked
LockedLocked
LockedLocked
LockedLockedidxidx
Task 1:release lock
Task 2:acquired lock
Task 3:waiting
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlock implementation (array-based)
typedef struct { int flags[N]; int queuelast, winner_idx;} lock_t;
void init_lock(lock_t *lck){ int i; lck->flags[0] = HAS_LOCK; for (i=1; i<N; i++) lck->flags[i] = MUST_WAIT; lck->queuelast = 0;}
void lock(volatile int *lck){ int myplace;
myplace = fetch_and_add(lck->queuelast, 1);
while (lck->flags[myplace % N] == MUST_WAIT) continue; lck->winner_idx = myplace; /* memory barrier */}
void unlock(int *lck){ /* memory barrier */ lck->flags[lck->winner_idx % N] = MUST_WAIT; lck->flags[(lck->winner_idx + 1) % N] = HAS_LOCK; /* memory barrier */}
should be padded: each element in
a different cache line
N must be a power of 2
get a location to poll
(each task obtains a different index)
only a task here: record the index used
(needed in unlock)
allows the next waiting task to proceed
Università degli studi di Udine Sistemi operativi – Operating Systems
�Tasks do not poll a single shared location
� reduced bus traffic for a write-invalidate cache coherency protocol
�Lock is passed from a task to the next
� through a shared slot in an array
� this slot is not shared with any other thread
�Only 1 atomic instruction executed per lock acquisition
�Fair, lock is granted in order of request: no starvation
�Need to know max number of threads
Spinlock implementation (array-based)
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlocks
�Applicable to any number of tasks
�Applicable to any number of processors (shared memory)
�Simple
� thus easy to verify
�Support multiple critical sections
� each critical section is identified by its own lock variable
Università degli studi di Udine Sistemi operativi – Operating Systems
Spinlocks
� Process waits by executing a loop
� Can be implemented at user level
� no syscalls are required by user level code
� CPU time is wasted
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
Reader-writer locks
Synchronization primitives
Low-level synchronization primitives
Università degli studi di Udine Sistemi operativi – Operating Systems
Low-level synchronization primitives
� Reader-Writer locks
� 2 categories of tasks: readers and writers� readers can proceed concurrently� a writer must have exclusive access
� increase parallelism� readers advance in parallel� a new reader can proceed if other readers are accessing data
� tasks must be specialized� readers do not modify data!
� writers may starve� a writer must wait until there are no more readers, but a new reader
can steal the waiting writers turn� give priority to writers � increased complexity (thus overhead)
W
Università degli studi di Udine Sistemi operativi – Operating Systems
Reader-Writer lock
� 3-state variable
� States:� unlocked� reader_locked� writer_locked
� Operations
� read_lock(lock_var)
if lock_var is writer_locked the calling task cannot proceed
else lock_var becomes (or stays) reader_locked
� write_lock(lock_var)
if lock_var is unlocked then lock_var becomes writer_locked
else the calling task cannot proceed
� unlock(lock_var) lock_var is reader_locked and no more readers � lock_var becomes unlocked
lock_var is writer_locked � lock_var becomes unlocked
Università degli studi di Udine Sistemi operativi – Operating Systems
RWlock implementation (example)
const W = 1;const R = 2;
typedef int lock_t;
void init_lock(lock_t *lck){ *lck = 0;}
void read_lock(volatile lock_t *lck) { fetch_and_add(lck, R); while(lck & W) continue;}
void write_lock(lock_t *lck) { while(CAS(lck, 0, W) != 0) continue;}
void read_unlock(lock_t *lck) { fetch_and_add(lck, -R);}
void write_unlock(lock_t *lck) { fetch_and_add(lck, -W);}
�Simple
�Not efficient
� Polling CAS
�Not fair
� Readers are preferred
� Writers can starve
Università degli studi di Udine Sistemi operativi – Operating Systems
Reader-Writer lock
� Variants
� more states
� VAX/VMS Distributed Lock Manager: 6-state lock
� states: Unlocked, Concurrent-Read, Concurrent-Write,
Protected-Read, Protected-Write, Exclusive
� DBMS: even more than 30 states!
����������
��������� ������������� ���������� ���� ������������ ��������� ���� ���������
�������� � � � � �������������� � � � � ����������� ���� � � � � ������������� � � � � ���������� ���� � � � � ���������� � � � � �
Result:
: allowed
: blocked
Università degli studi di Udine Sistemi operativi – Operating Systems
Low-level synchronization primitives
� Sequential locks
� Similar to reader-writer locks but writers have priority
� a writer is never blocked by readers
� writers do not starve
� a writer is only serialized with respect to other writers
� readers try to get data
� operation is restarted if a conflict with a writer is detected
do { seq = read_seqbegin(&foo); ...} while (read_seqretry(&foo, seq));
write_seqlock(&test_seqlock);... /* update data */write_sequnlock(&test_seqlock);
Reader Writer
Example
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
Locking
strategies and issues
Synchronization primitives
Low-level synchronization primitives
Università degli studi di Udine Sistemi operativi – Operating Systems
Locking strategies
� Giant lock
� the whole code (e.g., a library) is protected with a single lock
� simplest approach
� allows to port non-parallel code in parallel architectures
� available parallelism is lost
Università degli studi di Udine Sistemi operativi – Operating Systems
Locking strategies
� Coarse-grained locking
� code is split in subsystems
� e.g., for an OS kernel
� filesystems
� memory management
� network stack
� video drivers
� input drivers
� ...
� each subsystem is protected with its own lock
� calls to different subsystem can proceed concurrently
� communication between different subsystems can still require
a global lock
Università degli studi di Udine Sistemi operativi – Operating Systems
Locking strategies
� Fine-grained locking
� locks protect individual data structures
� scalable
� several locks must be managed
� need to understand which locks are required
� order on locks requests
� management of a hierarchy of locks
� rules!
Università degli studi di Udine Sistemi operativi – Operating Systems
Locking issues
� Deadlock
� circular waiting dependency that prevents work to proceed
� tasks blocked on a lock held by a task waiting for another lock held...
� Convoying
� set of tasks repeatedly competing for a lock
� progression speed is limited by the slowest task
� fast tasks are forced to slow down
� similar to a column of cars in a single lane
Università degli studi di Udine Sistemi operativi – Operating Systems
Locking issues
� Priority Inversion
� a high priority task (TH) blocked on a lock held by a low priority
task (TL)
� an independent medium priority task (TM) is ready
� � TM is scheduled to run� � TH obtains an actual lower priority than TM
� Workarounds
� disable preemption when a lock is held� requires disabling interrupts
� priority ceiling� give the highest priority to a task that holds a lock
� priority inheritance� �task is blocked on a lock its priority passes to the lock owner (if higher)
Università degli studi di Udine Sistemi operativi – Operating Systems
Locking issues
� Signal-safety
� signal handlers (and interrupt handlers) cannot share locks
with the other code
� e.g.,
1. task1 holds lockA
2. task1 is interrupted by a signal
3. signal handler requires lockA
� signal handler blocks, task1 cannot proceed
� � deadlock
� disable signals (or interrupts) when lockA is acquired
� not required for locks that are not used in signal handlers too
Università degli studi di Udine Sistemi operativi – Operating Systems
Locking issues
� Kill-tolerant availability
� tasks killed while holding a lock
� Pre-emption tolerance
� tasks pre-empted while holding a lock
Università degli studi di Udine Sistemi operativi – Operating Systems
Locking issues
� Overall performance
� overhead of lock primitives
� global communication
� memory barriers
� depends on lock contention
� non-contended lock is stored only in a CPU cache� still, not for free: memory barrier
� contended locks bounce from a cache to other caches� cache misses
� look for efficient algorithms
� use specialized locks
� e.g., reader-writer locks
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
Non-blocking data
structures
Synchronization primitives
Low-level synchronization primitives
Università degli studi di Udine Sistemi operativi – Operating Systems
Low-level synchronization primitives
� Non-blocking data structures
� Lock-free data structures
� e.g. lock-free linked lists
� see Linux llist
� others: buffer, stack, queue, map, snapshot
� Wait-free data structures
� much harder than lock-free
� not always possible
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock- and Wait-free synchronization
� Lock-free synchronization:
� At least one thread will make progress in finite time
� A data structure is lock-free if and only if some operation
completes after a finite number of steps system-wide have
been executed on the structure
� Wait-free synchronization:
� Every thread will make progress in finite time
� A data structure is wait-free if and only if every operation on
the structure completes after it has executed a finite
number of steps
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock-free stack (example)
struct NodeType {
Datatype data;
struct NodeType *next;
};
struct NodeType *Head;
void init() {
Head = NULL;
}
void push(struct NodeType *n) {
n->next = Head;
Head = n;
}
struct NodeType *pop() {
struct NodeType *n;
n = Head;
if (n != NULL)
Head = n->next;
return n;
}
Head
NULL
top of the stackGlobal data is changed here
If nobody else has changed global data
� changes are valid
otherwise, abort and retry
Not concurrent:
a lock is needed to make push and
pop atomic
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock-free stack (example)
struct NodeType {
Datatype data;
struct NodeType *next;
};
struct NodeType *Head;
void init() {
Head = NULL;
}
void push(struct NodeType *n) {
do {
n->next = Head;
} while (CAS(&Head, n->next, n) != n->next);
}
struct NodeType *pop() {
struct NodeType *n;
do {
n = Head;
} while (n != NULL && CAS(&Head, n, n->next) != n);
return n;
}
Lock free
Head
NULL
top of the stack
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock-Free issues
� Designing generalized lock-free algorithms is hard
� � Design lock-free data structures instead� buffer, list, stack, queue, map, deque, snapshot
� ABA problem
� typical lock-free operation� task1:
1. acquire atomically a flag (finds the value A)
2. use data
3. test the current value of flag� �if A data not changed: ok to proceed; else, repeat the operation
� problem:� after task1.1, task2 stores B to flag
� before task1.3, task2 changes data and store A to flag� � task1 is not aware of changes
� data inconsistency
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock-free stack (example): ABA
struct NodeType {
Datatype data;
struct NodeType *next;
};
struct NodeType *Head;
void init() {
Head = NULL;
}
void push(struct NodeType *n) {
do {
n->next = Head;
} while (CAS(&Head, n->next, n) != n->next);
}
struct NodeType *pop() {
struct NodeType *n, *next;
do {
n = Head; next = n->next;
} while (n != NULL && CAS(&Head, n, next) != n);
return n;
}
Lock free
Head
A B C NULL
top of the stack
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock-free stack (example): ABA
TASK1: TASK2:
n = Head;
n1 = Head;
CAS(&Head, n1, n1->next)
n2 = Head;
CAS(&Head, n2, n2->next)
n1->next = Head;
CAS(&Head, n1->next, n1)
CAS(&Head, n, n->next)
Head
A B C NULL
n
n1
n2
n = pop();
n1 = pop();
n2 = pop();
push(n1);
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock-free stack (example): ABA
TASK1: TASK2:
n = Head;
n1 = Head;
CAS(&Head, n1, n1->next)
n2 = Head;
CAS(&Head, n2, n2->next)
n1->next = Head;
CAS(&Head, n1->next, n1)
CAS(&Head, n, n->next)
Head
A B C NULL
n
n1
n2
n = pop();
n1 = pop();
n2 = pop();
push(n1);
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock-free stack (example): ABA
TASK1: TASK2:
n = Head;
next = n->next;
n1 = Head;
CAS(&Head, n1, n1->next)
n2 = Head;
CAS(&Head, n2, n2->next)
n1->next = Head;
CAS(&Head, n1->next, n1)
CAS(&Head, n, n->next)
Head
A B C NULL
top of the stack
n = pop();
n1 = pop();
n2 = pop();
push(n1);
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock-free stack (example): ABA
TASK1: TASK2:
n = Head;
next = n->next;
n1 = Head;
CAS(&Head, n1, n1->next)
n2 = Head;
CAS(&Head, n2, n2->next)
n1->next = Head;
CAS(&Head, n1->next, n1)
CAS(&Head, n, n->next)
n = pop();
n1 = pop();
n2 = pop();
push(n1);
Head
A B C NULL
n
n1
n2
next
CAS is successful
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock-free stack (example): ABA
� Do not reuse nodes
� task2:
n1 = pop(); n1 = pop();
n2 = pop(); n2 = pop();
push(n1); n3 = new node
n3.data = n1.data;
push(n3);
� When can n1 be freed?
� after n1 is released, another task can obtain that memory as a new
node
� � ABA can happen
Università degli studi di Udine Sistemi operativi – Operating Systems
ABA solutions
� Deferred reclamation
� Do not reuse nodes
� Don't recycle the memory “too soon”
� Garbage collector
� Hazard pointers
� Read-Copy-Update
� Use the same CAS for 2 pointers
� needs a double-word CAS
� Tagged pointers
� some bits of a pointer are used as a counter
� beware the wrap-around
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock-free list (example)
struct NodeType {
Datatype data;
struct NodeType *next;
};
struct NodeType *Head, *Tail;
void init() {
NodeType *dummynode;
dummynode = malloc(sizeof struct NodeType);
dummynode->next = NULL;
Head = Tail = dummynode;
}
void insert(struct NodeType *n) {
struct NodeType *tmp;
n->next = NULL;
tmp = Tail;
tmp->next = n;
Tail = n;
}
struct NodeType *remove() {
struct NodeType *n;
n = Head->next;
if (n != NULL) {
Head = n;
}
return n;
}
Head
NULL
dummy node
Tail
first node
Not concurrent
discard dummy node;
n becomes the new dummy node Not concurrent:
a lock is needed to make push and
pop atomic
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock-free list (example)
struct NodeType {
Datatype data;
struct NodeType *next;
};
struct NodeType *Head, *Tail;
void init() {
NodeType *dummynode;
dummynode = malloc(sizeof struct NodeType);
dummynode->next = NULL;
Head = Tail = dummynode;
}
void insert(struct NodeType *n) {
struct NodeType *tmp;
n->next = NULL;
tmp = Tail;
tmp->next = n;
Tail = n;
}
struct NodeType *remove() {
struct NodeType *n;
n = Head->next;
if (n != NULL) {
Head = n;
}
return n;
}
Head
NULL
dummy node
Tail
first node
Not concurrent:
a lock is needed to make push and
pop atomic
after every step, the list must remain consistent:
- nodes are all linked
- Head points to the dummy node
- Tail is after Head
concurrent tasks must “cooperate”
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock-free list (example)
Head
NULL
dummy node
Tail
first node
struct NodeType {
Datatype data;
struct NodeType *next;
};
struct NodeType *Head, *Tail;
void init() {
NodeType *dummynode;
dummynode = malloc(sizeof struct NodeType);
dummynode->next = NULL;
Head = Tail = dummynode;
}
void insert(struct NodeType *n) {
struct NodeType *tmp, *ntmp;
n->next = NULL;
do {
tmp = Tail;
ntmp = tmp->next;
if (Tail != tmp) continue;
if (ntmp != NULL) {
CAS(&Tail, tmp, tmp->next);
continue;
}
} while (CAS(&tmp->next, NULL, n) != NULL);
CAS(&Tail, tmp, n);
}
struct NodeType *remove() {
struct NodeType *n, *h, *t;
do {
h = Head;
t = Tail;
n = h->next;
if (Head != h) continue;
if (n == NULL)
break;
if (h == t) {
CAS(&Tail, t, n);
continue;
}
} while (CAS(&Head, h, n) != h);
return n;
}Lock free
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
High-level
synchronization primitives
Synchronization primitives
Università degli studi di Udine Sistemi operativi – Operating Systems
� Semaphores
� Semaphores (Counting semaphores)
� Binary semaphores
� Mutexes
� Condition variables
� Monitors
� Deferred processing
� e.g., Read-Copy-Update (RCU)
High-level synchronization primitives
Università degli studi di Udine Sistemi operativi – Operating Systems
� Semaphore
� Integer variable
� Operations (all atomic)
� initialize
� set the initial value� an arbitrary non-negative value
� semWait (also: P)
� decrement value; if the result is negative, then suspend the calling processif suspended, the process is stored on a list associated to the semaphore
� used to enter in a critical section
� semSignal (also: V)
� increment value; if the result is non-positive, then resume a suspended processthe process to be resumed is read from the list associated to the semaphore
� used to leave a critical section
High-level synchronization primitives
Università degli studi di Udine Sistemi operativi – Operating Systems
Semaphores
� Strong semaphore
� task are resumed in FIFO order
� fair implementation
� Weak semaphore
� no order is imposed on task reactivations
Università degli studi di Udine Sistemi operativi – Operating Systems
Semaphores
� Binary semaphore
� Semaphore that can only assume values 0 or 1
� initialize
� Only 0 or 1 are valid initial values
� semSignal
� if value is 0, then resume a waiting process (if any)the process to be resumed is read from the list associated to the semaphore
� semWait
� if value is 0, then suspend process, else decrement valueif suspended, the process is stored on a list associated to the semaphore
Università degli studi di Udine Sistemi operativi – Operating Systems
Semaphore implementation (example)
typedef struct semaphore_t { int count; int lock; QUEUE suspended;} semaphore;
void semWait(semaphore *sem){ lock(&sem->lock); sem->count--; if (sem->count < 0) { place this process in sem->suspended unlock(&sem->lock); suspend this process } else { unlock(&sem->lock); }}
void semSignal(semaphore *sem){ lock(&sem->lock); sem->count++; if ( sem->count <= 0 ) { remove a process P from sem->suspended place process P on the ready list } unlock(&sem->lock);}
kernel-level
operations
access to
sem must
be atomic
spinlock
protected
section
Università degli studi di Udine Sistemi operativi – Operating Systems
High-level synchronization primitives
� Mutex
� Similar to a binary semaphore but
only the task owning the mutex can unlock it
� The same semantic of low-level locks
� but scheduler is into play
� Reentrant (or recursive) mutex
� a task can acquire the mutex multiple times
� multiple levels of ownership
� must be released the same number of times
Università degli studi di Udine Sistemi operativi – Operating Systems
High-level synchronization primitives
� Monitor
� abstract data type
� accessible only through “access procedures” (all atomic and exclusive)
� Only a task can access the monitor at a time
� Object oriented approach
� e.g., in C++ a monitor can be implemented with a class where:
� there is a reentrant mutex as a field
� all methods get the mutex on entry
� all methods release the mutex on exit
� signaling is realized with explicit condition variables
Università degli studi di Udine Sistemi operativi – Operating Systems
� Condition variables
� Condition to test
� Operations (all atomic)
� cond_wait
� sleep until another task calls signal or broadcast
� cond_notify (also: signal)
� wake up a waiting task
� cond_notifyAll (also: broadcast)
� wake up all waiting tasks
High-level synchronization primitives
Università degli studi di Udine Sistemi operativi – Operating Systems
Condition variables
� Using condition variables
� on waiting: use a loop
� condition set by signaling task may be no more true
� another task could have changed the condition after the signaling
� MESA semantic
� Hoare semantic:
� after the signaling the waiting thread is woken up
� nobody else gets control on the condition
� hard to implement (never used in practice)
� a lock (or a mutex) is required
� prevents the race condition:
� sequence: test (done is 0), set (done becomes 1), cond_signal, cond_wait
� the signaling is lost � the waiting thread is never woken up
task A
task B
Università degli studi di Udine Sistemi operativi – Operating Systems
Condition variables pseudocode (example 1)
typedef struct { lock_t lock; QUEUE waiting;} cond_t;
void cond_wait(cond_t *cond_var){ atomically add this task to cond_var->waiting unlock(&cond_var->lock); suspend this task lock(&cond_var->lock);}
void cond_notify(cond_t *cond_var){ atomically remove a task T from cond_var->waiting resume T (place T on the ready list)}
void cond_notify_all(cond_tr *cond_var){ for each task T in cond_var->waiting { atomically remove a task T from cond_var->waiting resume T (place T on the ready list) }}
Università degli studi di Udine Sistemi operativi – Operating Systems
Condition variables (example 1)
/* do something */
/* need to wait for task B */
lock(&cv->lock);while (done == 0) { cond_wait(cv);}unlock(&cv->lock);
Task A
/* do something */
lock(&cv->lock);done = 1;cond_notify(cv);unlock(&cv->lock);
/* now task A can advance */
Task B
test on condition and
call to cond_wait
must be atomic
changes on condition and
call to cond_notify
must be atomic
protected by the same lock
done is initially is 0
Università degli studi di Udine Sistemi operativi – Operating Systems
Condition variables pseudocode (example 2)
typedef struct { QUEUE waiting;} cond_t;
void cond_wait(cond_t *cond_var, lock_t *lck){ atomically add this task to cond_var->waiting unlock(lck); suspend this task lock(lck);}
void cond_notify(cond_t *cond_var, lock_t *lck){ atomically remove a task T from cond_var->waiting resume T (place T on the ready list)}
void cond_notify_all(cond_t *cond_var, lock_t *lck){ for each task T in cond_var->waiting { atomically remove a task T from cond_var->waiting resume T (place T on the ready list) }}
Università degli studi di Udine Sistemi operativi – Operating Systems
Condition variables (example 2)
/* do something */
/* need to wait for task B */
lock(&lck);while (done == 0) { cond_wait(&cv, &lck);}unlock(&lck);
Task A
/* do something */
lock(&lck);done = 1;cond_notify(&cv, &lck);unlock(&lck);
/* now task A can advance */
Task B
test on condition and
call to cond_wait
must be atomic
changes on condition and
call to cond_notify
must be atomic
protected by the same lock
done is initially is 0
Università degli studi di Udine Sistemi operativi – Operating Systems
Read-Copy Update
� Synchronization for read-mostly data
� Update is split in:
� removal
� reclamation
� Publish-Subscribe Mechanism
� Simple to apply to data structures
� lists, arrays
Università degli studi di Udine Sistemi operativi – Operating Systems
Read-Copy Update
� Update is split in:
� removal
� remove the reference to old data
� reclamation
� free memory
� � removal does not need to wait for running readers
� � reclamation must wait until readers have done
Università degli studi di Udine Sistemi operativi – Operating Systems
Read-Copy Update
� Publish-Subscribe Mechanism
� subscribe data
� rcu_dereference
� publish new data
� rcu_assign_pointer
� old data is “reclaimed” when is no more needed
� after a “grace” period
Università degli studi di Udine Sistemi operativi – Operating Systems
Readers-Writers with RCU
data
dataptr
p1
data2
rcu_read_lock();
p1 = rcu_dereference(dataptr);
... do something with data
rcu_read_unlock();
Reader(s)
p2
lock(&writers_lock);
... prepare new data pointed by p2
oldp = dataptr;rcu_assign_pointer(dataptr, p2);
unlock(&writers_lock);
synchronize_rcu();
... free data pointed by oldp
Writer(s)
p1 is only valid between rcu_read_lock
and rcu_read_unlock
Università degli studi di Udine Sistemi operativi – Operating Systems
Readers-Writers with RCU
� Readerrcu_read_lock(); Reader signals its arrival
p1 = rcu_dereference(dataptr); Reader gets a reference to data
... do something with data
rcu_read_unlock(); Reader signals its leaving
� Writerlock(&writers_lock); Writer synchronizes with other writers
... prepare new data pointed by p2
oldp = dataptr;
rcu_assign_pointer(dataptr, p2); Writers “publishes” new data
unlock(&writers_lock); Writer synchronizes with other writers
synchronize_rcu(); Writer waits until “active” readers complete
... free data pointed by oldp Old data is no more needed (can be freed)
No blocking
operations
here!
Università degli studi di Udine Sistemi operativi – Operating Systems
Readers-Writers with RCU
� Readerrcu_read_lock();
p1 = rcu_dereference(dataptr);
... do something with data
rcu_read_unlock();
� Writerlock(&writers_lock);
... prepare new data pointed by p2
oldp = dataptr;rcu_assign_pointer(dataptr, p2);
unlock(&writers_lock);
call_rcu(..., reclaim_func);
Called asynchronously when readers have completedreclaim_func(...){
... free data}
To avoid waiting on writers
No blocking
operations
here!
Università degli studi di Udine Sistemi operativi – Operating Systems
Read-Copy Update
�Performance
� Readers
� do not acquire locks
� do not perform atomic instructions
� do not need memory barriers (but for Alpha)
�Deadlock immunity
�Realtime latency
Università degli studi di Udine Sistemi operativi – Operating Systems
Read-Copy Update
�Readers and Updaters run concurrently
� Readers can obtain old data
�Low-priority RCU readers can block high-priority
Reclaimers
�Grace-period latencies can extend for many milliseconds
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
Synchronization patterns
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization patterns
� Signaling
� instruction (or instructions block) A1 must be executed before B1
instruction A1;
semSignal(sem);
task A
semWait(sem);
instruction B1;
task B
sem is initialized with 0
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization patterns
� Mutual exclusion (Mutex)
� A1 and B1 cannot overlap
semWait(sem_mutex);
instruction A1;
semSignal(sem_mutex);
task A
semWait(sem_mutex);
instruction A1;
semSignal(sem_mutex);
task B
sem_mutex is initialized with 1
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization patterns
� Multiplex
� generalized mutex
� no more than k tasks can access to critical section
� same structure of mutex (initialize sem_mutex with k)
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization patterns
instruction A1;
semSignal(semBgo);semWait(semAgo);
instruction A2;
task A
instruction B1;
semSignal(semAgo);semWait(semBgo);
instruction B2;
task B
semAgo and semBgo are initialized with 0
� Rendezvous
� both A1 and B1 must be executed before A2 and B2
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization patterns
instruction A1_before;
barrier(B,k)
instruction A1_after;
task A1
instruction A2_before;
barrier(B,k)
instruction A2_after;
task A2
instruction Ak_before;
barrier(B,k)
instruction Ak_after;
task Ak
� Barrier
� generalized rendezvous (to k tasks)
� use a barrier object
� implemented on top of semaphores
Università degli studi di Udine Sistemi operativi – Operating Systems
Barriers
An implementationtypedef struct barr_t { int arrived; semaphore mutex, sem;} barr;
void barrier(barr b, int n_proc){ semWait(b.mutex); tmp = ++b.arrived; semSignal(b.mutex);
if (tmp != n_proc) { semWait(b.sem); semSignal(b.sem); } else { semWait(b.mutex); b.arrived = 0; semSignal(b.mutex); semSignal(b.sem); }}
arrived and sem are initialized with 0 ; mutex is initialized with 1
not reusable (after all tasks leaved, sem==1)
an additional semWait(sem) is needed
Università degli studi di Udine Sistemi operativi – Operating Systems
Barriers
V
phase 1
phase 2
� Reusable barrier
� where the final semWait should be issued?
� after all tasks leaved the barrier
� otherwise one waiting task will not resume
� before tasks leave the barrier
� otherwise a task can reenter the barrier before the final semWait
Università degli studi di Udine Sistemi operativi – Operating Systems
Barriers
An implementationtypedef struct barr_t { int arrived; semaphore mutex, phase1, phase2;} barr;
void barrier(barr b, int n_proc){ semWait(b.mutex); tmp = ++b.arrived; semSignal(b.mutex);
if (tmp != n_proc) { semWait(b.phase1); semSignal(b.phase1); } else { semWait(b.phase2); semSignal(b.phase1); }
arrived and phase1 are initialized with 0 ;
mutex and phase2 are initialized with 1
semWait(b.mutex); tmp = --b.arrived; semSignal(b.mutex);
if (tmp != 0) { semWait(b.phase2); semSignal(b.phase2); } else { semWait(b.phase1); semSignal(b.phase2); }}phase 1
phase 2
semaphore phase2 needs an
additional semWait too
use phase 2 to issue the additional
semWait to put the semaphore phase1
at its initial value
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
Classical problems
Università degli studi di Udine Sistemi operativi – Operating Systems
Classical problems
� Use semaphores to solve:
� Producer – Consumer (with a bounded buffer)
� Readers – Writers
� no priority
� no-starve writers
� writers with priority
� Dining philosophers
Università degli studi di Udine Sistemi operativi – Operating Systems
Producer – Consumer
� Some tasks produce data
� Some tasks consume data
� Data are consumed in the same order they are produced
� The queue size is known and limited (e.g., a circular buffer)
Producers Consumers
FIFO
Università degli studi di Udine Sistemi operativi – Operating Systems
Producer – Consumer
� Grant exclusive access to the queue
� Producer signals to consumers that a new data is ready
� Consumer signals that space is available in queue
semWait(space);semWait(mutex);queue.insert(data);semSignal(mutex);semSignal(inqueue);
Producer
semWait(inqueue);semWait(mutex);data = queue.get();semSignal(mutex);semSignal(space);
Consumer
mutex is initialized with 1
inqueue is initialized with 0 (initial data into queue)
space is initialized with queue size (initial room into queue)
Università degli studi di Udine Sistemi operativi – Operating Systems
Producer – Consumer
semWait(space);semWait(mutex);queue.insert(data);semSignal(mutex);semSignal(inqueue);
Producer
semWait(mutex);semWait(inqueue);data = queue.get();semSignal(mutex);semSignal(space);
Consumer
DEADLOCK
������������������ ����������� ������ ������� ����������������� ���������
� swap semWaits? WRONG� consumer waits into a critical section� producer cannot pass the critical section
� producer cannot send semSignal to consumer
Università degli studi di Udine Sistemi operativi – Operating Systems
Producer – Consumer
� swap semSignal?
� No deadlock
� Additional context switches can occur
semWait(space);semWait(mutex);queue.insert(data);semSignal(mutex);semSignal(inqueue);
Producer
semWait(inqueue);semWait(mutex);data = queue.get();semSignal(space);semSignal(mutex);
Consumer
mutex is initialized with 1
inqueue is initialized with 0 (initial data into queue)
space is initialized with queue size (initial room into queue)
non-optimal implementation
Università degli studi di Udine Sistemi operativi – Operating Systems
Producer – Consumer
� Using conditional variables and mutexes
� Grant exclusive access to the queue
� Producer signals to consumers that a new data is ready
� Consumer signals that space is available in queue
mutex_lock(mutex);while(count == MAX) cond_wait(space, mutex);queue.insert(data); count++;cond_signal(datain, mutex);mutex_unlock(mutex);
Producer
mutex_lock(mutex);while(count == 0) cond_wait(datain, mutex);data = queue.get(); count--;cond_signal(space, mutex);mutex_unlock(mutex);
Consumer
mutex is initialized with 1
count is initialized with 0 (initial data into queue)
datain is used to signal that there is some data in the queue
space is used to signal that there is free space in the queue
Università degli studi di Udine Sistemi operativi – Operating Systems
Readers – Writers
Shared data
Writers
Readers
� Some tasks write in a shared area
� Some tasks read from the shared area
� No order must be enforced
� Data integrity must be preserved� do not read half-written data
Università degli studi di Udine Sistemi operativi – Operating Systems
Readers – Writers - 1
semWait(noWriters);write datasemSignal(noWriters);
Writer
semWait(mutex);if (readers==0) semWait(noWriters);readers++;semSignal(mutex);
read data
semWait(mutex);readers--;if (readers==0) semSignal(noWriters);semSignal(mutex);
Reader
noWriters is initialized with 1 (initially none is accessing to shared data)
mutex is initialized with 1
readers is initialized with 0 (initially no readers are reading)
� Grant exclusive access for writers
� First arriving reader must signal that data is used (and wait for writer)
� Last leaving reader must signal that none is using data
Università degli studi di Udine Sistemi operativi – Operating Systems
Readers – Writers - 2
� Writers has less chance to get data� possible starvation
� Solution:� do not allow incoming readers to access data until waiting
writers have been served
Shared data
Writer (waiting)
ReadersIncoming readers
Università degli studi di Udine Sistemi operativi – Operating Systems
Readers – Writers - 2
semWait(writer_in);semWait(noWriters);write datasemSignal(noWriters);semSignal(writer_in);
Writer
semWait(writer_in);semSignal(writer_in);
semWait(mutex);if (readers==0) semWait(noWriters);readers++;semSignal(mutex);
read data
semWait(mutex);readers--;if (readers==0) semSignal(noWriters);semSignal(mutex);
Reader
noWriters is initialized with 1 ; mutex is initialized with 1 ; readers is initialized with 0
writer_in is initialized with 1 (readers and writers can try to proceed)
� Block readers when a writer is waiting
� Resume readers when a writer finishes
� Readers must not hold the writer_in semaphore
Università degli studi di Udine Sistemi operativi – Operating Systems
Readers – Writers - 2
semWait(writer_in);semWait(noWriters);write datasemSignal(noWriters);semSignal(writer_in);
Writer
semWait(writer_in);semSignal(writer_in);
semWait(mutex);if (readers==0) semWait(noWriters);readers++;semSignal(mutex);
read data
semWait(mutex);readers--;if (readers==0) semSignal(noWriters);semSignal(mutex);
Reader
� readers and writers wait on writer_in
� one writer or one reader is selected
� how to grant priority to writers?
Università degli studi di Udine Sistemi operativi – Operating Systems
Readers – Writers - 3
semWait(mutexW);if (writers==0) semWait(noReaders);writers++;semSignal(mutexW);
semWait(noWriters);write datasemSignal(noWriters);
semWait(mutexW);writers--;if (writers==0) semSignal(noReaders);semSignal(mutexW);
Writer
semWait(noReaders);
semWait(mutexR);if (readers==0) semWait(noWriters);readers++;semSignal(mutexR);
semSignal(noReaders);
read data
semWait(mutexR);readers--;if (readers==0) semSignal(noWriters);semSignal(mutexR);
Reader
writers wait here when readers are into (so new readers are blocked)
readers wait here when a writer is into
(so incoming writers can still block readers)
noWriters is initialized with 1 ; mutexR and mutexW are initialized with 1 ; readers is initialized with 0
noReaders is initialized with 1 ; writers is initialized with 0
Università degli studi di Udine Sistemi operativi – Operating Systems
Readers – Writers
� Readers – Writers – 1
� Simple
� Writers can starve
� Readers – Writers – 2
� No starvation for writers
� Readers – Writers – 3
� Writers have priority over readers
Università degli studi di Udine Sistemi operativi – Operating Systems
Dining philosophers
0
14
23
01
2
3
4
for (;;) { think(); get_forks(); eat(); release_forks();}
philosopher
non-critical section
critical section
� Only one philosopher can hold a fork at a time.
� No deadlock.
� No starvation.
� Allows more eating philosopher at the same time
� Five plates� one for each philosopher
� Five forks
� To eat two forks are needed
Università degli studi di Udine Sistemi operativi – Operating Systems
Dining philosophers
for (;;) { think();
/* get_forks(); */ semWait(fork[(i+1)%5]); semWait(fork[i]);
eat();
/* release_forks(); */ semSignal(fork[(i+1)%5]); semSignal(fork[i]);}
philosopher
fork[5] are initialized with 1
Università degli studi di Udine Sistemi operativi – Operating Systems
Dining philosophers
� Deadlock
think();
semWait(fork[1]);semWait(fork[0]);
eat();
semSignal(fork[1]);semSignal(fork[0]);
philosopher 0
think();
semWait(fork[2]);semWait(fork[1]);
eat();
semSignal(fork[2]);semSignal(fork[1]);
philosopher 1
think();
semWait(fork[3]);semWait(fork[2]);
eat();
semSignal(fork[3]);semSignal(fork[2]);
philosopher 2
think();
semWait(fork[4);semWait(fork[3]);
eat();
semSignal(fork[4]);semSignal(fork[3]);
philosopher 3
think();
semWait(fork[0);semWait(fork[4]);
eat();
semSignal(fork[0]);semSignal(fork[4]);
philosopher 4
Università degli studi di Udine Sistemi operativi – Operating Systems
Dining philosophers
� Allow only 4 philosophers to try to acquire forks
think();
semWait(fork[1]);semWait(fork[0]);
eat();
semSignal(fork[1]);semSignal(fork[0]);
philosopher 0
think();
semWait(fork[2]);semWait(fork[1]);
eat();
semSignal(fork[2]);semSignal(fork[1]);
philosopher 1
think();
semWait(fork[3]);semWait(fork[2]);
eat();
semSignal(fork[3]);semSignal(fork[2]);
philosopher 2
think();
semWait(fork[4);semWait(fork[3]);
eat();
semSignal(fork[4]);semSignal(fork[3]);
philosopher 3
think();
semWait(fork[0);semWait(fork[4]);
eat();
semSignal(fork[0]);semSignal(fork[4]);
philosopher 4
Università degli studi di Udine Sistemi operativi – Operating Systems
Dining philosophers
� Symmetric solution
for (;;) { think();
/* get_forks(); */ semWait(mutex4); semWait(fork[(i+1)%5]); semWait(fork[i]);
eat();
/* release_forks(); */ semSignal(fork[(i+1)%5]); semSignal(fork[i]); semSignal(mutex4);}
philosopher
fork[5] are initialized with 1
mutex4 is initialized with 4
Università degli studi di Udine Sistemi operativi – Operating Systems
Dining philosophers
� Asymmetric solution
for (;;) { think();
/* get_forks(); */ semWait(fork[(i+1)%5]); semWait(fork[i]);
eat();
/* release_forks(); */ semSignal(fork[(i+1)%5]); semSignal(fork[i]);}
L-philosopher
fork[5] are initialized with 1 ; there is at least one L-philosopher and one R-philosopher
for (;;) { think();
/* get_forks(); */ semWait(fork[i]); semWait(fork[(i+1)%5]);
eat();
/* release_forks(); */ semSignal(fork[(i+1)%5]); semSignal(fork[i]);}
R-philosopher
Università degli studi di Udine Sistemi operativi – Operating Systems
Dining philosophers
� No more than 4 philosophers can try to acquire forks
think();
semWait(fork[1]);semWait(fork[0]);
eat();
semSignal(fork[1]);semSignal(fork[0]);
L-philosopher 0
think();
semWait(fork[2]);semWait(fork[1]);
eat();
semSignal(fork[2]);semSignal(fork[1]);
L-philosopher 1
think();
semWait(fork[3]);semWait(fork[2]);
eat();
semSignal(fork[3]);semSignal(fork[2]);
L-philosopher 2
think();
semWait(fork[4);semWait(fork[3]);
eat();
semSignal(fork[4]);semSignal(fork[3]);
L-philosopher 3
think();
semWait(fork[4);semWait(fork[0]);
eat();
semSignal(fork[0]);semSignal(fork[4]);
R-philosopher 4
ph 4 is blocked by ph 3
Università degli studi di Udine Sistemi operativi – Operating Systems
Dining philosophers
think();
semWait(fork[1]);semWait(fork[0]);
eat();
semSignal(fork[1]);semSignal(fork[0]);
L-philosopher 0
think();
semWait(fork[2]);semWait(fork[1]);
eat();
semSignal(fork[2]);semSignal(fork[1]);
L-philosopher 1
think();
semWait(fork[3]);semWait(fork[2]);
eat();
semSignal(fork[3]);semSignal(fork[2]);
L-philosopher 2
think();
semWait(fork[4);semWait(fork[3]);
eat();
semSignal(fork[4]);semSignal(fork[3]);
L-philosopher 3
think();
semWait(fork[4);semWait(fork[0]);
eat();
semSignal(fork[0]);semSignal(fork[4]);
R-philosopher 4
think();
semWait(fork[1]);semWait(fork[0]);
eat();
semSignal(fork[1]);semSignal(fork[0]);
think();
semWait(fork[2]);semWait(fork[1]);
eat();
semSignal(fork[2]);semSignal(fork[1]);
think();
semWait(fork[3]);semWait(fork[2]);
eat();
semSignal(fork[3]);semSignal(fork[2]);
think();
semWait(fork[4);semWait(fork[3]);
eat();
semSignal(fork[4]);semSignal(fork[3]);
think();
semWait(fork[4);semWait(fork[0]);
eat();
semSignal(fork[0]);semSignal(fork[4]);
ph 3 is blocked by ph 4
ph 0 is blocked by ph 4
ph 3 is blocked by ph 4
ph 4 is blocked by ph 0
� No more than 4 philosophers can try to acquire forks
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
Deadlock management
Università degli studi di Udine Sistemi operativi – Operating Systems
Deadlock conditions
� Mutual exclusion.� A resource can be assigned only at a fixed finite number of processes at a
time. No other processes may access a resource unit that has reached the
maximum number of assignations.� Needed (to enforce synchronization)
� No preemption.� No resource can be forcibly removed from a process holding it.� Difficult to avoid (a rollback is needed to implement resource preemption)
� Hold and wait.� A process may hold allocated resources while awaiting assignment of other
resources.
� Circular wait.� A closed chain of processes exists, such that each process holds at least one
resource needed by the next process in the chain
Deadlock is possible
Università degli studi di Udine Sistemi operativi – Operating Systems
Resources
� Swappable space
� Devices
� physical drives
� files
� Main memory blocks
� Internal resources
� I/O interrupts handling
Required asynchronously by
independent processes
Università degli studi di Udine Sistemi operativi – Operating Systems
Resource allocation graph
B R1
Task requires Resource
Task Resource
Task holds Resource
Task Resource
A B has to wait until A will release R1
B
R1
A
R2
circular dependency: deadlock
Università degli studi di Udine Sistemi operativi – Operating Systems
Deadlock handling
� Prevention
� make deadlock not possible
� Avoidance
� disallow operations that may lead to a deadlock
� Detection
� periodically check for deadlock and recover
Università degli studi di Udine Sistemi operativi – Operating Systems
Deadlock prevention
� Disallow hold-and-wait� all the needed resources must be required simultaneously� a process is blocked until all the required resources are available� inefficient
� a process must acquire resources needed only for small time intervals or actually not
needed
� Allow preemption� when a request is refused, a process must release all its resources� OS may request a process to release resources� practical only for resources with an easily restored state (e.g., processor)
� Disallow circular waits� define an ordering on resources� a process that owns a resource R can request a resource Q only if ord(R) < ord(Q)� disallows incremental resource request
Università degli studi di Udine Sistemi operativi – Operating Systems
Deadlock avoidance
� Evaluate resource requests
� grant a resource request only if a deadlock cannot occur
� OS must know all the future requests
� banker's algorithm (Dijkstra)
Università degli studi di Udine Sistemi operativi – Operating Systems
Deadlock detection
� Periodically check for deadlock
� grant resource requests whenever possible
� if a deadlock is detected
� kill all deadlocked processes
� most common approach
� successively abort deadlocked processes (until deadlock no longer exist)
� selection order can be a key factor
� rollback all deadlocked processes to a previous state
� backup and restore mechanism must be implemented
� force a deadlocked process to release resources
� preemption
� rollback the process to a point prior the resource acquisition
Università degli studi di Udine Sistemi operativi – Operating Systems
Banker's algorithm
� For a single resource type
� Process
� resources used
� resources needed
� Available resources
� grant request only if it will lead to a safe state
� safe state:
� there exist at least one process that still needs less resources than available
� unsafe state
� deadlock is possible (no-deadlock cannot be ensured)
Università degli studi di Udine Sistemi operativi – Operating Systems
Banker's algorithm
Process A 0 15
Allocated Needed
Process B 0 7
Process C 0 4
Process D 0 12
Available 20
Process A 8 7
Allocated Needed
Process B 4 3
Process C 1 3
Process D 6 6
Available 1
unsafe: with available resources no process
is guaranteed to terminate
� For a single resource type
Università degli studi di Udine Sistemi operativi – Operating Systems
Banker's algorithm
Process A 0 15
Allocated Needed
Process B 0 7
Process C 0 4
Process D 0 12
Available 20
Process A 7 8
Allocated Needed
Process B 4 3
Process C 2 2
Process D 5 7
Available 2
safe: with available resources process C can
surely terminate
� For a single resource type
Università degli studi di Udine Sistemi operativi – Operating Systems
Banker's algorithm
� For a single resource type
� safe state:
� � i � Needed(i) < Available
Needed(i): resources still needed by process i
Available: resources still available on the system
Università degli studi di Udine Sistemi operativi – Operating Systems
Banker's algorithm
� For several resource types
� replicate information for each resource type
Process A 0 15
Allocated Needed
Process B 0 7
Process C 0 4
Process D 0 12
Available 20
0 1
Allocated Needed
0 2
0 4
0 1
Available 4
0 3
Allocated Needed
0 7
0 4
0 9
Available 10
Type-1 Type-2 Type-3
� safe state:
� i �� � j Needed(i,j) < Available(j)
Needed(i,j): resources of type j still needed by process i
Available(j): resources of type j still available on the system
Università degli studi di Udine Sistemi operativi – Operating Systems
Synchronization
User level
(POSIX)
synchronization primitives
Università degli studi di Udine Sistemi operativi – Operating Systems
GCC builtins for atomic accesses
� Read-Modify-Write operations
� __sync_fetch_and_add(type *ptr, type value);
� __sync_fetch_and_sub(type *ptr, type value);
� __sync_fetch_and_or(type *ptr, type value);
� __sync_fetch_and_and(type *ptr, type value);
� __sync_fetch_and_xor(type *ptr, type value);
� __sync_fetch_and_nand(type *ptr, type value);
� perform the operation suggested by the name, and return the old
value;
� imply a full memory barrier
Università degli studi di Udine Sistemi operativi – Operating Systems
GCC builtins for atomic accesses
� Read-Modify-Write operations
� __sync_add_and_fetch(type *ptr, type value);
� __sync_sub_and_fetch(type *ptr, type value);
� __sync_or_and_fetch(type *ptr, type value);
� __sync_and_and_fetch(type *ptr, type value);
� __sync_xor_and_fetch(type *ptr, type value);
� __sync_nand_and_fetch(type *ptr, type value);
� perform the operation suggested by the name, and return the new
value;
� imply a full memory barrier
Università degli studi di Udine Sistemi operativi – Operating Systems
GCC builtins for atomic accesses
� Read-Modify-Write operations
� __sync_lock_test_and_set(type *ptr, type value);
� perform an atomic exchange: writes value into *ptr and returns the previous
contents of *ptr;
� implies an acquire barrier
� Read-Test-Modify-Write operations
� __sync_val_compare_and_swap(type *ptr, type oldval, type newval);
� __sync_bool_compare_and_swap(type *ptr, type oldval, type newval);
� perform atomic compare-and-swap: if the current value of *ptr is oldval,
then write newval into *ptr;
� __sync_val_compare_and_swap returns the old value of *ptr
� __sync_bool_compare_and_swap returns true if the comparison is successful
� imply a full memory barrier
Università degli studi di Udine Sistemi operativi – Operating Systems
GCC builtins for atomic accesses
� Others:
� __sync_lock_release(type *ptr);
� Writes 0 to *ptr;
� implies a release barrier
� __sync_synchronize();
� Issues a full memory barrier
Università degli studi di Udine Sistemi operativi – Operating Systems
User level (POSIX)
synchronization primitives
� Low-level primitives
� spinlocks
� High-level primitives
� semaphores
� mutexes
� reader-writer locks
� condition variables
� barriers
Università degli studi di Udine Sistemi operativi – Operating Systems
User level (POSIX)
synchronization primitives
� Low-level primitives
� spinlocks
� type
� pthread_spinlock_t
� operations:
� pthread_spin_init
� pthread_spin_destroy
� pthread_spin_lock
� pthread_spin_unlock
� pthread_spin_trylock
initialization
deallocation
locking
unlocking
tentative locking
Università degli studi di Udine Sistemi operativi – Operating Systems
User level (POSIX)
synchronization primitives
� High-level primitives
� semaphores
� type
� sem_t
� operations:
� sem_init
� sem_destroy
� sem_getvalue
� sem_wait
� sem_timedwait
� sem_trywait
� sem_post
initialization
deallocation
waiting
unlocking
tentative waiting
Università degli studi di Udine Sistemi operativi – Operating Systems
User level (POSIX)
synchronization primitives
� High-level primitives
� mutexes
� type
� pthread_mutex_t
� operations:
� pthread_mutex_init
� pthread_mutex_destroy
� pthread_mutex_lock
� pthread_mutex_unlock
� pthread_mutex_trylock
initialization
deallocation
locking
unlocking
tentative locking
Università degli studi di Udine Sistemi operativi – Operating Systems
User level (POSIX)
synchronization primitives
� High-level primitives
� reader-writer locks
� type
� pthread_rwlock_t
� operations:
� pthread_rwlock_init
� pthread_rwlock_destroy
� pthread_rwlock_rdlock
� pthread_rwlock_wrlock
� pthread_rwlock_unlock
� pthread_rwlock_tryrdlock
� pthread_rwlock_trywrlock
initialization
deallocation
locking
unlocking
tentative locking
Università degli studi di Udine Sistemi operativi – Operating Systems
User level (POSIX)
synchronization primitives
� High-level primitives
� condition variables
� type
� pthread_cond_t
� operations:
� pthread_cond_init
� pthread_cond_destroy
� pthread_cond_wait
� pthread_cond_timedwait
� pthread_cond_signal
� pthread_cond_broadcast
initialization
deallocation
waiting
notifying
Università degli studi di Udine Sistemi operativi – Operating Systems
User level (POSIX)
synchronization primitives
� High-level primitives
� barriers
� type
� pthread_barrier_t
� operations:
� pthread_barrier_init
� pthread_barrier_destroy
� pthread_barrier_wait
initialization
deallocation
waiting
Università degli studi di Udine Sistemi operativi – Operating Systems
Lock pthread_spinlock_t
RW-Lock NO
Mutex pthread_mutex_t
Semaphore sem_t
RW-mutex pthread_rwlock_t
Condition Variable pthread_cond_t
Barrier pthread_barrier_t
Low-level
(no sleeping)
High-level
(may sleep)
User level (POSIX)
synchronization primitives