ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD

ECE729 : Advance Computer ECE729 : Advance Computer ArchitectureArchitecture

ECE729 : Advance Computer ECE729 : Advance Computer ArchitectureArchitecture

Lecture 26: Synchronization, Memory Consistency

25th March, 2010

Anshul Kumar, CSE IITD slide 2

Synchronization ProblemSynchronization ProblemSynchronization ProblemSynchronization Problem

• Processes run on different processors independently

• At some point they need to know the status of each other for– communication– mutual exclusion etc

• Hardware primitives required for these operations


Consider an exampleConsider an exampleConsider an exampleConsider an example

Bank transaction from account number A :

• b = read_bal (A)

• b = b – debit_amt

• if b >= bmin

update_bal (A, b)


Two concurrent transactionsTwo concurrent transactionsTwo concurrent transactionsTwo concurrent transactions

Transaction 1 :• b1 = read_bal (A)• b1 = b1 – debit_amt1• if b1 >= bmin

update_bal (A, b1)

Transaction 2 :• b2 = read_bal (A)• b2 = b2 – debit_amt2• if b2 >= bmin

update_bal (A, b2)


Two concurrent transactionsTwo concurrent transactionsTwo concurrent transactionsTwo concurrent transactions

serialize reads

Transaction 1 :• b1 = read_bal (A)• b1 = b1 – debit_amt1

• if b1 >= bmin

update_bal (A, b1)

and writes

Transaction 2 :

• b2 = read_bal (A)• b2 = b2 – debit_amt2• if b2 >= bmin

update_bal (A, b2)


Lock for mutual exclusionLock for mutual exclusionLock for mutual exclusionLock for mutual exclusion

Transaction 1 :aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1……release: clear (lock)





Lock for mutual exclusionLock for mutual exclusionLock for mutual exclusionLock for mutual exclusion






Synchronization PrimitivesSynchronization PrimitivesSynchronization PrimitivesSynchronization Primitives

Hardware primitive required

• Should have atomic read+write operation

• Examples:– test&set – exchange– fetch&increment– load linked, store contitional

Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Lock: 0 indicates free and 1 indicates locked

Code to lock X : r2 1lockit: r2 X ;atomic exchange

if(r20)lockit ;already locked

locks are cached for efficiency, coherence is used

Better code to lock X :lockit: r2 X ;read lock if(r20)lockit ;not available r2 1 r2 X ;atomic exchange

if(r20)lockit ;already locked


LD Linked & ST conditionalLD Linked & ST conditionalLD Linked & ST conditionalLD Linked & ST conditionalSimpler to implement• atomic exchange r2 X using LL and SCtry: r3 r2 ;move exchange value LL r1, X ;load linked SC r3, X ;store conditional if(r3=0)try ;branch, store fails r2 r1 ;put loaded value in r2• fetch&increment using LL and SCtry: LL r1, X ;load locked r3 r1 + 1 ;increment SC r3, X ;store conditional if(r3=0)try ;branch, store fails


Spin Lock with LL & SCSpin Lock with LL & SCSpin Lock with LL & SCSpin Lock with LL & SClockit: LL r2, X ;load locked if(r20)lockit ;not available r2 1 SC r2, X ;store cond if(r2=0)lockit ;branch store fails• performance in presence of contention?• spin lock with exponential back-off reduces

contention


Barrier SynchronizationBarrier SynchronizationBarrier SynchronizationBarrier Synchronization

lock (X)

if(count=0)release 0count++

unlock(X)

if(count=total){count0;release1}else spin(release=1)

0

1


Improved Barrier Synch.Improved Barrier Synch.Improved Barrier Synch.Improved Barrier Synch.

local_sense !local_senselock (X)count++unlock(X)if(count = total) {count0;releaselocal_sense}else {spin(release = local_sense)}

tree based barrier reduces contention


Memory Consistency ProblemMemory Consistency ProblemMemory Consistency ProblemMemory Consistency Problem

• When must a processor see the value that has been written by another processor? Atomicity of operations – system wide?

• Can memory operations be re-ordered?

Various models :

http://rsim.cs.uiuc.edu/~sadve/Publications/

models_tutorial.ps


ExampleExampleExampleExample

P1: A = 0 P2: B = 0 ... ... A = 1 B = 1L1: if(B=0)S1 L2: if(A=0)S2

Which statements among S1 and S2 are done?

Both S1, S2 may be done if writes are delayed


Sequential ConsistencySequential ConsistencySequential ConsistencySequential Consistency

• result of any execution is same as if the operations of all processors were executed in some sequential order

• operations of each processor occur in the order specified by its program

- it requires all memory operations to be atomic

- too restrictive, high overheads


Relaxing WRelaxing WR orderR orderRelaxing WRelaxing WR orderR order

Loads are allowed to overtake stores

Write buffering is permitted

1. Total Store Ordering : Writes are atomic

2. Processor Consistency : Writes need not be atomic - Invalidations may gradually propagate


Relaxing WRelaxing WR & WR & WW orderW orderRelaxing WRelaxing WR & WR & WW orderW order

Partial Store Ordering

• Loads are allowed to overtake stores

• Writes can be re-ordered

• Memory barrier or fence are used to explicitly order any operations

Further improves the performance


ExamplesExamplesExamplesExamples

P1 P2

A = 1; while(flag=0);

flag = 1; print A;

P1 P2

A = 1; print B;

B = 1; print A;

SC ensures that “1” is printed

TSO, PC also do so

PSO does not

SC ensures that if B is printed as “1” then A is also printed as “1”

TSO, PC also do so

PSO does not


Examples - continuedExamples - continuedExamples - continuedExamples - continued

P1 P2 P3A = 1; while(A=0); while(B=0); B = 1; print A;SC ensures that “1” is printed. TSO and PSO also do that but

PC does not

P1 P2A = 1; B = 1;print B; print A;SC ensures that both can’t be printed as “0”. TSO, PC and

PSO do not


Relaxing all R/W orderRelaxing all R/W orderRelaxing all R/W orderRelaxing all R/W order

Weak Ordering or Weak Consistency

• Loads and Stores are not restricted to follow an order

• Explicit synchronization primitives are used

• Synchronization primitives follow a strict order

• Easy to achieve

• Low overhead


Release ConsistencyRelease ConsistencyRelease ConsistencyRelease Consistency

• Further relaxation of weak ordering• Synch primitives are divided into aquire

and release operations• R/W operations after an aquire cannot

move before it but those before it can be moved after

• R/W operations before a release cannot move after it but those after it can be moved before


WC and RC ComparisonWC and RC ComparisonWC and RC ComparisonWC and RC Comparison

R/W…

R/W

R/W…

R/W

R/W…

R/W

synch

synch

1

2

3

R/W…

R/W

R/W…

R/W

R/W…

R/W

aquire

release

1

2

3

WC RC

ECE729 : Advance Computer Architecture

Documents

Transcript of ECE729 : Advance Computer Architecture