CPE 731 Advance Computer Architecture Memory Hierarchy Review
ECE729 : Advance Computer Architecture
description
Transcript of ECE729 : Advance Computer Architecture
Anshul Kumar, CSE IITD
ECE729 : Advance Computer ECE729 : Advance Computer ArchitectureArchitecture
ECE729 : Advance Computer ECE729 : Advance Computer ArchitectureArchitecture
Lecture 26: Synchronization, Memory Consistency
25th March, 2010
Anshul Kumar, CSE IITD slide 2
Synchronization ProblemSynchronization ProblemSynchronization ProblemSynchronization Problem
• Processes run on different processors independently
• At some point they need to know the status of each other for– communication– mutual exclusion etc
• Hardware primitives required for these operations
Anshul Kumar, CSE IITD slide 3
Consider an exampleConsider an exampleConsider an exampleConsider an example
Bank transaction from account number A :
• b = read_bal (A)
• b = b – debit_amt
• if b >= bmin
update_bal (A, b)
Anshul Kumar, CSE IITD slide 4
Two concurrent transactionsTwo concurrent transactionsTwo concurrent transactionsTwo concurrent transactions
Transaction 1 :• b1 = read_bal (A)• b1 = b1 – debit_amt1• if b1 >= bmin
update_bal (A, b1)
Transaction 2 :• b2 = read_bal (A)• b2 = b2 – debit_amt2• if b2 >= bmin
update_bal (A, b2)
Anshul Kumar, CSE IITD slide 5
Two concurrent transactionsTwo concurrent transactionsTwo concurrent transactionsTwo concurrent transactions
serialize reads
Transaction 1 :• b1 = read_bal (A)• b1 = b1 – debit_amt1
• if b1 >= bmin
update_bal (A, b1)
and writes
Transaction 2 :
• b2 = read_bal (A)• b2 = b2 – debit_amt2• if b2 >= bmin
update_bal (A, b2)
Anshul Kumar, CSE IITD slide 6
Lock for mutual exclusionLock for mutual exclusionLock for mutual exclusionLock for mutual exclusion
Transaction 1 :aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1……release: clear (lock)
Transaction 2 :aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2……release: clear (lock)
Transaction 1 :aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1……release: clear (lock)
Transaction 2 :aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2……release: clear (lock)
Anshul Kumar, CSE IITD slide 7
Lock for mutual exclusionLock for mutual exclusionLock for mutual exclusionLock for mutual exclusion
Transaction 1 :aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1……release: clear (lock)
Transaction 2 :aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2……release: clear (lock)
Transaction 1 :aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1……release: clear (lock)
Transaction 2 :aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2……release: clear (lock)
Anshul Kumar, CSE IITD slide 8
Synchronization PrimitivesSynchronization PrimitivesSynchronization PrimitivesSynchronization Primitives
Hardware primitive required
• Should have atomic read+write operation
• Examples:– test&set – exchange– fetch&increment– load linked, store contitional
Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Lock: 0 indicates free and 1 indicates locked
Code to lock X : r2 1lockit: r2 X ;atomic exchange
if(r20)lockit ;already locked
locks are cached for efficiency, coherence is used
Better code to lock X :lockit: r2 X ;read lock if(r20)lockit ;not available r2 1 r2 X ;atomic exchange
if(r20)lockit ;already locked
Anshul Kumar, CSE IITD slide 10
LD Linked & ST conditionalLD Linked & ST conditionalLD Linked & ST conditionalLD Linked & ST conditionalSimpler to implement• atomic exchange r2 X using LL and SCtry: r3 r2 ;move exchange value LL r1, X ;load linked SC r3, X ;store conditional if(r3=0)try ;branch, store fails r2 r1 ;put loaded value in r2• fetch&increment using LL and SCtry: LL r1, X ;load locked r3 r1 + 1 ;increment SC r3, X ;store conditional if(r3=0)try ;branch, store fails
Anshul Kumar, CSE IITD slide 11
Spin Lock with LL & SCSpin Lock with LL & SCSpin Lock with LL & SCSpin Lock with LL & SClockit: LL r2, X ;load locked if(r20)lockit ;not available r2 1 SC r2, X ;store cond if(r2=0)lockit ;branch store fails• performance in presence of contention?• spin lock with exponential back-off reduces
contention
Anshul Kumar, CSE IITD slide 12
Barrier SynchronizationBarrier SynchronizationBarrier SynchronizationBarrier Synchronization
lock (X)
if(count=0)release 0count++
unlock(X)
if(count=total){count0;release1}else spin(release=1)
0
1
Anshul Kumar, CSE IITD slide 13
Improved Barrier Synch.Improved Barrier Synch.Improved Barrier Synch.Improved Barrier Synch.
local_sense !local_senselock (X)count++unlock(X)if(count = total) {count0;releaselocal_sense}else {spin(release = local_sense)}
tree based barrier reduces contention
Anshul Kumar, CSE IITD slide 14
Memory Consistency ProblemMemory Consistency ProblemMemory Consistency ProblemMemory Consistency Problem
• When must a processor see the value that has been written by another processor? Atomicity of operations – system wide?
• Can memory operations be re-ordered?
Various models :
http://rsim.cs.uiuc.edu/~sadve/Publications/
models_tutorial.ps
Anshul Kumar, CSE IITD slide 15
ExampleExampleExampleExample
P1: A = 0 P2: B = 0 ... ... A = 1 B = 1L1: if(B=0)S1 L2: if(A=0)S2
Which statements among S1 and S2 are done?
Both S1, S2 may be done if writes are delayed
Anshul Kumar, CSE IITD slide 16
Sequential ConsistencySequential ConsistencySequential ConsistencySequential Consistency
• result of any execution is same as if the operations of all processors were executed in some sequential order
• operations of each processor occur in the order specified by its program
- it requires all memory operations to be atomic
- too restrictive, high overheads
Anshul Kumar, CSE IITD slide 17
Relaxing WRelaxing WR orderR orderRelaxing WRelaxing WR orderR order
Loads are allowed to overtake stores
Write buffering is permitted
1. Total Store Ordering : Writes are atomic
2. Processor Consistency : Writes need not be atomic - Invalidations may gradually propagate
Anshul Kumar, CSE IITD slide 18
Relaxing WRelaxing WR & WR & WW orderW orderRelaxing WRelaxing WR & WR & WW orderW order
Partial Store Ordering
• Loads are allowed to overtake stores
• Writes can be re-ordered
• Memory barrier or fence are used to explicitly order any operations
Further improves the performance
Anshul Kumar, CSE IITD slide 19
ExamplesExamplesExamplesExamples
P1 P2
A = 1; while(flag=0);
flag = 1; print A;
P1 P2
A = 1; print B;
B = 1; print A;
SC ensures that “1” is printed
TSO, PC also do so
PSO does not
SC ensures that if B is printed as “1” then A is also printed as “1”
TSO, PC also do so
PSO does not
Anshul Kumar, CSE IITD slide 20
Examples - continuedExamples - continuedExamples - continuedExamples - continued
P1 P2 P3A = 1; while(A=0); while(B=0); B = 1; print A;SC ensures that “1” is printed. TSO and PSO also do that but
PC does not
P1 P2A = 1; B = 1;print B; print A;SC ensures that both can’t be printed as “0”. TSO, PC and
PSO do not
Anshul Kumar, CSE IITD slide 21
Relaxing all R/W orderRelaxing all R/W orderRelaxing all R/W orderRelaxing all R/W order
Weak Ordering or Weak Consistency
• Loads and Stores are not restricted to follow an order
• Explicit synchronization primitives are used
• Synchronization primitives follow a strict order
• Easy to achieve
• Low overhead
Anshul Kumar, CSE IITD slide 22
Release ConsistencyRelease ConsistencyRelease ConsistencyRelease Consistency
• Further relaxation of weak ordering• Synch primitives are divided into aquire
and release operations• R/W operations after an aquire cannot
move before it but those before it can be moved after
• R/W operations before a release cannot move after it but those after it can be moved before
Anshul Kumar, CSE IITD slide 23
WC and RC ComparisonWC and RC ComparisonWC and RC ComparisonWC and RC Comparison
R/W…
R/W
R/W…
R/W
R/W…
R/W
synch
synch
1
2
3
R/W…
R/W
R/W…
R/W
R/W…
R/W
aquire
release
1
2
3
WC RC