Post on 22-Feb-2016
description
Getting Rid of Store-Buffers in TSO Analysis
Mohamed Faouzi Atig Uppsala University, Sweden
Ahmed Bouajjani LIAFA, University of Paris 7, France
Gennaro Parlato ✓
University of Southampton, UK
Sequential consistency memory model (SC)
Write(var,val): sh_mem[var] val; (immidialy visible to all threads Read(var): returns sh_mem[val];
SC= • actions of different threads interleaved in any order• action of the same thread maintain the execution order
WMM= For performance reason modern multi-processors reorder memory operations of the same thread
T1
SharedMemory
Tn
…
Total Store Ordering (TSO)(x4) (z7)
(y3)T1M1
SharedMemory(z4) (y4)Tn Mn
… …
• Each thread has its store-buffer (FIFO)
• Write(var,val): the pair (varval) is sent to the buffer
• Memory update = execution of a Write taken from some buffer
• Read(var) returns val- If (var val) the last value written into var still in the store-buffer - the buffer does not contain any Write to var, and sh_mem(var) = val
• fence requires that the store-buffer is empty
…
Correct under SC -- Wrong under TSODekker’s mutual exclusion protocol
Thread 1a: y:=1b: r1:=xc: if (r1==0) thend: critical section
Shared memory
x y0 0Thread 2
1: x:=12: r2:=y4: if (r2==0) then4: critical sectionBad Schedule for TSO: a b c d 1 2 3 4 both threads in the critical section!!!
Verification for TSO?• For finite state programs
reachability is non-primitive recursive[Atig, Bouajjani, Burckhardt, Masuvathi – POPL’10]
• What shall we do?• Symbolic representation of the store buffers?
[Linden, Wolper—SPIN’10]: Regular model-checking
• Our approach reduce the analysis from TSO to SC• can be done only with approximations …
What is this talk aboutIf we restrict to only executions where each thread is executed at most k times with no interruption (for a fixed k)
we can translate any concurrent program PTSO (recursion, thread creation, heap, …) into another program PSC s.t.
• PSC (under SC) simulates all possible executions of PTSO (under TSO) where each thread is executed at most k times
• PSC has no buffer at all! Simulation of the store-buffers using 2k copies of the shared variables as locals
• PSC has linear size in the size of PTSO
• Advantage: use off-the-shelf SC tools for the analysis of TSO programs
Code-to-code translation from TSO to SC
k-round (for each thread) reachability
Run = (Ti1++Mi1)+ (Ti2++Mi2)+ ... round Pi1 round Pi2
A k-round run : Ɐi # round Pi ≤ k
T1 M1
SharedMemoryTi Mi
… …
… …
Pi
P1
Compositional reasoning
[(Ti +Mi)*]k
round0
round1
round2
(Mask0 Buff0)
(Mask1 Buff1)
(Mask2 Buff2)
Getting rid of store-buffers
(Mask0 Buff0)
(Mask1 Buff1)
(Mask2 Buff2)
is a copy of the shared vars (as locals)
is a copy of the shared vars as Boolean (as locals)
x y z
Maski
x y z- 6 -
Buffi
Invariant: x y z
Mask0
x y z3 5 -0 - -0 1 4
Buff0Buff1Buff2
Mask1
Mask2
(x0) (y1) (z4) (y7) (x0) (x4) (x7) (x3) (x7) (y5)
round 0round 1round 2
store-buffer
at each time in the simulation Maski [var]=1 iff
• there is a store in the store-buffer for var that update the Shared memory at round i
• Buffi[var] containts the last value sent for var
Simulation
1,21,3
0,0 0,1 0,2
Before simulation:• Masks set to False• r_SC0; r_TSO0;
Simulation:• All statements not involving
shared vars are executed
Write(var,val)• Maskr_TSO[var] T;• Queuer_TSO[var] val;
Read(var)Let i be the greatest index s.t.i>=r_SC & Maski(var) =1
if i>=0 return Queuei[var] else return var ;
Buffiround
0
round
1
round
2
End of round : (Update shared vars):
For all var if Maskr_SC (var) ==1 varBuffr_SC [var];
(Mask0 Buff0)
(Mask1 Buff1)
(Mask2 Buff2)
Skeleton of the translationShared sh_vars;
Thread_i()
Begin
locals l_vars;
stmt_1;
stmt_2;
…
stmt_n;
end
r_TSO, r_SC, sim, Mask0 , Buff0, …,Maskk , Buffk;
Init(); // initialize Masks to False, r_SC=0, r_TSO, sim=0;
stmt_j before(); stmt_j; after();
before(){ // start round if (!sim){ lock; sim=1; r_SC++; if (r_TSO< r_SC) r_TSO=r_SC; } while(*) r_TSO++;}
after(){ if(*) //end round Update_shared(r_SC, Mask, Queue) sim=0; unlock;}
Characteristics of the translation
• For fixed k, PSC is linear in the size of PTSO
• 2k copies of the shared variable as locals (no store-buffer)
• PSC and PTSO are in the same class• no restriction on the programs is imposed
• The reachable shared states are the same in PSC and PTSOA state S is reachable in PTSO with at most k rounds
per thread iff
S is reachable in PSC
Bounding Store AgesObservation:
When r_SC =1 (Mask0, Buff0) are not used any longer
Reuse the Mask and Queue variables:
Translation: (Maskj , Buffj) are used circularly (modulo k+1).
k store-ages:• Unbounded rounds! • Constraint: each write pair
remains in the store-buffer for at most k rounds
(Mask0 Buff0)
(Mask1 Buff1)
(Mask2 Buff2)
(Mask0 Buff0) … …
How can we use this code-to-codetranslation?
Corollaries
schedules(k fixed)
ConcurrentBoolean Prog.
Complexity
References
k-store-ages no recursion Pspace
k context-switches
Recursion Exptime [Qadeer, Rehof – TACAS’05]
k round-robin RecursionFinite # threads |parameterized
Exptime [Lal, Reps–CAV’08][La Torre, P., Madhusudan—CAV’09] [La Torre, P., Madhusudan—CAV’10]
k-rounds per thread
recursionthread-creation
2-Expspace [Atig, Bouajjani, Qadeer – TACAS’09]
k-delay bound recursionthread- creation
Exptime [Emmi, Qadeer, Rakamaric—POPL’11]
k-compositional
recursion thread-creation
Exptime [Bouajjani, Emmi, P.—SAS’11]
Decidability results for TSO reachabilityOur code-to-code translation is a linear reduction TSO -> SC. Inherit decidability from SC
Tools for SC Tools for TSO(our code-to-code translation as a plug-in)
A convenient way to get new tools for TSO …
SC tool
TSOSCtranlsation
Instrumentation
for the SC tool
Concurrent Program
ExperimentsMutual
exclusion Protocols
POIROT (by MSR)Loop unrolling: 2 D stands for Delay bound
No fences(buggy for TSO)
D=1
With fences(correct for TSO)
D=1 D=2Dekker 7 s 6 s 72 sLamport 26 s 110 s 1608 sPeterson 5 s 6 s 47 sSzymanski 8 s 6 s 978 s
POIROT: SMT-based bounded model-checkers for SC programs
Errors due to TSO discovered in few seconds!POIROT can also be a model-checker for TSO!
Conclusions
ConclusionsWe have proposed a code-to-code translation from TSO to SC
• allows to use existing and future tools designed for SC to analyze programs running under TSO
• under-approximation (error finding)• restrictions imposed on the analyzed runs is
useful to find errors in programs
Beyond TSO ? Generic approach ?
Thanks!