On Sequentializing Concurrent Programs Gennaro Parlato University of Southampton, UK UPMARC 7 th...
-
Upload
leslie-tyler -
Category
Documents
-
view
215 -
download
2
Transcript of On Sequentializing Concurrent Programs Gennaro Parlato University of Southampton, UK UPMARC 7 th...
On Sequentializing Concurrent Programs
Gennaro ParlatoUniversity of Southampton, UK
UPMARC 7th Summer School on Multicore Computing, June 8-10, 2015
Concurrency for better performace
Clock rates are stalling, but Moore’s Law is still alive…• no longer feasible to increase the speed of individual
processors• additional gates turned into (caches and) multiple cores
⇒ For better performance programs must be concurrent!
⇒ Concurrency has become an important aspect for many areas of computer science:
algorithms, data structures, programming languages, software engineering, testing, verification, …
Writing concurrent programs is DIFFICULT
Programmers have to guarantee• correctness of sequential execution
of each individual thread• under nondeterministic interferences
from other threads (schedules)
Rare schedules result in errors that are difficult to find, reproduce, and repair• developers / testers can spend weeks chasing a single bug
⇒ huge productivity problem
communication mechanism
…T2 TN
T2
threads
Testing
Testing remains the most used (and often the only known) paradigm in industry...
… but is ineffective for concurrent programs:• large number of schedules makes scaling-up difficult• non-deterministic nature of scheduling makes repeatability
difficult
⇒ needs to be complemented by automated analyses that handle schedules symbolically
Verification approach
develop practical but theoretically well-founded symbolic verification techniques based on the idea of
Sequentialization
Sequentialization: motivations
Building verification tools for full-fledged concurrent languages is difficult and expensive...
… but scalable verification techniques exist for sequential languages
• Abstraction• SAT/SMT techniques (i.e., bounded model checking)• …
⇒ Can we leverage these?
Sequentialization as a code-to-code translation
Code-to-code translation from multithreaded recursive programs to sequential programs that preserves reachability
Conc.program
“equivalent”
Sequential program with non determinism
shared variables
…T2 TNT1
Use existing automatic verification techniques designed for sequential programs to analyze concurrent programs
From concurrent to sequential
Always possible but can be inefficient•simulate the global behavior (track all locals of each thread)•current techniques do not work
What do we want?•avoid the extreme blow-up•track at any point only the locals of one thread
What we want is not always possible
But it is possible if we restrict behaviors•Bug-finding
Sequentialization: advantages
Keep focus on the concurrency aspects of programs, delegate sequential reasoning to an existing analysis tool
•code-to-code translation is much easier to implement than a full-fledged analysis tool
•simplifies experimentation with different approaches
•can be designed to target multiple backends for sequential program analysis
Concurrent (shared-memory) Programs
Formed of sequential programs T1 , … , TN
(each possibly with recursive function calls)
shared variables
…T2 TN
T1
threads
• each program Ti can read and write shared vars
• we assume sequential consistency (SC) (writes are immediately visible to all the other programs)
• an execution is an interleaving of the executions of each Ti
A first sequentialization: KISS
KISS: Keep It Simple and Sequential [Quadeer-Wu, PLDI’04]
Under-approximation (subset of interleavings)
Thread creation function call•at context-switches either:
- the active thread is terminated or- a not yet scheduled thread is started (by calling its main function)
•when a thread is terminated either:- the thread that has called it is resumed (if any) or- a not yet scheduled thread is started
KISS schedules
(l1,s1)
T1
(l1,s3)
T2(l2,s1)
T3
(l3,s2)
(l4,s2)
(l5,s3)
Scheduling 1:1. Start T1
2. Start T2
3. Terminate T2
4. start T3
5. terminate T3
6. Resume T1
T1 T2 T3
Scheduling 2:1. start T1
2. start T2
3. start T3 4. terminate T3
5. resume T2
6. terminate T2
7. resume T1
T1 T2 T3
Scheduling 3:1. start T1
2. start T2
3. terminate T2
4. resume T1
5. start T3
6. terminate T3
7. resume T1
More on KISS
Allows dynamic thread creation in form of asynchronous calls
Bounds the number of threads that have been created but not started yet
-scheduler nondeterministically starts a thread from this set, or -resumes the last suspended thread (if any)
State space: no cross product
Context-switches:-does allow an unbounded number of context-switches-does not allow a bounded number context-switches between any two threads (for more than 1 interaction)
Bounded Context-Switching (CS) is essential
Switching between threads is allowed only a bounded number of times [Qadeer-Rehof, TACAS’05]
Systematic bounded CS is useful for bug hunting:
most concurrency related bugs manifest themselves within few CS [Musuvathi-Qadeer, PLDI’07]
Efficient sequentializations for bounded CS•Eager approach [Lal-Reps, CAV’08]
•Lazy approach [La Torre-Madhusudan-Parlato, CAV’09]
LR sequentialization:
Bounded Round-Robin schedules
T1 TNTN-1T2
…
round 1
round 2
round k
round 3
… …
Bounded Round- Robin captures bounded context-switches
Schedule: T2 T3 T4 T1 T3 T2 T1; minimal number of rounds ???
LR sequentialization: simulation
Sequential program (k-rounds)
1. Guess a2, …, ak
2. Execute T1 to completion
Computes
- local states l1, .., lk, and
- global states b1, …, bk
(l1,b1)
T1
(l1,a2)
(l2,b2)
(l2,a3)
(l3,b3)
T2 T3
LR sequentialization: simulation
Sequential program (k-rounds)
1. Guess a2,…,ak
2. Execute T1 to completion
3. Pass b1,…,bk to T2
(l1,b1)
T1
(l1,a2)
(l2,b2)
(l2,a3)
(l3,b3)
T2 T3
b1
b2
b3
LR sequentialization: simulation
Sequential program (k-rounds)
1. Guess a2,…,ak
2. Execute T1 to completion
3. Pass b1,…,bk to T2
We can dismiss locals of T1
T1
a2
a3
T2 T3
b1
b2
b3
LR sequentialization: simulation
Sequential program (k-rounds)
• Guess a2,…,ak
• Execute T1 to completion
• Pass b1,…,bk to T2
• Execute T2 to completion
1. Dismiss locals of T2
2. Dismiss b1,…,bk
T1
a2
a3
T2 T3
b1
b2
b3
(l1’,c1)
c3
(l1’,b2)
(l0’,b1)
(l2’,c2)
(l2’,b3)
LR sequentialization: simulation
Sequential program (k-rounds)
• Guess a2,…,ak
• Execute T1 to completion
• Pass b1,…,bk to T2
• Execute T2 to completion
T1
a2
a3
T2 T3
b1
b2
b3
c1
c2
c3
LR sequentialization: simulation
Sequential program (k-rounds)
1. Guess a2,…,ak
2. Execute T1 to completion
3. Pass b1,…,bk to T2
4. Execute T2 to completion
5. Pass c1,…,ck to T3
6. Execute T3 to completion
1. Dismiss locals of T3
2. Dismiss c1,…,ck
T1
a2
a3
T2 T3
b1
b2
b3
c1
c2
c3
d3
(l0’’, c1)
(l1’,d1)(l1’,c2)
(l1’,d2)(l1’,c3)
LR sequentialization: simulation
Sequential program (k-rounds)
1. Guess a2,…,ak
2. Execute T1 to completion
3. Pass b1,…,bk to T2
4. Execute T2 to completion
5. Pass c1,…,ck to T3
6. Execute T3 to completion
T1
a2
a3
T2 T3
b1
b2
b3
c1
c2
c3
d1
d2
d3
LR sequentialization: simulation
Sequential program (k-rounds)
1. Guess a2,…,ak
2. Execute T1 to completion
3. Pass b1,…,bk to T2
4. Execute T2 to completion
5. Pass c1,…,ck to T3
6. Execute T3 to completion
7. Computation iff di = ai+1
i [1,k-1]
T1
a2
a3
T2 T3
b1
b2
b3
c1
c2
c3
d1
d2
d3
LR sequentialization: simulation
Sequential program (k-rounds)
1. Guess a2,…,ak
2. Execute T1 to completion
3. Pass b1,…,bk to T2
4. Execute T2 to completion
5. Pass c1,…,ck to T3
6. Execute T3 to completion
7. Computation iff di = ai+1
i [1,k-1]
1. Report an error if a bug occurs during the simulation
T1
a2
a3
T2 T3
b1
b2
b3
c1
c2
c3
d1
d2
d3
LR seq. as a code-to-code translation for C programs + PthreadCSeq website: users.ecs.soton.ac.uk/gp4/cseq/
[Fischer-Inverso-Parlato, ASE’13]
…T1 T2
TN
…F1() F2() FN() main()
Concurrent program
“equivalent”Sequential program with non determinism
Sequentialization(code-to-code translation)
Simulation functions
translates
…
translates
translates
LR sequentialization as a code-to-code translation
LR sequentialization as a code-to-code translation
• considers round-robin scheduleswith k rounds- thread → function, run to completion
• global memory copy for each round - scalar → array
• context switch → round counter++• first thread starts with
nondeterministic memory contents- other threads continue with content left by predecessor
T1 T2
S2,2
S0,2
S1,2
SK-1,2
TN
...
... ...
...
...
...
S2,1
S0,1
S1,1
SK-1,1
S2,N
S0,N
S1,N
SK-1,N
LR sequentialization as a code-to-code translation
• considers round-robin scheduleswith k rounds- thread → function, run to completion
• global memory copy for each round - scalar → array
• context switch → round counter++• first thread starts with
nondeterministic memory contents- other threads continue with content left by predecessor
• checker prunes away inconsistent simulations
- assume(Si+1,0 == Si,N);
- requires second set of memory copies- errors can only be checked at end of simulation
• requires explicit error checks
T1 T2
S2,2
S0,2
S1,2
Sk-1,2
TN
...
... ...
...
...
...
S2,1
S0,1
S1,1
Sk-1,1
S2,N
S0,N
S1,N
Sk-1,N
LR sequentialization as a code-to-code translation
//shared varstypeg1 g1; typeg2 g2; …
//thread functionst(){ typex1 x1; typex2 x2; … stmt1 ; stmt2 ; …} …
main(){ …}
//shared varstypeg1 g1[K]; typeg2 g2[K]; …uint round=1; bool ret=0; //aux vars
// context-switch simulationcs() { unsigned int j; j= nondet(); assume(round +j < K); round+=j; if (round==K-1 && nondet()) ret=1;}
//thread functionst(){ typex1 x1; typex2 x2; … cs(); if ret return; stmt1[round]; cs(); if ret return; stmt2[round]; …} …
main_thread(){ …}
main(){ … } //next slide
LR sequentialization as a code-to-code translation
main(){ typeg1 _g1[K]; typeg2 _g2[K]; … // first thread starts with non-deterministic memory contents for (i=1;i++;i<K){ _g1[i] = g1[i] = nondet(); _g2[i] = g2[i] = nondet(); … } // thread simulations t[0] = main_thread; born[0] = ACTIVE; for (i=0;i++;i<N){ if(born[i]>NO_ACTIVE){ ret=0; round = born[i]; t[i](); } } // consistency check for (i=0;i++;i<K-1){ assume(_g1[i+1] == g1[i]); assume(_g2[i+1] == g2[i]); … } // error detection assert(err ==0); }
Implementations of variants of LR schema (SMT-based)
Corral (SMT-based analysis for Boogie programs)
– [ Lal–Qadeer–Lahiri, CAV’12 ]– [ Lal–Qadeer, FSE’14 ]
CSeq (code-to-code translation for C + PThread) – [ Fischer–Inverso–Parlato, ASE’13 ]
Rek (for Real-time Embedded Software Systems)
– [ Chaki–Gurfinkel–Strichman, FMCAD’11 ]
Storm: implementation for C programs – [ Lahiri–Qadeer–Rakamaric, CAV’09 ]– [Rakamaric, ICSE’10]
•General eager translation representing thread interactions using bounded DAGs [Bouajjani-Emmi-Parlato, SAS’11]
LR sequentialization is “eager”
T1
a2
T2 T3
b1
a3
b2
a2 is guesseda2 may be unreachable
EAGER
[Lal, Reps CAV’08]
LR sequentialization does not preserve assertions
void thread1() { while (blocked); x = x/y; if (x%2==1) ERROR; }
void thread2() { x=12; y=2;
//unblock thread2 blocked=false;}
// shared variablesbool blocked=true;int x=0, y=0;
Inv: y != 0
blocked=true
blocked=false
guess x=13
y=0
A lazy transformation is desirable
A lazy sequential program explores only reachable states of the concurrent program
Why is it desirable? • In model-checking it can drastically reduce the explored state-
space• Better invariants for deductive verification / abstract interpretation
We now illustrate a lazy transformation:
[La Torre-Madhusudan-Parlato, CAV’09]
Lazy transformation: main idea
Execute T1
Context-switch:
store s1 and abort
Execute T2 from s1
store s2 and abort
(l1,s1)
(l’1,s1)
(l’2,s2)
T1
(l0,s0)
T2
store s1
& abort store s2
& abort
Lazy transformation: main idea
Re-execute T1 till it reaches s1
May reach a new local state!
Anyway it is correct !!
Restart from global s2 and compute s3
(l1,s1)
(l’1,s1)
(l’2,s2)
T1
(l0,s0)
T2
store s1
& abort store s2
& abort
(l’’1,s1)
store s3
& abort
(l’’1,s2)
Lazy transformation: main idea
Switch to T2
Execute till it reaches s2
Continue computation from global s3
(l1,s1)
(l’1,s1)
(l’2,s2)
T1
(l0,s0)
T2
store s1
& abort store s2
& abort
(l’’1,s1)
store s3
& abort
(l’’’1,s2)
(l’’1,s2) (l’’’1,s3)
Lazy transformation: main idea
T1 T2
store s1
store s2
store s3
store s4
store s5
end
s1
s2
s3s4
s1
s2
s3
s4
s5
Lazy transformation: features
• Explores only reachable states
• Preserves invariants across the translation
• Tracks local state of one thread at any time
• Tracks values of shared variables at context switches
(s1, s2, …, sk)
• Requires recomputation of local states
…T1 T2
TN
…F1() F2() FN() main()
Concurrent program
“equivalent”Sequential program with non determinism
Sequentialization(code-to-code translation)
Simulation functions
translates
…
translates
translates
Lazy sequentialization as a code-to-code translation
Guess scheduling Orchestrate calls to threads (Fi)
Nondet jump to next context where this thread is active At last context-switch, store shared state, abort, and return to main
Lazy translation for Concurrent Boolean programs
• Concurrent Boolean programs Boolean programs
• We have implemented an eager and a lazy translator for concurrent Boolean programs- Download: http://www.cs.uiuc.edu/~madhu/getafix/cbp2bp
Experiments: Windows NT Bluetooth driver Context
switches
1-adder
1-stopper
2-adders
1-stopper
1-adder
2-stoppers
2-adders
2-stoppers
eager lazy eager lazy eager lazy eager lazy
1
2
3
4
5
6
N
N
N
N
N
N
0.1
0.3
43.3
73.6
930.0
-
0.1
0.2
1.4
5.5
20.2
66.8
N
N
N
Y
Y
Y
0.2
0.9
135.9
1601.0
-
-
0.1
0.8
6.3
2.6
18.0
122.9
N
N
Y
Y
Y
Y
0.1
0.7
70.1
597.2
-
-
0.1
0.9
0.4
2.9
14.0
66.1
N
N
Y
Y
Y
Y
0.2
1.6
177.6
out of mem.
out of mem.
out of mem.
0.1
2.0
0.8
7.5
66.5
535.9
• Backend sequential analysis: GetAFix [La Torre-Madhusudan-Parlato, PLDI’09]- BDD-based analysis: it stores summaries recomputations => no multiple thread explorations
• Lazy outperforms Eager
Lazy translation for Concurrent Boolean programs
• Eager and Lazy sequentializations can be extended to parameterized programs: [La Torre-Madhusudan-Parlato, CAV’10, FIT’12]
T1 T2Tm
in1
in2
in3
out1
out2
out3
Correctness of abstractions of several Linux device drivers
Lazy translation for Concurrent Boolean programs
• Eager and Lazy sequentializations can be extended to parameterized programs: [La Torre-Madhusudan-Parlato, CAV’10, FIT’12]
• Delay-bounded scheduling [Emmi-Qadeer-Rakamaric, POPL’11]
- Programs with asynchronous calls (creating tasks) - Each task is executed to completion (no interleaving with other tasks)- Sequentialization is according to a DFS scheduler of tasks- When dispatched, a task can be delayed to next round
– the total number of delays in a task-creation tree is bounded by k– total number of explored rounds is k+1
– The beginning of each round is guessed (eager)
Lazy translation for Concurrent Boolean programs
• Eager and Lazy sequentializations can be extended to parameterized programs: [La Torre-Madhusudan-Parlato, CAV’10, FIT’12]
• Delay-bounded scheduling [Emmi-Qadeer-Rakamaric, POPL’11]
• General sequentialization [Bouajjani-Emmi-Parlato, SAS’11]
- Programs with asynchronous calls
- Tasks can be interleaved with other ones
- Sequentialization based on– DAGs of contexts – Composition and compression operations
– Bound on the size of the DAGs
– Generalizes k-rounds Eager e delay bounded-scheduling sequentialization
aa
cc
dd
bb
ee
More Sequentializations
• Eager and Lazy sequentializations can be extended to parameterized programs: [La Torre-Madhusudan-Parlato, CAV’10, FIT’12]
• Delay-bounded scheduling [Emmi-Qadeer-Rakamaric, POPL’11]
• General sequentialization [Bouajjani-Emmi-Parlato, SAS’11]
• Scope-bounded sequentialization [La Torre-Napoli-Parlato, FSTTCS’12]
• Budget-bounded sequentialization [Abdulla-Atig- Rezine-Stenman, FMCAD’12]
• Sequentialization for proving correctness [Garg-Madhusudan, TACAS’11]
• Bounded-phase sequentialization of message passing programs [Bouajjani-Emmi,
TACAS’12]
• Eager sequentialization for periodic programs[Chaki-Gurfinkel-Sinha, FMCAD’14], [Chaki-Gurfinkel-Strichman, FMCAD’13]
• …
Conclusions
Sequentialization is an effective approach to analyze concurrent programs–Fast prototyping–Re-use of mature technologies (tools designed for sequential programs)–Code-to-code translation
Presented translations:–keep track only of the local state of the current thread (no cross product)–Keep track of a finite number of copies of shared states –thread creation is implemented with calls
• KISS is a lazy sequentialization that allows an unbounded number of context-switches, but does not allow a bounded number context-switches between any two threads
•Bounded context-switches:- Eager translations require guessing of values of the shared variables and may
explore unreachable states- Lazy translations preserve the invariants and introduces many recursive calls
(thread re-computations)
Tomorrow’s lecture
Sequentializations for Bounded Model Checking backends:• Lazy-CSeq [ Inverso-Tomasco-Fischer-La Torre-Parlato, CAV’14 ]
• MU-Cseq [ Tomasco-Inverso-Fischer-La Torre-Parlato, TACAS’15 ]
framework for developing sequentialization of
concurrent C programs :
users.ecs.soton.ac.uk/gp4/cseq/