Automatic verification and fence inference for relaxed memory models
Effective Program Verification for Relaxed Memory Models
description
Transcript of Effective Program Verification for Relaxed Memory Models
Effective Program Verificationfor Relaxed Memory Models
Sebastian Burckhardt Madanlal Musuvathi
Microsoft ResearchCAV, July 10, 2008
2
Motivation: Memory Model Vulnerabilities
Programmers do not always follow strict locking discipline in performance-critical code◦ Ad-hoc synchronization with normal loads and stores or
interlocked operations is faster◦ Result: “benign” or “intentional” data races
Such code can break on relaxed memory models◦ Most multicore machines are not sequentially consistent◦ Both compilers and actual hardware can contribute to effect
Vulnerabilities are hard to find, reproduce, and analyze◦ May require specific hardware configuration and schedule
3
C# Examplevolatile bool isIdling;volatile bool hasWork; //Consumer thread void BlockOnIdle(){ lock (condVariable){ isIdling = true; if (!hasWork) Monitor.Wait(condVariable); isIdling = false; } } //Producer thread void NotifyPotentialWork(){ hasWork = true; if (isIdling) lock (condVariable) { Monitor.Pulse(condVariable); } }
4
Key pieces of code on previous slide:
On x86, hardware may perform store late Bug: Producer thread does not notice waiting Consumer,
does not send signal
Store ii, 1
Example: Store Buffer Vulnerability
Store ii, 1
volatile int ii = 0;volatile int hw = 0;
Load hw, 0
Load ii, 1
Store hw, 1
Consumer Producer
0
Abstract View of Memory Models
5
Given a program P, a memory model Y defines the subset TP,Y T of traces corresponding to some (partial or complete) execution of P on Y.
TP, SC TTP, Y
SC (sequential consistency)Is strongest memory model
More executions may be possible on a relaxed memory model Y
5
Example: TSO
6
Under TSO, processors can buffer stores in FIFO queue.
TP, SC TTP, TSO
Trace corresponding to code on slide 4
6
2.1 Store hw, 1
2.2 Load ii, 0
1.1 Store ii, 1
1.2 Load hw, 0
Memory models are platform dependent & ridden with details
We focus on TSO because it models store buffers, the most common relaxation
In practice, TSO is almost the same as the x86 hardware model
TSO
PSO
IA-32
Alpha
RMO
z6SC
IA-64
Why TSO?
7
8
Model Checking Programs on Relaxed Memory Models
Covering all relaxed executions is challenging◦ Highly nondeterministic
(exposed to low-level hardware concurrency)◦ Memory models are usually not finite-state◦ Memory models are often a matter of negotiation
(formal descriptions are the exception)
State of the art has limited scalability◦ Model checking using simplified operational models◦ Bounded model checking using axiomatic models
(CheckFence)
Memory Model Safety
Observation: Programmer writes code for SC◦ Resorts to {locks, fences, volatiles, interlocked
operations} to maintain SC behavior where needed◦ If program P exhibits non-SC behavior,
it is most likely a bug
Definition: A program P is Y-safe if TP,SC = TP,Y
9
10
Decomposed Program Verification on Relaxed Memory Models
1. Verify sequentially consistent executions(show that all executions in TP,SC are correct)
2. Verify memory model safety(show that TP,SC = TP,Y )
Can we do 1 and 2 at the same time? Yes.
TP, SC TTP, Y
11
Borderline ExecutionsDef.: A borderline execution for P is an execution
with a successor in TP,TSO - TP,SC
Thm.: A program P is TSO-safe if and only if it has no borderline executions.
TP,TSO
TP,SC
12
Borderline ExecutionsDef.: A borderline execution for P is an execution
with a successor in TP,TSO - TP,SC
Thm.: A program P is TSO-safe if and only if it has no borderline executions.
TP,TSO
TP,SC
We can verify / falsify this as a safety property of sequentially
consistent executions!
Example: TSO Borderline Execution
13
2.1 Store hw, 11.1 Store ii, 1
1.2 Load hw, 0
2.1 Store hw, 1
2.2 Load ii, 0
1.1 Store ii, 1
1.2 Load hw, 0
2.1 Store hw, 1
2.2 Load ii, 1
1.1 Store ii, 1
1.2 Load hw, 0
TP, SC
TP, TSO
Successor traces are traces with one more instruction.
14
Sober Tool Structure
InstrumentedProgram
BorderlineMonitor
Stateless Model Checker (CHESS)
Scheduler EnumeratesTraces
Event Stream(shared memory accesses, sync ops)
Program output is always sound.Tool may not terminate exploration if # of executions is too large.
Outputs: (1) P correct (2) P not TSO-safe (+cex) (3) P has SC-bug (+cex)
15
Define SC using hb relation Trace = Set of Instructions (Vertices) with attributes
◦ [processor]. [issue index] [operation] [address], [coherence index]
coh.index is the position of the value within the sequence of values written to the same location (i.e., “we replace each value with its sequence number”)
Add edges: program order p / conflict order c
Define happens-before order hb = (p c) Trace is sequentially consistent if and only if hb is acyclic.
1.1 Store ii, 1
1.2 Load hw, 0
2.2 Load ii, 1
2.1 Store hw, 1
This trace is SC:
1.1 Store ii, 1
1.2 Load hw, 02.1 Store hw, 1
This trace is not SC:
2.2 Load ii, 0
16 rhb
Define TSO by Relaxing hb
Define relaxed happens-before order rhb = (p c) \ { (s,l) | s is store, l is load, and s p l }
Trace is possible on TSO if and only if (1) rhb is acyclic (2) there do not exist s, l such that s p l and l c s
This trace is TSO, but not SC:
2.1 Store hw, 1
2.2 Load ii, 0hb
1.1 Store ii, 1
1.2 Load hw, 02.1 Store hw, 1
2.2 Load ii, 0
1.1 Store ii, 1
1.2 Load hw, 02.1 Store hw, 1
2.2 Load ii, 0
1.1 Store ii, 1
1.2 Load hw, 0
Thm.: Def. Is equivalent to operational TSO model (see Tech Report)
Borderline Monitor Implementation
Receiving a stream of memory accesses: Record all stores to all locations. For each load L, check if there exists a reordering of L
with prior stores to the same location such that (1) hb has a cycle(2) rhb is acyclic(3) there do not exist s, l such that s p l and l c s
Implementation: use standard vector clock to compute hb , and custom vector clock (twice the width) to compute rhb
17
Equivalent Interleavings
Typically, many different interleavings map to the same (Mazurkiewic) trace.
By construction, our monitor is insensitive to the choice of interleaving ◦ Checks all hb -equivalent ones simultaneously◦ Makes it compatible with partial order reduction◦ Improves probability of finding bugs
18
19
ResultsGood at finding bugs even if only a small number of
schedules is explored◦ Monitor checks all hb-equivalent interleavings◦ Chess heuristic (iterative context bounding) seems to mix well
Found expected store buffer vulnerabilities in standard examples (Dekker, Bakery)
Detected 2 store buffer vulnerabilities in a production-level concurrency library.◦ Overall code size ~ 33 kloc◦ Used existing test harness written by product team (slightly
adapted for use with CHESS)◦ Bugs not previously known
20
program context # interleavings time ver. time [s]name bound total borderline [s] SoBeR CHESS
Fig. 1(b) ∞ 10 4 < 0.1 < 0.2 < 0.2dekker 1 5 4 < 0.1 < 0.2 < 0.2(2 threads, 2 36 23 < 0.1 0.39 0.372 crit-sec) 3 183 50 < 0.1 1.9 1.8 (loc 82) 4 1,219 124 < 0.1 13.2 13.0 5 8,472 349 < 0.1 106.0 100.6 bakery 0 1 1 < 0.1 < 0.2 < 0.2(2 threads, 1 25 20 < 0.1 0.47 0.433 crit-sec) 2 742 533 < 0.1 10.3 9.8(loc 122) 3 12,436 8,599 < 0.1 189.0 181.0
takequeue 0 3 0 n.a. < 0.3 < 0.3(2 threads, 1 47 14 0.34 0.72 0.696 ops) 2 402 189 0.43 5.2 4.9(loc 374) 3 2,318 1,197 0.74 28.9 27.8 4 9,147 5,321 0.84 125.5 118.9 5 29,821 17,922 0.86 481.5 461.6
Some Numbers
21
Conclusion
With increasing use of multicores, more and more programs are likely to exhibit failures caused by the memory model.
Such failures are hard to find by conventional means (code inspection, testing).
Our combination of borderline monitor & stateless model checking makes it practical to detect memory model safety violations in a unit test environment.
22
Future WorkRun on larger programs (runtime verification)Handle more memory models◦ Which memory models guarantee borderline executions?
Prove memory model safety of concurrent data type implementations
Develop borderline monitors for other relaxed concurrent APIs◦ Transactional memory◦ Concurrency Libraries