1
Shared Memory Consistency Protocol Verification against Weak Memory
Models: Refinement via Model-checking
Prosenjit Chatterjee,
Hemanthkumar Sivaraj,
Ganesh Gopalakrishnan
School of Computing, University of Utah
http://www.cs.utah.edu/formal_verification/
Supported by NSF awards CCR 9987516 and 0081406,and equipment gift from Intel Corpn.
{ hemanth, ganesh } @ cs.utah.edu
2
Shared memory multiprocessors
Desktop machinescpu cpu cpu….
memsnoopy bus
Servers andSupercomputers
…
dir dir
3
How is the programmer’s view classically specified?
“sequentialconsistency”
cpu cpu
mem
st(a,1);ld(b,0);
st(b,2);ld(a,0);
One disallowed scenario
Logical View
(“Coherence” means “per location SC”)
Processors
Memory
Initial memory contents = 0
Peterson? No!
4
cpu cpu cpu….
mem
…
dir dir
Growing CPU / Memory performance gap necessitates weakenings…
Aggressiveload/storereorderings
‘Bypassing’ (read back own store before others)
Strong orderingsonly at acquires/releases
…all that and more!
5
Overall Features of Weak Memory Models
• Support ‘ordinary’ as well as ‘special’ loads and stores
• Support fences and synchronization primitives
• Orderings may even depend on dynamic context
=> Provide a much larger range of load-values
Therefore…
• Writing a formal specification is highly non-trivial
• Writing a spec that supports verification is even trickier
7
“sequentialconsistencyIS good”
One almost wishes to go back to SC…
It does not seem a realistic goal for now…
• Simplifies programming
• Some hardware tricks to hide latencies
• Range of such tricks limited
• Complexity for end-users is containable
8
• a formal specification of a weak consistency model (SPEC)
• a finite-state model of the shared memory system (IMP)
The Verification Problem
Given
Verify that
• the executions of IMP executions allowed by SPEC
Our work enables this checking to be achieved using finite-state reachability
9
• Qadeer [CAV’99, SRC TR #176]• Condon, Hu, et.al. [SPAA’01]• Nalumasu et.al. [CAV’98]• Dist. Computing Special Issue ‘99• MPV Workshop [Post FMCAD’00]
Related Work
For SC
• Qadeer [MPV workshop]• Condon et.al. [HPCA’99]• Ghughal and Gopalakrishnan [FMPPTA’00]
For Weak Models
10
• Simple and intuitive SPECs
• Support a wide range of memory models
• Support automated (finite-state) verification
• Avoid backtracking search over SPEC’s executions
• Avoid bloating state-space beyond that of IMP
Our Emphasis
11
Verification Criterion Illustrated…
IMP
Spec
st(a,1) ; ld(a,1) ; ld(a, 0)P1st(a,1);ld(a);
P2ld(a);
Show that
Implies
Same program Same execution
12
Idea: Employ a model-checker to establish refinement
loadvaluesagree
…loadvaluesagree
ExecutableSPEC
IMP
load
load store
store
load
load
• Must do a non-backtracking search over SPEC’s executions
• SPEC must be deterministic with respect to recorded events
therefore
13
What events do we record? Not just Loads and Stores!
SPEC =Carbon-copy of Imp
Imp st(a,1) ; ld(a,1) ; ld(a, 0)
st(a,1) ; ld(a,1) ; ld(a, 1)P1st(a,1);ld(a);
P2ld(a);
P1
LB LB SBSB
ststld ld
P2
M
st(a,1) ; ld(a,1) ; ld(a, 1)
• st(a,1) drained to M• ld(a,1) , ld(a,1) read from M
st(a,1) ; ld(a,1) ; ld(a, 0)
• st(a,1) in SB ; ld(a,1) from SB• ld(a,0) from M
eh?phew!
14
-- Already growing in use (Itanium spec, Neiger, Condon, …)
-- Helps export internal events to determinize SPEC’s executions
-- Defines Read Values to depend on most recent write
st_L(a,1) ; ld(a,1) ; st_G(a,1); ld(a,1)P1st(a,1);ld(a);
P2ld(a);
st_L(a,1) ; ld(a,1) ; ld(a, 0) ; st_G(a,1)
Choices revealed..
SPEC =Carbon-copy of Imp
Imp
ld(a,0) ; st_G(a,1)
Use Visibility Order style SPECs
15
Example of Visibility Order Spec (Condon, HPCA’99)
In non-Visibility Order
: program order : memory order
is in TSO if
(Memory order constraints)
• X Y /\ isLD(X) /\ isST(Y) => X Y• X MB Y => X Y
Read value rule
Value of LD, ‘X’ == Value of closest store ‘Y’ before or after ‘X’ in (local bypassing detail is messy)
: program order : a total order of LD, ST_L, ST_G is in TSO if
(Memory order constraints)
• conditions on split stores
Read value rule
Value of LD, ‘X’ == -- most recent ST_L, when ST_G is after X (local bypassing)
-- most recent ST_G, otherwise (local bypassing not exercised)
In Visibility Order style
16
• Visibility order SPECs for a wide range of mem models
• Built executable SPEC generator prototype (runnable over web)
• Verification of refinement using Parallel Murphi (ported to MPI at Utah)
• Verification without bloating IMP’s state-space and without backtracking on SPEC’s executions
• Two snoopy-bus protocols modeled after Alpha and Itanium
• Two snoopy protocols where temporal order != visibility order
• One directory-based protocol (‘Avalanche’ multiprocessor)
Our Contributions
18
…
cpu SB
LB
ld
executionpipelineL1 cache
L2 cache
Inside CPU chips Inside Directories, Interconnects, …
(Fewer design groups have control over this)
(More design groups have control over this)
Approach: Exploit Bug-classification
So… develop Intermediate Abstraction that Retains External Partition
19
The Intermediate Abstraction
SPEC
Visibility orderRead-value rule
IMPIntermediateAbstraction
Retain internalpartition
Simplify externalpartition
THIS PAPER
FUTURE WORK
…
dir dir
20
S C*IBM370
TSOPSORMOAlpha
PC PowerPC
PRAM Slow Memory
Cache C* Causal C*
Itanium Weak C* Entry C*
Release C*
External Partition ReplacementDepends on SPEC Memory Model
Strong Weak Weakest Hybrid
( ‘C*’ means ‘Consistency’ )
21
local global global
Abstraction Method for External Partition
Memory Model Splitting of
store instructions
External Partition
Strong store
unsplit
single port memory
Weak store single port memory
Weakest store Memory & re-order buffer per processor
Hybrid store Memory & re-order buffer per processor
local global
local global global
22
One memory (strong/weak)
orOne memory per CPU
(weakest/hybrid)
Pipe Pipe
RB RB SBSB
ststld ld
CPU1 CPU2
Pipe Pipe
RB RB SBSB
ststld ld
CPU1 CPU2
Snoopy-bus or Directory-based Memory Subsystem
Creating the Intermediate Abstraction
23
Overall approach
Phase 1
Phase 2
Phase 3
Final Spec
Define SpecGenerate Executable Spec
Run it, and gain understanding
Annotated Imp
Final Imp
Design Imp Annotate Imp with events
Derive Impabs
Start
Verify against Impabs
Success
FailureVerify Impabs
24
Verification IMPIntermediateAbstraction
st_L
st_L
st_G
st_G
ld
ld
store in SB
store in M
store in Cache
store in SB
load from LBor from Cache
load from M
loadvaluesagree?
25
Protocol States
(M)
Trans
(M)
Time
(h)
States
(M)
Trans
(M)
Time
(h)
Split Trans Bus
64 470 0.95 111 985 1.75
-- with
Scheurich Opt
251 1794 3.4 325 2769 4.8
Multiple
Interleaved Bus
255 1820 3.6 773 2686 11
-- with
Scheurich278 1946 3.9 927 3402 12
Runs on 16 CPU Parallel Murphi ported to MPI at UtahEach CPU @ 850 MHz, 256 Mb per node (LAN communication)
Alpha model w/oBarriers and LL/SC
Itanium w/o weak ld/stSemaphores (RC_tso)
26
Features of Examples
• Examples with Scheurich’s optimization: -- Logical order != Temporal order
• Directory Protocols: -- a Migratory directory protocol using PV and SPIN found no errors (parallel search not tried)
• Other directory protocols as well as Itanium (hybrid) memory model soon to be tried
27
- Not just coherence
- SC violations
- Write atomicity violations
- Hybrid memory ordering violations
- Bugs in internal partition: will be caught when intermediate abstraction compared against SPEC
Bugs likely to be caught
28
- Improve parallel model-checker
- Approximate search (e.g., parallel random-walk)
- Bounded model-checking (enumerative or SAT)
- Exploit data independence
- Try many examples, and refine methodology
How to scale up?
29
- Efficient use of reachability analysis to verify IMP against weak memory model SPEC
- Applicable to a whole range of weak models
- Selection of Intermediate Abstraction is systematic
- Annotating Intermediate Abstractions is not hard
- State explosion problem is not worsened
An easy-to-use verification technique that multiprocessor designers can use readily.
Conclusions
31
• SC executions have a single visibility order, V• Stores present in V consistent with prog. order (single store order) • Loads present in V consistent with prog. order• Each load to address A returns value D that the most recent store in V to A wrote
“Visibility Order” explained using SC
st(a,1);ld(b,0);
st(b,2);ld(a,0);
st(p,1);st(q,2);
ld(q,2);ld(p,1);
P1 P2 P1 P2
st(a,1); ld(b,0); st(b,2); ld(a,0)
whoops!
st(p,1); ld(q,2); st(q,2); ld(p,1)
OK!
NON-SC SC
32
Writing visibility order specs for weak memory models…
st(a,1);ld(a,1);
xxx ;ld(a,0);ld(a,1);
P1 P2
ld(p,1);ld(p,2);
st(p,1);st(p,2);
P1 P2ld(p,2);ld(p,1);
P3
st(p,1)ld(p,1)st(p,2) ld(p,2)
Visibilityorder of P1
st_L(a,1)ld(a,1)
ld(a,0)st_G(a,1)ld(a,1)
Single visibility order for TSO
Can use single or multiple visibility orders[MPV workshop slides, see http://www.cs.utah.edu/mpv]
st(p,2) ld(p,2)st(p,1)ld(p,1)
..of P3
Split stores into Local and GlobalSingle Global-store Order
Stores kept unsplit
Multiple VO needed for some weak mem models….
33
ld(p,1);ld(p,2);
st(p,1);st(p,2);
P1 P2ld(p,2);ld(p,1);
P3st_1(p,1)ld(p,1)st_1(p,2) ld(p,2)st_2(p,2) ld(p,2)st_2(p,1)ld(p,1)
Single visibility order for Itanium, obtained bysplitting every Store into N copies
• Always use single Visibility Order• Makes specification more intuitive• Can annotate Implementation model with coherency events to obtain generated VO• Can compare against reliable Spec that encompasses all legal VO using reachability analysis
Our main idea
34
Related Work on Verifying Against Weak Memory Models
• Ghughal et.al. [FMPPTA’00] : -- Extension of Collier’s work to weak memory models -- Finite-state abstraction of “ARCHTESTs” to detect ordering violations
• Condon, Hill, Plakal, Sorin et.al [HPCA’99]: -- Idea based on “Lamport Clocks” -- Define “Wisconsin TSO” ordering for execution events -- Assign Lamport Clock values to coherency events -- Manual proof that Lamport Ordering (which traces causalities, and hence read values) implies Wisconsin TSO -- Defines single visibility order idea, but shows it only for subsets of TSO and Alpha
Main inspiration for our work
35
cpu cpu….
mem
What are the observable effects on programs?
ld(a,2);st(b,1);
ld(b,1);st(a,2);lost
atomicity
cpu cpu ….
mem
st(a,1);ld(b,0);
st(b,2);ld(a,0);only
certainguaranteeson executions
cpu cpu
st(p,1);st.rel(q,2);
ld.acq(q,2);ld(p,1);
36
• Shared Memory Implementations are very complex
• Spec (shared memory consistency models) also highly non-trivial
=> Verification engineers face a “double-whammy”
Mini Roadmap:
… Identifying the sources of memory model related bugs
… Related work on verifying against weak memory models
… How to verify against a broad taxonomy of mem models
The Verification Problem
38
• Shared Memory Implementations are very complex
• Spec (shared memory consistency models) also highly non-trivial
=> Verification engineers face a “double-whammy”
Mini Roadmap:
… Identifying the sources of memory model related bugs
… Related work on verifying against weak memory models
… How to verify against a broad taxonomy of mem models
The Verification Problem
39
…
cpu SB
LB
ld
executionpipelineL1 cache
L2 cache
Inside CPU chips Inside Directories, Interconnects, …
(Fewer design groups have control over this)
(More design groups have control over this)
Where are Ordering Relaxations Made?
Techniques that focus on the“external partition” can still bequite useful…
40
Methodology IMPIntermediateAbstraction
• Annotate Imp protocol with events of visibility order -- designer reflects his/her understanding of mem model and Imp
• Replace external partition specific to target memory model
• Annotate intermediate abstraction thus obtained
•Run reachability, matching every visibility event of Imp by one produced by Intermediate Abstraction
41
Taxonomy of memory models, and external partitions for them(can use these in combination for hybrid models)
Strong Weak Weakest Hybrid
Write AtomicityNo local bypassing
Write AtomicityLocal bypassing
No Write AtomicityCoherence
Instructions of many varietiesFences, Acq / Rel
Pictures of ext partitions as well as brief explanation(pictorial) of how event-splitting is done
Top Related