Towards Certified Separate Compilation for
Concurrent ProgramsHanru Jiang* Hongjin Liang†
Siyang Xiao* Junpeng Zha* Xinyu Feng†
* University of Science and Technology of China† Nanjing University
Compilers are NOT Trustworthy
[PLDI 2011]
• 11 open-source/commercial compilers were tested
• Found 325 bugs, in EVERY compiler!
Compilers are NOT Trustworthy
[PLDI 2011]
• 11 open-source/commercial compilers were tested
• Found 325 bugs, in EVERY compiler!
Verification of compiler correctness helps:
“The striking thing about our CompCert results is that the middle end bugs we found in all other compilers are absent.”
Compilation Correctness
S
T
Compiler
Source (e.g. C)
Target (e.g. assembly)
∀S, T . T = Compiler(S) ⟹ T ⊆ S
Correct(Compiler) :
Semantic preservation: T has no more observable behaviors (e.g. I/O events by print) than S.
Compiler Verification
Compiler Verification• Leroy’06: Formal certification of a compiler back-end
• Lochbihler’10: Verifying a compiler for Java threads
• Myreen’10: Verified just-in-time compiler on x86
• Sevcik et al.’11: Relaxed-memory concurrency and verified compilation
• Zhao et al.’13: Formal verification of SSA-based optimizations for LLVM
• Kumar et al.’14: CakeML: A verified implementation of ML
• Stewart et al.’15: Compositional CompCert
• Kang et al.’16: Lightweight Verification of Separate Compilation
• …
Compiler Verification• Leroy’06: Formal certification of a compiler back-end
• Lochbihler’10: Verifying a compiler for Java threads
• Myreen’10: Verified just-in-time compiler on x86
• Sevcik et al.’11: Relaxed-memory concurrency and verified compilation
• Zhao et al.’13: Formal verification of SSA-based optimizations for LLVM
• Kumar et al.’14: CakeML: A verified implementation of ML
• Stewart et al.’15: Compositional CompCert
• Kang et al.’16: Lightweight Verification of Separate Compilation
• …
Limited support of separate compilation and concurrency!
Separate Compilation
S1 S2Source (e.g. C)
Real-world programs may consist of multiple components, which will be compiled independently.
Separate Compilation
S1 S2Source (e.g. C)
Interaction
Real-world programs may consist of multiple components, which will be compiled independently.
// Module S1 extern void g(int *x); int f(){ int a = 0, b = 0; g(&b); return a + b; }
// Module S2 void g(int *x){ *x = 3; }
Separate Compilation
S1 S2Source (e.g. C)
Interaction
Separate Compilation
S1 S2Source (e.g. C)
Interaction
T1Target (e.g. assembly)
Compiler-1
Separate Compilation
S1 S2Source (e.g. C)
Interaction
T1 T2Target (e.g. assembly)
Compiler-1 Compiler-2
Separate Compilation
S1 S2Source (e.g. C)
Interaction
T1 T2Target (e.g. assembly)
Compiler-1 Compiler-2
Interaction
Separate Compilation
S1 S2Source (e.g. C)
Interaction
T1 T2Target (e.g. assembly)
Compiler-1 Compiler-2
Interaction
Different compilersDifferent
compilers
Separate Compilation
S1 S2Source (e.g. C)
Interaction
T1 T2Target (e.g. assembly)
Compiler-1 Compiler-2
Interaction
Different compilersDifferent
compilers
Different Languages
Different languages
Separate Compilation of Concurrent Programs
S1 S2Source (e.g. C)
T1 T2Target (e.g. assembly)
Compiler-1 Compiler-2
Interaction
Interaction
Separate Compilation of Concurrent Programs
S1 S2Source (e.g. C)
Parallel Composition
T1 T2Target (e.g. assembly)
Parallel Composition
||
||
Compiler-1 Compiler-2
Separate Compilation of Concurrent Programs
S1 S2Source (e.g. C)
Parallel Composition
T1 T2Target (e.g. assembly)
Parallel Composition
||
||
Can we reuse existing certified compilers (e.g. CompCert) for separate compilation of concurrent programs?
Compiler-1 Compiler-2
Compiler-1 Compiler-2
Compositional CompCert’s Argument…[Stewart et al. POPL’15]
DRFS1 S2Source (e.g. C)
T1 T2Target (e.g. assembly)
||
||
YES, for data-race-free (DRF) concurrent programs
r1 = 1; r1 = r1 + 1; lock(); x = 1; y = x + 1; unlock();
r2 = 2; r2 = r2 + 1; lock(); x = 2; y = x + 1; unlock();
interleaving
Intuition of the Argument: Interleaving <=> Non-preemptive for DRF Programs
r1 = 1; r1 = r1 + 1; lock(); x = 1; y = x + 1; unlock();
r2 = 2; r2 = r2 + 1; lock(); x = 2; y = x + 1; unlock();
interleaving
No race
Intuition of the Argument: Interleaving <=> Non-preemptive for DRF Programs
r1 = 1; r1 = r1 + 1; lock(); x = 1; y = x + 1; unlock();
r2 = 2; r2 = r2 + 1; lock(); x = 2; y = x + 1; unlock();
interleaving
No race
Intuition of the Argument: Interleaving <=> Non-preemptive for DRF Programs
r1 = 1; r1 = r1 + 1; lock(); x = 1; y = x + 1; unlock();
r2 = 2; r2 = r2 + 1; lock(); x = 2; y = x + 1; unlock();
interleaving
No race Non-preemptive: yield control at
certain points only
sequential
sequentialr1 = 1; r1 = r1 + 1; yield; x = 1; y = x + 1; yield;
r2 = 2; r2 = r2 + 1; yield; x = 2; y = x + 1; yield;
Intuition of the Argument: Interleaving <=> Non-preemptive for DRF Programs
r1 = 1; r1 = r1 + 1; lock(); x = 1; y = x + 1; unlock();
r2 = 2; r2 = r2 + 1; lock(); x = 2; y = x + 1; unlock();
interleaving
No race Non-preemptive: yield control at
certain points only
sequential
sequentialr1 = 1; r1 = r1 + 1; yield; x = 1; y = x + 1; yield;
r2 = 2; r2 = r2 + 1; yield; x = 2; y = x + 1; yield;
Plausible, but need to address several key challenges
Intuition of the Argument: Interleaving <=> Non-preemptive for DRF Programs
Challenges
Challenges• How to formulate DRF in language independent manner?
DRFS1 S2||
Different Languages
Different languages
Challenges• How to formulate DRF in language independent manner?
• How to prove DRF-preservation, compositionally?
DRF
||
S1 S2
T1 T2
||
Challenges• How to formulate DRF in language independent manner?
• How to prove DRF-preservation, compositionally?
DRF
||
S1 S2
T1 T2
||
||T1’ T2’
… …
Challenges• How to formulate DRF in language independent manner?
• How to prove DRF-preservation, compositionally?
DRF
||
S1 S2
T1 T2
||
||T1’ T2’
… …
DRF?
Challenges• How to formulate DRF in language independent manner?
• How to prove DRF-preservation, compositionally?
Challenges• How to formulate DRF in language independent manner?
• How to prove DRF-preservation, compositionally?
• How to support benign-race and relaxed memory models?
Challenges• How to formulate DRF in language independent manner?
• How to prove DRF-preservation, compositionally?
• How to support benign-race and relaxed memory models?
“… synchronization primitives are commonly implemented with assembly code that has data races.”
—— Hans-J. Boehm, HotPar’11
Our Work
Our Work• Language independent verification framework
- Key semantics components + proof structures
- Supports separate compilation for race-free concurrent programs
- With both external function calls & multi-threaded code
Our Work• Language independent verification framework
- Key semantics components + proof structures
- Supports separate compilation for race-free concurrent programs
- With both external function calls & multi-threaded code
• Framework extension:
- Supports x86-TSO + confined benign-races
Our Work• Language independent verification framework
- Key semantics components + proof structures
- Supports separate compilation for race-free concurrent programs
- With both external function calls & multi-threaded code
• Framework extension:
- Supports x86-TSO + confined benign-races
• CASCompCert:
- Extends CompCert with Concurrency + Abstraction + Separate compilation
- Reuses considerable amount of CompCert proofs
- Racy x86-TSO impl. of locks as synchronization library
Outline of this Talk
• Language-independent DRF formulation
• DRF-preservation and key proof structures
• Supporting x86-TSO and confined benign-races in CASCompCert
Outline of this Talk
• Language-independent DRF formulation
• DRF-preservation and key proof structures
• Supporting x86-TSO and confined benign-races in CASCompCert
Language-Independent DRF Data-race: read-write / write-write conflicts
Memory …
write read/write
Language-Independent DRFData-race: read-write / write-write conflicts
Language-Independent DRF
Why language-independent?To support cross-language interaction
Data-race: read-write / write-write conflicts
S1 S2Interaction
May in different languagesMay in different languages
Language-Independent DRF
Why language-independent?To support cross-language interactionabstract away lang. details
e.g. interaction semantics[Stewart et al. POPL’15]
Data-race: read-write / write-write conflicts
S1 S2Interaction
May in different languagesMay in different languages
Language-Independent DRF
Why language-independent?To support cross-language interactionabstract away lang. details
e.g. interaction semantics[Stewart et al. POPL’15]
Data-race: read-write / write-write conflicts
S1 S2Interaction
May in different languagesMay in different languages
NO concrete reads/writes
Language-Independent DRF
Why language-independent?To support cross-language interactionabstract away lang. details
e.g. interaction semantics[Stewart et al. POPL’15]
Data-race: read-write / write-write conflicts
S1 S2Interaction
May in different languagesMay in different languages
NO concrete reads/writes
How to formulate DRF if we do not even know the concrete reads/writes?
Solution: Abstract Footprints
DRF(S1 || S2)
Solution: Abstract Footprints
DRF(S1 || S2)
Defined in terms of footprint disjointness
• footprints δ ::= (rs, ws) read-set write-set
Solution: Abstract Footprints
DRF(S1 || S2)
Defined in terms of footprint disjointness
• footprints δ ::= (rs, ws)
• well-defined language extensional characterization of footprints
read-set write-set
Solution: Abstract Footprints
DRF(S1 || S2)
Defined in terms of footprint disjointness
• footprints δ ::= (rs, ws)
• well-defined language extensional characterization of footprints
read-set write-set
rs
Solution: Abstract Footprints
DRF(S1 || S2)
Defined in terms of footprint disjointness
• footprints δ ::= (rs, ws)
• well-defined language extensional characterization of footprints
read-set write-set
rs
rs
Solution: Abstract Footprints
DRF(S1 || S2)
Defined in terms of footprint disjointness
• footprints δ ::= (rs, ws)
• well-defined language extensional characterization of footprints
read-set write-set
rs
rs
Arbitrarily differentArbitrarily different
rs
Solution: Abstract Footprints
DRF(S1 || S2)
Defined in terms of footprint disjointness
• footprints δ ::= (rs, ws)
• well-defined language extensional characterization of footprints
read-set write-set
rsS
rs
ws
Arbitrarily differentArbitrarily different
rs
Solution: Abstract Footprints
DRF(S1 || S2)
Defined in terms of footprint disjointness
• footprints δ ::= (rs, ws)
• well-defined language extensional characterization of footprints
read-set write-set
rsS
rs rsS
ws
ws
Arbitrarily differentArbitrarily different Same
Solution: Abstract Footprints
DRF(S1 || S2)
Defined in terms of footprint disjointness
• footprints δ ::= (rs, ws) read-set write-set
• well-defined language extensional characterization of footprints
Solution: Abstract Footprints
DRF(S1 || S2)
Defined in terms of footprint disjointness
• footprints δ ::= (rs, ws) read-set write-set
• well-defined language extensional characterization of footprints
Outline of this Talk
• Language-independent DRF formulation
• DRF-preservation and key proof structures
• Supporting x86-TSO and confined benign-races in CASCompCert
Outline of this Talk
• Language-independent DRF formulation
• DRF-preservation and key proof structures
• Supporting x86-TSO and confined benign-races in CASCompCert
Compositional CompCert’s Argument…
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
?
Compositional CompCert’s Argument…
T1 | T2 ⊆ S1 | S2
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
?
Non-preemptive
Compositional CompCert’s Argument…
T1 | T2 ⊆ S1 | S2
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
?
?
Non-preemptive
Our Ideas
T1 | T2 ⊆ S1 | S2
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
?
Non-preemptive
?
Our Ideas
T1 | T2 ⊆ S1 | S2
⊆
⊆
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
?
Non-preemptive
Our Ideas
T1 | T2 ⊆ S1 | S2
⊆
⊆
Trivial
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
?
Non-preemptive
Our Ideas
T1 | T2 ⊆ S1 | S2
⊆
⊆
Trivial
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
?
Non-preemptive
?
Our Ideas
T1 | T2 ⊆ S1 | S2
⊆
⊆
TrivialDRF(T1 || T2)
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
?
Non-preemptive
Our Ideas
T1 | T2 ⊆ S1 | S2
⊆
⊆
TrivialDRF(T1 || T2)
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
?
Non-preemptive
?
Our Ideas
T1 | T2 ⊆ S1 | S2
⊆
⊆
TrivialDRF(T1 || T2)
?
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
?
Non-preemptive
Our Ideas
T1 | T2 ⊆ S1 | S2
⊆
⊆
TrivialDRF(T1 || T2)
?
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
?
Non-preemptive
How to prove DRF-preservation?
Our Ideas
T1 | T2 ⊆ S1 | S2
⊆
⊆
TrivialDRF(T1 || T2)
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
?
Our Ideas
T1 | T2 ⊆ S1 | S2
⊆
⊆
TrivialDRF(T1 || T2)
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
T1 | T2 ≤ S1 | S2 ≲
Footprint- preserving simulation
?
Our Ideas
T1 | T2 ⊆ S1 | S2
⊆
⊆
TrivialDRF(T1 || T2)
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
T1 | T2 ≤ S1 | S2 ≲
Footprint- preserving simulation
?
Our Ideas
T1 | T2 ⊆ S1 | S2
⊆
⊆
TrivialDRF(T1 || T2)
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
T1 | T2 ≤ S1 | S2 ≲
Footprint- preserving simulation
DRF preservation
Our Ideas
T1 | T2 ⊆ S1 | S2
T1 ≤ S1 /\ T2 ≤ S2
⊆
⊆
TrivialDRF(T1 || T2)
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
≲ ≲
T1 | T2 ≤ S1 | S2 ≲
Footprint- preserving simulation
DRF preservation Compositionality
Our Ideas
T1 | T2 ⊆ S1 | S2
T1 ≤ S1 /\ T2 ≤ S2
⊆
⊆
TrivialDRF(T1 || T2)
T1 || T2 ⊆ S1 || S2
DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)
≲ ≲
T1 | T2 ≤ S1 | S2 ≲
Footprint- preserving simulation
DRF preservation Compositionality
Our compiler correctness
(T, σ)
(S, Σ)
≤
Solution: Footprint-Preserving Simulation
Source state
Target state≲
Target has smaller footprints, so cannot introduce more races
(T, σ)
(S, Σ)
≤
Solution: Footprint-Preserving Simulation
≲
Target has smaller footprints, so cannot introduce more races
(T, σ)
(S, Σ)
≤
(T’, σ’)
Solution: Footprint-Preserving Simulation
≲
Target has smaller footprints, so cannot introduce more races
(T, σ)
(S, Σ)
≤
(S’, Σ’)*
(T’, σ’)
Solution: Footprint-Preserving Simulation
Zero-or-multiple steps
≲
Target has smaller footprints, so cannot introduce more races
(T, σ)
(S, Σ)
≤ ≤
(S’, Σ’)*
(T’, σ’)
Solution: Footprint-Preserving Simulation
Zero-or-multiple steps
≲ ≲
Target has smaller footprints, so cannot introduce more races
(T, σ)
(S, Σ) …
…
≤ ≤
(S’, Σ’)*
(T’, σ’)
Solution: Footprint-Preserving Simulation
≲ ≲
Target has smaller footprints, so cannot introduce more races
(T, σ)
(S, Σ) …
…
≤ ≤
(S’, Σ’)*
(T’, σ’)
Solution: Footprint-Preserving Simulation
≲ ≲
Δ
δ
Δ, δ: Footprints
Target has smaller footprints, so cannot introduce more races
(T, σ)
(S, Σ) …
…
≤ ≤
(S’, Σ’)*
(T’, σ’)
Solution: Footprint-Preserving Simulation
≲ ≲
Δ
δ
Δ, δ: Footprints ⊆
Target has smaller footprints, so cannot introduce more races
Outline of this Talk
• Language-independent DRF formulation
• DRF-preservation and key proof structures
• Supporting x86-TSO and confined benign-races in CASCompCert
Outline of this Talk
• Language-independent DRF formulation
• DRF-preservation and key proof structures
• Supporting x86-TSO and confined benign-races in CASCompCert
DRF Imposes Strong Restriction on Libraries
Lib
zCall
T1 ||
Callz
T2
DRFClient
DRF Imposes Strong Restriction on Libraries
lock_rel: … mov $1, %eax lock xchg %eax, L …
Lib
zCall
T1 ||
Callz
T2
DRFClient
DRF Imposes Strong Restriction on Libraries
lock_rel: … mov $1, %eax lock xchg %eax, L …
Lib
zCall
T1 ||
Callz
T2
DRF ! inefficient
lock_rel: … mov $1, L …
[spin-lock impl. in Linux 2.6]
Client
DRF Imposes Strong Restriction on Libraries
lock_rel: … mov $1, %eax lock xchg %eax, L …
Lib
zCall
T1 ||
Callz
T2
DRF ! inefficient
lock_rel: … mov $1, L …
[spin-lock impl. in Linux 2.6]
Client
Racy
DRF Imposes Strong Restriction on Libraries
lock_rel: … mov $1, %eax lock xchg %eax, L …
Lib
zCall
T1 ||
Callz
T2
DRF ! inefficient
lock_rel: … mov $1, L …
[spin-lock impl. in Linux 2.6]
Client
Relaxed memory model, e.g. x86-TSO
Racy
DRF Imposes Strong Restriction on Libraries
lock_rel: … mov $1, %eax lock xchg %eax, L …
Lib
zCall
T1 ||
Callz
T2
DRF ! inefficient
lock_rel: … mov $1, L …
[spin-lock impl. in Linux 2.6]
?
Client
Relaxed memory model, e.g. x86-TSO
Racy
Our Idea
Racy RF
Our Idea
Racy Lib’
zCall
T1 ||
Callz
T2
Client
Racy RF
Our Idea• Confined benign-races:
• Racy libraries and client code run in separate memory regions
Racy Lib’
zCall
T1 ||
Callz
T2
Client
Racy RF
Our Idea• Confined benign-races:
• Racy libraries and client code run in separate memory regions• Client code be well-synchronized
Racy Lib’
zCall
T1 ||
Callz
T2
Client Well-synchronized
Racy RF
Our Idea• Confined benign-races:
• Racy libraries and client code run in separate memory regions• Client code be well-synchronized• Racy libraries have race-free abstraction
Racy Lib’
zCall
T1 ||
Callz
T2
z
Call
T1 ||Call
z
T2
DRF
LibRace-free
abstraction
⊆
Client Well-synchronized
Racy RF
Source P Clight CallLock…
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Source P Clight CallLock…
Multi-threaded
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Source P Clight CallLock…
Multi-threaded race-free abstraction of spin-locks for synchronization
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Source P Clight CallLock…
Multi-threaded race-free abstraction of spin-locks for synchronization
DRF
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Source PDRF
CallLock…
Race-free
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Target P
CompCert …
Source PDRF
CallLock…
x86…
Race-free
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Target P
CompCert … identity
Source PDRF
Call
Call
Lock
Lock
…
x86…
Race-free
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Target P
CompCert … identity
Source PDRF
Call
Call
Lock
Lock
…
x86…
Race-free
x86-SC
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Target P
CompCert … identity
Source P
Manually impl.
Lock-Impl
DRF
with benign-races
Call
Call
Lock
Lock
…
x86…
Race-free
x86-SC
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Target P
CompCert … identity
Source P
Manually impl.
identity …
Lock-ImplTarget P’TSO
DRF
with benign-races
Call
Call
Call
Lock
Lock
…
x86…
x86-TSO…
Race-free
x86-SC
x86-TSO semantics
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Target P
CompCert … identity
Source P
Manually impl.
identity …
Lock-ImplTarget P’TSO
DRF
with benign-races
Call
Call
Call
Lock
Lock
…
x86…
x86-TSO…
Race-free
x86-SC
x86-TSO semantics
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Target P
CompCert … identity
Source P
Manually impl.
identity …
Lock-Impl
⊆
Target P’TSO
DRF
with benign-races
Call
Call
Call
Lock
Lock
…
x86…
x86-TSO…
Race-free
x86-SC
x86-TSO semantics
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Target P
CompCert … identity
Source P
≤oidentity …
Lock-Impl
⊆
Target P’TSO
DRF
with benign-races
Call
Call
Call
Lock
Lock
…
x86…
x86-TSO…
Race-free
x86-SC
x86-TSO semantics
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Target P
CompCert … identity
Source P
≤oidentity …
Lock-Impl
⊆
Target P’TSO
DRF
with benign-races
Call
Call
Call
Lock
Lock
…
x86…
x86-TSO…
Race-free
x86-SC
x86-TSO semantics
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Target P
CompCert … identity
Source P
≤oidentity …
Lock-Impl
⊆
Target P’TSO
⊆DRF
with benign-races
Call
Call
Call
Lock
Lock
…
x86…
x86-TSO…
Race-free
x86-SC
x86-TSO semantics
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Target P
CompCert … identity
Source P
≤oidentity …
Lock-Impl
⊆
Target P’TSO
⊆DRF
with benign-races
Call
Call
Call
Lock
Lock
…
x86…
x86-TSO…
Race-free
x86-SC
x86-TSO semantics
Clight
Supporting Confined Benign-Race & x86-TSO in Our CASCompCert
Target P
CompCert … identity
Source P
≤oidentity …
Lock-Impl
⊆
Target P’TSO
⊆DRF
with benign-races
Call
Call
Call
Lock
Lock
…
x86…
x86-TSO…
Race-free
x86-SC
x86-TSO semantics
CompCert PassesCompCert C Clight C#minor Cminor
CminorSel
RTL
LTLLinearMachPower PC
RISC-V
x86
ARM
SimplLocals
TunnelingCleanupLabels
Tailcall, Renumber, Inlining, Constprop, CSE,
Deadcode,Unusedglob
20 passes
Verified 12/20 Passes in CASCompCert
CompCert C Clight C#minor Cminor
CminorSel
RTL
LTLLinearMach
SimplLocals
TunnelingCleanupLabels
Tailcall, Renumber, Inlining, Constprop, CSE,
Deadcode,Unusedglob
20 passesVerified 12/
Power PC
RISC-V
x86
ARM
Verified 12/20 Passes in CASCompCert
CompCert C Clight C#minor Cminor
CminorSel
RTL
LTLLinearMach
SimplLocals
TunnelingCleanupLabels
Tailcall, Renumber, Inlining, Constprop, CSE,
Deadcode,Unusedglob
20 passesVerified 12/
Power PC
RISC-V
x86
ARM
Including all the translation passes from Clight to x86
Reused Considerable Amount of CompCert Proofs. Framework is Challenging to Implement.Towards Certified Separate Compilation for Concurrent Programs PLDI ’19, June 22–26, 2019, Phoenix, AZ, USA
Compilation passes andframework
Spec ProofCompCert Ours CompCert Ours
Cshmgen 515 1021 1071 1503Cminorgen 753 1556 1152 1251Selection 336 500 647 783RTLgen 428 543 821 862Tailcall 173 328 275 405Renumber 86 245 117 358Allocation 704 785 1410 1700Tunneling 131 339 166 475Linearize 236 371 349 733CleanupLabels 126 387 161 388Stacking 730 1038 1108 2135Asmgen 208 338 571 1128Compositionality (Lem. 6) 580 2249DRF preservation (Lem. 8) 358 1142Semantics equiv. (Lem. 9) 1540 4718Lifting 813 1795
Figure 13. Lines of code (using coqwc) in Coq
Here the premises 1-3 are similar to those required inDef. 11. In addition, the premise 4 requires that the x86-TSOcode πo of the object be simulated by γo . The simulation(tl, geo, πo) !
o (slCImp, geo,γo) is an extension of Liang andFeng [19] with the support of TSO semantics for the low-level code. Due to space limit, we omit the definition here.The refinement relation ⊑′ is a weaker version of ⊑ (see
Sec. 3.2). It does not preserve termination (the formal defini-tion omitted here). This is because our simulation!o for theobject code does not preserve termination for now, whichwe leave as future work.
Theorem 15 can be derived from Thm. 14 (for the compila-tion from Clight to x86-SC), and from Lem. 16 below, sayingthe x86-TSO code refines the x86-SC client code and thesource object code (we use tlsc and tltso as shorter notationsfor tlx86-SC and tlx86-TSO respectively).
Lemma 16 (Restore SC semantics for DRF x86 programs).Let Πsc = {(tlsc, ge1, π1), . . . , (tlsc, gem, πm ), (slCImp, geo,γo )},and Πtso = {(tltso, ge1, π1), . . . , (tltso, gem, πm ), (tltso, geo, πo )}.For any f1 . . . fn , if
1. Safe(let Πsc in f1 ∥ . . . ∥ fn ) and DRF(let Πsc in f1 ∥ . . . ∥ fn ),2. (tltso, geo, πo ) !
o (slCImp, geo,γo ),
then let Πtso in f1 ∥ . . . ∥ fn ⊑′ let Πsc in f1 ∥ . . . ∥ fn .
As explained before, Lem. 16 can be viewed as a strength-ened DRF-guarantee theorem for x86-TSO in that, if we letγo contain only skip and geo = ∅, Lem. 16 implies the DRF-guarantee of x86-TSO.
7.4 Proof Efforts in Coq
In Coq we have mechanized the framework (Fig. 2) andthe extended framework (Fig. 3) and proved all the relatedlemmas. We have verified all the CompCert passes in Fig. 11.Statistics of our Coq implementation and proofs are de-
picted in Fig. 13. Adapting the compilation correctness proofsfrom CompCert is relatively lightweight. For most passes
our proofs are within 300 lines of code more than the origi-nal CompCert proofs. The Stacking pass introduces moreadditional proofs, mostly caused by arguments marshallingfor supporting cross-language linking. In our experience,adapting CompCert’s original compilation proofs to our set-tings takes less than one person week per translation pass(except for Stacking). For simpler passes such as Tailcall,Linearize, Allocation, and RTLgen, it takes less than oneperson day per pass.By contrast, implementing our framework is more chal-
lenging, which took us about 1 person year. In particular,proving the equivalence between non-preemptive and pre-emptive semantics for DRF programs took us more time thanexpected, although it seems to be a well-known folklore the-orem. The co-inductive proofs there involve a large numberof non-trivial cases of reordering threads’ executions.
8 Related Work and ConclusionCompiler verification. Variouswork extends CompCert [16]to support separate compilation or concurrency. We havediscussed Compositional CompCert [2, 29] in Sec. 1 and 2.SepCompCert [15] extends CompCert with the support ofsyntactical linking. Their approach requires all the compila-tion units be compiled by CompCert. They do not supportcross-language linking or concurrency as we do.
CompCertTSO [27] compiles ClightTSO programs to thex86-TSO machine. It does not support cross-language link-ing, and its proof for the two CompCert passes Stackingand Cminorgen are not compositional. By contrast, we haveverified these two passes using our compositional simulation.For the other compositional passes, CompCertTSO relies ona thread-local simulation, which is stronger than ours. Itrequires that the source and the target always generate thesame memory events (excepts for those local variables thatcan be stored in registers). As a result, some optimizations(such as constant propagation and CSE) in CompCertTSOhave to be more restrictive.
As an extension of CompCertTSO, Jagannathan et al. [12]allow the compiler to inject racy code such as the efficientspin lock in Fig. 10. They propose a refinement calculus onthe racy code to ensure the compilation correctness. Theirwork looks similar to our extended framework in Fig. 3, butsince they use TSO semantics for both the source and targetprograms, they do not need to handle the gap between theSC and TSO semantics, so they do not need the source to beDRF as in our work.Podkopaev et al. [24] prove correctness of the compila-
tion from the promising semantics (which is a high-leveloperational relaxed model) to the operational ARMv8-POPmachine. They develop whole-program simulations to dealwith the complicated relaxed behaviors. Later on they verifycompilations from the promising semantics to declarativehardware models such as POWER, ARMv7 and ARMv8 [25].
156
Reused Considerable Amount of CompCert Proofs. Framework is Challenging to Implement.
Compiler verification: 100 - 400 more lines of Coq proof for most passes
Towards Certified Separate Compilation for Concurrent Programs PLDI ’19, June 22–26, 2019, Phoenix, AZ, USA
Compilation passes andframework
Spec ProofCompCert Ours CompCert Ours
Cshmgen 515 1021 1071 1503Cminorgen 753 1556 1152 1251Selection 336 500 647 783RTLgen 428 543 821 862Tailcall 173 328 275 405Renumber 86 245 117 358Allocation 704 785 1410 1700Tunneling 131 339 166 475Linearize 236 371 349 733CleanupLabels 126 387 161 388Stacking 730 1038 1108 2135Asmgen 208 338 571 1128Compositionality (Lem. 6) 580 2249DRF preservation (Lem. 8) 358 1142Semantics equiv. (Lem. 9) 1540 4718Lifting 813 1795
Figure 13. Lines of code (using coqwc) in Coq
Here the premises 1-3 are similar to those required inDef. 11. In addition, the premise 4 requires that the x86-TSOcode πo of the object be simulated by γo . The simulation(tl, geo, πo) !
o (slCImp, geo,γo) is an extension of Liang andFeng [19] with the support of TSO semantics for the low-level code. Due to space limit, we omit the definition here.The refinement relation ⊑′ is a weaker version of ⊑ (see
Sec. 3.2). It does not preserve termination (the formal defini-tion omitted here). This is because our simulation!o for theobject code does not preserve termination for now, whichwe leave as future work.
Theorem 15 can be derived from Thm. 14 (for the compila-tion from Clight to x86-SC), and from Lem. 16 below, sayingthe x86-TSO code refines the x86-SC client code and thesource object code (we use tlsc and tltso as shorter notationsfor tlx86-SC and tlx86-TSO respectively).
Lemma 16 (Restore SC semantics for DRF x86 programs).Let Πsc = {(tlsc, ge1, π1), . . . , (tlsc, gem, πm ), (slCImp, geo,γo )},and Πtso = {(tltso, ge1, π1), . . . , (tltso, gem, πm ), (tltso, geo, πo )}.For any f1 . . . fn , if
1. Safe(let Πsc in f1 ∥ . . . ∥ fn ) and DRF(let Πsc in f1 ∥ . . . ∥ fn ),2. (tltso, geo, πo ) !
o (slCImp, geo,γo ),
then let Πtso in f1 ∥ . . . ∥ fn ⊑′ let Πsc in f1 ∥ . . . ∥ fn .
As explained before, Lem. 16 can be viewed as a strength-ened DRF-guarantee theorem for x86-TSO in that, if we letγo contain only skip and geo = ∅, Lem. 16 implies the DRF-guarantee of x86-TSO.
7.4 Proof Efforts in Coq
In Coq we have mechanized the framework (Fig. 2) andthe extended framework (Fig. 3) and proved all the relatedlemmas. We have verified all the CompCert passes in Fig. 11.Statistics of our Coq implementation and proofs are de-
picted in Fig. 13. Adapting the compilation correctness proofsfrom CompCert is relatively lightweight. For most passes
our proofs are within 300 lines of code more than the origi-nal CompCert proofs. The Stacking pass introduces moreadditional proofs, mostly caused by arguments marshallingfor supporting cross-language linking. In our experience,adapting CompCert’s original compilation proofs to our set-tings takes less than one person week per translation pass(except for Stacking). For simpler passes such as Tailcall,Linearize, Allocation, and RTLgen, it takes less than oneperson day per pass.By contrast, implementing our framework is more chal-
lenging, which took us about 1 person year. In particular,proving the equivalence between non-preemptive and pre-emptive semantics for DRF programs took us more time thanexpected, although it seems to be a well-known folklore the-orem. The co-inductive proofs there involve a large numberof non-trivial cases of reordering threads’ executions.
8 Related Work and ConclusionCompiler verification. Variouswork extends CompCert [16]to support separate compilation or concurrency. We havediscussed Compositional CompCert [2, 29] in Sec. 1 and 2.SepCompCert [15] extends CompCert with the support ofsyntactical linking. Their approach requires all the compila-tion units be compiled by CompCert. They do not supportcross-language linking or concurrency as we do.
CompCertTSO [27] compiles ClightTSO programs to thex86-TSO machine. It does not support cross-language link-ing, and its proof for the two CompCert passes Stackingand Cminorgen are not compositional. By contrast, we haveverified these two passes using our compositional simulation.For the other compositional passes, CompCertTSO relies ona thread-local simulation, which is stronger than ours. Itrequires that the source and the target always generate thesame memory events (excepts for those local variables thatcan be stored in registers). As a result, some optimizations(such as constant propagation and CSE) in CompCertTSOhave to be more restrictive.
As an extension of CompCertTSO, Jagannathan et al. [12]allow the compiler to inject racy code such as the efficientspin lock in Fig. 10. They propose a refinement calculus onthe racy code to ensure the compilation correctness. Theirwork looks similar to our extended framework in Fig. 3, butsince they use TSO semantics for both the source and targetprograms, they do not need to handle the gap between theSC and TSO semantics, so they do not need the source to beDRF as in our work.Podkopaev et al. [24] prove correctness of the compila-
tion from the promising semantics (which is a high-leveloperational relaxed model) to the operational ARMv8-POPmachine. They develop whole-program simulations to dealwith the complicated relaxed behaviors. Later on they verifycompilations from the promising semantics to declarativehardware models such as POWER, ARMv7 and ARMv8 [25].
156
Reused Considerable Amount of CompCert Proofs. Framework is Challenging to Implement.
Compiler verification: 100 - 400 more lines of Coq proof for most passes
Towards Certified Separate Compilation for Concurrent Programs PLDI ’19, June 22–26, 2019, Phoenix, AZ, USA
Compilation passes andframework
Spec ProofCompCert Ours CompCert Ours
Cshmgen 515 1021 1071 1503Cminorgen 753 1556 1152 1251Selection 336 500 647 783RTLgen 428 543 821 862Tailcall 173 328 275 405Renumber 86 245 117 358Allocation 704 785 1410 1700Tunneling 131 339 166 475Linearize 236 371 349 733CleanupLabels 126 387 161 388Stacking 730 1038 1108 2135Asmgen 208 338 571 1128Compositionality (Lem. 6) 580 2249DRF preservation (Lem. 8) 358 1142Semantics equiv. (Lem. 9) 1540 4718Lifting 813 1795
Figure 13. Lines of code (using coqwc) in Coq
Here the premises 1-3 are similar to those required inDef. 11. In addition, the premise 4 requires that the x86-TSOcode πo of the object be simulated by γo . The simulation(tl, geo, πo) !
o (slCImp, geo,γo) is an extension of Liang andFeng [19] with the support of TSO semantics for the low-level code. Due to space limit, we omit the definition here.The refinement relation ⊑′ is a weaker version of ⊑ (see
Sec. 3.2). It does not preserve termination (the formal defini-tion omitted here). This is because our simulation!o for theobject code does not preserve termination for now, whichwe leave as future work.
Theorem 15 can be derived from Thm. 14 (for the compila-tion from Clight to x86-SC), and from Lem. 16 below, sayingthe x86-TSO code refines the x86-SC client code and thesource object code (we use tlsc and tltso as shorter notationsfor tlx86-SC and tlx86-TSO respectively).
Lemma 16 (Restore SC semantics for DRF x86 programs).Let Πsc = {(tlsc, ge1, π1), . . . , (tlsc, gem, πm ), (slCImp, geo,γo )},and Πtso = {(tltso, ge1, π1), . . . , (tltso, gem, πm ), (tltso, geo, πo )}.For any f1 . . . fn , if
1. Safe(let Πsc in f1 ∥ . . . ∥ fn ) and DRF(let Πsc in f1 ∥ . . . ∥ fn ),2. (tltso, geo, πo ) !
o (slCImp, geo,γo ),
then let Πtso in f1 ∥ . . . ∥ fn ⊑′ let Πsc in f1 ∥ . . . ∥ fn .
As explained before, Lem. 16 can be viewed as a strength-ened DRF-guarantee theorem for x86-TSO in that, if we letγo contain only skip and geo = ∅, Lem. 16 implies the DRF-guarantee of x86-TSO.
7.4 Proof Efforts in Coq
In Coq we have mechanized the framework (Fig. 2) andthe extended framework (Fig. 3) and proved all the relatedlemmas. We have verified all the CompCert passes in Fig. 11.Statistics of our Coq implementation and proofs are de-
picted in Fig. 13. Adapting the compilation correctness proofsfrom CompCert is relatively lightweight. For most passes
our proofs are within 300 lines of code more than the origi-nal CompCert proofs. The Stacking pass introduces moreadditional proofs, mostly caused by arguments marshallingfor supporting cross-language linking. In our experience,adapting CompCert’s original compilation proofs to our set-tings takes less than one person week per translation pass(except for Stacking). For simpler passes such as Tailcall,Linearize, Allocation, and RTLgen, it takes less than oneperson day per pass.By contrast, implementing our framework is more chal-
lenging, which took us about 1 person year. In particular,proving the equivalence between non-preemptive and pre-emptive semantics for DRF programs took us more time thanexpected, although it seems to be a well-known folklore the-orem. The co-inductive proofs there involve a large numberof non-trivial cases of reordering threads’ executions.
8 Related Work and ConclusionCompiler verification. Variouswork extends CompCert [16]to support separate compilation or concurrency. We havediscussed Compositional CompCert [2, 29] in Sec. 1 and 2.SepCompCert [15] extends CompCert with the support ofsyntactical linking. Their approach requires all the compila-tion units be compiled by CompCert. They do not supportcross-language linking or concurrency as we do.
CompCertTSO [27] compiles ClightTSO programs to thex86-TSO machine. It does not support cross-language link-ing, and its proof for the two CompCert passes Stackingand Cminorgen are not compositional. By contrast, we haveverified these two passes using our compositional simulation.For the other compositional passes, CompCertTSO relies ona thread-local simulation, which is stronger than ours. Itrequires that the source and the target always generate thesame memory events (excepts for those local variables thatcan be stored in registers). As a result, some optimizations(such as constant propagation and CSE) in CompCertTSOhave to be more restrictive.
As an extension of CompCertTSO, Jagannathan et al. [12]allow the compiler to inject racy code such as the efficientspin lock in Fig. 10. They propose a refinement calculus onthe racy code to ensure the compilation correctness. Theirwork looks similar to our extended framework in Fig. 3, butsince they use TSO semantics for both the source and targetprograms, they do not need to handle the gap between theSC and TSO semantics, so they do not need the source to beDRF as in our work.Podkopaev et al. [24] prove correctness of the compila-
tion from the promising semantics (which is a high-leveloperational relaxed model) to the operational ARMv8-POPmachine. They develop whole-program simulations to dealwith the complicated relaxed behaviors. Later on they verifycompilations from the promising semantics to declarativehardware models such as POWER, ARMv7 and ARMv8 [25].
156
Reused Considerable Amount of CompCert Proofs. Framework is Challenging to Implement.
Compiler verification: 100 - 400 more lines of Coq proof for most passes
Towards Certified Separate Compilation for Concurrent Programs PLDI ’19, June 22–26, 2019, Phoenix, AZ, USA
Compilation passes andframework
Spec ProofCompCert Ours CompCert Ours
Cshmgen 515 1021 1071 1503Cminorgen 753 1556 1152 1251Selection 336 500 647 783RTLgen 428 543 821 862Tailcall 173 328 275 405Renumber 86 245 117 358Allocation 704 785 1410 1700Tunneling 131 339 166 475Linearize 236 371 349 733CleanupLabels 126 387 161 388Stacking 730 1038 1108 2135Asmgen 208 338 571 1128Compositionality (Lem. 6) 580 2249DRF preservation (Lem. 8) 358 1142Semantics equiv. (Lem. 9) 1540 4718Lifting 813 1795
Figure 13. Lines of code (using coqwc) in Coq
Here the premises 1-3 are similar to those required inDef. 11. In addition, the premise 4 requires that the x86-TSOcode πo of the object be simulated by γo . The simulation(tl, geo, πo) !
o (slCImp, geo,γo) is an extension of Liang andFeng [19] with the support of TSO semantics for the low-level code. Due to space limit, we omit the definition here.The refinement relation ⊑′ is a weaker version of ⊑ (see
Sec. 3.2). It does not preserve termination (the formal defini-tion omitted here). This is because our simulation!o for theobject code does not preserve termination for now, whichwe leave as future work.
Theorem 15 can be derived from Thm. 14 (for the compila-tion from Clight to x86-SC), and from Lem. 16 below, sayingthe x86-TSO code refines the x86-SC client code and thesource object code (we use tlsc and tltso as shorter notationsfor tlx86-SC and tlx86-TSO respectively).
Lemma 16 (Restore SC semantics for DRF x86 programs).Let Πsc = {(tlsc, ge1, π1), . . . , (tlsc, gem, πm ), (slCImp, geo,γo )},and Πtso = {(tltso, ge1, π1), . . . , (tltso, gem, πm ), (tltso, geo, πo )}.For any f1 . . . fn , if
1. Safe(let Πsc in f1 ∥ . . . ∥ fn ) and DRF(let Πsc in f1 ∥ . . . ∥ fn ),2. (tltso, geo, πo ) !
o (slCImp, geo,γo ),
then let Πtso in f1 ∥ . . . ∥ fn ⊑′ let Πsc in f1 ∥ . . . ∥ fn .
As explained before, Lem. 16 can be viewed as a strength-ened DRF-guarantee theorem for x86-TSO in that, if we letγo contain only skip and geo = ∅, Lem. 16 implies the DRF-guarantee of x86-TSO.
7.4 Proof Efforts in Coq
In Coq we have mechanized the framework (Fig. 2) andthe extended framework (Fig. 3) and proved all the relatedlemmas. We have verified all the CompCert passes in Fig. 11.Statistics of our Coq implementation and proofs are de-
picted in Fig. 13. Adapting the compilation correctness proofsfrom CompCert is relatively lightweight. For most passes
our proofs are within 300 lines of code more than the origi-nal CompCert proofs. The Stacking pass introduces moreadditional proofs, mostly caused by arguments marshallingfor supporting cross-language linking. In our experience,adapting CompCert’s original compilation proofs to our set-tings takes less than one person week per translation pass(except for Stacking). For simpler passes such as Tailcall,Linearize, Allocation, and RTLgen, it takes less than oneperson day per pass.By contrast, implementing our framework is more chal-
lenging, which took us about 1 person year. In particular,proving the equivalence between non-preemptive and pre-emptive semantics for DRF programs took us more time thanexpected, although it seems to be a well-known folklore the-orem. The co-inductive proofs there involve a large numberof non-trivial cases of reordering threads’ executions.
8 Related Work and ConclusionCompiler verification. Variouswork extends CompCert [16]to support separate compilation or concurrency. We havediscussed Compositional CompCert [2, 29] in Sec. 1 and 2.SepCompCert [15] extends CompCert with the support ofsyntactical linking. Their approach requires all the compila-tion units be compiled by CompCert. They do not supportcross-language linking or concurrency as we do.
CompCertTSO [27] compiles ClightTSO programs to thex86-TSO machine. It does not support cross-language link-ing, and its proof for the two CompCert passes Stackingand Cminorgen are not compositional. By contrast, we haveverified these two passes using our compositional simulation.For the other compositional passes, CompCertTSO relies ona thread-local simulation, which is stronger than ours. Itrequires that the source and the target always generate thesame memory events (excepts for those local variables thatcan be stored in registers). As a result, some optimizations(such as constant propagation and CSE) in CompCertTSOhave to be more restrictive.
As an extension of CompCertTSO, Jagannathan et al. [12]allow the compiler to inject racy code such as the efficientspin lock in Fig. 10. They propose a refinement calculus onthe racy code to ensure the compilation correctness. Theirwork looks similar to our extended framework in Fig. 3, butsince they use TSO semantics for both the source and targetprograms, they do not need to handle the gap between theSC and TSO semantics, so they do not need the source to beDRF as in our work.Podkopaev et al. [24] prove correctness of the compila-
tion from the promising semantics (which is a high-leveloperational relaxed model) to the operational ARMv8-POPmachine. They develop whole-program simulations to dealwith the complicated relaxed behaviors. Later on they verifycompilations from the promising semantics to declarativehardware models such as POWER, ARMv7 and ARMv8 [25].
156
Framework impl.: > 60k LoC, ~ 1 person year
Conclusion
Conclusion• Language independent verification framework
- Key semantics components + proof structures
- Supports separate compilation for race-free concurrent programs
- Well-defined language for language-independent DRF
- Footprint-preserving simulation for DRF preservation
Conclusion• Language independent verification framework
- Key semantics components + proof structures
- Supports separate compilation for race-free concurrent programs
- Well-defined language for language-independent DRF
- Footprint-preserving simulation for DRF preservation
• Framework extension:
- Support x86-TSO + confined benign-races
Conclusion• Language independent verification framework
- Key semantics components + proof structures
- Supports separate compilation for race-free concurrent programs
- Well-defined language for language-independent DRF
- Footprint-preserving simulation for DRF preservation
• Framework extension:
- Support x86-TSO + confined benign-races
• CASCompCert:
- Reused considerable amount of CompCert proofs
- Racy x86-TSO impl. of locks as synchronization library
Thank you!
Top Related