Towards Certiﬁed Separate Compilation for Concurrent Programs · S1 S2 Source (e.g. C) Real-world...

Towards Certified Separate Compilation for

Concurrent ProgramsHanru Jiang* Hongjin Liang†

Siyang Xiao* Junpeng Zha* Xinyu Feng†

* University of Science and Technology of China† Nanjing University

Compilers are NOT Trustworthy

[PLDI 2011]

• 11 open-source/commercial compilers were tested

• Found 325 bugs, in EVERY compiler!

Compilers are NOT Trustworthy

[PLDI 2011]

• 11 open-source/commercial compilers were tested

• Found 325 bugs, in EVERY compiler!

Verification of compiler correctness helps:

“The striking thing about our CompCert results is that the middle end bugs we found in all other compilers are absent.”

Compilation Correctness

S

T

Compiler

Source (e.g. C)

Target (e.g. assembly)

∀S, T . T = Compiler(S) ⟹ T ⊆ S

Correct(Compiler) :

Semantic preservation: T has no more observable behaviors (e.g. I/O events by print) than S.

Compiler Verification

Compiler Verification• Leroy’06: Formal certification of a compiler back-end

• Lochbihler’10: Verifying a compiler for Java threads

• Myreen’10: Verified just-in-time compiler on x86

• Sevcik et al.’11: Relaxed-memory concurrency and verified compilation

• Zhao et al.’13: Formal verification of SSA-based optimizations for LLVM

• Kumar et al.’14: CakeML: A verified implementation of ML

• Stewart et al.’15: Compositional CompCert

• Kang et al.’16: Lightweight Verification of Separate Compilation

• …

Compiler Verification• Leroy’06: Formal certification of a compiler back-end

• Lochbihler’10: Verifying a compiler for Java threads

• Myreen’10: Verified just-in-time compiler on x86

• Sevcik et al.’11: Relaxed-memory concurrency and verified compilation

• Zhao et al.’13: Formal verification of SSA-based optimizations for LLVM

• Kumar et al.’14: CakeML: A verified implementation of ML

• Stewart et al.’15: Compositional CompCert

• Kang et al.’16: Lightweight Verification of Separate Compilation

• …

Limited support of separate compilation and concurrency!

Separate Compilation

S1 S2Source (e.g. C)

Real-world programs may consist of multiple components, which will be compiled independently.



Interaction

Real-world programs may consist of multiple components, which will be compiled independently.

// Module S1 extern void g(int *x); int f(){ int a = 0, b = 0; g(&b); return a + b; }

// Module S2 void g(int *x){ *x = 3; }



Interaction



Interaction

T1Target (e.g. assembly)

Compiler-1



Interaction

T1 T2Target (e.g. assembly)

Compiler-1 Compiler-2



Interaction



Interaction



Interaction



Interaction

Different compilersDifferent

compilers



Interaction



Interaction

Different compilersDifferent

compilers

Different Languages

Different languages

Separate Compilation of Concurrent Programs




Interaction

Interaction



Parallel Composition



||

||







||

||

Can we reuse existing certified compilers (e.g. CompCert) for separate compilation of concurrent programs?



Compositional CompCert’s Argument…[Stewart et al. POPL’15]

DRFS1 S2Source (e.g. C)


||

||

YES, for data-race-free (DRF) concurrent programs

r1 = 1; r1 = r1 + 1; lock(); x = 1; y = x + 1; unlock();


interleaving

Intuition of the Argument: Interleaving <=> Non-preemptive for DRF Programs



interleaving

No race




interleaving

No race Non-preemptive: yield control at

certain points only

sequential

sequentialr1 = 1; r1 = r1 + 1; yield; x = 1; y = x + 1; yield;

r2 = 2; r2 = r2 + 1; yield; x = 2; y = x + 1; yield;




interleaving

No race Non-preemptive: yield control at

certain points only

sequential

sequentialr1 = 1; r1 = r1 + 1; yield; x = 1; y = x + 1; yield;

r2 = 2; r2 = r2 + 1; yield; x = 2; y = x + 1; yield;

Plausible, but need to address several key challenges


Challenges

Challenges• How to formulate DRF in language independent manner?

DRFS1 S2||

Different Languages

Different languages


• How to prove DRF-preservation, compositionally?

DRF

||

S1 S2

T1 T2

||



DRF

||

S1 S2

T1 T2

||

||T1’ T2’

… …



DRF

||

S1 S2

T1 T2

||

||T1’ T2’

… …

DRF?



• How to support benign-race and relaxed memory models?



• How to support benign-race and relaxed memory models?

“… synchronization primitives are commonly implemented with assembly code that has data races.”

—— Hans-J. Boehm, HotPar’11

Our Work

Our Work• Language independent verification framework

- Key semantics components + proof structures

- Supports separate compilation for race-free concurrent programs

- With both external function calls & multi-threaded code





• Framework extension:

- Supports x86-TSO + confined benign-races






- Supports x86-TSO + confined benign-races

• CASCompCert:

- Extends CompCert with Concurrency + Abstraction + Separate compilation

- Reuses considerable amount of CompCert proofs

- Racy x86-TSO impl. of locks as synchronization library

Outline of this Talk

• Language-independent DRF formulation

• DRF-preservation and key proof structures

• Supporting x86-TSO and confined benign-races in CASCompCert

Language-Independent DRF Data-race: read-write / write-write conflicts

Memory …

write read/write

Language-Independent DRFData-race: read-write / write-write conflicts

Language-Independent DRF

Why language-independent?To support cross-language interaction

Data-race: read-write / write-write conflicts

S1 S2Interaction

May in different languagesMay in different languages


Why language-independent?To support cross-language interactionabstract away lang. details

e.g. interaction semantics[Stewart et al. POPL’15]


S1 S2Interaction






S1 S2Interaction


NO concrete reads/writes





S1 S2Interaction


NO concrete reads/writes

How to formulate DRF if we do not even know the concrete reads/writes?

Solution: Abstract Footprints

DRF(S1 || S2)


DRF(S1 || S2)

Defined in terms of footprint disjointness

• footprints δ ::= (rs, ws) read-set write-set


DRF(S1 || S2)


• footprints δ ::= (rs, ws)

• well-defined language extensional characterization of footprints

read-set write-set


DRF(S1 || S2)




read-set write-set

rs


DRF(S1 || S2)




read-set write-set

rs

rs


DRF(S1 || S2)




read-set write-set

rs

rs

Arbitrarily differentArbitrarily different

rs


DRF(S1 || S2)




read-set write-set

rsS

rs

ws

Arbitrarily differentArbitrarily different

rs


DRF(S1 || S2)




read-set write-set

rsS

rs rsS

ws

ws

Arbitrarily differentArbitrarily different Same


DRF(S1 || S2)


• footprints δ ::= (rs, ws) read-set write-set


Compositional CompCert’s Argument…

T1 || T2 ⊆ S1 || S2

DRF(S1 || S2) T1 = Comp(S1) T2 = Comp(S2)

?


T1 | T2 ⊆ S1 | S2

T1 || T2 ⊆ S1 || S2


?

Non-preemptive


T1 | T2 ⊆ S1 | S2

T1 || T2 ⊆ S1 || S2


?

?

Non-preemptive

Our Ideas

T1 | T2 ⊆ S1 | S2

T1 || T2 ⊆ S1 || S2


?

Non-preemptive

?

Our Ideas

T1 | T2 ⊆ S1 | S2

⊆

⊆

T1 || T2 ⊆ S1 || S2


?

Non-preemptive

Our Ideas

T1 | T2 ⊆ S1 | S2

⊆

⊆

Trivial

T1 || T2 ⊆ S1 || S2


?

Non-preemptive

Our Ideas

T1 | T2 ⊆ S1 | S2

⊆

⊆

Trivial

T1 || T2 ⊆ S1 || S2


?

Non-preemptive

?

Our Ideas

T1 | T2 ⊆ S1 | S2

⊆

⊆

TrivialDRF(T1 || T2)

T1 || T2 ⊆ S1 || S2


?

Non-preemptive

Our Ideas

T1 | T2 ⊆ S1 | S2

⊆

⊆


T1 || T2 ⊆ S1 || S2


?

Non-preemptive

?

Our Ideas

T1 | T2 ⊆ S1 | S2

⊆

⊆


?

T1 || T2 ⊆ S1 || S2


?

Non-preemptive

Our Ideas

T1 | T2 ⊆ S1 | S2

⊆

⊆


?

T1 || T2 ⊆ S1 || S2


?

Non-preemptive

How to prove DRF-preservation?

Our Ideas

T1 | T2 ⊆ S1 | S2

⊆

⊆


T1 || T2 ⊆ S1 || S2


?

Our Ideas

T1 | T2 ⊆ S1 | S2

⊆

⊆


T1 || T2 ⊆ S1 || S2


T1 | T2 ≤ S1 | S2 ≲

Footprint- preserving simulation

?

Our Ideas

T1 | T2 ⊆ S1 | S2

⊆

⊆


T1 || T2 ⊆ S1 || S2


T1 | T2 ≤ S1 | S2 ≲


DRF preservation

Our Ideas

T1 | T2 ⊆ S1 | S2

T1 ≤ S1 /\ T2 ≤ S2

⊆

⊆


T1 || T2 ⊆ S1 || S2


≲ ≲

T1 | T2 ≤ S1 | S2 ≲


DRF preservation Compositionality

Our Ideas

T1 | T2 ⊆ S1 | S2

T1 ≤ S1 /\ T2 ≤ S2

⊆

⊆


T1 || T2 ⊆ S1 || S2


≲ ≲

T1 | T2 ≤ S1 | S2 ≲


DRF preservation Compositionality

Our compiler correctness

(T, σ)

(S, Σ)

≤

Solution: Footprint-Preserving Simulation

Source state

Target state≲

Target has smaller footprints, so cannot introduce more races

(T, σ)

(S, Σ)

≤


≲


(T, σ)

(S, Σ)

≤

(T’, σ’)


≲


(T, σ)

(S, Σ)

≤

(S’, Σ’)*

(T’, σ’)


Zero-or-multiple steps

≲


(T, σ)

(S, Σ)

≤ ≤

(S’, Σ’)*

(T’, σ’)


Zero-or-multiple steps

≲ ≲


(T, σ)

(S, Σ) …

…

≤ ≤

(S’, Σ’)*

(T’, σ’)


≲ ≲


(T, σ)

(S, Σ) …

…

≤ ≤

(S’, Σ’)*

(T’, σ’)


≲ ≲

Δ

δ

Δ, δ: Footprints


(T, σ)

(S, Σ) …

…

≤ ≤

(S’, Σ’)*

(T’, σ’)


≲ ≲

Δ

δ

Δ, δ: Footprints ⊆


DRF Imposes Strong Restriction on Libraries

Lib

zCall

T1 ||

Callz

T2

DRFClient


lock_rel: … mov $1, %eax lock xchg %eax, L …

Lib

zCall

T1 ||

Callz

T2

DRFClient



Lib

zCall

T1 ||

Callz

T2

DRF ! inefficient

lock_rel: … mov $1, L …

[spin-lock impl. in Linux 2.6]

Client



Lib

zCall

T1 ||

Callz

T2

DRF ! inefficient



Client

Racy



Lib

zCall

T1 ||

Callz

T2

DRF ! inefficient



Client

Relaxed memory model, e.g. x86-TSO

Racy



Lib

zCall

T1 ||

Callz

T2

DRF ! inefficient



?

Client

Relaxed memory model, e.g. x86-TSO

Racy

Our Idea

Racy RF

Our Idea

Racy Lib’

zCall

T1 ||

Callz

T2

Client

Racy RF

Our Idea• Confined benign-races:

• Racy libraries and client code run in separate memory regions

Racy Lib’

zCall

T1 ||

Callz

T2

Client

Racy RF


• Racy libraries and client code run in separate memory regions• Client code be well-synchronized

Racy Lib’

zCall

T1 ||

Callz

T2

Client Well-synchronized

Racy RF


• Racy libraries and client code run in separate memory regions• Client code be well-synchronized• Racy libraries have race-free abstraction

Racy Lib’

zCall

T1 ||

Callz

T2

z

Call

T1 ||Call

z

T2

DRF

LibRace-free

abstraction

⊆

Client Well-synchronized

Racy RF

Source P Clight CallLock…

Supporting Confined Benign-Race & x86-TSO in Our CASCompCert


Multi-threaded



Multi-threaded race-free abstraction of spin-locks for synchronization



Multi-threaded race-free abstraction of spin-locks for synchronization

DRF


Clight


Source PDRF

CallLock…

Race-free

Clight


Target P

CompCert …

Source PDRF

CallLock…

x86…

Race-free

Clight


Target P

CompCert … identity

Source PDRF

Call

Call

Lock

Lock

…

x86…

Race-free

Clight


Target P


Source PDRF

Call

Call

Lock

Lock

…

x86…

Race-free

x86-SC

Clight


Target P


Source P

Manually impl.

Lock-Impl

DRF

with benign-races

Call

Call

Lock

Lock

…

x86…

Race-free

x86-SC

Clight


Target P


Source P

Manually impl.

identity …

Lock-ImplTarget P’TSO

DRF

with benign-races

Call

Call

Call

Lock

Lock

…

x86…

x86-TSO…

Race-free

x86-SC

x86-TSO semantics

Clight


Target P


Source P

Manually impl.

identity …

Lock-Impl

⊆

Target P’TSO

DRF

with benign-races

Call

Call

Call

Lock

Lock

…

x86…

x86-TSO…

Race-free

x86-SC

x86-TSO semantics

Clight


Target P


Source P

≤oidentity …

Lock-Impl

⊆

Target P’TSO

DRF

with benign-races

Call

Call

Call

Lock

Lock

…

x86…

x86-TSO…

Race-free

x86-SC

x86-TSO semantics

Clight


Target P


Source P

≤oidentity …

Lock-Impl

⊆

Target P’TSO

⊆DRF

with benign-races

Call

Call

Call

Lock

Lock

…

x86…

x86-TSO…

Race-free

x86-SC

x86-TSO semantics

CompCert PassesCompCert C Clight C#minor Cminor

CminorSel

RTL

LTLLinearMachPower PC

RISC-V

x86

ARM

SimplLocals

TunnelingCleanupLabels

Tailcall, Renumber, Inlining, Constprop, CSE,

Deadcode,Unusedglob

20 passes

Verified 12/20 Passes in CASCompCert

CompCert C Clight C#minor Cminor

CminorSel

RTL

LTLLinearMach

SimplLocals



Deadcode,Unusedglob

20 passesVerified 12/

Power PC

RISC-V

x86

ARM

Verified 12/20 Passes in CASCompCert

CompCert C Clight C#minor Cminor

CminorSel

RTL

LTLLinearMach

SimplLocals



Deadcode,Unusedglob

20 passesVerified 12/

Power PC

RISC-V

x86

ARM

Including all the translation passes from Clight to x86

Reused Considerable Amount of CompCert Proofs. Framework is Challenging to Implement.Towards Certified Separate Compilation for Concurrent Programs PLDI ’19, June 22–26, 2019, Phoenix, AZ, USA

Compilation passes andframework

Spec ProofCompCert Ours CompCert Ours

Cshmgen 515 1021 1071 1503Cminorgen 753 1556 1152 1251Selection 336 500 647 783RTLgen 428 543 821 862Tailcall 173 328 275 405Renumber 86 245 117 358Allocation 704 785 1410 1700Tunneling 131 339 166 475Linearize 236 371 349 733CleanupLabels 126 387 161 388Stacking 730 1038 1108 2135Asmgen 208 338 571 1128Compositionality (Lem. 6) 580 2249DRF preservation (Lem. 8) 358 1142Semantics equiv. (Lem. 9) 1540 4718Lifting 813 1795

Figure 13. Lines of code (using coqwc) in Coq

Here the premises 1-3 are similar to those required inDef. 11. In addition, the premise 4 requires that the x86-TSOcode πo of the object be simulated by γo . The simulation(tl, geo, πo) !

o (slCImp, geo,γo) is an extension of Liang andFeng [19] with the support of TSO semantics for the low-level code. Due to space limit, we omit the definition here.The refinement relation ⊑′ is a weaker version of ⊑ (see

Sec. 3.2). It does not preserve termination (the formal defini-tion omitted here). This is because our simulation!o for theobject code does not preserve termination for now, whichwe leave as future work.

Theorem 15 can be derived from Thm. 14 (for the compila-tion from Clight to x86-SC), and from Lem. 16 below, sayingthe x86-TSO code refines the x86-SC client code and thesource object code (we use tlsc and tltso as shorter notationsfor tlx86-SC and tlx86-TSO respectively).

Lemma 16 (Restore SC semantics for DRF x86 programs).Let Πsc = {(tlsc, ge1, π1), . . . , (tlsc, gem, πm ), (slCImp, geo,γo )},and Πtso = {(tltso, ge1, π1), . . . , (tltso, gem, πm ), (tltso, geo, πo )}.For any f1 . . . fn , if

1. Safe(let Πsc in f1 ∥ . . . ∥ fn ) and DRF(let Πsc in f1 ∥ . . . ∥ fn ),2. (tltso, geo, πo ) !

o (slCImp, geo,γo ),

then let Πtso in f1 ∥ . . . ∥ fn ⊑′ let Πsc in f1 ∥ . . . ∥ fn .

As explained before, Lem. 16 can be viewed as a strength-ened DRF-guarantee theorem for x86-TSO in that, if we letγo contain only skip and geo = ∅, Lem. 16 implies the DRF-guarantee of x86-TSO.

7.4 Proof Efforts in Coq

In Coq we have mechanized the framework (Fig. 2) andthe extended framework (Fig. 3) and proved all the relatedlemmas. We have verified all the CompCert passes in Fig. 11.Statistics of our Coq implementation and proofs are de-

picted in Fig. 13. Adapting the compilation correctness proofsfrom CompCert is relatively lightweight. For most passes

our proofs are within 300 lines of code more than the origi-nal CompCert proofs. The Stacking pass introduces moreadditional proofs, mostly caused by arguments marshallingfor supporting cross-language linking. In our experience,adapting CompCert’s original compilation proofs to our set-tings takes less than one person week per translation pass(except for Stacking). For simpler passes such as Tailcall,Linearize, Allocation, and RTLgen, it takes less than oneperson day per pass.By contrast, implementing our framework is more chal-

lenging, which took us about 1 person year. In particular,proving the equivalence between non-preemptive and pre-emptive semantics for DRF programs took us more time thanexpected, although it seems to be a well-known folklore the-orem. The co-inductive proofs there involve a large numberof non-trivial cases of reordering threads’ executions.

8 Related Work and ConclusionCompiler verification. Variouswork extends CompCert [16]to support separate compilation or concurrency. We havediscussed Compositional CompCert [2, 29] in Sec. 1 and 2.SepCompCert [15] extends CompCert with the support ofsyntactical linking. Their approach requires all the compila-tion units be compiled by CompCert. They do not supportcross-language linking or concurrency as we do.

CompCertTSO [27] compiles ClightTSO programs to thex86-TSO machine. It does not support cross-language link-ing, and its proof for the two CompCert passes Stackingand Cminorgen are not compositional. By contrast, we haveverified these two passes using our compositional simulation.For the other compositional passes, CompCertTSO relies ona thread-local simulation, which is stronger than ours. Itrequires that the source and the target always generate thesame memory events (excepts for those local variables thatcan be stored in registers). As a result, some optimizations(such as constant propagation and CSE) in CompCertTSOhave to be more restrictive.

As an extension of CompCertTSO, Jagannathan et al. [12]allow the compiler to inject racy code such as the efficientspin lock in Fig. 10. They propose a refinement calculus onthe racy code to ensure the compilation correctness. Theirwork looks similar to our extended framework in Fig. 3, butsince they use TSO semantics for both the source and targetprograms, they do not need to handle the gap between theSC and TSO semantics, so they do not need the source to beDRF as in our work.Podkopaev et al. [24] prove correctness of the compila-

tion from the promising semantics (which is a high-leveloperational relaxed model) to the operational ARMv8-POPmachine. They develop whole-program simulations to dealwith the complicated relaxed behaviors. Later on they verifycompilations from the promising semantics to declarativehardware models such as POWER, ARMv7 and ARMv8 [25].

156

Reused Considerable Amount of CompCert Proofs. Framework is Challenging to Implement.

Compiler verification: 100 - 400 more lines of Coq proof for most passes

Towards Certified Separate Compilation for Concurrent Programs PLDI ’19, June 22–26, 2019, Phoenix, AZ, USA























156

Reused Considerable Amount of CompCert Proofs. Framework is Challenging to Implement.

Compiler verification: 100 - 400 more lines of Coq proof for most passes

Towards Certified Separate Compilation for Concurrent Programs PLDI ’19, June 22–26, 2019, Phoenix, AZ, USA























156

Framework impl.: > 60k LoC, ~ 1 person year

Conclusion

Conclusion• Language independent verification framework



- Well-defined language for language-independent DRF

- Footprint-preserving simulation for DRF preservation







- Support x86-TSO + confined benign-races







- Support x86-TSO + confined benign-races

• CASCompCert:

- Reused considerable amount of CompCert proofs

- Racy x86-TSO impl. of locks as synchronization library

Thank you!

Towards Certiﬁed Separate Compilation for Concurrent Programs · S1 S2 Source (e.g. C) Real-world...

Documents

Transcript of Towards Certiﬁed Separate Compilation for Concurrent Programs · S1 S2 Source (e.g. C) Real-world...