Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified...

59
Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang University of Science and Technology of China, China [email protected] Hongjin Liang ∗‡ Nanjing University, China [email protected] Siyang Xiao Junpeng Zha University of Science and Technology of China, China {yutio888,jpzha}@mail.ustc.edu.cn Xinyu Feng †‡ Nanjing University, China [email protected] Abstract Certified separate compilation is important for establish- ing end-to-end guarantees for certified systems consisting of multiple program modules. There has been much work building certified compilers for sequential programs. In this paper, we propose a language-independent framework con- sisting of the key semantics components and lemmas that bridge the verification gap between the compilers for sequen- tial programs and those for (race-free) concurrent programs, so that the existing verification work for the former can be reused. One of the key contributions of the framework is a novel footprint-preserving compositional simulation as the compilation correctness criterion. The framework also provides a new mechanism to support confined benign races which are usually found in efficient implementations of syn- chronization primitives. With our framework, we develop CASCompCert, which extends CompCert for certified separate compilation of race- free concurrent Clight programs. It also allows linking of concurrent Clight modules with x86-TSO implementations of synchronization primitives containing benign races. All our work has been implemented in the Coq proof assistant. CCS Concepts Theory of computation Program verification; Abstraction; Software and its engineer- ing Correctness; Semantics; Concurrent programming languages; Compilers. Keywords Concurrency, Certified Compilers, Data-Race- Freedom, Simulations 1 Introduction Separate compilation is important for real-world systems, which usually consist of multiple program modules that need to be compiled independently. Correct compilation then needs to guarantee that the target modules can work together and preserve the semantics of the source program The first two authors contributed equally and are listed alphabetically. Corresponding author. Also with State Key Laboratory for Novel Software Technology. as a whole. It requires not only that individual modules be compiled correctly, but also that the expected interactions between modules be preserved at the target. CompCert [16], the most well-known certified realistic compiler, establishes the semantics preservation property for compilation of sequential Clight programs, but with no explicit support of separate compilation. To support general separate compilation, Stewart et al.[29] develop Compo- sitional CompCert, which allows the modules to call each other’s external functions. Like CompCert, Compositional CompCert only supports sequential programs too. Stewart et al.[29] do argue that Compositional Comp- Cert may be applied for data-race-free (DRF) concurrent programs, since they behave the same in the standard inter- leaving semantics as in certain non-preemptive semantics where switches between threads occur only at certain desig- nated program points. The sequential compilation is sound as long as the switch points are viewed as external function calls so that optimizations do not go beyond them. Although the argument is plausible, there are still signif- icant challenges to actually implement it. We need proper formulation of the non-preemptive semantics and the notion of DRF. Then we need to indeed prove that DRF programs have the same behaviors (including termination) in the in- terleaving semantics as in the non-preemptive semantics, and verify that the compilation preserves DRF. More impor- tantly, the formulation and the proofs should be done in a compositional and language-independent way, to allow sep- arate compilation where the modules can be implemented in different languages. Reusing the proofs of CompCert is chal- lenging too, as memory allocation for multiple threads may require a different memory model from that of CompCert, and the non-deterministic semantics also makes it difficult to reuse the downward simulation in CompCert. In addition, the requirement of DRF could be sometimes overly restrictive. Although Boehm [3] points out there are essentially no “benign” races in source programs, and lan- guages like C/C++ essentially give no semantics to racy programs [4], it is still meaningful to allow benign races in 1

Transcript of Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified...

Page 1: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation forConcurrent Programs

Extended Version

Hanru Jiang∗University of Science and Technology of China, China

[email protected]

Hongjin Liang∗‡Nanjing University, China

[email protected]

Siyang Xiao Junpeng ZhaUniversity of Science and Technology of China, China

{yutio888,jpzha}@mail.ustc.edu.cn

Xinyu Feng†‡Nanjing University, China

[email protected]

AbstractCertified separate compilation is important for establish-ing end-to-end guarantees for certified systems consistingof multiple program modules. There has been much workbuilding certified compilers for sequential programs. In thispaper, we propose a language-independent framework con-sisting of the key semantics components and lemmas thatbridge the verification gap between the compilers for sequen-tial programs and those for (race-free) concurrent programs,so that the existing verification work for the former can bereused. One of the key contributions of the framework isa novel footprint-preserving compositional simulation asthe compilation correctness criterion. The framework alsoprovides a new mechanism to support confined benign raceswhich are usually found in efficient implementations of syn-chronization primitives.With our framework, we develop CASCompCert, which

extends CompCert for certified separate compilation of race-free concurrent Clight programs. It also allows linking ofconcurrent Clight modules with x86-TSO implementationsof synchronization primitives containing benign races. Allour work has been implemented in the Coq proof assistant.

CCS Concepts • Theory of computation → Programverification; Abstraction; • Software and its engineer-ing → Correctness; Semantics; Concurrent programminglanguages; Compilers.

Keywords Concurrency, Certified Compilers, Data-Race-Freedom, Simulations

1 IntroductionSeparate compilation is important for real-world systems,which usually consist of multiple program modules thatneed to be compiled independently. Correct compilationthen needs to guarantee that the target modules can worktogether and preserve the semantics of the source program∗The first two authors contributed equally and are listed alphabetically.†Corresponding author.‡Also with State Key Laboratory for Novel Software Technology.

as a whole. It requires not only that individual modules becompiled correctly, but also that the expected interactionsbetween modules be preserved at the target.CompCert [16], the most well-known certified realistic

compiler, establishes the semantics preservation propertyfor compilation of sequential Clight programs, but with noexplicit support of separate compilation. To support generalseparate compilation, Stewart et al. [29] develop Compo-sitional CompCert, which allows the modules to call eachother’s external functions. Like CompCert, CompositionalCompCert only supports sequential programs too.Stewart et al. [29] do argue that Compositional Comp-

Cert may be applied for data-race-free (DRF) concurrentprograms, since they behave the same in the standard inter-leaving semantics as in certain non-preemptive semanticswhere switches between threads occur only at certain desig-nated program points. The sequential compilation is soundas long as the switch points are viewed as external functioncalls so that optimizations do not go beyond them.Although the argument is plausible, there are still signif-

icant challenges to actually implement it. We need properformulation of the non-preemptive semantics and the notionof DRF. Then we need to indeed prove that DRF programshave the same behaviors (including termination) in the in-terleaving semantics as in the non-preemptive semantics,and verify that the compilation preserves DRF. More impor-tantly, the formulation and the proofs should be done in acompositional and language-independent way, to allow sep-arate compilation where the modules can be implemented indifferent languages. Reusing the proofs of CompCert is chal-lenging too, as memory allocation for multiple threads mayrequire a different memory model from that of CompCert,and the non-deterministic semantics also makes it difficultto reuse the downward simulation in CompCert.In addition, the requirement of DRF could be sometimes

overly restrictive. Although Boehm [3] points out there areessentially no “benign” races in source programs, and lan-guages like C/C++ essentially give no semantics to racyprograms [4], it is still meaningful to allow benign races in

1

Page 2: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

machine language code for better performance (e.g., the spinlocks in Linux). Also machine languages commonly allowrelaxed behaviors. Can we support realistic target code withpotential benign races in relaxed machine models?There has been work on verified compilation for concur-

rent programs [20, 27], but with only limited support ofcompositionality. We explain the major challenges in detailin Sec. 2, and discuss more related work in Sec. 8.

In this paper, we propose a language-independent frame-work consisting of the key semantics components and veri-fication steps that bridge the gap between compilation forsequential programs and for DRF concurrent programs. Weapply our framework to verify the correctness of CompCertfor separate compilation of DRF programs to x86 assembly.We also extend the framework to allow confined benign racesin x86-TSO (confined in that the racy code must execute in aseparate region of memory and have race-free abstraction),so that optimized implementations of synchronization prim-itives like spin-locks in Linux can be supported. Our work isbased on previous work on certified compilation, but makesthe following new contributions:

• We design a compositional footprint-preserving simulationas the correctness formulation of separate compilation forsequential modules (Sec. 4). It considers module interac-tions at both external function calls and synchronizationpoints, thus is compositional with respect to both modulelinking and non-preemptive concurrency. It also requiresthe footprints (i.e., the set of memory locations accessedduring the execution) of the source and target modulesto be related. This way we effectively reduce the proofof the compiler’s preservation of DRF, a whole programproperty, to the proof of local footprint preservation.

• We work with an abstract programming language (Sec. 3),which is not tied to any specific synchronization con-structs such as locks but uses abstract labels to modelhow such constructs interact with other modules. It alsoabstracts away the concrete primitives that access mem-ory. We introduce the notion of well-defined languagesto enforce a set of constraints over the state transitionsand the related footprints, which actually give an exten-sional interpretation of footprints. These constraints aresatisfied by various real languages such as Clight and x86assembly.With the abstract language, we study the equiva-lence between preemptive and non-preemptive semantics(Sec. 3.3), the equivalence between DRF and NPDRF (thenotion of race-freedom defined in the non-preemptivesetting, shown in Sec. 5), and the properties of our newsimulation. As a result, the lemmas in our proof frame-work are re-usable when instantiating to real languages.

• We prove a strengthened DRF-guarantee theorem for thex86-TSO model (Sec. 7.3). It allows an x86-TSO programto call a (x86-TSO) module with benign races. If one canreplace the racy module with a more abstract version

so that the resulting program is DRF in the sequentiallyconsistent (SC) semantics, then the original racy x86-TSOprogram would behave the same with the more abstractprogram in the SC semantics. This waywe allow the targetprograms to have confined benign races in the relaxedx86-TSO model. We use this approach to support efficientx86-TSO implementations of locks.

• Putting all these together, our framework (see Fig. 2 andFig. 3) successfully builds certified separate compilationfor concurrent programs from sequential compilation. Ithighlights the importance of DRF preservation for correctcompilation. It also shows a possible way to adapt theexisting work of CompCert and Compositional CompCertto interleaving concurrency.

• As an instantiation of our language-independent compila-tion verification framework, we develop the certified com-piler CASCompCert1 (Sec. 7). We instantiate the sourceand target languages as Clight and x86 assembly. We alsoprovide an efficient x86-TSO implementation of locks asa synchronization library. CASCompCert reuses the com-pilation passes of CompCert-3.0.1 [6] (including all thetranslation passes and four optimization passes).We provethat they satisfy our new compilation correctness crite-rion, wherewe reuse a considerable amount of the originalCompCert proofs, with minor adjustment for footprint-preservation. The proofs for each pass take less than oneperson week on average.This technical report (TR) is an extended version of our

PLDI’19 paper. The Coq development and the paper are pub-licly available at https://plax-lab.github.io/publications/ccc/.

2 Informal DevelopmentBelow we first give an overview of the main ideas in Comp-Cert [16, 17] and Compositional CompCert [29] as startingpoints for our work. Then we explain the challenges and ourideas in reusing them to build certified separate compilationfor concurrent programs.

2.1 CompCertFigure 1(a) shows the key proof structure of CompCert. Thecompilation Comp is correct, if for every source program S ,the compiled code C preserves the semantics of S . That is,

Correct(Comp) def= ∀S,C . Comp(S) = C =⇒ S ≈ C .

The semantics preservation S ≈ C requires S and C have thesame sets of observable event traces:

S ≈ C iff ∀B. Etr(S,B) ⇐⇒ Etr(C,B) .Here we write Etr(S,B) to mean that an execution of S pro-duces the observable event trace B, and likewise for C .To verify S ≈ C , CompCert relies on the determinism

of the target language (written as Det(C) in Fig. 1(a)) andproves only the downward direction S ⊑ C , i.e., S refines C:1It is short for an extended CompCert with the support of Concurrency,Abstraction and Separate compilation.

2

Page 3: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

S ≈ C

⇑ Det(C)

S ⊑ C

S - C

(a) CompCert

S1 ◦ . . . ◦ Sn ≈ C1 ◦ . . . ◦Cn⇑ Det(C1 ◦ . . . ◦Cn )

S1 ◦ . . . ◦ Sn ⊑ C1 ◦ . . . ◦Cn⇑

S1 ◦ . . . ◦ Sn - C1 ◦ . . . ◦Cn⇑

∀i . R,G ⊢ Si -′ Ci

(b) Compositional CompCert

S S ′

C C ′+

- -

(c) S - C

S1 S2 S3 S4

C1 C2 C3 C4

-′ -′ -′ -′G R G+ +

(d) R,G ⊢ S -′ C

Figure 1. Proof structures of certified compilation

S ⊑ C iff ∀B. Etr(S,B) =⇒ Etr(C,B) .The determinism Det(C) ensures that C admits only oneevent trace, so we can derive the upward refinement S ⊒ Cfrom S ⊑ C . The latter is then proved by constructing a(downward) simulation relation S - C , depicted in Fig. 1(c).However, the simulation- relates whole programs only andit does not take into account the interactions with othermodules which may update the shared resource. So it is notcompositional and does not support separate compilation.

2.2 Compositional CompCertCompositional CompCert supports separate compilation byre-defining the simulation relation for modules. Figure 1(b)shows its proof structure. Suppose we make separate compi-lation Comp1, . . . ,Compn . Each Compi transforms a sourcemodule Si to a target module Ci . The overall compilation iscorrect if, when linked together, the target modulesC1 ◦ . . .◦Cn preserve the semantics of the source modules S1 ◦ . . . ◦Sn(here we write ◦ as the module-linking operator). For ex-ample, the following program consists of two modules. Thefunction f in module S1 calls the external function g. Theexternal module S2 may access the variable b in S1.

// Module S1extern void g(int *x);int f(){

int a = 0, b = 0;g(&b);return a + b; }

// Module S2int g(int *x){

*x = 3;}

(2.1)

Suppose the two modules S1 and S2 are independently com-piled to the target modules C1 and C2. The correctness of theoverall compilation requires (S1 ◦ S2) ≈ (C1 ◦ C2).

With the determinism of the target modules, we only needto prove the downward refinement (S1 ◦ S2) ⊑ (C1 ◦ C2),which is reduced to proving (S1 ◦ S2) - (C1 ◦ C2), just asin CompCert. Ideally we hope to know (S1◦S2) - (C1◦C2)from S1 - C1 and S2 - C2, and ensure the latter two bycorrectness of Comp1, . . . ,Compn . However, the CompCertsimulation - is not compositional.

To achieve compositionality, Compositional CompCertdefines the simulation relation -′ shown in Fig 1(d). It isparameterized with the interactions between modules, for-mulated as the rely/guarantee conditions R and G [14]. Therely condition R of a module specifies the general callee be-haviors happen at the external function calls of the currentmodule (shown as the thick arrows in Fig. 1(d)). The guaran-tee G specifies the possible transitions made by the moduleitself (the thin arrows). The simulation is compositional aslong as the R and G of linked modules are compatible.

Compositional CompCert proves that the CompCert com-pilation passes satisfy the new simulation -′. The intuitionis that the compiler optimizations do not go beyond externalcalls (unless only local variables get involved). That is, for theexample (2.1), the compiler cannot do optimizations basedon the (wrong) assumption that b is 0 when f returns.

Since the R steps happen only at the external calls, it can-not be applied to concurrent programs, where module/threadinteractions may occur at any program point. However, if weconsider race-free concurrent programs only, where threadsare properly synchronized, we may consider the interleavingat certain synchronization points only. It has been a folkloretheorem that DRF programs in interleaving semantics be-have the same as in non-preemptive semantics. For instance,the following program (2.2) uses a lock to synchronize theaccesses of the shared variables, and it is race-free. Its behav-iors are the same as those when the threads yield controls atlock() and unlock() only. That is, we can view lines 1-2and lines 4-5 in either thread as sequential code that willnot be interfered by the other. The interactions occur onlyat the boundaries of the critical regions.

1 r1 = 1;2 r1 = r1 + 1;3 lock();4 x = 1;5 y = x + 1;6 unlock();

r2 = 2;r2 = r2 + 1;lock();

x = 2;y = x + 1;

unlock();

(2.2)

Intuitively, we can use Compositional CompCert to com-pile the program, where the code segment between twoconsecutive switch points is compiled as sequential code.By viewing the switch points as special external functioncalls, the simulation -′ can be applied to relate the non-preemptive executions of the source and target modules.

2.3 Challenges and Our ApproachesAlthough the idea of applying Compositional CompCertto compile DRF programs is plausible, we have to addressseveral key challenges to really implement it.

Q: How to give language-independent formulations ofDRF and non-preemptive semantics? The interaction se-mantics introduced in Compositional CompCert describes

3

Page 4: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

modules’ interactions without referring to the concrete lan-guages used to implement the modules. This allows compo-sition of modules implemented in different languages. Wewould like to follow the semantics framework, but how dowe define DRF and non-preemptive semantics if we do noteven know the concrete synchronization constructs and thecommands that access memory?

A: We extend the module-local semantics in Composi-tional CompCert so that each local step of a module reportsits footprints, i.e. the memory locations it accesses. Instead ofrelying on the concrete memory-access commands to definewhat valid footprints are, we introduce the notion of well-defined languages (in Sec. 3) to specify the requirements overthe state transitions and the related footprints. For instance,we require the behavior of each step is affected by the readset only, and each step does not touch the memory outsideof the write set. When we instantiate the framework withreal languages, we prove they satisfy these requirements.

Besides, we also allow module-local steps to generate mes-sages EntAtom and ExtAtom to indicate the boundary ofthe atomic operations inside the module. The concrete com-mands that generate these messages are unspecified, whichcan be different in different modules.

Q: What memory model to use in the proofs? The choiceof memory models could greatly affect the complexity ofproofs. For instance, using the samememory model as Comp-Cert allows us to reuse CompCert proofs, but it also causesmany problems. The CompCert memory model records thenext available block number nextblock for memory allo-cation. Using the model under a concurrent setting mayask all threads to share one nextblock. Then allocationin one thread would affect the subsequent allocations inother threads. This breaks the property that re-ordering non-conflicting operations from different threads would not affectthe final states, which is a key lemma we rely on to provethe equivalence between preemptive and non-preemptive se-mantics for DRF programs. In addition, sharing the nextblockby all threads also means we have to keep track of the owner-ship of each allocated block when we reason about footprints.

A:We decide to use a different memory model. We reserveseparate address spaces F for memory allocation in differ-ent threads (see Sec. 3). Therefore allocation of one threadwould not affect behaviors of others. This greatly simplifiesthe semantics and the proofs, but also makes it almost im-possible to reuse CompCert proofs (see Sec. 7.2). We addressthis problem by establishing some semantics equivalence be-tween our memory model and the CompCert memory model(shown in Sec. 7.2).

Q:How to compositionally proveDRF-preservation? Thesimulation -′ in Compositional CompCert cannot ensureDRF-preservation. As we have explained, DRF is a whole-program property, and so is DRF-preservation. To support

separate compilation, we need to reduce DRF-preservationto some thread-local property.

A: We propose a new compositional simulation 4 (seeSec. 4). Based on the simulation in Fig. 1(d), we additionallyrequire footprint consistency saying that the target C shouldhave the same or smaller footprints than the source S duringrelated transitions. For instance, when compiling lines 4-5of the left thread in (2.2), the target is only allowed to readx and write to x and y. Note that we check footprint con-sistency at switch points only. This way we allow compileroptimizations as long as they do not go beyond the switchpoints. For lines 4-5 of the left thread in (2.2), the targetcould be y=2;x=1 where the writes to x and y are swapped.

Q: Can we flip refinement/simulation in spite of non-determinism? As we explained, the last steps of CompCertand Compositional CompCert in Fig. 1 derive semanticsequivalence ≈ (or the upward refinement ⊒) from the down-ward refinement ⊑ using determinism of target programs.Actually the simulations - and -′ can also be flipped if thetarget programs are deterministic. It is unclear if the refine-ment or simulation can still be flipped in concurrent settingswhere programs have non-deterministic behaviors. The prob-lem is that the target program can be more fine-grained andhave more non-deterministic interleavings than the source.

A: With non-preemptive semantics, the non-determinismoccurs only at certain switch points. Then, in our simulation,we require the source and target to switch at the same timeand to the same thread. As a result, although the switchingstep is still non-deterministic, there exists a one-to-one cor-respondence between the switching steps of the source andof the target. Thus such non-determinism will not affect theflip of our simulation.

Q: How to support benign races and relaxed machinemodels? It is difficult to write useful DRF programs withinsequential languages (e.g., CompCert Clight) because theydo not provide synchronization primitives (e.g., locks). Nev-ertheless, we can implement synchronization primitives asexternal modules so that the threads written in the sequen-tial languages can call functions (e.g., lock-acquire and lock-release) in the external modules. In practice, the efficientimplementation πo of the synchronization primitives may bewritten directly in assembly languages, and may introducebenign races when used by multiple threads simultaneously(e.g., see Fig. 10 for such an efficient implementation of spinlocks). We may view that there is a separate compiler trans-forming the abstract primitives γo to their implementationsπo . But how do we prove the compilation correctness in thecase when the target code may contain races?

In addition, our previous discussions all assumed that thesource and target programs have sequentially consistent (SC)behaviors. But real-world target machines commonly userelaxed semantics. Although most relaxed models have DRFguarantees saying that DRF programs have SC behaviors in

4

Page 5: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

S1 ∥ ... ∥ Sn C1 ∥ ... ∥ Cn≈

S1 | ... | Sn C1 | ... | Cn≈

S1 | ... | Sn C1 | ... | Cn<

S1 | ... | Sn C1 | ... | Cn4

∀i . R,G ⊢ Si 4 Ci

DRF(C1 ∥ ... ∥ Cn )

NPDRF(C1 | ... | Cn )

NPDRF(S1 | ... | Sn )

DRF(S1 ∥ ... ∥ Sn )

≈1 ≈

8

6

2

5

4 ∀i . Det(Ci )

37

Figure 2. Our basic framework

Pdef= let {γ1, . . . ,γm,γo } in f1 ∥ . . . ∥ fn

Pscdef= let {πsc

1 , . . . , πscm ,γo } in f1 ∥ . . . ∥ fn

Prmmdef= let {πrmm

1 , . . . , πrmmm , π

rmmo } in f1 ∥ . . . ∥ fn

P

Psc

Prmm

1 ∀i . SeqCompi (γi ) = πi∧ Correct(SeqCompi )∧ DRF(P)

πrmmo 4o γo

∧ DRF(Psc)

23

Figure 3. The extended framework

the relaxed semantics, there are no such guarantees for racyprograms. It is unclear whether the compilation is still correctwhen the target code (which may have benign races dueto the use of efficient implementations of synchronizationprimitives) uses relaxed semantics.

A: Although the target assembly program using πo maycontain races, we do know that the assembly program usingγo is DRF if the source program using γo is DRF and thecompilation is DRF-preserving. We propose a compositionalsimulation πo 4o γo , which ensures a strengthened DRF-guarantee theorem for the x86-TSO model. It says, the x86-TSO program using πo behaves the same as the programusing γo in the SC semantics if the latter is DRF.

2.4 Frameworks and Key Semantics ComponentsFigure 2 shows the semantics components and proof steps inour basic framework for DRF and SC target code. The goal ofour compilation correctness proof is to show the semanticspreservation (i.e., S1∥. . .∥Sn ≈ C1∥. . .∥Cn at the top of thefigure). This follows the correctness of separate compilationof each module, formulated as R,G ⊢ Si 4 Ci (the bottomleft), which is our new footprint-preserving module-localsimulation. We do the proofs in the following steps. Doublearrows in the figure are logical implications.

First, we restrict the compilation to source preemptivecode that is race-free (i.e., DRF(S1 ∥ . . . ∥ Sn) at the rightbottom of the figure). Then from the equivalence betweenpreemptive and non-preemptive semantics, we derive 1 , theequivalence between S1 ∥ . . . ∥ Sn and S1 | . . . | Sn , the lat-ter representing non-preemptive execution of the threads.Similarly, if we have DRF(C1 ∥ . . . ∥ Cn) (at the top right),we can derive 2 . With 1 and 2 , we can derive the seman-tics preservation ≈ between preemptive programs from ≈

between their non-preemptive counterparts.Second, DRF of the target programs is obtained through

the path 6 , 7 and 8 . We define a notion of DRF for non-preemptive programs (called NPDRF), making it equivalentto DRF, fromwhichwe derive 6 and 8 . To knowNPDRF(C1 |

. . . |Cn) fromNPDRF(S1 | . . . |Sn), we need a DRF-preservingsimulation 4 between non-preemptive programs (see 7 ).Third, by composing our footprint-preserving local simu-

lation, the DRF-preserving simulation4 for non-preemptivewhole programs can be derived (step 5 ). Given the down-ward whole-program simulation, we flip it to get an upwardone (step 4 ), with the extra premise that the local executionin each target module is deterministic. Using the simulationin both directions we derive the equivalence (step 3 ).

The extended framework. Figure 3 shows our ideas foradapting the results of Fig. 2 to relaxed target models and toallow the target programs to contain confined benign races.The goal of the compilation correctness proof is still to showthe refinement P ⊒ Prmm. Here the source P contains a li-brary module γo of the abstract synchronization primitives,as well as normal client modules γ1, . . . ,γm . Threads in Pcan call functions in these modules. In the target code Prmm,each client module γi is compiled to πi using a sequentialcompiler SeqCompi . The library module γo is implementedas πo , which may contain benign races. The superscript rmmindicates the use of relaxed memory models.

To prove the refinement, we first apply the results of Fig. 2and prove P ⊒ Psc assuming DRF(P) and correctness of eachSeqCompi (step 1 in Fig. 3). Here the superscript sc meansthe use of SC assembly semantics. Each client module πi inPsc has the same code as in Prmm but uses the SC semantics.Note that Psc still uses the abstract library γo rather than itsracy implementation πo . We can view that at this step thelibrary is compiled by an identity transformation.

Figure 2 also showsDRF preservation, sowe knowDRF(Psc)

givenDRF(P) (step 2 in Fig. 3). The last step 3 is our strength-ened DRF-guarantee theorem. We require a compositionalsimulation relation π rmm

o 4o γo holds, saying that πo in therelaxed semantics indeed implements γo .

The approach not only supports racy implementations ofsynchronization primitives, but also applies in more generalcases when πo is a racy implementation of a general concur-rent object such as a stack or a queue. For instance, πo couldbe the Treiber stack implementation [30], and then γo could

5

Page 6: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

(Entry) f ∈ String(Prog) P, P ::= let Π in f1 ∥ . . . ∥ fn(GEnv) ge ∈ Addr⇀fin Val(MdSet) Π, Γ ::= {(tl1, ge1, π1), . . . , (tlm, gem, πm )}

(Lang) tl, sl ::= (Module, Core, InitCore, 7−−→)

(Module) π ,γ ::= . . .

(Core) κ, k ::= . . .

InitCore ∈ Module → Entry⇀ Core7−−→ ∈ FList × (Core × State) →

P((Msg × FtPrt) × ((Core × State) ∪ abort))

(ThrdID) t ∈ N

(Addr) l ::= . . .

(Val) v ::= l | . . .

(FList) F , F ∈ Pω (Addr)(State) σ , Σ ∈ Addr⇀fin Val(FtPrt) δ ,∆ ::= (rs,ws) where rs,ws ∈ P(Addr)(Msg) ι ::= τ | e | ret | EntAtom | ExtAtom

(Event) e ::= . . .

(Config) ϕ,Φ ::= (κ,σ ) | abort

Figure 4. The abstract concurrent language

be an atomic abstract stack. Then π rmmo 4o γo can be viewed

as a correctness criterion of the object implementation in arelaxed model. It ensures a kind of “contextual refinement”,i.e., any client threads using πo (in relaxed semantics) gener-ate no more observable behaviors than using γo instead (inSC semantics), as long as the clients using γo is DRF.Although we expect the approach is general enough to

work for different relaxed machine models, so far we haveonly proved the result for x86-TSO, as shown in Sec. 7.3.Note that the notations used here are simplified ones to

give a semi-formal overview of the key ideas. We may usedifferent notations in the formal development below.

3 The Language and Semantics3.1 The Abstract LanguageFigure 4 shows the syntax and the state model of an abstractlanguage for preemptive concurrent programming. A pro-gram P consists of n threads, each with an entry f that labelsa code segment in a module in Π. In high-level languagessuch as Clight, f is usually the name of a function in a mod-ule. A module declaration in Π is a triple consisting of thelanguage declaration tl, the global environment ge, and thecode π . Here ge contains the global variables declared in themodule. It is a finite partial mapping from a global variable’saddress to its initial value.The abstract module language tl is defined as a tuple

(Module,Core, InitCore, 7−−→), whose components can be in-stantiated for different concrete languages.Module describesthe syntax of programs. As in Compositional CompCert [29],Core is the set of internal “core” states κ, such as controlcontinuations or register files. The function InitCore returnsan initial “core” state κ whenever a thread is created or an

Figure 5. The state model

external function call is made. In this paper we mainly focuson threads as different modules, and omit the external callsto simplify the presentation.We do support external calls inour Coq implementation in the same way as in CompositionalCompCert. The labelled transition “ 7−−→” models the local ex-ecution of a module, which we explain below. We use P(S)to represent the powerset of the set S .Module-local semantics. The local execution step inside amodule is in the form of F ⊢ (κ,σ )

ι7−−→δ

(κ ′,σ ′). The globalmemory state σ is a finite partial mapping from memoryaddresses to values.2 Each step is also labeled with a messageι and a footprint δ .

Each module also has a free list F , an infinite set of mem-ory addresses. It models the preserved space for allocatinglocal stack frames. Initially we require F ∩ dom(σ ) = ∅, andthe only memory accessible by the module is the staticallyallocated global variables declared in the ge of all modules,represented as the shared part S in Fig. 5. The local execu-tion of a module may allocate memory from its F ,3 whichenlarges the state σ . So “F − dom(σ )” is the set of free ad-dresses, depicted in Fig. 5 as the part outside of the boundaryof σ . The memory allocated from F is exclusively owned bythis module. F for different modules must be disjoint (see theLoad rule in Fig. 7).The messages ι contain information about the module-

local steps. Here we only consider externally observableevents e (such as outputs), termination of threads (ret), andthe beginning and the end of atomic blocks (EntAtom andExtAtom). Any other steps are silent, labeled with a specialsymbol τ . The label τ is often omitted for cleaner presenta-tion. Atomic blocks are the language constructs to ensuresequential execution of code blocks inside them. They canbe implemented differently in different module languages.The messages define the protocols of communications withthe global whole-program semantics (presented below).

2In our Coq implementation, we use the more concrete CompCert’s block-based memory model, where memory addresses l are instantiated as pairsof block IDs and offsets. This allows us to reuse the CompCert code.3The readers should not confuse the allocation from F with dynamic heapmemory allocation like malloc. The former is for allocation of stack framesonly. For the latter, we assume there is a designated module implementingmalloc, whose free memory blocks are pointed to by a global variable inits ge. Therefore, these memory blocks are in S in Fig. 5, but not in F .

6

Page 7: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

The footprint δ is a pair (rs,ws) of the read set and writeset of memory locations in this step.4 We write emp for thefootprint where both sets are empty.

Note we use two sets of symbols to distinguish the sourceand the target level notations. Symbols such as P, F, k, Σ, Γand γ are used for the source. Their counterparts, P , F , κ, σ ,Π and π , are for the target.Well-defined languages. Although the abstractmodule lan-guage tl can be instantiated with different real languages,the instantiation needs to satisfy certain basic requirements.We formulate these requirements in Def. 1. It gives us anextensional interpretation of footprints.Definition 1 (Well-Defined Languages). wd(tl) iff , for anyexecution step F ⊢ (κ,σ )

ι7−→δ

(κ ′,σ ′) in this language, all of thefollowing hold (some auxiliary definitions are in Fig. 6):(1) forward(σ ,σ ′);(2) LEffect(σ ,σ ′, δ , F );(3) For any σ1, if LEqPre(σ ,σ1, δ , F ), then there exists σ ′

1such that F ⊢ (κ,σ1)

ι7−→δ

(κ ′,σ ′1) and LEqPost(σ ′,σ ′

1, δ , F ).

(4) Let δ0 =⋃{δ | ∃κ ′,σ ′. F ⊢ (κ,σ )

τ7−→δ

(κ ′,σ ′)}. For anyσ1, if LEqPre(σ ,σ1, δ0, F ), then for any κ ′1,σ

′1, ι1, δ1,

F ⊢ (κ,σ1)ι17−→δ1

(κ ′1,σ′1) =⇒ ∃σ ′. F ⊢ (κ,σ )

ι17−→δ1

(κ ′1,σ′).

It requires that a step may enlarge the memory domain butcannot reduce it (Item (1)), and the additional memory shouldbe allocated from F and included in the write set (Item (2)).Item (2) also requires that the memory out of the write setshould keep unchanged. Item (3) says that the memory up-dates and allocation only depend on the memory content inthe read set, the availability of the memory cells in the writeset, and the set of memory locations already allocated fromF . Item (4) requires that the non-determinism of each step isnot affected by memory contents outside of all the possibleread sets in δ0. Here we do not relate σ ′

1 and σ′, which can

be derived from Item (3).We have proved in Coq that some real languages satisfy

wd, including Clight, Cminor, and x86 assembly [13].

3.2 The Global Preemptive SemanticsFigure 7 shows the global states and selected global semanticsrules to model the interaction between modules. The worldW consists of the thread pool T , the ID t of the currentthread, a bit d indicating whether the current thread is inan atomic block or not, and the memory state σ . The threadpool T maps a thread ID to a triple recording the modulelanguage tl, the free list F , and the current core state κ.5

4In our Coq code, a footprint contains two additional fields cmps and freesfor operations that observe and modify memory permissions. They allowfootprints to capture memory operations more precisely, but they are or-thogonal to our main ideas for supporting concurrency and hence mergedinto rs and ws in the paper to simplify the presentation.5As we mentioned, our Coq implementation supports external functioncalls, so T actually maps a thread ID to a stack of triples (tl, F , κ). The

σrs==== σ ′ iff ∀l ∈ rs. l < (dom(σ )∪dom(σ ′)) ∨

l ∈ (dom(σ )∩dom(σ ′)) ∧ σ (l)=σ ′(l)δ ⊆ δ ′ iff (δ .rs ⊆ δ ′.rs) ∧ (δ .ws ⊆ δ ′.ws)

δ ∪ δ ′def= (δ .rs ∪ δ ′.rs, δ .ws ∪ δ ′.ws)

forward(σ ,σ ′) iff (dom(σ ) ⊆ dom(σ ′))

LEqPre(σ1,σ2, δ , F ) iffσ1

δ .rs====== σ2 ∧ (dom(σ1) ∩ δ .ws) = (dom(σ2) ∩ δ .ws)

∧(dom(σ1) ∩ F ) = (dom(σ2) ∩ F )LEqPost(σ1,σ2, δ , F ) iff

σ1δ .ws====== σ2 ∧ (dom(σ1) ∩ F ) = (dom(σ2) ∩ F )

LEffect(σ1,σ2, δ , F ) iff

σ1dom(σ1)−δ .ws============== σ2 ∧ (dom(σ2)−dom(σ1)) ⊆ (δ .ws ∩ F )

Figure 6. Auxiliary definitions about states and footprints

The Load rule in Fig. 7 shows the initialization of the worldfrom the program. The memory σ is initialized as GE(Π), theunion of ge from all the modules. The union is defined onlyif all the ge’s are compatible. The rule also requires that σcontain no wild pointers. This requirement is formalized asclosed(σ ) in Fig. 7, which says the addresses stored in σ mustbe also in dom(σ ).Global transitions ofW are also labeled with footprints

δ and messages o. Each global step executes the modulelocally and processes the message of the local transition. TheEntAt and ExtAt rules correspond to the entry and exit ofatomic blocks, respectively. The flag d is flipped in the twosteps. Since context-switch can be done only when d is 0, asrequired by the Switch rule below, we know a thread in itsatomic block cannot be preempted. The Switch rule showsthat context-switch can occur at any program point outsideof atomic blocks (d = 0).Below we write F ⊢ ϕ

τ7−−→δ

∗ϕ ′ for zero or multiple silentsteps, where δ is the accumulation of the footprint of eachstep. F ⊢ ϕ

τ7−−→δ+ϕ ′ represents at least one step. Similar

notations are used for global steps. AlsoW =⇒∗W ′ is for zeroor multiple steps that either are silent or produce sw events.Event-trace refinement and equivalence. An externallyobservable event trace B is a finite or infinite sequence ofexternal events e , and may end with a termination markerdone or an abortion marker abort. Following the definitionin CompCert, we use P ⊑ P to represent the event-tracerefinement, and P ≈ P for equivalence.

3.3 The Non-Preemptive SemanticsA key step in our framework is to reduce the semanticspreservation under the preemptive semantics to the seman-tics preservation in non-preemptive semantics. We writelet Π in f1 | . . . | fn or P for the programwith non-preemptivesemantics, to distinguish it from the preemptive concurrency.

Coq code also contains additional semantics rules for external calls. Theformalization reuses the definitions in Compositional CompCert.

7

Page 8: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

(World) W ,W ::= (T , t,d,σ ) (AtomBit) d ::= 0 | 1(NPWorld) W , W ::= (T , t, 𝕕,σ ) (GMsg) o ::= τ | e | sw(AtomBits) 𝕕 ::= {t1 ; d1, . . . , tn ; dn }(ThrdPool) T ,T ::= {t1; (tl1, F1,κ1), . . . , tn ; (tl1, Fn,κn )}

GE({(tl1, ge1, π1), . . . , (tlm, gem, πm )

})

def=

m⋃i=1

gei , if ∀i, j . geidom(gei )∩dom(gej )================== gej

undefined, otherwiseclosed(S,σ ) iff ∀l, l ′. l ∈S ∧ l ′=σ (l) =⇒ l ′ ∈Sclosed(σ ) iff closed(dom(σ ),σ )

for all i and j in {1, . . . ,n}, and i , j:Fi∩Fj = ∅ dom(σ )∩Fi = ∅

tli .InitCore(πi , fi ) = κi , where (tli , gei , πi ) ∈ ΠT = {1; (tl1, F1,κ1), . . . ,n; (tln, Fn,κn )}t ∈ dom(T ) σ = GE(Π) closed(σ )

let Π in f1 ∥ . . . ∥ fnload===⇒ (T , t, 0,σ )

Load

T (t) = (tl, F ,κ) F ⊢ (κ,σ )τ7−→δ

(κ ′,σ ′)

(T , t,d,σ )τ=⇒δ

(T {t ; (tl, F ,κ ′)}, t,d,σ ′)

τ -step

T (t) = (tl, F ,κ) F ⊢ (κ,σ )EntAtom7−−−−−−−→

emp(κ ′,σ )

(T , t, 0,σ )τ==⇒emp

(T {t ; (tl, F ,κ ′)}, t, 1,σ )EntAt

T (t) = (tl, F ,κ) F ⊢ (κ,σ )ExtAtom7−−−−−−−→

emp(κ ′,σ )

(T , t, 1,σ )τ==⇒emp

(T {t ; (tl, F ,κ ′)}, t, 0,σ )ExtAt

t′ ∈ dom(T )

(T , t, 0,σ )sw==⇒emp

(T , t′, 0,σ )Switch

T (t) = (tl, F ,κ) 𝕕(t) = 0 t′ ∈ dom(T )

F ⊢ (κ,σ )EntAtom7−−−−−−−→

emp(κ ′,σ ) T ′ = T {t; (tl, F ,κ ′)}

(T , t, 𝕕,σ ) :sw==⇒emp

(T ′, t′, 𝕕{t ; 1},σ )EntAtnp

T (t) = (tl, F ,κ) 𝕕(t) = 1 t′ ∈ dom(T )

F ⊢ (κ,σ )ExtAtom7−−−−−−−→

emp(κ ′,σ ) T ′ = T {t; (tl, F ,κ ′)}

(T , t, 𝕕,σ ) :sw==⇒emp

(T ′, t′, 𝕕{t ; 0},σ )ExtAtnp

Figure 7. Preemptive and non-preemptive global semantics

The global world W is defined similarly as the preemptiveworldW , except thatW keeps an atomic bit map 𝕕 recordingwhether each thread’s next step is inside an atomic block.We need to record the atomic bits of all threads becausethe context switch may occur when a thread just enters anatomic block.The last two rules in Fig. 7 define the non-preemptive

global stepsW :o=⇒δW ′. More rules are shown in Appendix A.

There is no rule like Switch of the preemptive semantics,

since context-switch occurs only at synchronization points.EntAtnp and ExtAtnp execute one step of the current threadt, and then non-deterministically switch to a thread t′. Thecorresponding global steps produce the sw events.

4 The Footprint-Preserving SimulationIn this section, we define a module-local simulation as thecorrectness obligation of each module’s compilation, whichis compositional and preserves footprints, allowing us toderive a whole-program simulation that preserves DRF.

Footprint consistency. As in CompCert, the simulation re-quires that the source and the target generate the same ex-ternal events. In addition, it also requires that the target hasthe same or smaller footprints than the source, which is im-portant to ensure DRF-preservation. Recall that the memoryaccessible by a thread ti consists of two parts, the sharedmemory S and the local memory allocated from Fi , as shownin Fig. 5. DRF informally requires that the threads never haveconflicting accesses to the memory in S at the same time.

We introduce the triple µ below to record the key informa-tion about the shared memory at the source and the target.

µdef= (S, S, f ), where S, S ∈P(Addr) and f ∈Addr⇀Addr.

Here S and S specify the shared memory locations at thesource and the target respectively. The partial mapping fmaps locations at the source level to those at the target. Werequire µ to be well-formed, defined as wf(µ) in Fig. 8.

Then, given footprints∆ and δ , we define their consistencywith respect to µ as FPmatch(µ,∆, δ ) in Fig. 8. It says theshared locations in δ must be contained in ∆, modulo themapping µ . f . We only consider the shared locations in µ .Sbecause accesses of local memory would not cause races. Theshared locations in δ .rs of the target module are allowed tocome from ∆.ws of the source, since transforming a write toa read would not introduce more races.Rely/guarantee conditions. Weuse rely and guarantee con-ditions to specify interaction between modules. They need toenforce the view of accessibility of shared and local memoryin Fig. 5. More specifically, amodule expects others to keep itslocal memory (in F) intact. In addition, although other mod-ules may update the shared memory S, they must preservecertain properties of S. One such property is closed(S, Σ)(defined in Fig. 7), which ensures S cannot contain memorypointers pointing to local memory cells in any Fi . Otherwisea thread tj can update the memory in Fi by tracing thesepointers. 6 Also the invariant Inv should be preserved, whichrelates the contents of the corresponding memory locations6We disallow cross-module escape of pointers pointing to stack-allocatedvariables. (We still allow the escape within a module and the transfer ofdynamically allocated heap data structures.) Appendix E presents the frame-work with the support of stack pointer escape. Adding the support looksrelatively orthogonal to our main ideas for supporting concurrency, andwe can follow the approach of Compositional CompCert (the part for stackpointer escape).

8

Page 9: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

f {{S}}def= {l ′ | ∃l . l ∈S ∧ f (l)=l ′}

f |Sdef= {(l, f (l)) | l ∈ (S ∩ dom(f ))}

wf(µ) iff injective(µ . f ) ∧ dom(µ . f ) = µ .S ∧ µ . f {{µ .S}} = µ .S

FPmatch(µ,∆, δ ) iff (δ .rs ∩ µ .S ⊆ µ . f {{∆.rs ∪ ∆.ws}})∧ (δ .ws ∩ µ .S ⊆ µ . f {{∆.ws}})

f (v)def=

v, if v < Addrf (v), if v ∈ Addr ∧v ∈ dom(f )

undefined, otherwiseInv(f , Σ,σ ) iff ∀l, l ′. l ∈ dom(Σ) ∧ f (l) = l ′

=⇒ l ′ ∈ dom(σ ) ∧ f (Σ(l)) = σ (l ′)

HG(∆, Σ, F, S) iff ∆ ⊆ (F∪S) ∧ closed(S, Σ)LG(µ, (δ ,σ , F ), (∆, Σ)) iff δ ⊆ (F∪µ .S) ∧ closed(µ .S,σ )

∧FPmatch(µ,∆, δ ) ∧ Inv(µ . f , Σ,σ )

R(Σ, Σ′, F, S) iff (ΣF=== Σ′) ∧ closed(S, Σ′) ∧ forward(Σ, Σ′)

Rely(µ, (Σ, Σ′, F), (σ ,σ ′, F )) iff R(Σ, Σ′, F, µ .S) ∧ R(σ ,σ ′, F , µ .S)∧ Inv(µ . f , Σ′,σ ′)

⌊φ⌋(ge) def=

{(φ(l), φ(v)) | (l,v) ∈ ge},

if (dom(ge) ∪ (range(ge) ∩ Addr)) ⊆ dom(φ)

undefined, otherwiseinitM(φ, ge, Σ,σ ) iff ge⊆Σ ∧ closed(Σ)

∧ dom(σ )=φ{{dom(Σ)}} ∧ Inv(φ, Σ,σ )

Figure 8. Footprint matching and rely/guarantee conditions

in Σ and σ . It expresses the same thing as memory injectionin CompCert [6]. We encode these requirements in the relycondition Rely in Fig. 8, and define the guarantee conditionsHG and LG correspondingly.

The simulation. Below we define (sl, ge,γ ) 4φ (tl, ge′, π )to relate the non-preemptive executions of the source module(sl, ge,γ ) and the target one (tl, ge′, π ). The injective functionφ maps source addresses to the target ones.

Definition 2 (Module-Local Downward Simulation).(sl, ge,γ ) 4φ (tl, ge′, π ) iff1. ⌊φ⌋(ge)=ge′; and2. for all f, k, Σ, σ , F, F , and µ = (dom(Σ), dom(σ ),φ |dom(Σ)),

if sl.InitCore(γ , f)=k, F∩dom(Σ) = F∩dom(σ ) = ∅, andinitM(φ, ge, Σ,σ ), then there exist i ∈ index and κ suchthat tl.InitCore(π , f) = κ, and

(F, (k, Σ), emp) 4iµ (F , (κ,σ ), emp),

where (F, (k, Σ),∆) 4iµ (F , (κ,σ ), δ ) is defined in Def. 3.

It says that, starting from some core states k and κ, withany states (Σ and σ ) and free lists (F and F ) satisfying someinitial constraints, we have the simulation (F, (k, Σ), emp) 4i

µ(F , (κ,σ ), emp) defined in Def. 3. Here ge and ge′ must berelated through ⌊φ⌋ (see Fig. 8). The initial states Σ and σalso need to be related with ge and φ through initM.

Definition 3. (F, (k, Σ),∆0) 4iµ (F , (κ,σ ), δ0) is the largest

relation such that, whenever (F, (k, Σ),∆0) 4iµ (F , (κ,σ ), δ0),

then the following are true:

1. for all k′, Σ′ and ∆, if F ⊢ (k, Σ) τ7−→∆

(k′, Σ′) and (∆0∪∆) ⊆

(F∪µ .S), then one of the following holds:a. ∃j < i . (F, (k′, Σ′),∆0∪∆) 4

jµ (F , (κ,σ ), δ0), or

b. there exist κ ′, σ ′, δ and j such that:i. F ⊢ (κ,σ )

τ7−→δ+(κ ′,σ ′);

ii. (δ0∪δ ) ⊆ (F∪µ .S) and FPmatch(µ,∆0∪∆, δ0∪δ ); andiii. (F, (k′, Σ′),∆0∪∆) 4

jµ (F , (κ ′,σ ′), δ0∪δ ).

2. for all k′ and ι, if F ⊢ (k, Σ) ι7−→emp

(k′, Σ), ι , τ , andHG(∆0, Σ, F, µ .S), there exist κ ′, δ , σ ′ and κ ′′ such that:a. F ⊢ (κ,σ )

τ7−→δ∗(κ ′,σ ′), and F ⊢ (κ ′,σ ′)

ι7−→emp

(κ ′′,σ ′), andb. LG(µ, (δ0∪δ ,σ ′, F ), (∆0, Σ)), andc. for all σ ′′ and Σ′, if Rely(µ, (Σ, Σ′, F), (σ ′,σ ′′, F )), then

there exists j such that(F, (k′, Σ′), emp) 4j

µ (F , (κ ′′,σ ′′), emp).

The simulation (F, (k, Σ),∆0) 4iµ (F , (κ,σ ), δ0) carries ∆0

and δ0, the footprints accumulated at the source and thetarget, respectively. The definition follows the diagram inFig. 1(d). For every τ -step in the source (case 1), if the newlygenerated footprints and the accumulated ∆0 are in scope (i.e.every location must either be from the freelist space F of cur-rent thread, or from the shared memory µ .S), then the stepcorresponds to zero or multiple τ -steps in the target, and thesimulation holds over the resulting states with the accumu-lated footprints and a new index j . We carry the well-foundedindex to ensure the simulation preserves termination.

If the source step corresponds to at least one target steps(case 1-b), the footprints at the target must also be in scopeand be consistent with the source level footprints. The ac-cumulation of footprints allows us to establish FPmatch forcompiler optimizations that reorder the instructions.At the switch points when the source generates a non-

silent message ι (case 2), if the footprints and states satisfythe high-level guarantee HG, the target must be able to gen-erate the same ι, and the accumulated footprints and thestate satisfy the low-level guarantee LG. We also need toconsider the interaction with other modules or threads. Forany environment steps satisfying Rely, the simulation musthold over the new states, with some index j and empty foot-prints — Since the effects of the current thread have beenmade visible to the environments at the switch point, we canclear the accumulated footprints. Rely and the guaranteesHG and LG are defined in Fig. 8, as explained before.Note that each case in Def. 3 has prerequisites about the

source level footprints (e.g. the footprints are in scope or sat-isfy HG). We need to prove that these requirements indeedhold at the source level, to make the simulation meaningfulinstead of being vacuously true. We formalize these require-ments separately as ReachClose in Def. 4. It is a simplifiedversion of the reach-close concept by Stewart et al. [29] (sim-plified because we disallow the escape of local stack pointers

9

Page 10: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

into the shared memory). The compilation correctness as-sumes that all the source modules satisfy ReachClose (seeLem. 6 and Def. 11 below).

Definition 4 (Reach Closed Module). ReachClose(sl, ge,γ )iff , for all f, k, Σ, F and S, if sl.InitCore(γ , f)=k, S=dom(Σ),ge ⊆ Σ, F∩S = ∅, and closed(S, Σ), then RC(F, S, (k, Σ)).

Here RC is defined as the largest relation such that, when-ever RC(F, S, (k, Σ)), then for all Σ′ such that R(Σ, Σ′, F, S),and for all k′, Σ′, Σ′′, ι and ∆ such that F ⊢ (k, Σ′)

ι7−→∆

(k′, Σ′′),we have HG(∆, Σ′′, F, S), and RC(F, S, (k′, Σ′′)).

The relation RC(F, S, (k, Σ)) essentially says during everystep of the execution of (k, Σ), HG always holds over theresulting footprints ∆ and states, even with possible inter-ference from the environment, as long as the environmentsteps satisfy the rely condition R defined in Fig. 8.

Our simulation is transitive. One can decompose thewholecompiler correctness proofs into proofs for individual com-pilation passes.

Lemma 5 (Transitivity). ∀sl, sl′, tl,γ ,γ ′, π .if (sl, ge,γ ) 4φ (sl′, ge′,γ ′) and (sl′, ge′,γ ′) 4φ ′ (tl, ge′′, π ),then (sl, ge,γ ) 4φ ′◦φ (tl, ge′′, π ).

The proof of Lem. 5 relies on the auxiliary lemmas similarto “memory interpolations” in Compostional CompCert.

Lemma 6 (Compositionality, 5 in Fig. 2).For any f1, . . . , fn , φ, Γ = {(sl1, ge1,γ1), . . . , (slm, gem,γm )}, andΠ = {(tl1, ge′1, π1), . . . , (tlm, ge

′m, πm )}, if

∀i ∈ {1, . . . ,m}.wd(sli ) ∧ wd(tli ) ∧ ReachClose(sli , gei ,γi )∧(sli , gei ,γi ) 4φ (tli , ge′i , πi ) ,

then let Γ in f1 | . . . | fn 4 let Π in f1 | . . . | fn .

Lemma 6 shows the compositionality of our simulation.The whole program downward simulation P 4 P relates thenon-preemptive execution of the whole source program Pand target program P . The definition is given in Appendix B.

With determinism of the target module language (writtenas det(tl)), we can flip P 4 P to derive the upward simula-tion P 6 P. We give the definition of det(tl) and the FlipLemma ( 4 in Fig. 2) in Appendix B. Lemma 7 shows thenon-preemptive global simulation ensures the refinement.

Lemma 7 (Soundness, 3 in Fig. 2). If P 6 P, then P ⊑ P.

5 Data-Race-FreedomBelow we first define the conflict of footprints.

δ1 ⌢ δ2 iff (δ1.ws∩δ2 , ∅) ∨ (δ2.ws∩δ1 , ∅)

(δ1,d1)⌢ (δ2,d2) iff (δ1 ⌢ δ2) ∧ (d1 = 0 ∨ d2 = 0)Recall that, when used as a set, δ represents δ .rs ∪ δ .ws.Since we do not treat accesses of the same memory locationinside atomic blocks as a race, we instrument a footprintδ with the atomic bit d to record whether the footprint isgenerated inside an atomic block (d = 1) or not (d = 0). Two

Pload===⇒W W =⇒∗W ′ W ′ Z=⇒ Race

P Z=⇒ Race

predict(W , t1, (δ1,d1)) predict(W , t2, (δ2,d2))t1 , t2 (δ1,d1)⌢ (δ2,d2)

W Z=⇒ RaceRace

W = (T , _, 0,σ ) T (t)= (F ,κ) F ⊢ (κ,σ )τ7−→δ

(κ ′,σ ′)

predict(W , t, (δ , 0)) Predict-0

W = (T , _, 0,σ ) T (t) = (F ,κ)

F ⊢(κ,σ )EntAtom7−−−−−−−→

emp(κ ′,σ ) F ⊢(κ ′,σ )

τ7−→δ∗(κ ′′,σ ′′)

predict(W , t, (δ , 1)) Predict-1

Figure 9. Data races in preemptive semantics

instrumented footprints (δ1,d1) and (δ2,d2) are conflicting ifδ1 and δ2 are conflicting and at least one of d1 and d2 is 0.

We define data races in Fig. 9 for preemptive semantics. Inthe Race rule,W steps to Race if there are conflicting foot-prints of two threads predicted from the current configura-tion through predict(W , t, (δ ,d)). Then we define DRF(P):

DRF(P) iff ¬(P Z=⇒ Race)

NPDRF(P) is defined similarly. We can prove NPDRF isequivalent to DRF ( 6 and 8 in Fig. 2). The following Lem. 8shows the simulation preserves NPDRF. Given the equiva-lence, we know it also preserves DRF.

Lemma 8 (NPDRF Preservation, 7 in Fig. 2).For any P, P , if P 6 P, and NPDRF(P), then NPDRF(P).

Lemma 9 (Semantics Equivalence, 1 and 2 in Fig. 2).For any Π, f1, . . . , fm , if DRF(let Π in f1 ∥ . . . ∥ fm), thenlet Π in f1 | . . . | fm ≈ let Π in f1 ∥ . . . ∥ fm .

6 The Final TheoremPutting all the previous results together, we are able to proveour final theorem in the basic framework (Fig. 2). We firstmodel a sequential compiler SeqComp as follows:SeqComp ::= (CodeT,φ) , where CodeT ∈ Module⇀ Module

As the key proof obligation, we need to verify that eachSeqComp isCorrect. The correctness is defined based on ourfootprint-preserving module-local simulation. Recall that⌊φ⌋ is defined in Fig. 8.

Definition 10 (Sequential Compiler Correctness).Correct(SeqComp, sl, tl) iff∀γ , π , ge, ge′. SeqComp.CodeT(γ )=π ∧ ⌊SeqComp.φ⌋(ge)=ge′

=⇒ (sl, ge,γ ) 4SeqComp.φ (tl, ge′, π ) .

The desired correctness GCorrect (Def. 11) of concurrentprogram compilation is the semantics preservation of wholeprograms, i.e., every target concurrent program is a refine-ment of the source. Here all the SeqCompmust agree on thetransformation φ of global environments (see item 1 below).

10

Page 11: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

Definition 11 (Concurrent Compiler Correctness).GCorrect((SeqComp1, sl1, tl1), . . . , (SeqCompm, slm, tlm)) ifffor any φ, f1, . . . , fn , Γ = {(sl1, ge1,γ1), . . . , (slm, gem,γm )}, andΠ = {(tl1, ge′1, π1), . . . , (tlm, ge

′m, πm )}, if

1. ∀i ∈ {1, . . . ,m}. (SeqCompi .CodeT(γi ) = πi ) ∧ injective(φ)∧ (SeqCompi .φ = φ) ∧ ⌊φ⌋(gei ) = ge′i ,

2. Safe(let Γ in f1 ∥ . . . ∥ fn ), and DRF(let Γ in f1 ∥ . . . ∥ fn ),3. ∀i ∈ {1, . . . ,m}. ReachClose(sli , gei ,γi ),

then let Π in f1 ∥ . . . ∥ fn ⊑ let Γ in f1 ∥ . . . ∥ fn .

Our final theorem is then formulated as Thm. 12. It saysif a set of sequential compilers are certified to satisfy ourcorrectness obligation Correct, the source and target lan-guages sli and tli are well-defined, and the target languagesare deterministic, then the sequential compilers as a wholeis GCorrect for compiling concurrent programs. The proofsimply applies the lemmas that correspond to 1 - 8 in Fig. 2.

Theorem 12 (Final Theorem).For any SeqComp1, . . . , SeqCompm , sl1, . . . , slm , tl1, . . . , tlmsuch that for any i ∈ {1, . . . ,m} we have wd(sli ), wd(tli ),det(tli ), and Correct(SeqCompi , sli , tli ), thenGCorrect((SeqComp1, sl1, tl1), . . . , (SeqCompm, slm, tlm)).

7 The CASCompCert CompilerWe apply our framework to develop CASCompCert. It usesCompCert-3.0.1 [6] for compilation of multi-threaded Clightprograms to x86-SC, i.e. x86 with SC semantics. We fur-ther extend the framework (as shown in Fig. 3) to supportx86-TSO [28] as the target language. The source program wecompile consists of multiple sequential Clight threads. Inter-thread synchronization can be achieved through externalcalls to an external module. Since the external synchroniza-tion module can be shared by the threads, below we alsorefer to it as an object and the Clight threads as clients.As a tiny example, Fig. 10(a) shows a spin-lock imple-

mentation in a simple imperative language which we callCImp. ⟨C⟩ is an atomic block, which cannot be interruptedby other threads. The EntAtom and ExtAtom events are gen-erated at the beginning and the end of the atomic blockrespectively. The command assert(B) aborts if B is false.This source implementation of locks serves as an abstractspecification, which is manually translated to x86-TSO, asshown in Fig. 10(b). The implementation is similar to theLinux spin-lock implementation (a.k.a. the TTAS lock [11]).To acquire the lock, the l_acq block reads and resets the lockbit to 0 atomically by the lock-prefixed cmpxchgl, whichensures mutual exclusion. The spin loop reads the lock bituntil it appears to be available (i.e., not 0). Note that the loadand store operations in the spin and unlock blocks are notlock-prefixed. This optimization introduces benign races.With the external lock module, we can implement a DRFcounter inc in Clight, as shown in Fig. 10(c). An examplewhole program P is let {γC,γlock} in inc() ∥ inc().

lock(){ r := 0; while(r==0){ <r:=[L]; [L]:=0;> } }unlock(){ < r := [L]; assert(r == 0); [L] := 1; > }

(a) lock specification γlocklock: movl $L, %ecx

movl $0, %edxl_acq: movl $1, %eax

lock cmpxchgl %edx, (%ecx)je enter

spin: movl (%ecx), %ebxcmp $0, %ebxje spinjmp l_acq

enter: retlunlock: movl $L, %eax

movl $1, (%eax)retl

(b) lock implementation πlock

void inc(){int32_t tmp;lock();

tmp = x;x ++;

unlock();print(tmp);

}

(c) a Clight client γC

Figure 10. Lock as an external module of Clight programs

As shown in Fig. 3, we compile the code in two steps. First,we use CompCert to compile the Clight client to x86-SC, butleave object code (e.g., γlock in Fig. 10) untouched. Second,we transform the resulting x86-SC client code to x86-TSO.Syntactically this is an identity transformation, but the se-mantics changes. Then we manually transform the sourceobject code to x86-TSO code. We can prove the resultingwhole program in x86-TSO preserves the source semanticsas long as the x86-TSO object code refines its source.

7.1 Language InstantiationsWe need to first instantiate our abstract languages of Fig. 4with Clight, x86-SC and the intermediate languages intro-duced in CompCert. We also instantiate it with the simplelanguage CImp for the source object code.

The Clight language. The Module in Fig. 4 is instantiatedwith the same Clight syntax as in CompCert. The core stateκ is a pair of a local state c and an index N indicating theposition of the next block in the freelist F to be allocated, asshown below. InitCore initializes N to 0. Local transitions ofClight are instrumented with footprints.

(FList) F ::= b1 ::b2 :: . . . (Block) b ∈ N+

(BIndex) N ∈ N (Core) κ ::= (c,N )

(Mem) σ ∈ Block⇀fin (N⇀ val)

The source language CImp for objects. The instantiationof the abstract language with CImp lets the atomic blocksgenerate the EntAtom and ExtAtom events. To allow be-nign races in the target x86-TSO code, we need to ensurepartition of the client data and the object data. This can beenforced through the permissions in the CompCert memorymodel. The client programs can only access memory loca-tions whose permission is not None. Therefore we set thepermission of the object data (the memory at the locationL in our example in Fig. 10) to None. Also, we require theCImp program can only access memory locations with None

11

Page 12: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

Clight C#minorCshmgen

CminorCminorgen

CminorSelSelection

RTLRTLgen

RTLTailcall, Renumber

LTLAllocation

LTLTunneling

LinearLinearize

LinearCleanupLabels

MachStacking

x86-SC assemblyAsmgen

Figure 11. Proved CompCert compilation passes

permission. It aborts if trying to access memory locationswhose permissions are not None .

7.2 Adapting CompCertGiven the program let {γ1, . . . ,γl ,γo} in f1 ∥ . . . ∥ fn con-sisting of Clight modules γi and the module γo (we omitsl and ge in the modules to simplify the presentation), thecompilation Comp is defined as

Comp(let {γ1, . . . ,γl ,γo } in f1 ∥ . . . ∥ fn )def=

let {CompCert(γ1), . . . , CompCert(γl ), IdTrans(γo ) } in f1 ∥ . . . ∥ fn

where CompCert is the adapted compilation consisting ofthe original CompCert passes, and IdTrans is the identitytranslation which returns the object module unchanged.

Lemma 13. Correct(CompCert, Clight, x86-SC).

We have proved Lem. 13, i.e. the original CompCert-3.0.1passes satisfy our Correct in Def. 10. The verified compi-lation passes (shown in Fig. 11) include all the translationpasses and four optimization passes (Tailcall, Renumber,Tunneling and CleanupLabels).7 Proving other optimiza-tion passes would be similar and is left as future work.

We also prove the well-definedness of Clight, x86-SC, andCImp, the determinism of x86-SC and CImp, and the correct-ness of IdTrans for CImp. Together with our framework’sfinal theorem (Thm. 12), we derive the following result:

Theorem 14 (Correctness with x86-SC backend and obj.).GCorrect((CompCert, Clight, x86-SC), (IdTrans, CImp, CImp)).

To prove Lem. 13, we try to reuse as much the originalCompCert correctness proofs as possible. We address the fol-lowing two main challenges in reusing CompCert proofs.Convertingmemory layout. Many CompCert lemmas relyon the specific definition of the CompCert memory model,which is different from ours. In CompCert, memory allo-cations in an execution get consecutive natural numbers asblock numbers. This fact is used extensively in CompCert’sfundamental libraries and its compilation correctness proofs.But it does not hold in our model, where each thread has itsown freelist F (an infinite sequence of block numbers). Sincethe F of different threads must be disjoint, we cannot makeeach F an infinite sequence of consecutive natural numbersto directly simulate CompCert.Our solution is to define a bijection between memories

under the two models. As a result, the behaviors of a threadunder our model are equivalent to its behaviors under Comp-Cert model, and our module-local simulation can be derived7The compiler option -g for insertion of debugging information is disabled.

Lemma sel_expr_correct:forall sp e m a v fp, Cminor.eval_expr sge sp e m a v ->

Cminor.eval_expr_fp sge sp e m a fp ->forall e' le m', env_lessdef e e' -> Mem.extends m m' ->exists v', exists fp', FP.subset fp' fp /\

eval_expr_fp tge sp e' m' le (sel_expr a) fp' /\eval_expr tge sp e' m' le (sel_expr a) v' /\ Val.lessdef v v'.

Figure 12. Coq code example

from a simulation based on the CompCert model. This waywe reuse most CompCert libraries and compilation proofswithout modification.Footprint preservation. CompCert does not model foot-prints. Fortunately many of its definitions and lemmas canbe slightly modified to support footprint preservation. Forinstance, Fig. 12 shows a key lemma in the proof of theSelection pass, sel_expr_correct, with our newly-addedcode highlighted. It says the selected expression must evalu-ate to a value refined by the Cminor expression. We simplyextend the lemma by requiring the selected expression hassmaller footprint while evaluating on related memory.

7.3 x86-TSO as the Target LanguageNow we show how to generate x86-TSO code while preserv-ing the behaviors of the source program. The theorem belowshows our final goal. To avoid clutter, below we use sl and tlto represent slClight and tlx86-TSO respectively.

Theorem 15 (Correctness with x86-TSO backend and obj.).For any f1 . . . fn , Γ = {(sl, ge1,γ1), . . . , (sl, gem,γm ), (slCImp, geo,γo )},and Π = {(tl, ge′1, π1), . . . , (tl, ge

′m, πm ), (tl, geo, πo )}, if

1. ∀i ∈ {1, . . . ,m}. (CompCert.CodeT(γi ) = πi ) ∧ injective(φ)∧ (CompCert.φ = φ) ∧ ⌊φ⌋(gei ) = ge′i ,

2. Safe(let Γ in f1 ∥ . . . ∥ fn ) and DRF(let Γ in f1 ∥ . . . ∥ fn ),3. ∀i ∈ {1, . . . ,m}. ReachClose(sl, gei ,γi ),

and ReachClose(slCImp, geo,γo ),4. (tl, geo, πo ) 4o (slCImp, geo,γo ),

then let Π in f1 ∥ . . . ∥ fn ⊑′ let Γ in f1 ∥ . . . ∥ fn .

Here the premises 1-3 are similar to those required inDef. 11. In addition, the premise 4 requires that the x86-TSOcode πo of the object be simulated by γo . The simulation(tl, geo, πo) 4

o (slCImp, geo,γo) is an extension of Liang andFeng [19] with the support of TSO semantics for the low-level code. We give the definition in Appendix D.The refinement relation ⊑′ is a weaker version of ⊑ (see

Sec. 3.2). It does not preserve termination (the formal defini-tion omitted here). This is because our simulation4o for the

12

Page 13: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

Compilation passes andframework

Spec ProofCompCert Ours CompCert Ours

Cshmgen 515 1021 1071 1503Cminorgen 753 1556 1152 1251Selection 336 500 647 783RTLgen 428 543 821 862Tailcall 173 328 275 405Renumber 86 245 117 358Allocation 704 785 1410 1700Tunneling 131 339 166 475Linearize 236 371 349 733CleanupLabels 126 387 161 388Stacking 730 1038 1108 2135Asmgen 208 338 571 1128Compositionality (Lem. 6) 580 2249DRF preservation (Lem. 8) 358 1142Semantics equiv. (Lem. 9) 1540 4718Lifting 813 1795

Figure 13. Lines of code (using coqwc) in Coq

object code does not preserve termination for now, whichwe leave as future work.

Theorem 15 can be derived from Thm. 14 (for the compila-tion from Clight to x86-SC), and from Lem. 16 below, sayingthe x86-TSO code refines the x86-SC client code and thesource object code (we use tlsc and tltso as shorter notationsfor tlx86-SC and tlx86-TSO respectively).

Lemma 16 (Restore SC semantics for DRF x86 programs).Let Πsc = {(tlsc, ge1, π1), . . . , (tlsc, gem, πm ), (slCImp, geo,γo )},and Πtso = {(tltso, ge1, π1), . . . , (tltso, gem, πm ), (tltso, geo, πo )}.For any f1 . . . fn , if1. Safe(let Πsc in f1 ∥ . . . ∥ fn ) and DRF(let Πsc in f1 ∥ . . . ∥ fn ),2. (tltso, geo, πo ) 4o (slCImp, geo,γo ),then let Πtso in f1 ∥ . . . ∥ fn ⊑′ let Πsc in f1 ∥ . . . ∥ fn .

As explained before, Lem. 16 can be viewed as a strength-ened DRF-guarantee theorem for x86-TSO in that, if we letγo contain only skip and geo = ∅, Lem. 16 implies the DRF-guarantee of x86-TSO.

7.4 Proof Efforts in CoqIn Coq we have mechanized the framework (Fig. 2) andthe extended framework (Fig. 3) and proved all the relatedlemmas. We have verified all the CompCert passes in Fig. 11.Statistics of our Coq implementation and proofs are de-

picted in Fig. 13. Adapting the compilation correctness proofsfrom CompCert is relatively lightweight. For most passesour proofs are within 300 lines of code more than the origi-nal CompCert proofs. The Stacking pass introduces moreadditional proofs, mostly caused by arguments marshallingfor supporting cross-language linking. In our experience,adapting CompCert’s original compilation proofs to our set-tings takes less than one person week per translation pass(except for Stacking). For simpler passes such as Tailcall,Linearize, Allocation, and RTLgen, it takes less than oneperson day per pass.

By contrast, implementing our framework is more chal-lenging, which took us about 1 person year. In particular,proving the equivalence between non-preemptive and pre-emptive semantics for DRF programs took us more time thanexpected, although it seems to be a well-known folklore the-orem. The co-inductive proofs there involve a large numberof non-trivial cases of reordering threads’ executions.

8 Related Work and ConclusionCompiler verification. Variouswork extends CompCert [16]to support separate compilation or concurrency. We havediscussed Compositional CompCert [2, 29] in Sec. 1 and 2.SepCompCert [15] extends CompCert with the support ofsyntactical linking. Their approach requires all the compila-tion units be compiled by CompCert. They do not supportcross-language linking or concurrency as we do.

CompCertTSO [27] compiles ClightTSO programs to thex86-TSO machine. It does not support cross-language link-ing, and its proof for the two CompCert passes Stackingand Cminorgen are not compositional. By contrast, we haveverified these two passes using our compositional simulation.For the other compositional passes, CompCertTSO relies ona thread-local simulation, which is stronger than ours. Itrequires that the source and the target always generate thesame memory events (excepts for those local variables thatcan be stored in registers). As a result, some optimizations(such as constant propagation and CSE) in CompCertTSOhave to be more restrictive.

As an extension of CompCertTSO, Jagannathan et al. [12]allow the compiler to inject racy code such as the efficientspin lock in Fig. 10. They propose a refinement calculus onthe racy code to ensure the compilation correctness. Theirwork looks similar to our extended framework in Fig. 3, butsince they use TSO semantics for both the source and targetprograms, they do not need to handle the gap between theSC and TSO semantics, so they do not need the source to beDRF as in our work.Podkopaev et al. [24] prove correctness of the compila-

tion from the promising semantics (which is a high-leveloperational relaxed model) to the operational ARMv8-POPmachine. They develop whole-program simulations to dealwith the complicated relaxed behaviors. Later on they verifycompilations from the promising semantics to declarativehardware models such as POWER, ARMv7 and ARMv8 [25].The simulations are tied to the promising semantics. Theydo not aim for reusing existing sequential compilations buthandle a lot of complicated issues in relaxed models.As part of their CCAL framework, Gu et al. [10] develop

thread-safe CompCertX (TSCompCertX), which supportsseparate compilation of concurrent C programs, and thelinking of C programs with assembly modules. However,the compositionality of TSCompCertX is tied to the specificsettings of the CCAL model. In particular, it relies on the

13

Page 14: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

abstraction of concurrent objects to derive the partition ofprivate and shared memory, and uses the auxiliary push andpull instructions to ensure race freedom. Also TSCompCertXsupports only race-free programs in the sequentially consis-tent memory model. Our work does not need source-levelspecifications for race-free programs. Besides, we supportconfined benign races in x86-TSO (a feature not supportedin TSCompCertX), where the racy objects are required tohave race-free abstraction.

Vellvm [35, 36] proves correctness of several optimizationpasses for sequential LLVM programs. Wang et al. [32] verifya separate compiler from Cito to Bedrock, which relies onaxiomatic specifications for cross-language external calls. Itis unclear how to adapt their work to concurrency. Percontiand Ahmed [23] verify separate compilation by embeddinglanguages in a combined language. They do not supportconcurrency either. Ševčík [26] studies safety of a class ofoptimizations in concurrent settings using an abstract tracesemantics. It is unclear if his approach can be applied to ver-ify general compilation. Lochbihler [20] verifies a compilerfor concurrent Java programs. His simulation has similarrestrictions as CompCertTSO.

Non-preemptive semantics anddata-race-freedom. Non-preemptive (or cooperative) semantics has been developed invarious settings for various purposes (e.g., [1, 5, 18, 31, 34]).Both Ferreira et al. [9] and Xiao et al.[33] study the rela-tionships between non-preemptive semantics and DRF, butthey do not give any mechanized proofs of termination-preserving semantics equivalence as in our work. DRFx [21]proposes a concept called Region-Conflict-Freedom, whichlooks similar to our NPDRF, but there is no formal opera-tional formulation as we do. Owens [22] proposes Triangular-Race-Freedom (TRF) and proves that TRF programs behavesthe same in x86-SC and x86-TSO. TRF is weaker than DRFand can be satisfied by the efficient spin lock code in Fig. 10.

Conclusion and future work. We present a frameworkfor building certified compilation of concurrent programsfrom sequential compilation. We develop CASCompCert,which reuses CompCert to compile DRF programs, withthe support of confined benign races in manually writtenassembly (x86-TSO). We believe our work is a promisingstart for certified separate compilation of general concurrentprograms. The latter goal requires longer-term work and hasmore problems to address, as discussed below.First, our work is limited in the support of general con-

current languages. For instance, we have not yet consideredthread spawn, though we do not see any particular chal-lenges. The spawn step in the operational semantics needsto assign a new F to each newly created thread. In simu-lations spawns should be handled in a similar way as con-text switches. Besides, our extended framework currently

does not support multiple objects because it lacks a mech-anism to ensure the partition between objects’ data. To ad-dress the problem, we may follow the ideas in LRG [8] andCAP [7] to set the logical boundaries between objects. Alsoit is worthwhile to support relaxed concurrency, such asC11-style memory models which have relaxed atomics.Second, it is also interesting to explore other target ma-

chine models, such as PowerPC and ARM which have LL/SCinstructions rather than lock-prefixed atomic instructionson x86.Third, we would like to verify more optimization passes,

including those relying on concurrency features. For in-stance, our work may be modified to support roach-motelreorderings, by distinguishing EntAtom and ExtAtom in thelocal simulation and recording the footprints that are movedacross EntAtom or ExtAtom. We also would like to finishthe proofs for the remaining CompCert optimization passes(which we do not expect any major technical challenges)and add the support of stack pointer escape following theon-paper proofs in Appendix E.Finally, we are also curious about extensional (language-

independent) characterizations of footprints. Our formula-tion of wd (Def. 1) is one way to characterize footprints butit is not restrictive enough to rule out all benign races. Thisin practice is harmless since the current wd already allowsus to prove our final theorem of the framework (Thm. 12),and we use properly defined concrete languages to provecorrectness of specific compilers (Thm. 14). Nevertheless, itis interesting to explore better formulations of wd.

AcknowledgmentsWe thank our shepherd John Reppy, Zhong Shao and anony-mous referees for their suggestions and comments on earlierversions of this paper. We especially thank Weijie Fan andJianwei Song for their helps on the Coq proofs of the opti-mization passes Tunneling and CleanupLabels. This workis supported in part by grants from National Natural ScienceFoundation of China (NSFC) under Grant No. 61632005, andfrom Huawei Innovation Research Program (HIRP).

References[1] Martin Abadi and Gordon Plotkin. 2009. A Model of Cooperative

Threads. In POPL. 29–40.[2] Lennart Beringer, Gordon Stewart, Robert Dockins, and Andrew W.

Appel. 2014. Verified Compilation for Shared-Memory C. In ESOP.107–127.

[3] Hans-Juergen Boehm. 2011. How to Miscompile Programs with "Be-nign" Data Races. In HotPar. 3–3.

[4] Hans-Juergen Boehm and Sarita V. Adve. 2008. Foundations of theC++ concurrency memory model. In PLDI. 68–78.

[5] Gérard Boudol. 2007. Fair Cooperative Multithreading. In CONCUR.272–286.

[6] CompCert Developers. 2017. CompCert-3.0.1. http://compcert.inria.fr/release/compcert-3.0.1.tgz

[7] Thomas Dinsdale-Young, Mike Dodds, Philippa Gardner, Matthew J.Parkinson, and Viktor Vafeiadis. 2010. Concurrent Abstract Predicates.

14

Page 15: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

T (t) = (tl, F ,κ) F ⊢ (κ,σ )e7−−→emp

(κ ′,σ )

T ′ = T {t ; (tl, F ,κ ′)}

(T , t, 0,σ )e==⇒emp

(T ′, t, 0,σ )Print

T (t) = (tl, F ,κ) t′ ∈ dom(T \t) F ⊢ (κ,σ )ret7−−−→emp

(κ ′,σ )

(T , t, 0,σ )sw==⇒emp

(T \t, t′, 0,σ )Term

T (t) = (tl, F ,κ) dom(T ) = {t} F ⊢ (κ,σ )ret7−−−→emp

(κ ′,σ )

(T , t, 0,σ )τ==⇒emp

doneDone

T (t) = (tl, F ,κ) F ⊢ (κ,σ ) 7−−→δ

abort

(T , t, 0,σ )τ=⇒δ

abortAbort

Figure 14.More Rules of the Preemptive Global Semantics

In ECOOP. 504–528.[8] Xinyu Feng. 2009. Local rely-guarantee reasoning. In POPL. 315–327.[9] Rodrigo Ferreira, Xinyu Feng, and Zhong Shao. 2010. Parameterized

Memory Models and Concurrent Separation Logic. In ESOP. 267–286.[10] Ronghui Gu, Zhong Shao, Jieung Kim, Xiongnan (Newman) Wu,

Jérémie Koenig, Vilhelm Sjöberg, Hao Chen, David Costanzo, andTahina Ramananandro. 2018. Certified Concurrent Abstraction Layers.In PLDI. 646–661.

[11] Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Pro-gramming. Morgan Kaufmann.

[12] Suresh Jagannathan, Vincent Laporte, Gustavo Petri, David Pichardie,and Jan Vitek. 2014. Atomicity Refinement for Verified Compilation.ACM Trans. Program. Lang. Syst. 36, 2 (2014), 6:1–6:30.

[13] Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and XinyuFeng. 2019. Towards Certified Separate Compilation for ConcurrentPrograms (Technical Report and Coq Implementations). https://plax-lab.github.io/publications/ccc/

[14] Cliff B. Jones. 1983. Tentative Steps Toward a Development Methodfor Interfering Programs. ACM Trans. Program. Lang. Syst. 5, 4 (1983),596–619.

[15] Jeehoon Kang, Yoonseung Kim, Chung-Kil Hur, Derek Dreyer, and Vik-tor Vafeiadis. 2016. Lightweight Verification of Separate Compilation.In POPL. 178–190.

[16] Xavier Leroy. 2009. Formal verification of a realistic compiler. Commun.ACM 52, 7 (2009), 107–115.

[17] Xavier Leroy. 2009. A Formally Verified Compiler Back-end. J. Autom.Reason. 43 (December 2009), 363–446. Issue 4.

[18] Peng Li and Steve Zdancewic. 2007. Combining Events and Threads forScalable Network Services Implementation and Evaluation of Monadic,Application-level Concurrency Primitives. In PLDI. 189–199.

[19] Hongjin Liang and Xinyu Feng. 2013. Modular Verification of Lineariz-ability with Non-Fixed Linearization Points. In PLDI. 459–470.

[20] Andreas Lochbihler. 2010. Verifying a Compiler for Java Threads. InESOP. 427–447.

[21] Daniel Marino, Abhayendra Singh, Todd Millstein, Madanlal Musu-vathi, and Satish Narayanasamy. 2010. DRFX: A Simple and EfficientMemory Model for Concurrent Programming Languages. In PLDI.351–362.

[22] Scott Owens. 2010. Reasoning about the Implementation of Concur-rency Abstractions on x86-TSO. In ECOOP. 478–503.

[23] James T. Perconti and Amal Ahmed. 2014. Verifying an Open CompilerUsing Multi-language Semantics. In ESOP. 128–148.

[24] Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. 2017. PromisingCompilation to ARMv8 POP. In ECOOP. 22:1–22:28.

[25] Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. 2019. Bridging thegap between programming languages and hardware weak memorymodels. PACMPL 3, POPL (2019), 69:1–69:31.

[26] Jaroslav Ševčík. 2011. Safe optimisations for shared-memory concur-rent programs. In PLDI. 306–316.

[27] Jaroslav Ševčík, Viktor Vafeiadis, Francesco Zappa Nardelli, SureshJagannathan, and Peter Sewell. 2013. CompCertTSO: A Verified Com-piler for Relaxed-Memory Concurrency. J. ACM 60, 3 (2013), 22.

[28] Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli,and Magnus O. Myreen. 2010. x86-TSO: a rigorous and usable pro-grammer’s model for x86 multiprocessors. Commun. ACM 53, 7 (2010),89–97.

[29] Gordon Stewart, Lennart Beringer, Santiago Cuellar, and Andrew W.Appel. 2015. Compositional CompCert. In POPL. 275–287.

[30] R. K. Treiber. 1986. System programming: coping with parallelism.Technical Report RJ 5118. IBM Almaden Research Center.

[31] Jérôme Vouillon. 2008. Lwt: A Cooperative Thread Library. In ML.3–12.

[32] Peng Wang, Santiago Cuellar, and Adam Chlipala. 2014. CompilerVerification Meets Cross-language Linking via Data Abstraction. InOOPSLA. 675–690.

[33] Siyang Xiao, Hanru Jiang, Hongjin Liang, and Xinyu Feng. 2018. Non-Preemptive Semantics for Data-Race-Free Programs. In ICTAC. 513–531.

[34] Jaeheon Yi, Caitlin Sadowski, and Cormac Flanagan. 2011. CooperativeReasoning for Preemptive Execution. In PPoPP. 147–156.

[35] Jianzhou Zhao, Santosh Nagarakatte, Milo M.K. Martin, and SteveZdancewic. 2012. Formalizing the LLVM Intermediate Representationfor Verified Program Transformations. In POPL. 427–440.

[36] Jianzhou Zhao, Santosh Nagarakatte, Milo M.K. Martin, and SteveZdancewic. 2013. Formal Verification of SSA-based Optimizations forLLVM. In PLDI. 175–186.

A More about the Language SettingsGlobal preemptive semantics. Fig. 14 gives the omittedrules of the global preemptive semantics.

The Print rule shows the step generating an external event.It requires that the flag d must be 0, i.e., external events canbe generated only outside of atomic blocks. Also the stepgenerating an external event does not access memory, so itsfootprint is empty (emp).When the current thread terminates, we either remove

it from the thread pool and then switch to another threadif there is one (see the Term rule), or terminate the wholeprogram if not, as shown in the Done rule. The Term rulegenerates the sw message to record the context switch.The Abort rule says the whole program aborts if a local

module aborts.We write F ⊢ ϕ

τ7−−→δ

∗ϕ ′ for zero or multiple silent steps,where δ is the accumulation of the footprint of each step.F ⊢ ϕ

τ7−−→δ+ϕ ′ represents at least one step. Similar notations

are used for global steps.We alsowriteW =⇒+W ′ formultiplesteps that either are silent or produce sw events. It mustcontain at least one silent step.W

e=⇒+W ′ represents multiple

steps with exactly one e event produced (while other stepseither are silent or produce sw events).

15

Page 16: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

for all i and j in {1, . . . ,n}, and i , j:Fi∩Fj =∅ dom(σ )∩Fi =∅tli .InitCore(πi , fi )=κi , where (tli , gei , πi ) ∈Π

T = {1; (tl1, F1,κ1), . . . ,n; (tln, Fn,κn )}σ =GE(Π) closed(σ )t∈dom(T ) 𝕕= {t1;0, . . . , tn ;0}

let Π in f1 | . . . | fn :load===⇒ (T , t, 𝕕,σ )

Loadnp

T (t) = (tl, F ,κ)F ⊢ (κ,σ )

τ7−→δ

(κ ′,σ ′) T ′ = T {t ; (tl, F ,κ ′)}

(T , t, 𝕕,σ ) :τ=⇒δ

(T ′, t, 𝕕,σ ′)

τ -stepnp

T (t) = (tl, F ,κ) 𝕕(t) = 0 t′ ∈ dom(T )

F ⊢ (κ,σ )e7−→emp

(κ ′,σ ) T ′ = T {t ; (tl, F ,κ ′)}

(T , t, 𝕕,σ ) :e==⇒emp

(T ′, t′, 𝕕,σ )Printnp

T (t) = (tl, F ,κ) 𝕕(t) = 0F ⊢ (κ,σ )

ret7−−−→emp

(κ ′,σ ) t′ ∈ dom(T \t)

(T , t, 𝕕,σ ) :sw==⇒emp

(T \t, t′, 𝕕\t,σ )Termnp

T (t) = {t ; (tl, F ,κ)} 𝕕 = {t ; 0}F ⊢ (κ,σ )

ret7−−−→emp

(κ ′,σ )

(T , t, 𝕕,σ ) :τ==⇒emp

doneDonenp

T (t) = (tl, F ,κ) F ⊢ (κ,σ ) 7−−→δ

abort

(T , t, 𝕕,σ ) :τ=⇒δ

abortAbortnp

Figure 15. More Rules of the Non-Preemptive Global Se-mantics

Event-trace refinement and equivalence. Below we givethe omitted definitions of event-trace refinement and equiv-alence.

PEtr(P,B) iff ∃W . (Pload===⇒W ) ∧ Etr(W ,B)

W =⇒+ abortEtr(W , abort)

We=⇒+W ′ Etr(W ′,B)

Etr(W , e ::B)

W =⇒+ doneEtr(W , done)

W =⇒+W ′ Etr(W ′, ϵ)

Etr(W , ϵ)The co-inductive definition of Etr(W ,B) says that B can

be produced by executingW . Note we distinguish the tracesof non-terminating (diverging) executions from those of ter-minating ones. If the execution ofW diverges, its observableevent trace B is either of infinitely length, or finite but doesnot end with done or abort (called silent divergence, see theright-most rule above).

Then we define the refinement P ⊑ P and the equivalenceP ≈ P below. They ensure that if P has a diverging execution,

so does P. Thus the refinement and the equivalence relationspreserve termination.

Definition 17 (Event-Trace Refinement and Equivalence).P ⊑ P iff ∀B. PEtr(P,B) =⇒ PEtr(P,B).P ≈ P iff ∀B. PEtr(P,B) ⇐⇒ PEtr(P,B).

Safety. Below we use the event traces to define Safe(W) andSafe(P).

Definition 18 (Safety). Safe(W) iff ¬∃tr. Etr(W, tr ::abort).

Safe(P) iff (∃W. Pload==⇒ W) and ∀W. (P

load==⇒ W) =⇒

Safe(W).

Non-preemptive semantics. Fig. 15 gives the omitted non-preemptive semantics rules. Like EntAtnp and ExtAtnp, therules Printnp and Termnp execute one step of the currentthread t, and then non-deterministically switch to a threadt′ (which could just be t). The corresponding global stepsproduce the sw events (or the external event e in the (Printnp)rule). Other rules are very similar to their counterparts inthe preemptive semantics.

B More about the SimulationProof of Lemma 6. Proof of Lemma 6 relies on the followingnon-interference lemma.

Lemma 19 (Non-Interference). For any k, k′ of some well-defined language sl, and any κ,κ ′ of some well-defined lan-guage tl, if F1 ⊢ (k, Σ0)

ι7−→∆0

+(k′, Σ), and F1 ⊢ (κ,σ0)ι7−→δ0

+(κ ′,σ ),

then for any µ, F2 and F2 such that (µ .S ∪ F1) ∩ F2 = ∅, and(µ .S ∪ F1) ∩ F2 = ∅, we have

HG(∆0, Σ, F1, µ .S) ∧ LG(µ, (δ0,σ , F1), (∆0, Σ))=⇒ Rely(µ, (Σ0, Σ, F2), (σ0,σ , F2)) .

Here we use F ⊢ (k, Σ0)ι7−→∆0

+(k′, Σ) to say:

either F ⊢ (k, Σ0)τ7−→∆0

+(k′, Σ),

or F ⊢ (k, Σ0)τ7−→∆0

∗(k′′, Σ) and F ⊢ (k′′, Σ)ι7−→emp

(k′, Σ) for some

k′′, and ι , τ .The non-interference lemma says, after the execution of

(k, Σ0) and (κ,σ0) until some interaction point with resultingstates Σ and σ , we are able to establish the Rely conditionover the initial and resulting states with respect to µ andsome freelists F2 and F2, if the freelists F2 and F2 containsno shared location and are disjoint with the freelists F1 andF1 respectively, and the execution guarantees HG and LG.Here LG is established by our module-local simulation if HGis guaranteed, and HG is established by RC by the followinglemma, which says we can establish HG for multiple steps ifthe module and corresponding state satisfies RC.

Lemma 20 (RC-guarantee). If RC(S, F, (k, Σ0)) andF ⊢ (k, Σ0)

ι7−→∆0

+(k′, Σ) then HG(∆0, Σ, F, S).

16

Page 17: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

The non-preemptive global downward simulation. Thedefinition of the whole program simulation is similar to themodule local simulation, except the case for environmentinterference (Rely steps), which is unnecessary for wholeprogram simulation. Every source step should correspondto multiple target steps. As in module local simulation, wealways require the target footprints matches those of thesource, defined as FPmatch. Also the footprint should alwaysbe in scope. As a special requirement for the whole programsimulation, we always require the source and the target dolock-step context switch and they always switch to the samethread.

Belowwe use curF(W ) for the freelist of the current thread,i.e., curF((T, t, _, _)) def

= T(t).F.Definition 21 (Whole-Program Downward Simulation).Let P = let Γ in f1 | . . . | fm , P = let Π in f1 | . . . | fm ,Γ = {(sl1, ge1,γ1), . . . , (slm, gem,γm)} andΠ = {(tl1, ge′1, π1), . . . , (tlm, ge

′m, πm)}.

We say P 4 P iff• ∃φ.

(∀1 ≤ i ≤ m. ⌊φ⌋gei = ge′i

), and

• for any W, if P :load==⇒ W, then there existsW , i ∈ index,

µ , such that P :load==⇒ W and (W, emp) 4i

µ (W , emp).

Here we define (W,∆0) 4iµ (W , δ0) as the largest relation

such that whenever (W,∆0) 4iµ (W , δ0), then the following

are true:1. W.t = W .t, and W.𝕕 = W .𝕕;2. ∀W′,∆. if W :

τ=⇒∆W′ then one of the following holds:

a. ∃j . j < i , and (W′,∆0 ∪ ∆) 4jµ (W , δ0); or

b. ∃W ′, δ , j .

i. W :τ=⇒δ+W ′, and

ii. (δ0∪δ ) ⊆ (curF(W )∪µ .S), andFPmatch(µ,∆0∪∆, δ0∪δ ); and

iii. (W′,∆0 ∪ ∆) 4jµ (W ′, δ0 ∪ δ );

3. ∀T′, 𝕕′, Σ′,o, t′. if o , τ and W :o==⇒emp

(T′, t′, 𝕕′, Σ′),then∃W ′, δ ,T ′,σ ′, j . for any t′′ ∈ dom(T′), we havea. W :

τ=⇒δ∗W ′, and W ′ :

o==⇒emp

(T ′, t′′, 𝕕′,σ ′), and

b. (δ0∪δ ) ⊆ curF(W )∪µ .S , and FPmatch(µ,∆0, δ0∪δ );and

c. ((T′, t′′, 𝕕′, Σ′), emp) 4jµ ((T ′, t′′, 𝕕′,σ ′), emp);

4. if W :τ==⇒emp

done, then ∃W ′, δ .

a. W :τ=⇒δ∗W ′, and W ′ :

τ==⇒emp

done, and

b. (δ0∪δ ) ⊆ curF(W )∪µ .S , and FPmatch(µ,∆0, δ0∪δ ).The condition on context switches (lock step, and to the

same thread) in our whole-program simulations is not sostrong as it sounds. It can be naturally derived from com-posing our module-local simulations. In our module-local

simulation, we already have lock-step EntAtom/ExtAtomsteps (see case 2 in Def. 3), which will be switch points inwhole programs. Besides, since the local simulation is sup-posed to relate the source and target of the same thread, itseems natural to require the source and target to alwaysswitch to the same thread in the whole-program simulation.

The switching-to-the-same-thread condition makes it easyto flip the whole-program simulation (i.e. derive the whole-program upward simulation from the downward one). Theupward and downward simulations require us to map eachtarget thread to some source thread and also map each sourcethread to a target one. Since the number of target threads isthe same as the number of source threads, we would need torequire the mapping between the source and target threadsto be a bijection. The easiest way to ensure the bijection isto let the thread identifiers to be the same.

The non-preemptive global upward simulation. We de-fine the whole program upward simulation as P 6 P inDef. 22. It is similar to the downward simulation P 4 P , withthe positions of source and target swapped. Note that wedo not flip the FPmatch condition, since we always requirefootprint of target program being a refinement of the foot-print of the source program, in order to prove DRF of thetarget program. Correspondingly, the address mapping φ isnot flipped either.

Definition 22 (Whole-Program Upward Simulation).Let P = let Γ in f1 | . . . | fm , P = let Π in f1 | . . . | fm ,Γ = {(sl1, ge1,γ1), . . . , (slm, gem,γm)} andΠ = {(tl1, ge′1, π1), . . . , (tlm, ge

′m, πm)}.

We say P 6 P iff• ∃φ.

(∀1 ≤ i ≤ m. ⌊φ⌋gei = ge′i

), and

• for any W, if P :load==⇒ W , then there exists W, i ∈ index,

µ, such that P :load==⇒ W and (W , emp) 6i

µ (W, emp).

Here we define (W , δ0) 6iµ (W,∆0) as the largest relation

such that whenever (W , δ0) 6iµ (W,∆0), then the following

are true:1. W.t = W .t, and W.𝕕 = W .𝕕;2. ∀W ′, δ . if W :

τ=⇒δW ′, then one of the following holds:

a. ∃j . j < i , and (W ′, δ0 ∪ δ ) 6jµ (W,∆0);

b. ∃W′,∆, j .

i. W :τ=⇒∆

+W′, and

ii. (δ0∪δ ) ⊆ curF(W )∪µ .S , andFPmatch(µ,∆0∪∆, δ0∪δ ); and

iii. (W ′, δ0 ∪ δ ) 6jµ (W′,∆0 ∪ ∆);

3. ∀T ′, 𝕕′,σ ′,o, t′. if o , τ and W :o==⇒emp

(T ′, t′, 𝕕′,σ ′),then∃W′,∆,T′, Σ′, j . for any t′′ ∈ dom(T ′), we have

17

Page 18: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

P :load===⇒ W W :=⇒∗W ′ W ′ :Z=⇒ Race

P :Z=⇒ Race

P :load===⇒ W NPpred(W , t1, (δ1,d1)) t1 , t2

NPpred(W , t2, (δ2,d2)) (δ1,d1)⌢ (δ2,d2)

P :Z=⇒ Race

W :o==⇒emp

W ′ o , τ t1 , t2 (δ1,d1)⌢ (δ2,d2)

NPpred(W ′, t1, (δ1,d1)) NPpred(W ′, t2, (δ2,d2))

W :Z=⇒ RaceRacenp

W = (T , _, 𝕕,σ ) T (t) = (F ,κ)

F ⊢ (κ,σ )τ7−→δ∗(κ ′,σ ′) 𝕕(t) = d

NPpred(W , t, (δ ,d))Predictnp

Figure 16. Data Race in Non-Preemptive Semantics

a. W :τ=⇒∆

∗W′, and W′ :o==⇒emp

(T′, t′′, 𝕕′, Σ′), and

b. δ0 ⊆ curF(W )∪µ .S , and FPmatch(µ,∆0∪∆, δ0); andc. ((T ′, t′′, 𝕕′,σ ′), emp) 6j

µ ((T′, t′′, 𝕕′, Σ′), emp);4. if W :

τ==⇒emp

done, then ∃W′,∆.

a. W :τ=⇒∆

∗W′, and W′ :τ==⇒emp

done, and

b. δ0 ⊆ curF(W )∪µ .S , and FPmatch(µ,∆0∪∆, δ0).

Determinismof the target language, and flip of the globalsimulation.

Definition 23 (Deterministic Languages). det(tl) iff, for allconfiguration ϕ in tl (see the definition of ϕ in Fig. 4), and forall F , F ⊢ ϕ

ι17−→δ1

ϕ1 ∧ F ⊢ ϕι27−→δ2

ϕ2 =⇒ ϕ1 = ϕ2 ∧ ι1 = ι2 ∧ δ1 = δ2.

Lemma 24 (Flip, 4 in Fig. 2).For any f1, . . . , fn , ge,φ, Γ = {(sl1, ge1,γ1), . . . , (slm, gem,γm)},Π = {(tl1, ge′1, π1), . . . , (tlm, ge

′m, πm)}, if ∀i . det(tli ), and

let Γ in f1 | . . . | fm 4φ let Π in f1 | . . . | fm ,then let Π in f1 | . . . | fm 6φ let Γ in f1 | . . . | fm .

C More about DRF and NPDRFData races in the non-preemptive semantics (P :Z=⇒ Race)are defined in Fig. 16. Similar to the definition for the pre-emptive semantics, a non-preemptive program configurationW steps to Race (W :Z=⇒ Race) if two threads are predictedhaving conflicting footprints, as shown in ruleRacenp. Predic-tions are made only at the switch points of non-preemptivesemantics (W :

o==⇒emp

W ′ where o , τ ), i.e., program points atatomic block boundaries, after an observable event, or after

thread termination. The prediction rule Predictnp is a unifiedversion of its counterpart Predict-0 and Predict-1 in Fig. 9,where 𝕕 indicates whether the predicted steps are inside anatomic block. Similar to the Predict-1 rule, we do not insist onpredicting the footprint generated by the execution reachingthe next switch point, because the code segments betweenswitch points could be nonterminating. In addition to at theswitch points, we also need to be able to perform a predictionat the initial state as well (the second rule in Fig. 16), becausethe first executing thread is non-deterministically picked atthe beginning, which has similar effect as thread switching.

A program P is NPDRF if it never steps to Race :NPDRF(P) iff ¬(P :Z=⇒ Race)

Note that defining NPDRF is not for studying the absenceof data races in the non-preemptive semantics (which isprobably not very interesting since the execution is non-preemptive anyway). Rather, it is intended to serve as anequivalent notion ofDRF but formulated in the non-preemptivesemantics. The following lemma shows the equivalence.

Lemma 25 (Equivalence between DRF and NPDRF, 6 and8 in Fig. 2). For any f1, . . . , fm ,Π = {(tl1, π1), . . . , (tlm, πm)}such that ∀i . wd(tli ), and P = let Π in f1 ∥ . . . ∥ fm ,

DRF(P) ⇐⇒ NPDRF(P) .

D More about the Extended FrameworkD.1 x86-TSO semanticsOur x86-TSO semantics follows the x86-TSO model pre-sented in [22, 28] and CompCertTSO [27].

To make the refinement proof easy, we choose to presentthe x86-TSO global semantics in a similar way as our pre-emptive global semantics. The machine states and the globaltransition rules are shown in Fig. 17 and 18 respectively. Here

• All modules are x86-TSO modules.• The whole porgram configurationW is instrumentedwith write buffers B. Each thread t has its own writebuffer B(t).

• We do not need the atomic bit and EntAtom, ExtAtomevents in the x86-TSO model.

The Unbuffer rule in the global semantics in Fig. 18 saysthat the machine non-deterministically apply the writes in athread’s buffer to the real memory.

The x86 instruction sets and the local semantics are shownin Fig. 19. For normal writes, a thread always writes to itsbuffer rather than the real memory. For lock-prefixed in-structions, the thread requires the buffer is empty and itupdates the memory directly (see the LOCKDEC rule).

18

Page 19: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

(Prog) P ::= let Π in f1 ∥ . . . ∥ fn (Entry) f ∈ String(MdSet) Π ::= {(ge1, π1), . . . , (gem, πm )} (Module) π ::= . . .

(Lang) tlx86-TSO ::= (Module, Core, InitCore, 7−−→tso) (Core) κ ::= (π , . . .)

ge ∈ Addr⇀fin Val InitCore ∈ Module → Entry⇀ list(Val)⇀ Core

7−−→tso ∈ FList × (Core × State × Buf) → {P((Msg × FtPrt) × (Core × State × Buf)), abort}

(ThrdID) t ∈ N (Addr) l ::= . . .

(State) σ ∈ Addr⇀fin Val (Val) v ::= l | . . .

(Buf) b ::= ϵ | (l,v) ::b (FList) F , F ∈ Pω (Addr)(Msg) ι ::= τ | e | call(f, ®v) | ret(v) (Event) e ::= print(v)

(Config) ϕ,Φ ::= (κ,σ ) | abort

(World) Wtso ::= (Π, F ,T , t, B,σ )

(ThrdPool) T ,T ::= {t1 ; K1, . . . , tn ; Kn }

(RtMdStk) K,K ::= ϵ | (F ,κ) ::K

(Buffers) B ::= {t1 ; b1, . . . , tn ; bn }

(FSpace) F , FS ::= F ::F (co-inductive)

(GMsg) o ::= τ | e | sw | ub (ASet) S, S ∈ P(Addr)

Figure 17. x86TSO machine

19

Page 20: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

F0 = F1 :: . . . ::Fn ::F F = Fn+1 ::Fn+2 :: . . .for all i , j: Fi∩Fj = ∅ dom(σ )∩Fi = ∅

for all i in {1, . . . ,n}: tlx86-TSO.InitCore(πi , fi ) = κi , where (tlx86-TSO, gei , πi ) ∈ ΠT = {1 ; (tlx86-TSO, F1,κ1) ::ϵ, . . . ,n ; (tlx86-TSO, Fn,κn ) ::ϵ}t ∈ {1, . . . ,n} σ =GE(Π) closed(σ )

let Π in f1 ∥ . . . ∥ fnload===⇒tso Π, F ,T , t, {t1 ; ϵ, . . . tn ; ϵ}

Load

T (t) = (F ,κ) ::K B(t) = bF ⊢ (κ, (b,σ ))

ι7−−→δ tso

(κ ′, (b′,σ ′))

T ′ = T {t ; (F ,κ ′) ::K} ι = τ

(Π, F ,T , t, B,σ ))ι=⇒δ tso

(Π, F ,T ′, t, B{t ; b′}σ ′))

τ -step

T (t) = (F ,κ) ::K B(t) = ϵ

F ⊢ (κ, (ϵ,σ ))e7−−→emp tso

(κ ′, (ϵ,σ ))

T ′ = T {t ; (F ,κ ′) ::K}

(Π, F ,T , t, B,σ ))e==⇒emp tso

(Π, F ,T ′, t, B,σ ))Print

T (t) = (F ,κ) ::K B(t) = b F ⊢ (κ, (b,σ ))call(f, ®v)7−−−−−−−→

emp tso(κ ′, (b,σ ))

F = F1 ::F ′ initRtMd(Π, f, ®v,κ1) T ′ = {t ; (F1,κ1) :: (F ,κ ′) ::K}

(Π, F ,T , t, B,σ )τ==⇒emp tso

(Π, F ′,T ′, t, B,σ )EFCall

T (t) = (F1,κ1) :: (F ,κ) ::K, F1 ⊢ (κ1, (b,σ ))ret(v)7−−−−−→

emp tso(κ ′1, (b,σ ))

AftExtn(κ,v) = κ ′ T ′ = T {t ; (F ,κ ′) ::K}

(Π, F ,T , t, B,σ )τ==⇒emp tso

(Π, F ,T ′, t, B,σ )EFRet

T (t) = (F ,κ) ::ϵ B(t) = ϵ t′ ∈ dom(T \t)

F ⊢ (κ, (ϵ,σ ))ret(v)7−−−−−→

tso(κ ′, (ϵ,σ ))

(Π, F ,T , t, B,σ )sw==⇒emp tso

(Π, F ,T \t, t′, B\t,σ )Term

T (t) = (F ,κ) ::ϵ B(t) = ϵ dom(T ) = dom(B) = {t}

F ⊢ (κ, (ϵ,σ ))ret(v)7−−−−−→

tso(κ ′, (ϵ,σ ))

(Π, F ,T , t, B,σ )τ=⇒tso done

Done

t′ ∈ dom(T )

(Π, F ,T , t, B,σ )sw==⇒emp tso

(Π, F ,T , t′, B,σ )Switch

B(t′) = (l,v) ::b σ ′ = apply_buffer((l,v),σ ) B′ = B{t′ ; b}

(Π, F ,T , t, B,σ )ub==⇒emp tso

(Π, F ,T , t, B′,σ ′)

Unbuffer

T (t) = (F ,κ) ::ϵ, B(t) = b F ⊢ (κ, (b,σ )) 7−−→tso

abort

(Π, F ,T , t, B,σ )τ=⇒tso abort

Abort

Figure 18. x86TSO global semantics

20

Page 21: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

(Instr) c ::= mov rd , rs | Iadd rd , rs | call f | ret | . . .

| lock xchg l rs | lock xadd l rs | lock dec l | mfence(Core) κ ::= Register → Val

(Register) r ::= PC | SP | . . .

find_instr(ge,κ(PC)) = (mov rd , [rs ]) κ(rs ) = ll < dom(b) σ (l) = v δ = ({l}, ∅)

(ge, F ) ⊢ (κ, (b,σ ))τ7−−→δ tso

(κ{r ; v, PC ; PC + 1}, (b,σ ))READMEM

find_instr(ge,κ(PC)) = (mov rd , [rs ]) κ(rs ) = lb = b1 ++(l,v) ::b0 l < dom(b0) δ = ({l}, ∅)

(ge, F ) ⊢ (κ, (b,σ ))τ7−−→δ tso

(κ{r ; v, PC ; PC + 1}, (b,σ ))READBUF

find_instr(ge,κ(PC)) = (mov [rd ], rs ) κ(rs ) = v κ(rd ) = lδ = (∅, {l}) apply_buffer(b ++(l,v),σ ) = σ ′

(ge, F ) ⊢ (κ, (b,σ ))τ7−−→δ tso

(κ{PC ; PC + 1}, (b ++(l,v),σ ))WRITEBUF

find_instr(ge,R(PC)) = lock dec l r R(SF) = σ (l) ≤ 0κ ′ = κ{PC ; PC + 1, r ; σ (l)} σ ′ = σ {l ; σ (l) − 1} δ = ({l}, {l})

(ge, F ) ⊢ (κ, (ϵ,σ ))τ7−−→δ tso

(κ ′, (ϵ,σ ′))LOCKDEC

find_instr(ge,R(PC)) = mfence

(ge, F ) ⊢ (κ, (ϵ,σ ))τ7−−→emp tso

(κ{PC ; PC + 1}, (ϵ,σ ))BARRIER

find_instr(ge,R(PC)) = call print()κ ′ = κ{PC ; PC + 1} σ (first argument) = v

(ge, F ) ⊢ (κ, (ϵ,σ ))print(v)7−−−−−−−−→

emp tso(κ ′, (ϵ,σ ))

PRINT

apply_buffer((l,v) ::b,σ ) ::= apply_buffer(b,σ {l ; v})apply_buffer(ϵ,σ ) ::= σ

Figure 19. x86TSO thread local semantics

21

Page 22: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

objM(l, Σ)def= l ∈ dom(Σ) ∧ Σ(l) = (_,None )

objFP(δ , Σ)def= ∀l ∈ (δ .rs ∪ δ .ws) . objM(l, Σ)

clientFP(δ , Σ)def= ∀l ∈ (δ .rs ∪ δ .ws) .¬objM(l, Σ)

conflictc (δ , t, B)def= (δ .rs ∪ δ .ws) ∩

⋃t′,t dom(B(t′)) , ∅

bufConflict(B, Σ) def= ∃l, t1, t2.

l ∈ dom(B(t1)) ∩ dom(B(t2))∧¬objM(l, Σ)

G ⇒ R iff G((Σ,σ ), (Σ′,σ ′)) =⇒ R((Σ,σ ), (Σ′,σ ′))

In I′((Σ,σ ), (Σ′,σ ′))def= I((Σ,σ )) ∧ I′((Σ′,σ ′))

Figure 20.Auxiliary definitions formodule local simulations

D.2 Object correctness definitionDefinition 26 (Object corrrectness).(tltso, geo, πo ) 4

o (slCImp, geo,γo ) iff ∃Ro,Go, Io. such that1. for any t, (slCImp, geo,γo ) <o

Rto;Gt

o;Io(tltso, geo, πo ), and

2. RespectPartitiono (Ro,Go, objM) and3. Sta(Io,Rto) and4. ∀t1 , t2.G

t1o ⇒ Rt2o .

Where RespectPartitiono is defined in definition 27.(slCImp, geo,γo ) <

oRto;Gt

o;Io(tltso, geo, πo ) is an extension of

Liang and Feng [19] with the support of TSO semanticsfor the low-level code, and is defined in definition 28.

Definition 27 (Respect partition).RespectPartitiono (Ro,Go,po) iff

1. ∀Σ, B,σ , Σ′, B′,σ ′, t.

Σ′{l | po(l ,Σ)}============ Σ ∧ σ ′ {l | po(l ,Σ)}

============ σ

(B′ = B∨∃l,v, t′ , t.¬po(l, Σ) ∧ B′ = B{t′ ; B(t′) ++(l,v)}

)=⇒ Rto((Σ, (B,σ )), (Σ′, (B′,σ )))

2. ∀Σ, Σ′, B,σ , B′,σ ′, t.

Gto((Σ, (B,σ ))(Σ′, (B′,σ ′)) =⇒(Σ′

{l | ¬po(l ,Σ)}============= Σ ∧ σ ′ {l | ¬po(l ,Σ)}

============= σ∧(B′ = B ∨ ∃l,v .po(l, Σ) ∧ B′ = B{t ; B(t) ++(l,v)})

)Here we partition memory into object memory and client

memory using assertion po. And this definition says objectrely/guarantee relations Ro/Go respect this partition if andonly if the object rely Ro holds for any change of non-objectmemory, and the object guarantee Go guarantees non-objectmoemry unchanged.We instantiated the partition as objM, defined in Fig. 20,

which determines whether a location is in object memoryby the location’s permission.

L·M lifts object global environment geo to memory withNone permission:

LgeoM(l)def=

{(geo (l),None ), if l ∈ dom(geo )undefined, otherwise

Definition 28 (Object module simulation).(slCImp, geo,γo ) <

oRto;Gt

o;Io(tltso, geo, πo ) iff

1. ∀Σ,σ . LgeoM(l) ⊆ Σ ∧ geo ⊆ σ =⇒

Io(Σ, ({t1 ; ϵ, . . . , tn ; ϵ},σ )), and2. ∀Σ, B,σ , f, ®v,κ . if InitCore(π , f, ®v) = κ and Io(Σ, (B,σ )) then

∃k. InitCore(γ , f, ®v) = k and (k, Σ) <ot;Rt

o;Gto;Io

(κ, (B,σ )).Here (k, Σ) <o

t;Rto;Gt

o;Io(κ, (B,σ )) is defined as the largest

relation such that whenever (k, Σ) <ot;Rt

o;Gto;Io

(κ, (B,σ )) ,then all the following hold:a. if (κ, (B(t),σ )) τ

7−−→δ

(κ ′, (b′,σ ′) theneither (k, Σ) 7−−→∗abort,or ∃k′, Σ′,∆. such thati. (k, Σ) τ

7−−→∆

∗(k′, Σ′) or

(k, Σ) EntAtom7−−−−−−−−→ (k0, Σ)

τ7−−→∆

∗(k1, Σ′)ExtAtom7−−−−−−−−→ (k′, Σ′)

, andii. Gt

o((Σ, (B,σ )), (Σ′, (B{t ; b′},σ ′))) , and objFP(δ , Σ)

, andiii. (k′, Σ′) <o

t;Rto;Gt

o;Io(κ ′, (B{t ; b′},σ ′)).

b. if (κ, (B(t),σ )) ret(v)7−−−−−→

emp(κ ′, (B(t),σ ) then

either (k, Σ) 7−−→∗abort,or ∃k′. such thati. (k, Σ) ret(v)

7−−−−−−−→emp

∗(k′, Σ), andii. (k′, Σ) <o

t;Rto;Gt

o;Io(κ ′, (B,σ )).

c. if Rto((Σ, (B,σ )), (Σ′, (B′,σ ′)) then(κ, Σ′) <o

t;Rto;Gt

o;Io(κ, (B′,σ ′)).

d. ¬(κ, (B(t),σ )) call(f, ®v)7−−−−−−−→

emp(κ ′, (B(t),σ ).

e. if (κ, (B(t),σ )) 7−−→ abort, then (κ, Σ) 7−−→ abort.

D.3 Proof of Theorem 15Theorem 29 (Correctness with x86-TSO backend and obj.).For any f1 . . . fn , Γ = {(sl, ge1,γ1), . . . , (sl, gem,γm ), (slCImp, geo,γo )},and Π = {(tl, ge′1, π1), . . . , (tl, ge

′m, πm ), (tl, geo, πo )}, if

1. ∀i ∈ {1, . . . ,m}. (CompCert.CodeT(γi ) = πi ) ∧ injective(φ)∧ (CompCert.φ = φ) ∧ ⌊φ⌋gei = ge′i ,

2. Safe(let Γ in f1 ∥ . . . ∥ fn ),3. DRF(let Γ in f1 ∥ . . . ∥ fn ),4. ∀i ∈ {1, . . . ,m}. ReachClose(sl, gei ,γi ),

and ReachClose(slCImp, geo,γo ),5. (tl, geo, πo ) 4o (slCImp, geo,γo )

then let Γ in f1 ∥ . . . ∥ fn ⊒′ let Π in f1 ∥ . . . ∥ fn .

Proof. By transitivity of refinement, it suffices to provelet Γ in f1 ∥ . . . ∥ fn ⊒ let Πsc in f1 ∥ . . . ∥ fn (D.1)

andlet Πsc in f1 ∥ . . . ∥ fn ⊒′ let Π in f1 ∥ . . . ∥ fn (D.2)

WhereΠsc = {(tlsc, ge′1, π1), . . . , (tlsc, ge

′m, πm ), (slCImp, geo,γo )}

(D.1) is proved by applying Theorem 19, and premise 1-4.

22

Page 23: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

To prove (D.2), by Lemma 30 below, it suffice to proveSafe(let Γ in f1 ∥ . . . ∥ fn ) and DRF(let Γ in f1 ∥ . . . ∥ fn ). Thefirst is implied by premise 2 and (D.1); the second is impliedby premise 3 and DRF preservation results as we discussedin section 5. �

Lemma 30 (Restore SC semantics for DRF x86 programs).For anyΠsc = {(tlsc, ge1, π1), . . . , (tlsc, gem, πm ), (slCImp, geo,γo )},Πtso = {(tltso, ge1, π1), . . . , (tltso, gem, πm ), (tltso, geo, πo )},and for any f1, . . . , fn ,

1. Safe(let Πsc in f1 ∥ . . . ∥ fn ), and2. DRF(let Πsc in f1 ∥ . . . ∥ fn ), and3. (tl, geo, πo ) 4o (slCImp, geo,γo )

thenlet Πsc in f1 ∥ . . . ∥ fn ⊒′ let Πtso in f1 ∥ . . . ∥ fn .

Proof. By lemma 32, it suffices to provelet Πtso in f1 ∥ . . . ∥ fn 6id let Πsc in f1 ∥ . . . ∥ fn , as definedbelow.By compositionality theorem 37 and lemma 35, 36, 42, it

suffices to prove DRF(let Πsc in f1 ∥ . . . ∥ fn ), andSafe(let Πsc in f1 ∥ . . . ∥ fn ), which are exactly premises 1and 2. �

D.3.1 Whole-program simulation between TSO andSC

Definition 31 (Whole program simulation).For any f1, . . . , fn , Πsc,Πtso, geo ,let Πtso in f1 ∥ . . . ∥ fn 6id let Πsc in f1 ∥ . . . ∥ fn iff∀σ ,T , t, B. iflet Πtso in f1 ∥ . . . ∥ fn

load===⇒tso (Πtso, F ,T , t, (B,σ )) then

∃Σ,T. let Πsc in f1 ∥ . . . ∥ fnload===⇒ (Πx86-SC, F ,T, t, 0, Σ)) and

(Πtso, F ,T , t, (B,σ )) 6id (Πsc, F ,T, t, 0, Σ)).

HereW 6id W is defined as the largest relation such thatwheneverW 6id W, then the following hold:

1. ifWτ /ub===⇒W ′ then there existsW′ such thatW τ

=⇒∗W′

andW ′ 6id W′.

2. ifW ι=⇒W ′ and ι , τ and ι , ub then there exists W′

such thatW ι=⇒W′ andW ′ 6id W

′.3. ifW ==⇒ abort, thenW ==⇒∗abort.4. ifW ==⇒ done, thenW ==⇒∗done.

Lemma 32 (whole program simulation implies ref.).For any f1, . . . , fn , Πsc,Πtso,let Πtso in f1 ∥ . . . ∥ fn 6id let Πsc in f1 ∥ . . . ∥ fn=⇒ let Πsc in f1 ∥ . . . ∥ fn ⊒id let Πtso in f1 ∥ . . . ∥ fn

D.3.2 Client simulation and compositionDefinition 33 (Client module simulation).(tlsc, ge, π ) <c

Rtc ;Gt

c ;Ic(tltso, ge, π ) iff

∀Σ, B,σ , f, ®v,κ . if InitCore(π , f, ®v) = κ and Ic(Σ, (B,σ )) then(κ, Σ) <c

t;Rtc ;Gt

c ;Ic(κ, (B,σ )).

Here (κ, Σ) <ct;Rt

c ;Gtc ;Ic

(κ, (B,σ )) is the largest relation suchthat whenever (κ, Σ) <c

t;Rtc ;Gt

c ;Ic(κ, (B,σ )), then the following

hold:1. if (κ, (B(t),σ )) ι

7−−→δ

(κ ′, (b′,σ ′)) theneither (κ, Σ) 7−−→ abort

or ∃κ ′′, Σ′,∆. such that (κ, Σ) ι7−−→∆

+(κ ′′, Σ′) and δ .rs ∪

δ .ws ⊆ ∆.rs ∪ ∆.ws and clientFP(δ , Σ) and one of thefollowing (a) and (b) holds:a. ¬conflictc (δ , t, B) , and κ ′′ = κ ′ , and ∆ = δ , and

Gtc ((Σ, (B,σ )), (Σ′, (B{t ; b′},σ ′))) , and

(κ ′, Σ′) <ct;Rt

c ;Gtc ;Ic

(κ ′, (B{t ; b′},σ ′)) , orb. conflictc (δ , t, B, b′).

2. if (κ, (B(t),σ )) 7−−→ abort, then (κ, Σ) 7−−→∗abort.3. if Rtc ((Σ, (B,σ )), (Σ′, (B′,σ ′)), then

(κ, Σ′) <ct;Rt

c ;Gtc ;Ic

(κ, (B′,σ ′)).

Definition 34 (Ic,Rc,Gc definitions).

Ic(Σ, (B,σ ))def=

¬bufConflict(B, Σ)

(∀σ ′, l . apply_buffers(B,σ ,σ ′) ∧ ¬objM(l, Σ)

=⇒ Σ(l) = σ ′(l)

)Rtc

def= Ic n Ic

Gtc ((Σ, (B,σ )), (Σ′, (B′,σ ′)))

def=

Ic(Σ, (B,σ )) ∧ Ic(Σ′, (B′,σ ′))

∧ σ ′ = σ ∧ Σ′{l | objM(l ,Σ)}============== Σ

∧©­«

B′ = B{t ; B(t) ++(l,v)} ∧ ¬objM(l, Σ)∨

B′ = B

ª®¬Where apply_buffers is inductively defined as:

B = {t1 ; ϵ, . . . , tn ; ϵ}

apply_buffers(B,σ ,σ )

B(t) = (l,v) ::bapply_buffers(B{t ; b},σ {l ; v},σ ′)

apply_buffers(B,σ ,σ ′)

Ic says buffered writes in different threads are not con-flicted (unless they are object writes, i.e.write to location lwhich is objM(l, Σ)) and client memory are equal between SCand TSO when buffers are applied to TSO memory.

Gtc says a thread can only append write (l,v) to buffer

when the location l is not object memory location.

Lemma 35 (client-object R/G properties).For any Ro,Go, Io, if RespectPartitiono (Ro,Go, objM) then∀t1 , t2. G

t1c ⇒ Rt2

o and ∀t1, t2. Gt1o ⇒ Rt2

c

Proof. Trivial by definition of RespectPartitiono and Rc,Gc.�

Lemma 36 (client R/G/I properties).Sta(Ic,Rtc ) ∧ ∀t1, t2.G

t1c ⇒ Rt2c

23

Page 24: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

Proof. Trivial by definition. �

Theorem 37 (TSO obj. compositionality).For any f1, . . . , fn ,Πsc = {(tlsc, ge1, π1), . . . , (tlsc, gem, πm ), (slCImp, geo,γo )},Πtso = {(tltso, ge1, π1), . . . , (tltso, gem, πm ), (tltso, geo, πo )},

∀i ∈ {1, . . . ,m}. (tlsc, gei , πi ) <cRtc ;Gt

c ;Ic(tltso, gei , πi )

(tltso, geo, πo ) 4o (slCImp, geo,γo )

DRF(let ΠSC in f1 ∥ . . . ∥ fn )Safe(let ΠSC in f1 ∥ . . . ∥ fn )

let Πtso in f1 ∥ . . . ∥ fn 6id let Πsc in f1 ∥ . . . ∥ fn

Proof. The proof is similar to RGSim compositionality proof,except that we need to prove the TSO execution will not haveconflicting client footprints, which is the result of theorem 38.

D.3.3 DRF implies no conflicting client footprint onTSO machine

Theorem 38 (DRF implies no conflicting client footprint).For any f1, . . . , fn ,Πsc = {(tlsc, ge1, π1), . . . , (tlsc, gem, πm ), (slCImp, geo,γo )},Πtso = {(tltso, ge1, π1), . . . , (tltso, gem, πm ), (tltso, geo, πo )},if

1. ∀i ∈ {1, . . . ,m}. (tlsc, gei , πi ) <cRtc ;Gt

c ;Ic(tltso, gei , πi ), and

2. (tltso, geo, πo ) 4o (slCImp, geo,γo ), and3. Safe(let Πsc in f1 ∥ . . . ∥ fn ), and4. DRF(let Πsc in f1 ∥ . . . ∥ fn ), and

then ∀W0,W0., if let Πsc in f1 ∥ . . . ∥ fnload===⇒W0, and

let Πtso in f1 ∥ . . . ∥ fnload===⇒tso W0, then

∀W ′,W ′′. W =⇒∗W ′ ι=⇒δW ′′ implies

¬(clientFP(δ , Σ) ∧ conflictc (δ ,W ′.t,W ′.B))

Proof. We prove by contradiction. Assume clientFP(δ , Σ) ∧

conflictc (δ ,W ′.t,W ′.B) , we show we are able to construct SCexecution with data race, which contradicts with premise 4.

By clientFP(δ , Σ) ∧ conflictc (δ ,W ′.t,W ′.B), we know thereexistsW1, . . . ,WN , δ0, . . . , δN−1, ι0, . . . , ιN−1 such that∀i ∈ {0, . . . ,N − 2}.Wi

ιi=⇒δi

Wi+1 ∧ ¬(clientFP(δi , Σ) ∧ conflictc (δi ,Wi .t,Wi .B)) and

WN−1ιN−1====⇒δN−1

WN andclientFP(δN−1, Σ) ∧ conflictc (δN−1,WN−1.t,WN−1.B).

By conflictc (δN−1,WN−1.t,WN−1.B), there exists one bufferitem (l,v) in buffer of some thread such that l ∈ δN−1.rs ∪δN−1.ws. The buffer item (l,v) must be inserted during theexecution fromW0 toWN , i.e., there exists M < N such thatWM .t , WN−1.t and δM ⌢ δN−1 and clientFP(δM , Σ) andl ∈ δM .ws, and (l,v) not unbuffered during the executionfromWM toWN .

W0 . . . WM WM+1 . . . WN−1 WNδM δN−1

Now we reorder the execution, to postpone the executionof threadWM .t afterWM , such that we could construct anexecution trace as show below, such that ∀i ∈ {0, . . . ,N −

2}.¬(clientFP(δ ′i , Σ) ∧ conflictc (δ ′i ,W′i .t,W

′i .B)) and δ ′ ⌢ δ ′′

and l ∈ δ ′.ws and¬conflictc (δ ′, t′,W ′N−1.B) , where clientFP(δ

′, Σ)

and clientFP(δ ′′, Σ).

W0 . . . W ′N−1

W ′N

W ′′N

δ ′

δ ′′

t′

t′′

Since thread local steps would not change TSO-memory(they only insert buffer items to buffer), we can safely reordersteps of different threads except unbuffer steps. To reordernormal (not unbuffer) step with unbuffer steps, we applylemma 39 to reorder non-unbuffer step with unbuffer of thesame thread, and lemma 40 to reorder non-unbuffer stepwith unbuffer of another thread.

Since (l,v) not unbuffered during the execution fromW0toWN , we could always apply lemma 39 to reorder step M

with unbuffer steps of the same thread.To show premise of lemma 40 hold, it suffice to prove

∀i . ¬bufConflict(Wi .B, Σ). Consider the invariant R betweenSC and TSO program configurations, as defined in defini-tion 41 below. It is obvious that R(W0,W0) holds.By∀i ∈ {0, . . . ,N−1}.¬(clientFP(δi , Σ)∧conflictc (δi ,Wi .t,Wi .B))and safety premise Safe(let Πsc in f1 ∥ . . . ∥ fn ), we coulddo induction on N and apply client or object local simula-tions to construct SC execution Σ1, . . . , ΣN such that ∀i ∈

{0, . . . ,N − 1}.R(Wi ,Wi ,Ro,Go, Io), as shown below.

W0 . . . WM WM+1 . . . WN−1 WN

W0 . . . WM WM+1 . . . WN−1

By definition of R and Ic, ∀i < N . ¬bufConflict(Wi .B, Σ).Now we have reordered the execution toW ′

1 , . . . ,W′N−1.

We do the above induction again onW ′1 , . . .W

′N−1, and con-

struct SC execution W′1, . . .W

′N−1 such that ∀i ∈ {0, . . . ,N −

1}.R(W′i ,W

′i ,Ro,Go, Io), as shown below.

W0 . . . W ′N−1

W0 . . . W′N−1

24

Page 25: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

By R(W′N−1,W

′N−1,Ro,Go, Io), clientFP(δ ′, Σ),

and clientFP(δ ′, Σ), we can apply client simulation and thereexists W′

N ,W′′N ,∆

′,∆′′ such that ∆′ = δ ′ and δ ′′.rs ∪ δ ′′.ws ⊆∆′′.rs ∪ ∆′′.ws and

W′N−1

W′N

W′′N

∆′

∆′′

t′

t′′

Since l ∈ δ ′.ws and l ∈ δ ′′.ws ∪ δ ′′.ws, we have ∆′ ⌢ ∆′′,which indicates a racing execution that contradicts withpremise 4.

Lemma 39 (Reordering unbuffer and write buffer step ofthe same thread).∀κ, b,σ , δ , l,v, l ′,v ′.

if (κ, ((l,v) ::b,σ )) τ7−−→δ tso

(κ, ((l,v) ::b ++(l ′,v ′),σ )),

then (κ, (b,σ {l ; v}))τ7−−→δ tso

(κ, (b ++(l ′,v ′),σ {l ; v})).

Proof. Trivial by x86-TSO semantics. �

Lemma 40 (Reordering unbuffer and write buffer step ofdifferent threads).∀κ, b,σ , δ , l,v, l ′,v ′. if (κ, (b,σ )) τ

7−−→δ tso

(κ, (b ++(l ′,v ′),σ )) andl , l ′,then (κ, (b,σ {l ; v}))

τ7−−→δ tso

(κ, (b ++(l ′,v ′),σ {l ; v})).

Proof. It suffice to guarantee l < δ .rs ∪ δ .ws. By x86-TSOsemantics, a step with non-empty read and write footprintmust have the same read/write set, i.e. δ .rs = δ .ws = {l ′}.And we have l < δ .rs ∪ δ .ws by premise. �

Definition 41 (whole program simulation invariant).R((Γ, FS,T, t,d, Σ), (Π, F ,T , t′, B,σ ),Ro,Go, Io) iff

1. t = t′, and d = 0, and FS = F , and Ic(Σ, (B,σ )), andIo(Σ, (B,σ )), and

2. ∀t1. (T(t1), Σ) ≈ (T (t1), B,σ ), and3. ¬(Γ, FS,T, t,d, Σ) Z=⇒ Race and

¬(Γ, FS,T, t,d, Σ) =⇒∗abort.

Here (T(t), Σ) ≈ (T (t), B,σ ) iffif T (t) = (F1,κ1) :: . . . :: (Fk ,κk ) ::ϵand T(t) = (F1, k1) :: . . . :: (F′k , k

′k ) ::ϵ ,

then1. k = k ′, and2. Fj = Fj forall j = 1 . . .k , and3. (kj , Σ) <c

t;Rtc ;Gt

c ;Ic(κj , (B,σ )) forall j > 1, and

4. (k1, Σ) <ct;Rt

c ;Gtc ;Ic

(κ1, (B,σ )) or(k1, Σ) <o

t;Rto;Gt

o;Io(κ1, (B,σ )).

D.3.4 x86 assembly module client simulationLemma 42 (Client module simulation).For any ge, π , t, (tlx86-SC, ge, π ) <c

Rtc ;Gt

c ;Ic(tltso, ge, π )

Proof. Since client code are identical, and CompCert memorymodel guaranteed x86-SC semantics to abort when accessingmemory location with None permission (i.e., objM), it sufficeto prove if Ic(Σ, (B,σ )) then reading/writing the same locationl from/to Σ or (B,σ ) yields results related by Ic and Gc, if ldoes not conflict with B. I.e.,

1. ∀l, t.¬objM(l, Σ) =⇒ Σ(l) = apply_buffer(B(t),σ )(l),and

2. ∀l,v, t.¬objM(l, Σ) ∧ ¬conflictc ((∅, {l}), t, B)=⇒ Ic(Σ{l ; v}, (B{t ; B(t) ++(l,v)},σ ))

∧Gtc ((Σ, (B,σ )), (Σ{l ; v}, (B{t ; B(t) ++(l,v)},σ )))

1 is obvious by definition of Ic.For 2, to show Ic(Σ{l ; v}, (B{t ; B(t) ++(l,v)},σ )), it suf-

fices to prove ¬bufConflict(B{t ; B(t)++(l,v)}), which is obvi-ous by¬conflictc ((∅, {l}), t, B).Gt

c ((Σ, (B,σ )), (Σ{l ; v}, (B{t ;B(t) ++(l,v)},σ ))). The latter is also obvious by ¬objM(l, Σ).

D.4 Spinlock spec. and impl. satisfies our obj.correctness

We prove the lock specification and implementation in Fig. 10satisfies our object correctness, i.e., (tltso, gelock , πlock ) 4o

(slCImp, gelock ,γlock ).

Lemma 43 (Lock impl. correctness).(tltso, gelock , πlock ) 4

o (slCImp, gelock ,γlock )

Proof. To prove object correctness, the key is to find correctRo,Go, Io.For the spinlock spec. and impl., we define Ro,Go, Io in

Fig. 21, where L is the lock variable.The invariant Io says the lock variable has three possible

states, which is1. Iavailable: L in both SC and TSO memory is available,

and there is no buffered write to L

2. Iunavailable: L in both SC and TSOmemory is unavailable,and there is no buffered write to L

3. Ibuffered: L is available in SC memory, but unavailablein TSO memory and there is no buffered write to L

It is obvious that RespectPartitiono (Ro,Go, objM) and Sta(Io,Rto)and ∀t1 , t2.G

t1o ⇒ Rt2o .

The proof of lock spec./impl. has the simulation(slCImp, geo,γo ) <

oRto;Gt

o;Io(tltso, geo, πo ) is also straight forward,

and is mechanized in Coq.�

25

Page 26: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

Iodef= Iavailable ∨ Ibuffered ∨ Iunavailable

Gto

def= Gt

lock ∨ Gtunlock ∨ Id

Rtodef=

(∨t,t ′ G

t ′o

)∨ Rbuffer

Iavailable(Σ, (B,σ ))def= Σ(L) = (1,None ) ∧ σ (L) = 1 ∧ (∀ t . L < dom(B(t)))

Ibuffered(Σ, (B,σ ))def= Σ(L) = (1,None ) ∧ σ (L) = 0 ∧ ∃ t, buffered_unlock(t, B)

Iunavailable(Σ, (B,σ ))def= Σ(L) = (0,None ) ∧ σ (L) = 0 ∧ (∀ t . L < dom(B(t)))

buffered_unlock(t, B) def= B(t) = b ++(L, 1) ::b′ ∧ L < dom(b) ∪ dom(b′) ∧ ∀ t ′ , t . L < dom(B(t ′))

Id((Σ, (B,σ )), (Σ′, (B′,σ ))) def= Σ = Σ′ ∧ σ = σ ′ ∧ B = B′

Gtlock((Σ, (B,σ )), (Σ

′, B′,σ ′))def= Σ(L) = (1,None ) ∧ σ (L) = 1 ∧ Σ′ = Σ{L ; (0,None )} ∧ σ ′ = σ {L ; 0} ∧ B = B′

Gtunlock((Σ, (B,σ )), (Σ

′, B′,σ ′))def= Σ(L) = (0,None ) ∧ σ (L) = 0 ∧ Σ′ = Σ{L ; (1,None )} ∧ σ = σ ′ ∧ B′ = B{t B(t) ++(L, 1)}

Rbuffer((Σ, (B,σ )), (Σ′, B′,σ ′))def= Ibuffered n Iavailable ∨ Iavailable n Iavailable ∨ Ibuffered n Ibuffered ∨ Iunavailable n Iunavailable

Figure 21. Ro,Go, Io instantiations for spinlock

26

Page 27: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

(BlockID) b ∈ N

(State) σ , Σ ∈ BlockID⇀fin N⇀fin Value

(U) l ::= (b, i) where i ∈ N

(Value) v ::= l | . . .

(BSet) S, S ∈ P(BlockID)

(FList) F , F ∈ Pω (BlockID)

(LocSet) rs,ws, ls, ge ∈ P(U)

(FtPrt) δ ,∆ ::= (rs,ws)

locs(σ )def= {(b, i) | b ∈ dom(σ ) ∧ i ∈ dom(σ (b))}

blocks(ls) def= {b | ∃i . (b, i) ∈ ls}

TSU def= S × N

forward(σ ,σ ′) iff (dom(σ ) ⊆ dom(σ ′)) ∧ (locs(σ ′) ∩ Tdom(σ )U ⊆ locs(σ ))

σls=== σ ′ iff ∀(b, i) ∈ ls. ((b, i) < locs(σ ) ∪ locs(σ ′)) ∨

((b, i) ∈ locs(σ ) ∩ locs(σ ′)) ∧ (σ (b)(i) = σ ′(b)(i))

LEqPre(σ1,σ2, δ , F ) iff σ1δ .rs====== σ2 ∧ (locs(σ1) ∩ δ .ws) = (locs(σ2) ∩ δ .ws) ∧ (dom(σ1) ∩ F ) = (dom(σ2) ∩ F )

LEqPost(σ1,σ2, δ , F ) iff σ1δ .ws====== σ2 ∧ (dom(σ1) ∩ F ) = (dom(σ2) ∩ F )

LEffect(σ1,σ2, δ , F ) iffσ1

U−δ .ws========= σ2 ∧ (locs(σ2) − locs(σ1)) ⊆ δ .ws ∧ (locs(σ1) − locs(σ2)) ⊆ δ .ws ∩ δ .rs ∧ (dom(σ2) − dom(σ1)) ⊆ F

Figure 22. The Abstract Concurrent Language

E The Framework Supporting Stack Pointer EscapeOur framework presented earlier disallows escape of pointers of stack variables. Actually it still works when escaping stackpointers are added back in, though we need to revise the definitions of simulations and the proofs would become more involved.In this section, we give the revised definitions of simulations, and the on-paper proofs of the key lemmas.Our way to support stack pointer escape is actually very similar to the approach in Compositional CompCert, which is

relatively orthogonal to our new framework for concurrent compiler correctness.

E.1 Revised Definitions and Key LemmasThe state model. To make the problem of escaping stack pointers more concrete, we follow Compositional CompCert andinstantiate the state model as a block-based state model. Fig. 22 gives the instantiation of Fig. 4. Now a memory location l is apair of a block ID b and an offset (a natural number). U is the set of all the possible memory locations. We need U to takethe complement of a set of memory locations. S and F are sets of block IDs, and footprints and ge are sets of locations. Weuse dom(σ ) to get a set of block IDs, and locs(σ ) to get a set of locations. The module-local semantics is still abstract. Thepreemptive and the non-preemptive global semantics is not changed.We also adapt the definition of “well-defined languages” to the block-based state model. The definition of wd(tl) given

in Def. 1 is not changed, but we need to adapt the predicates forward(σ ,σ ′), LEffect(σ ,σ ′, δ , F ), LEqPre(σ ,σ1, δ , F ) andLEqPost(σ ′,σ ′

1, δ , F ) used in the definition. Their new definitions are given at the bottom of Fig. 22.

Module-local downward simulation. The revised simulation is still parameterized with µ. But here S and S are sets of blockIDs and f is a function mapping block IDs to addresses.

µdef= (S, S, f ) , where S, S ∈ P(BlockID) and f ∈ BlockID⇀ BlockID × N

Since we allow stack pointer escape, S (and S) may include escaped stack pointers, as well as ge. As a consequence, S (and S)may change during the executions of source and target code. The function f may include mappings from source-level escapedstack pointers to the corresponding target-level escaped stack pointers.

We give the useful auxiliary definitions in Fig. 23. Most definitions are similar to their counterparts in Fig. 8. Definition 44defines the module-local downward simulation which supports escape of stack pointers.

27

Page 28: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

wf(µ, Σ,σ ) iffinjective(µ . f , Σ) ∧ (µ .S = dom(µ . f )) ∧ (µ .S ⊆ dom(Σ)) ∧ (µ . f {{µ .S}} ⊆ µ .S) ∧ (µ . f ⟨⟨locs(Σ)⟩⟩ ⊆ locs(σ ))

wf(µ, (Σ, F), (σ , F )) iffwf(µ, Σ,σ ) ∧ (µ . f {{µ .S ∩ F}} ⊆ F ) ∧ (µ . f {{µ .S − F}} ∩ F = ∅)

FPmatch(µ, (∆, Σ), (δ , F )) iffG(δ , F , LShare(µ, Σ)) ∧ (δ .rs ∩ LShare(µ, Σ) ⊆ µ . f ⟨⟨locs(Σ) ∩ ∆.rs⟩⟩) ∧ (δ .ws ∩ LShare(µ, Σ) ⊆ µ . f ⟨⟨locs(Σ) ∩ ∆.ws⟩⟩)

LShare(µ, Σ)def= µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩

HGuar(µ,∆, F) iff G(∆, F, Tµ .SU)

G(∆, F, ls) iff ∆.rs ∪ ∆.ws ⊆ TFU ∪ ls

Rely((µ, µ ′), (Σ, Σ′, F), (σ ,σ ′, F )) iff R(Σ, Σ′, F, Tµ .SU) ∧ R(σ ,σ ′, F , LShare(µ, Σ)) ∧ EvolveR(µ, µ ′, Σ′,σ ′, F, F )

R(Σ, Σ′, F, ls) iff (ΣTFU−ls======== Σ′) ∧ forward(Σ, Σ′) ∧ ((dom(Σ′) − dom(Σ)) ∩ F = ∅)

EvolveG(µ, µ ′, Σ′,σ ′, F, F ) iff Evolve(µ, µ ′, Σ′,σ ′, F, F ) ∧ (µ ′.S − µ .S ⊆ F)

EvolveR(µ, µ ′, Σ′,σ ′, F, F ) iff Evolve(µ, µ ′, Σ′,σ ′, F, F ) ∧ ((µ ′.S − µ .S) ∩ F = ∅)

Evolve(µ, µ ′, Σ′,σ ′, F, F ) iff wf(µ ′, (Σ′, F), (σ ′, F )) ∧ (µ . f ⊆ µ ′. f ) ∧ Inv(µ ′. f , Σ′,σ ′) ∧ µ ′.S = cl(µ .S, Σ′) ∧ µ ′.S = cl(µ .S,σ ′)

Inv(f , Σ,σ ) iff

∀b,n,b ′,n′. ((b,n) ∈ locs(Σ)) ∧ (f ⟨(b,n)⟩ = (b ′,n′)) =⇒ ((b ′,n′) ∈ locs(σ )) ∧ (Σ(b)(n)f↪→ σ (b ′)(n′))

v1f↪→ v2 iff(v1 < U) ∧ (v1 = v2) ∨ v1 ∈ U ∧v2 ∈ U ∧ f ⟨v1⟩ = v2

injective(f , Σ) iff∀l1, l2, l ′1, l

′2. l1 , l2 ∧ l1 ∈ locs(Σ) ∧ l2 ∈ locs(Σ) ∧ f ⟨l1⟩ = l ′1 ∧ f ⟨l2⟩ = l ′2 =⇒ l ′1 , l

′2

injective(f ) iff∀l1, l2, l ′1, l

′2. l1 , l2 ∧ f ⟨l1⟩ = l ′1 ∧ f ⟨l2⟩ = l ′2 =⇒ l ′1 , l

′2

initM(φ, ge, Σ,σ ) iff ge⊆ locs(Σ) ∧ closed(dom(Σ), Σ) ∧ locs(σ ) = φ⟨⟨locs(Σ)⟩⟩ ∧ dom(σ ) = φ{{dom(Σ)}} ∧ Inv(φ, Σ,σ )

f {{S}}def= {b ′ | ∃b . (b ∈ S) ∧ f (b) = (b ′, _)}

f ⟨l⟩def= (b ′,n + n′) , if l = (b,n) ∧ f (b) = (b ′,n′)

f ⟨⟨ws⟩⟩ def= {l ′ | ∃l . (l ∈ ws) ∧ f ⟨l⟩ = l ′}

f |Sdef= {(b, f (b)) | b ∈ (S ∩ dom(f ))}

f2 ◦ f1def= {(b1, (b3,n2 + n3)) | ∃b2. f1(b1) = (b2,n2) ∧ f2(b2) = (b3,n3)}

closed(S, Σ) iff cl(S, Σ) ⊆ S

cl(S, Σ)def=

⋃k clk (S, Σ) , where clk (S, Σ) is inductively defined:

cl0(S, Σ)def= S

clk+1(S, Σ)def= {b ′ | ∃b,n,n′. (b ∈ clk (S, Σ)) ∧ Σ(b)(n) = (b ′,n′)}

Figure 23. Footprint Matching and Evolution of µ in Our Simulation

28

Page 29: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

The key to support stack pointer escape is to describe the possible changes of µ (i.e., S, S and the mapping f between them)during executions. In Definition 44, we are allowed to pick a new µ ′ at an interaction point (see case 2). The new µ ′ and theearlier µ should be related by EvolveG(µ, µ ′, . . .), which is defined in Fig. 23. In its definition, Evolve(µ, µ ′, . . .) requires that µ ′is well-formed, µ ′.S and µ ′.S are closed and the mapping µ ′. f can only be enlarged. The condition (µ ′.S − µ .S ⊆ F) says that amodule cannot leak out arbitrary addresses, instead it can only leak out pointers pointing to its own stack.Similarly, the environment step (see case 3 in Definition 44) may also modify µ, but the updates are confined using

EvolveR(µ, µ ′, . . .), which is also defined in Fig. 23. Here the condition ((µ ′.S − µ .S) ∩ F = ∅) says that it is impossible for theenvironment to leak out pointers pointing to the stack of the current module.Other parts in the simulation definition (Definition 44) are very similar to Definition 2. In particular, the silent step case

(case 1) is not changed (though we slightly modified the presentation), since the shared state updates are not exposed to theenvironment between two interaction points.

Definition 44 (Module-Local Downward Simulation).(sl, ge,γ ) 4φ (tl, ge′, π ) ifffor all f, k, Σ, σ , F, F , and µ = (dom(Σ), dom(σ ),φ |dom(Σ)), if sl.InitCore(γ , f) = k, φ⟨⟨ge⟩⟩ = ge′, initM(φ, ge, Σ,σ ), and F ∩dom(Σ) = F ∩ dom(σ ) = ∅, then there exists i ∈ index and κ such that tl.InitCore(π , f) = κ, and(F, (k, Σ), emp, •) 4i

µ (F , (κ,σ ), emp, •).Here we define (F, (k, Σ),∆0, β) 4i

µ (F , (κ,σ ), δ0, β) (where the bit β is either ◦ or •8) as the largest relation such that,whenever (F, (k, Σ),∆0, β) 4i

µ (F , (κ,σ ), δ0, β), then wf(µ, (Σ, F), (σ , F )), and the following are true:

1. ∀k′, Σ′,∆. if β = ◦ and F ⊢ (k, Σ) τ7−→∆

(k′, Σ′) and HGuar(µ,∆0 ∪ ∆, F), then one of the following holds:a. there exists j such that

i. j < i , andii. (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µ (F , (κ,σ ), δ0, ◦);b. or, there exist κ ′, σ ′, δ and j such that

i. F ⊢ (κ,σ )τ7−→δ+(κ ′,σ ′), and

ii. FPmatch(µ, (∆0 ∪ ∆, Σ), (δ0 ∪ δ , F )), andiii. (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µ (F , (κ ′,σ ′), δ0 ∪ δ , ◦);2. ∀k′, ι. if β = ◦, ι , τ and F ⊢ (k, Σ) ι

7−→emp

(k′, Σ) and HGuar(µ,∆0, F),then there exist κ ′, δ , σ ′, κ ′, µ ′ and j such thata. F ⊢ (κ,σ )

τ7−→δ∗(κ ′,σ ′), and

F ⊢ (κ ′,σ ′)ι7−→emp

(κ ′′,σ ′), andb. FPmatch(µ, (∆0, Σ), (δ0 ∪ δ , F )), andc. EvolveG(µ, µ ′, Σ,σ ′, F, F ), andd. (F, (k′, Σ), emp, •) 4j

µ′ (F , (κ′′,σ ′), emp, •);

3. ∀σ ′, Σ′, µ ′, if β = • and Rely((µ, µ ′), (Σ, Σ′, F), (σ ,σ ′, F )), then there exists j such that(F, (k, Σ′), emp, ◦) 4j

µ′ (F , (κ,σ′), emp, ◦).

Lemma 45 (Transitivity).If (sl, ge,γ ) 4φ (sl1, ge1,γ1), (sl1, ge1,γ1) 4φ ′ (tl, ge′, π ), φ⟨⟨ge⟩⟩ = ge′ and ∀S. φ{{S}} ⊆ dom(φ ′), then (sl, ge,γ ) 4φ ′◦φ(tl, ge′, π ).

Compositionality and the non-preemptive global downward simulation. To support stack pointer escape, the wholeprogram simulations also allow one to pick a new µ ′ at interaction points (see case 3 in Definition 46). The new definition ofreach-close modules (see Definition 47) also allows S to change, which is similar to the reach-closed definition in CompositionalCompCert.

Definition 46 (Whole-Program Downward Simulation).Let P = letΓ in f1| . . . |fm , P = letΠ in f1| . . . |fm , Γ = {(sl1, ge1,γ1), . . . , (slm, gem,γm)} andΠ = {(tl1, ge′1, π1), . . . , (tlm, ge

′m, πm)}.

8β is introduced to simplify the presentation only. It is used to make the rely step explicit (i.e. take the clause 3 out from the clause 2 in Def. 44), so the proofscan be cleaner.

29

Page 30: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

P 4φ P iff ∀i ∈ {1, . . . ,m}. φ⟨⟨gei ⟩⟩ = ge′i , and

for any Σ, σ , µ, W, if initM(φ,⋃

gei , Σ,σ ), µ = (dom(Σ), dom(σ ),φ), and (P, Σ) :load==⇒ W, then there exists W , i ∈ index such

that (P,σ ) :load==⇒ W and (W, emp) 4i

µ (W , emp).Here we define (W,∆0) 4i

µ (W , δ0) as the largest relation such that whenever (W,∆0) 4iµ (W , δ0), then the following are

true:1. W.t = W .t, and W.𝕕 = W .𝕕, and wf(µ, W.Σ,W .σ );2. ∀W′,∆. if W :

τ=⇒∆W′, then one of the following holds:

a. there exists j such thati. j < i , andii. (W′,∆0 ∪ ∆) 4j

µ (W , δ0);b. or, there exist W ′, δ and j such that

i. W :τ=⇒δ+W ′, and

ii. FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0 ∪ δ , curF(W ))), andiii. (W′,∆0 ∪ ∆) 4j

µ (W ′, δ0 ∪ δ );3. ∀T′, 𝕕′, Σ′,o. if o , τ and ∃t′. W :

o==⇒emp

(T′, t′, 𝕕′, Σ′), then there existW ′, δ , µ ′,T ′, σ ′ and j such that for any t′ ∈ dom(T′),we havea. W :

τ=⇒δ∗W ′, and W ′ :

o==⇒emp

(T ′, t′, 𝕕′,σ ′), and

b. FPmatch(µ, (∆0, W.Σ), (δ0 ∪ δ , curF(W ))), andc. ((T′, t′, 𝕕′, Σ′), emp) 4j

µ′ ((T′, t′, 𝕕′,σ ′), emp);

4. if W :τ==⇒emp

done, then there exist W ′ and δ such that

a. W :τ=⇒δ∗W ′, and W ′ :

τ==⇒emp

done, and

b. FPmatch(µ, (∆0, W.Σ), (δ0 ∪ δ , curF(W ))).

Definition 47 (Reach Closed Module). ReachClose(sl, ge,γ ) iff , for all f, k, Σ, F and S, if sl.InitCore(γ , f)=k, S=dom(Σ),ge ⊆ S, F∩S = ∅, and closed(S, Σ), then RC(F, S, (k, Σ), •).

Here RC is defined as the largest relation such that, whenever RC(F, S, (k, Σ), β), then all the following hold:1. for all k′, Σ′ and ∆, if β = ◦ and F ⊢ (k, Σ′)

τ7−→∆

(k′, Σ′), then G(∆, F, S) and RC(F, S, (k′, Σ′), ◦).

2. for all k′ and ι, if β = ◦ and ι , τ and F ⊢ (k, Σ) ι7−→emp

(k′, Σ), then there exists S′ such that S′ = cl(S, Σ), and S′ − S ⊆ F,and RC(F, S′, (k′, Σ), •).

3. for all Σ′ and S′, if β = • and R(Σ, Σ′, F, S) and S′ = cl(S, Σ′) and (S′ − S) ∩ F = ∅, then RC(F, S′, (k, Σ′), ◦).

Lemma 48 (Compositionality).For any f1, . . . , fn , φ, Γ = {(sl1, ge1,γ1), . . . , (slm, gem,γm)}, Π = {(tl1, ge′1, π1), . . . , (tlm, ge

′m, πm)}, if

∀i ∈ {1, . . . ,m}.wd(sli ) ∧ wd(tli ) ∧ ReachClose(sli , gei ,γi ) ∧ (sli , gei ,γi ) 4φ (tli , ge′i , πi ) ,

thenlet Γ in f1 | . . . | fn 4φ let Π in f1 | . . . | fn .

Flip of the non-preemptive global simulation.

Definition 49 (Whole-Program Upward Simulation).Let P = letΓ in f1| . . . |fm , P = letΠ in f1| . . . |fm , Γ = {(sl1, ge1,γ1), . . . , (slm, gem,γm)} andΠ = {(tl1, ge′1, π1), . . . , (tlm, ge

′m, πm)}.

P 6φ P iff ∀i ∈ {1, . . . ,m}. φ⟨⟨gei ⟩⟩ = ge′i , and

for any Σ, σ , µ, W, if initM(φ,⋃

gei , Σ,σ ), µ = (dom(Σ), dom(σ ),φ), and (P,σ ) :load==⇒ W , then there exists W, i ∈ index such

that (P, Σ) :load==⇒ W and (W , emp) 6i

µ (W, emp).Here we define (W , δ0) 6i

µ (W,∆0) as the largest relation such that whenever (W , δ0) 6iµ (W,∆0), then the following are

true:1. W.t = W .t, and W.𝕕 = W .𝕕, and wf(µ, W.Σ,W .σ );

30

Page 31: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

2. ∀W ′, δ . if W :τ=⇒δW ′, then one of the following holds:

a. there exists j such thati. j < i , andii. (W ′, δ0 ∪ δ ) 6j

µ (W,∆0);b. there exist W′, ∆ and j such that

i. W :τ=⇒∆

+W′, and

ii. FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0 ∪ δ , curF(W ))), andiii. (W ′, δ0 ∪ δ ) 6j

µ (W′,∆0 ∪ ∆);3. ∀T ′, 𝕕′,σ ′,o. if o , τ and ∃t′.W :

o==⇒emp

(T ′, t′, 𝕕′,σ ′), then there exist W′, ∆, µ ′, T′, Σ′ and j such that for any t′ ∈ dom(T ′),we havea. W :

τ=⇒∆

∗W′, and W′ :o==⇒emp

(T′, t′, 𝕕′, Σ′), and

b. FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0, ((W .T )(W .t)).F )), andc. ((T ′, t′, 𝕕′,σ ′), emp) 6j

µ′ ((T′, t′, 𝕕′, Σ′), emp);

4. if W :τ==⇒emp

done, then there exist W′ and ∆ such that

a. W :τ=⇒∆

∗W′, and W′ :τ==⇒emp

done, and

b. FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0, curF(W ))).

Lemma 50 (Flip).For any f1, . . . , fn , φ, Γ = {(sl1, ge1,γ1), . . . , (slm, gem,γm)}, Π = {(tl1, ge′1, π1), . . . , (tlm, ge

′m, πm)}, if ∀i . det(tli ) and

let Γ in f1 | . . . | fm 4φ let Π in f1 | . . . | fm ,

thenlet Π in f1 | . . . | fm 6φ let Γ in f1 | . . . | fm .

NPDRF preservation.

Lemma 51 (NPDRF Preservation).For any P, P , φ, Σ and σ , if P 6φ P, initM(φ,GE(P.Γ), Σ,σ ), and NPDRF(P, Σ), then NPDRF(P,σ ).

E.2 Proof of Transitivity of Simulation (Lemma 45)For all f, k, Σ, σ , F, F , and S = dom(Σ), and S = dom(σ ), and µ = (S, S, (φ ′ ◦ φ)|S), if sl.InitCore(γ , f) = k, (φ ′ ◦ φ)⟨⟨ge⟩⟩ = ge′,initM(φ ′ ◦φ, ge, Σ,σ ), and F∩ dom(Σ) = F ∩ dom(σ ) = ∅, we know S = (φ ′ ◦φ){{S}}. Let S1 = φ{{S}} and L1 = φ⟨⟨locs(Σ)⟩⟩. Weknow

S1 ⊆ dom(φ ′) , φ ′{{S1}} = (φ ′ ◦ φ){{S}} = dom(σ ) , locs(σ ) = φ ′⟨⟨L1⟩⟩ .Since Inv(φ ′ ◦ φ, Σ,σ ), we know there exists Σ1 such that

dom(Σ1) = S1 , locs(Σ1) = L1 , Inv(φ, Σ, Σ1) , Inv(φ ′, Σ1,σ ) .Since φ⟨⟨ge⟩⟩ = ge′ and dom(φ ′ ◦ φ) ⊆ dom(φ), we know

ge ⊆ locs(Σ) , dom(Σ) = S ⊆ dom(φ) , ge′ ⊆ locs(Σ1) , dom(Σ1) = S1 ⊆ dom(φ ′) .Pick F1 such that F1 ∩ S1 = ∅. Let µ1 = (S, S1,φ |S) and µ2 = (S1,φ

′{{S1}},φ′ |S1 ). From (sl, ge,γ ) 4φ (sl1, ge1,γ1), we know there

exists k1 such that sl1.InitCore(γ1, f) = k1 and there exists i1 ∈ index such that(F, (k, Σ), emp, ◦) 4i1

µ1 (F1, (k1, Σ1), emp, ◦) .Thus we know

S1 = cl(S1, Σ1) .From (sl1, ge1,γ1) 4φ ′ (tl, ge′, π ), we know there exists κ such that tl.InitCore(π )(f) = κ and there exists i2 ∈ index such that

(F1, (k1, Σ1), emp, ◦) 4i2µ2 (F , (κ,σ ), emp, ◦) .

By Lemma 52, let i = (i2, i1), we know(F, (k, Σ), emp, ◦) 4i

µ (F , (κ,σ ), emp, ◦) .Thus we are done.

31

Page 32: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

Lemma 52. If1. (F, (k, Σ),∆0, β) 4

i1µ1 (F1, (k1, Σ1),∆10, β);

2. (F1, (k1, Σ1),∆10, β) 4i2µ2 (F , (κ,σ ), δ0, β) ;

3. µ1 = (S, S1, f1), µ2 = (S1, S, f2), µ = (S, S, f2 ◦ f1), i = (i2, i1);4. if β = •, then Inv(f1, Σ, Σ1) and Inv(f2, Σ1,σ ).

then (F, (k, Σ),∆0, β)4iµ(F , (κ,σ ), δ0, β).

Proof. By co-induction. From the premises, we know

wf(µ1, (Σ, F), (Σ1, F1)), S ⊆ dom(Σ), S1 ⊆ dom(Σ1), wf(µ2, (Σ1, F1), (σ , F )).

Thus we know

injective(f1, Σ), S = dom(f1), f1{{S}} ⊆ S1, f1{{S ∩ F}} ⊆ F1, f1{{S − F}} ∩ F1 = ∅,injective(f2, Σ1), S1 = dom(f2), f2{{S1}} ⊆ S , f2{{S1 ∩ F1}} ⊆ F , f2{{S1 − F1}} ∩ F = ∅.

From injective(f1, Σ), injective(f2, Σ1) and Inv(f1, Σ, Σ1), we know

injective(f2 ◦ f1, Σ) .

Thus we know

wf(µ, (Σ, F), (σ , F )) .

Also, since Inv(f1, Σ, Σ1) and Inv(f2, Σ1,σ ), we know Inv(f2 ◦ f1, Σ,σ ). Also we need to prove the following:1. ∀k′, Σ′,∆. if β = ◦ and F ⊢ (k, Σ) τ

7−→∆

(k′, Σ′) and HGuar(µ,∆0 ∪ ∆, F), then one of the following holds:a. there exists j such that

i. j < i , andii. (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µ (F , (κ,σ ), δ0, ◦);b. or, there exist κ ′, σ ′, δ and j such that

i. F ⊢ (κ,σ )τ7−→δ+(κ ′,σ ′), and

ii. FPmatch(µ, (∆0 ∪ ∆, Σ), (δ0 ∪ δ , F )), andiii. (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µ (F , (κ ′,σ ′), δ0 ∪ δ , ◦).Proof. From (F, (k, Σ),∆0, β) 4

i1µ1 (F1, (k1, Σ1),∆10, β), we know one of the following holds:

(A) there exists j1 such that(I) j1 < i1, and(II) (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j1

µ1 (F1, (k1, Σ1),∆10, ◦);(B) or, there exist k′

1, Σ′1, ∆1 and j1 such that

(I) F1 ⊢ (k1, Σ1)τ7−→∆1

+(k′1, Σ

′1), and

(II) FPmatch(µ1, (∆0 ∪ ∆, Σ), (∆10 ∪ ∆1, F1)), and(III) (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j1

µ1 (F1, (k′1, Σ

′1),∆10 ∪ ∆1, ◦).

For case (A), let j = (i2, j1). By the co-induction hypothesis, we know(F, (k′, Σ′),∆0 ∪ ∆, ◦)4j

µ(F , (κ,σ ), δ0, ◦) .Since j1 < i1, we know

j < i .Thus we are done.For case (B), from FPmatch(µ1, (∆0 ∪ ∆, Σ), (∆10 ∪ ∆1, F1)), we know

G(∆10 ∪ ∆1, F1, LShare(µ1, Σ)) .Thus we know

HGuar(µ2,∆10 ∪ ∆1, F1) .From (F1, (k1, Σ1),∆10, β) 4

i2µ2 (F , (κ,σ ), δ0, β), by Lemma 53, we know one of the following holds:

(B1) there exists j2 such thati. j2 < i2, andii. (F1, (k′

1, Σ′1),∆10 ∪ ∆1, ◦) 4

j2µ2 (F , (κ,σ ), δ0, ◦);

32

Page 33: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

(B2) or, there exist κ ′, σ ′, δ and j2 such thati. F ⊢ (κ,σ )

τ7−→δ+(κ ′,σ ′), and

ii. FPmatch(µ2, (∆10 ∪ ∆1, Σ1), (δ0 ∪ δ , F )), andiii. (F1, (k′

1, Σ′1),∆10 ∪ ∆1, ◦) 4

j2µ2 (F , (κ

′,σ ′), δ0 ∪ δ , ◦).For case (B1), let j = (j2, j1). By the co-induction hypothesis, we know

(F, (k′, Σ′),∆0 ∪ ∆, ◦)4jµ (F , (κ,σ ), δ0, ◦) .

Since j2 < i2, we knowj < i .

For case (B2), let j = (j2, j1). By the co-induction hypothesis, we know(F, (k′, Σ′),∆0 ∪ ∆, ◦)4j

µ (F , (κ′,σ ′), δ0 ∪ δ , ◦) .

From FPmatch(µ1, (∆0 ∪ ∆, Σ), (∆10 ∪ ∆1, F1)) and FPmatch(µ2, (∆10 ∪ ∆1, Σ1), (δ0 ∪ δ , F )), by Lemma 55, we knowFPmatch(µ, (∆0 ∪ ∆, Σ), (δ0 ∪ δ , F )) .

Thus we are done. �

2. ∀k′, ι. if β = ◦, ι , τ and F ⊢ (k, Σ) ι7−→emp

(k′, Σ) and HGuar(µ,∆0, F), then there exist κ ′, δ , σ ′, κ ′, µ ′ and j such that

a. F ⊢ (κ,σ )τ7−→δ∗(κ ′,σ ′), and

F ⊢ (κ ′,σ ′)ι7−→emp

(κ ′′,σ ′), andb. FPmatch(µ, (∆0, Σ), (δ0 ∪ δ , F )), andc. EvolveG(µ, µ ′, Σ,σ ′, F, F ), andd. (F, (k′, Σ), emp, •) 4j

µ′ (F , (κ′′,σ ′), emp, •).

Proof. From (F, (k, Σ),∆0, β) 4i1µ1 (F1, (k1, Σ1),∆10, β), we know there exist k′

1, ∆1, Σ′1, k

′1, µ

′1 and j1 such that

a. F1 ⊢ (k1, Σ1)τ7−→∆1

∗(k′1, Σ

′1), and F1 ⊢ (k

′1, Σ

′1)

ι7−→emp

(k′′1 , Σ

′1), and

b. FPmatch(µ1, (∆0, Σ), (∆10 ∪ ∆1, F1)), andc. EvolveG(µ1, µ ′1, Σ, Σ

′1, F, F1), and

d. (F, (k′, Σ), emp, •) 4j1µ′1

(F1, (k′′1 , Σ

′1), emp, •).

From FPmatch(µ1, (∆0, Σ), (∆10 ∪ ∆1, F1)), we knowG(∆10 ∪ ∆1, F1, LShare(µ1, Σ)) .

Thus we knowHGuar(µ2,∆10 ∪ ∆1, F1) .

From (F1, (k1, Σ1),∆10, β) 4i2µ2 (F , (κ,σ ), δ0, β), by Lemma 54, we know there exist κ ′, δ , σ ′, κ ′, µ ′2 and j2 such that

a. F ⊢ (κ,σ )τ7−→δ∗(κ ′,σ ′), and F ⊢ (κ ′,σ ′)

ι7−→emp

(κ ′′,σ ′), andb. FPmatch(µ2, (∆10 ∪ ∆1, Σ1), (δ0 ∪ δ , F )), andc. EvolveG(µ2, µ ′2, Σ

′1,σ

′, F1, F ), andd. (F1, (k′′

1 , Σ′1), emp, •) 4

j2µ′2

(F , (κ ′′,σ ′), emp, •).Suppose µ ′1 = (S′, S′1, f

′1 ) and µ ′2 = (S′′1 , S

′, f ′2 ). Since EvolveG(µ1, µ ′1, Σ, Σ′1, F, F1) and EvolveG(µ2, µ ′2, Σ

′1,σ

′, F1, F ), weknow

S′1 = cl(S1, Σ′1) and S′′1 = cl(S1, Σ′

1) .Thus S′1 = S

′′1 . Let µ

′ = (S′, S ′, f ′2 ◦ f ′1 ). Let j = (j2, j1). By the co-induction hypothesis, we know(F, (k′, Σ), emp, •) 4j

µ′ (F , (κ′′,σ ′), emp, •).

From FPmatch(µ1, (∆0, Σ), (∆10 ∪ ∆1, F1)) and FPmatch(µ2, (∆10 ∪ ∆1, Σ1), (δ0 ∪ δ , F )), by Lemma 55, we knowFPmatch(µ, (∆0, Σ), (δ0 ∪ δ , F )) .

From EvolveG(µ1, µ ′1, Σ, Σ′1, F, F1) and EvolveG(µ2, µ ′2, Σ

′1,σ

′, F1, F ), by Lemma 56, we knowEvolveG(µ, µ ′, Σ,σ ′, F, F ) .

Thus we are done. �

3. ∀σ ′, Σ′, µ ′, if β = • andRely((µ, µ ′), (Σ, Σ′, F), (σ ,σ ′, F )), then there exists j such that (F, (k, Σ′), emp, ◦) 4jµ′ (F , (κ,σ

′), emp, ◦).33

Page 34: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

Proof. Suppose µ ′ = (S′, S ′, f ′). Since EvolveR(µ, µ ′, Σ′,σ ′, F, F ), we knowinjective(f ′, Σ′) , S′ = dom(f ′) , f ′{{S′}} ⊆ S ′ , f ′{{S′ ∩ F}} ⊆ F , f ′{{S′ − F}} ∩ F = ∅ ,

(f2 ◦ f1) ⊆ f ′ , Inv(f ′, Σ′,σ ′) , S′ = cl(S, Σ′) ⊆ dom(Σ′) , S ′ = cl(S,σ ′) , (S′ − S) ∩ F = ∅ .We first construct f ′1 . Let S

′′ = S′ − S. Thus S′′ ∩ F = ∅. From S′ = cl(S, Σ′), we know S′ = S ⊎ S′′. Pick an arbitrary setof fresh blocks S′′1 such that

|S′′1 | = |S′′ | , S′′1 ∩ dom(Σ1) = ∅ , S′′1 ∩ S1 = ∅ , S′′1 ∩ F1 = ∅ .Pick an arbitrary function fb1 mapping blocks in S′′ to the blocks of S′′1 such that

injective(fb1) and dom(fb1) = S′′ and range(fb1) = S

′′1 .

Define the function f ′′1 as follows.∀b ∈ S′′. f ′′1 (b) = (fb1(b), 0) .

Thus we knowinjective(f ′′1 ) and dom(f ′′1 ) = S′′ and f ′′1 {{S′′}} = S′′1 .

Let f ′1 = f1 ∪ f ′′1 . Since wf(µ1, Σ, F, F1), we knowinjective(f ′1 , Σ

′) and dom(f ′1 ) = S′ and f ′1 {{S

′}} ⊆ S1 ∪ S′′1 and

f ′1 {{S′ ∩ F}} ⊆ F1 and f ′1 {{S

′ − F}} ∩ F1 = ∅ .Next we construct S′1. Let S

′1 = S1 ∪ S

′′1 .

Then we construct f ′2 . Define a function f ′′2 as follows:dom(f ′′2 ) = S′′1 and ∀b ∈ S′′1 . f

′′2 (b) = f ′(fb−11 (b)) .

Since f ′{{S′}} ⊆ S ′, we knowf ′′2 {{S′′1 }} = f ′{{S′′}} ⊆ S ′ .

Let f ′2 = f2 ∪ f ′′2 . Since wf(µ2, Σ1, F1, F ) and f ′{{S′ − F}} ∩ F = ∅, we knowdom(f ′2 ) = S

′1 and f ′2 {{S

′1}} ⊆ S ′ and

f ′2 {{S′1 ∩ F1}} ⊆ F and f ′2 {{S

′1 − F1}} ∩ F = ∅ .

Also we knowf ′2 ◦ f ′1 = (f2 ◦ f1) ∪ (f ′′2 ◦ f ′′1 ) = f ′ .

Finally we construct Σ′1. Define

dom(Σ′1) = dom(Σ1) ∪ S′′1∀b0 ∈ dom(Σ1) − S1. dom(Σ′1(b0)) = dom(Σ1(b0)) ∀i . Σ′1(b0)(i) = Σ1(b0)(i)

∀b10 ∈ S1 − f1{{S}} − F1. dom(Σ′1(b10)) = ∅

∀b11 ∈ f1{{S}} − F1. dom(Σ′1(b11)) = {i | ∃b,n. f1(b) = (b11,n) ∧ (b, i − n) ∈ locs(Σ′)}

∀i . Σ′1(b11)(i) =

{v ′ if ∃b,n. f1(b) = (b11,n) ∧ Σ′(b)(i − n) = v ′ < Uv if ∃b,n,b ′, i ′,b ′1,n

′. f1(b) = (b11,n) ∧ Σ′(b)(i − n) = (b ′, i ′) ∧ f ′1 (b′) = (b ′1,n

′) ∧v = (b ′1, i′ + n′)

∀b1 ∈ S1 ∩ F1. dom(Σ′1(b1)) = {i | (b1, i) ∈ (f1⟨⟨locs(Σ′) ∩ TSU⟩⟩) ∪ (locs(Σ1) − f1⟨⟨locs(Σ) ∩ TSU⟩⟩)}

∀i . (b1, i) ∈ f1⟨⟨locs(Σ′) ∩ TSU⟩⟩ =⇒

Σ′1(b1)(i) =

{v ′ if ∃b,n. f1⟨(b,n)⟩ = (b1, i) ∧ Σ′(b)(n) = v ′ < Uv if ∃b,n,b ′, i ′. f1⟨(b,n)⟩ = (b1, i) ∧ Σ′(b)(n) = (b ′, i ′) ∧v = f ′1 ⟨(b

′, i ′)⟩

∀i . (b1, i) ∈ locs(Σ1) − f1⟨⟨locs(Σ) ∩ TSU⟩⟩ =⇒ Σ′1(b1)(i) = Σ1(b1)(i)

∀b2 ∈ S′′1 . dom(Σ′1(b2)) = dom(Σ′(fb−11 (b2)))

∀i . Σ′1(b2)(i) =

{v ′ if ∃b ′2. fb

−11 (b2) = b ′2 ∧ Σ′(b ′2)(i) = v

′ < U

v if ∃b ′2,b′, i ′,b,n. fb−11 (b2) = b ′2 ∧ Σ′(b ′2)(i) = (b ′, i ′) ∧ f ′1 (b

′) = (b,n) ∧v = (b, i ′ + n)

Below we prove the following:• forward(Σ1, Σ

′1).

We only need to prove locs(Σ′1) ∩ Tdom(Σ1)U ⊆ locs(Σ1). By the definition of Σ′

1, we only need to prove: for any b1and i , if b1 ∈ (f1{{S}} − F1) ∪ (S1 ∩ F1) and i ∈ dom(Σ′

1(b1)), then (b1, i) ∈ locs(Σ1).If b1 ∈ (f1{{S}}−F1), we know there existb andn such that f1(b) = (b1,n) and (b, i−n) ∈ locs(Σ′). Sinceb ∈ S ⊆ dom(Σ)and locs(Σ′) ∩ Tdom(Σ)U ⊆ locs(Σ), we know (b, i − n) ∈ locs(Σ). Since Inv(f1, Σ, Σ1), we know (b1, i) ∈ locs(Σ1).

34

Page 35: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

If b1 ∈ (S1∩F1), we know (b1, i) ∈ (f1⟨⟨locs(Σ′)∩TSU⟩⟩)∪(locs(Σ1)− f1⟨⟨locs(Σ)∩TSU⟩⟩). Since locs(Σ′)∩Tdom(Σ)U ⊆

locs(Σ), we know f1⟨⟨locs(Σ′) ∩ TSU⟩⟩ ⊆ f1⟨⟨locs(Σ) ∩ TSU⟩⟩. Since injective(f1, Σ, Σ1), we know f1⟨⟨locs(Σ) ∩ TSU⟩⟩ ⊆locs(Σ1). Thus (b1, i) ∈ locs(Σ1).

• injective(f ′2 , Σ′1).

We prove the following.– injective(f2, Σ′

1).From injective(f2, Σ1) and forward(Σ1, Σ

′1), we know injective(f2, Σ′

1).

– injective(f ′′2 , Σ′1).

For any b1, b2, i1, i2, b ′1, b′2, n1 and n2, if (b1, i1) , (b2, i2), (b1, i1) ∈ locs(Σ′

1), (b2, i2) ∈ locs(Σ′1), f

′′2 (b1) = (b ′1,n1) and

f ′′2 (b2) = (b ′2,n2), by the definition of f ′′2 , we knowb1 ∈ S

′′1 , b2 ∈ S

′′1 , f ′′2 (b1) = f ′(fb−11 (b1)), f ′′2 (b2) = f ′(fb−11 (b2)).

Also by the definition of Σ′1, we know

dom(Σ′1(b1)) = dom(Σ′(fb−11 (b1))) and dom(Σ′

1(b2)) = dom(Σ′(fb−11 (b2))).Thus

(fb−11 (b1), i1) ∈ locs(Σ′) and (fb−11 (b2), i2) ∈ locs(Σ′).Since injective(f ′, Σ′), we know b ′1 , b

′2 or i1 + n1 , i2 + n2. Thus injective(f

′′2 , Σ

′1).

– ∀b1,b2, i1, i2,b′1,b

′2,n1,n2. (b1, i1) ∈ locs(Σ′

1) ∧ (b2, i2) ∈ locs(Σ′1) ∧ f2(b1) = (b ′1,n1) ∧ f ′′2 (b2) = (b ′2,n2) =⇒

(b ′1 , b′2) ∨ (i1 + n1 , i2 + n2).

Since b1 ∈ S1, we consider three cases:∗ b1 ∈ S1 − f1{{S}} − F1. Since dom(Σ′

1(b1)) = ∅, the above holds trivially.

∗ b1 ∈ f1{{S}} − F1.By the definition of f ′′2 , we know b2 ∈ S

′′1 and f ′′2 (b2) = f ′(fb−11 (b2)). Also by the definition of Σ′

1, we knowdom(Σ′

1(b1)) = {i | ∃b,n. f1(b) = (b1,n) ∧ (b, i − n) ∈ locs(Σ′)} and dom(Σ′1(b2)) = dom(Σ′(fb−11 (b2))).

Since f2◦ f1 ⊆ f ′, we know there exist b and n such that f1(b) = (b1,n), (b, i1−n) ∈ locs(Σ′) and f ′(b) = (b ′1,n1+n).Also (fb−11 (b2), i2) ∈ locs(Σ′). Since injective(f ′, Σ′), we know b ′1 , b

′2 or i1 + n1 , i2 + n2.

∗ b1 ∈ S1 ∩ F1.Since f2{{S1 ∩ F1}} ⊆ F , we know b ′1 ∈ F . By the definition of f ′′2 , we know b2 ∈ S′′1 and f ′′2 (b2) = f ′(fb−11 (b2)).Since S′′ ∩ F = ∅, we know fb−11 (b2) < F. Since f ′{{S′ − F}} ∩ F = ∅, we know b ′2 < F . Thus b

′1 , b

′2.

From injective(f ′′′2 ), and the definition of f ′2 , we are done.

• σTFU−f2 ⟨⟨locs(Σ1)∩TS1U⟩⟩===================== σ ′.

Since f1{{S}} ⊆ S1 and Inv(f1, Σ, Σ1), we know f1⟨⟨locs(Σ)∩TSU⟩⟩ ⊆ locs(Σ1)∩TS1U. Thus TFU− f2⟨⟨locs(Σ1)∩TS1U⟩⟩ ⊆

TFU − (f2 ◦ f1)⟨⟨locs(Σ) ∩ TSU⟩⟩. From σTFU−(f2◦f1)⟨⟨locs(Σ)∩TSU⟩⟩======================= σ ′, we are done.

• Σ1TF1−S1U========= Σ′

1 and Σ1TF1U−f1 ⟨⟨locs(Σ)∩TSU⟩⟩==================== Σ′

1.

Since f1{{S}} ⊆ S1, we know TF1 − S1U ⊆ TF1U− f1⟨⟨locs(Σ)∩TSU⟩⟩. Thus we only need to prove Σ1TF1U−f1 ⟨⟨locs(Σ)∩TSU⟩⟩====================

Σ′1. By the above definition of Σ′

1, we are done.

• Inv(f ′2 , Σ′1,σ

′).For any b1, i , b ′1, i

′. if (b1, i) ∈ locs(Σ′1) and f ′2 ⟨(b1, i)⟩ = (b ′1, i

′), by the definition of Σ′1, we consider the following cases:

– b1 ∈ f1{{S}} − F1. Thus there exist b and n such that f1(b) = (b1,n), (b, i − n) ∈ locs(Σ′) and Σ′(b)(i − n)f ′1↪→ Σ′

1(b1)(i).Since f ′2 ⟨(b1, i)⟩ = (b ′1, i

′), we know(f ′2 ◦ f1)⟨(b, i − n)⟩ = (b ′1, i

′).

From Inv(f ′, Σ′,σ ′), we know (b ′1, i′) ∈ locs(σ ′) and Σ′(b)(i − n)

f ′↪→ σ ′(b ′1)(i

′). Since f ′2 ◦ f ′1 = f ′, we know

Σ′1(b1)(i)

f ′2↪→ σ ′(b ′1)(i

′).

35

Page 36: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

– b1 ∈ S1 ∩ F1 and (b1, i) ∈ f1⟨⟨locs(Σ′) ∩ TSU⟩⟩. Thus by the definition of Σ′1, we know there exist b and n such that

f1⟨(b,n)⟩ = (b1, i) and Σ′(b)(n)f ′1↪→ Σ′

1(b1)(i). Since f′2 ⟨(b1, i)⟩ = (b ′1, i

′) and f ′2 ◦ f′1 = f ′, we know f ′⟨(b,n)⟩ = (b ′1, i

′).

From Inv(f ′, Σ′,σ ′), we know (b ′1, i′) ∈ locs(σ ′) and Σ′(b)(n)

f ′↪→ σ ′(b ′1)(i

′). Thus Σ′1(b1)(i)

f ′2↪→ σ ′(b ′1)(i

′).

– b1 ∈ S1 ∩ F1 and (b1, i) ∈ locs(Σ1) − f1⟨⟨locs(Σ) ∩ TSU⟩⟩. Thus by the definition of Σ′1, we know Σ′

1(b1)(i) = Σ1(b1)(i).

Since b1 ∈ S1, we know f2⟨(b1, i)⟩ = (b ′1, i′). Since Inv(f2, Σ1,σ ), we know (b ′1, i

′) ∈ locs(σ ) and Σ1(b1)(i)f2↪→

σ (b ′1)(i′). From σ

TFU−(f2◦f1)⟨⟨locs(Σ)∩TSU⟩⟩======================= σ ′, we know (b ′1, i

′) ∈ locs(σ ′) and σ (b ′1)(i′) = σ ′(b ′1)(i

′). Thus Σ′1(b1)(i)

f ′2↪→

σ ′(b ′1)(i′).

– b1 ∈ S′′1 . Thus by the definition of Σ′

1, we know there exists b such that fb−11 (b1) = b and Σ′(b)(i)f ′1↪→ Σ′

1(b1)(i). By

the definition of f ′′2 , we know f ′(b) = (b ′1, i′ − i). From Inv(f ′, Σ′,σ ′), we know (b ′1, i

′) ∈ locs(σ ′) and Σ′(b)(i)f ′↪→

σ ′(b ′1)(i′). Since f ′2 ◦ f ′1 = f ′, we know Σ′

1(b1)(i)f ′2↪→ σ ′(b ′1)(i

′).Thus we are done.

• Inv(f ′1 , Σ′, Σ′

1).For any b, i , b1, i1. if (b, i) ∈ locs(Σ′) and f ′1 ⟨(b, i)⟩ = (b1, i1), we know b ∈ S′. Consider the following cases:– b ∈ S − F. Thus f1(b) = (b1, i1 − i). Since f1{{S − F}} ∩ F1 = ∅, we know b1 ∈ f1{{S}} − F1. By the definition of Σ′

1, we

know (b1, i1) ∈ locs(Σ′1) and Σ′(b)(i)

f ′1↪→ Σ′

1(b1)(i1).

– b ∈ S ∩ F. Thus f1(b) = (b1, i1 − i). Since f1{{S ∩ F}} ⊆ F1, we know b1 ∈ S1 ∩ F1. Also (b1, i1) ∈ f1⟨⟨locs(Σ′) ∩ TSU⟩⟩.

By the definition of Σ′1, and since injective(f ′1 , Σ

′), we know (b1, i1) ∈ locs(Σ′1) and Σ′(b)(i)

f ′1↪→ Σ′

1(b1)(i1).

– b ∈ S′′. Thus f ′′1 (b) = (b1, i1 − i). By the definition of f ′′1 , we know i1 = i and fb1(b) = b1 ∈ S′′1 . By the definition of

Σ′1, we know (b1, i1) ∈ locs(Σ′

1) and Σ′(b)(i)f ′1↪→ Σ′

1(b1)(i1).

• S′1 ⊆ cl(S1, Σ′1).

For any b ∈ S′1, we consider the following cases:– b ∈ S1. We know b ∈ cl0(S1, Σ′

1).

– b ∈ S′′1 . Thus there exists b′ such that b ′ ∈ S′′, b = fb1(b

′) and f ′′1 (b ′) = (b, 0). Since S′ = cl(S, Σ′), we know thereexists k such that

b ′ ∈ clk+1(S, Σ′) .We need to prove

∀b ′,b . b ′ ∈ clk+1(S, Σ′) ∧ f ′1 (b′) = (b, _) =⇒ b ∈ clk+1(S1, Σ′

1) (E.1)

By induction over k .∗ k = 0. Thus there exist b0, n0 and n′ such that b0 ∈ S and Σ′(b0)(n0) = (b ′,n′). Then we know there exist b1 and n1such that f1⟨(b0,n0)⟩ = (b1,n1) and b1 ∈ S1. Since Inv(f ′1 , Σ

′, Σ′1), we know

Σ′(b0)(n0)f ′1↪→ Σ′

1(b1)(n1) .Thus there exists n such that

Σ′1(b1)(n1) = f ′1 ⟨(b

′,n′)⟩ = (b,n) .Thus b ∈ cl1(S1, Σ′

1).

∗ k =m + 1. Thus there exist b0, n0 and n′ such that b0 ∈ clm+1(S, Σ′) and Σ′(b0)(n0) = (b ′,n′). Then we know thereexist b1 and n1 such that f ′1 ⟨(b0,n0)⟩ = (b1,n1). By the induction hypothesis, we know b1 ∈ clm+1(S1, Σ′

1). SinceInv(f ′1 , Σ

′, Σ′1), we know

Σ′(b0)(n0)f ′1↪→ Σ′

1(b1)(n1) .Thus there exists n such that

Σ′1(b1)(n1) = f ′1 ⟨(b

′,n′)⟩ = (b,n) .Thus b ∈ clk+1(S1, Σ′

1).

• cl(S1, Σ′1) ⊆ S

′1.

36

Page 37: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

We need to prove: for any k , clk (S1, Σ′1) ⊆ S

′1. By induction over k .

– k = 0. We know cl0(S1, Σ′1) = S1 ⊆ S

′1.

– k =m + 1. By the induction hypothesis, we knowclm+1(S1, Σ′

1) ⊆ {b ′ | ∃b, i, i ′. (b ∈ S′1) ∧ Σ′1(b)(i) = (b ′, i ′)}

We only need to prove the following:∀b, i,b ′, i ′. (b ∈ S′1) ∧ Σ′

1(b)(i) = (b ′, i ′) =⇒ b ′ ∈ S′1By the definition of Σ′

1, we consider the following cases:

∗ b ∈ f1{{S}}−F1. Thus there exist b0 and i0 such that f1⟨(b0, i0)⟩ = (b, i), (b0, i0) ∈ locs(Σ′) and Σ′(b0)(i0)f ′1↪→ Σ′

1(b)(i).Thus there exist b ′0 and i

′0 such that Σ′(b0)(i0) = (b ′0, i

′0) and f ′1 ⟨(b

′0, i

′0)⟩ = (b ′, i ′). Thus b ′ ∈ S′1.

∗ b ∈ S1 ∩ F1 and (b, i) ∈ f1⟨⟨locs(Σ′) ∩ TSU⟩⟩. The same as the earlier case.

∗ b ∈ S1 ∩ F1 and (b, i) ∈ locs(Σ1) − f1⟨⟨locs(Σ) ∩ TSU⟩⟩. Thus Σ′1(b)(i) = Σ1(b)(i). Since S1 = cl(S1, Σ1), we know

b ′ ∈ S1.

∗ b ∈ S′′1 . Thus there exists b0 such that fb−11 (b) = b0 and Σ′(b0)(i)f ′1↪→ Σ′

1(b)(i). Thus there exist b′0 and i

′0 such that

Σ′(b0)(i) = (b ′0, i′0) and f ′1 ⟨(b

′0, i

′0)⟩ = (b ′, i ′). Thus b ′ ∈ S′1.

Let µ ′1 = (S′, S′1, f′1 ) and µ ′2 = (S′1, S

′, f ′2 ). Thus we knowEvolveR(µ1, µ ′1, Σ

′, Σ′1, F, F1) and EvolveR(µ2, µ ′2, Σ

′1,σ

′, F1, F ) and µ ′ = (S′, S ′, f ′2 ◦ f ′1 ) .Thus from (F, (k, Σ),∆0, β) 4

i1µ1 (F1, (k1, Σ1),∆10, β) and (F1, (k1, Σ1),∆10, β) 4

i2µ2 (F , (κ,σ ), δ0, β), we know there exists

j1 and j2 such that(F, (k, Σ′), emp, ◦) 4j1

µ′1(F1, (k1, Σ′

1), emp, ◦) and (F1, (k1, Σ′1), emp, ◦) 4

j2µ′2

(F , (κ,σ ′), emp, ◦) .

Let j = (j2, j1). By the co-induction hypothesis, we know(F, (k, Σ′), emp, ◦) 4j

µ′ (F , (κ,σ′), emp, ◦) .

Thus we are done. �

Thus we have finished the proof. �

Lemma 53. If (F, (k, Σ),∆0, ◦) 4iµ (F , (κ,σ ), δ0, ◦) and F ⊢ (k, Σ) τ

7−→∆

n+1 (k′, Σ′) and HGuar(µ,∆0 ∪ ∆, F), then one of thefollowing holds:

1. there exists j such thata. j < i , andb. (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µ (F , (κ,σ ), δ0, ◦);2. or, there exist κ ′, σ ′, δ and j such thata. F ⊢ (κ,σ )

τ7−→δ+(κ ′,σ ′), and

b. FPmatch(µ, (∆0 ∪ ∆, Σ), (δ0 ∪ δ , F )), andc. (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µ (F , (κ ′,σ ′), δ0 ∪ δ , ◦).

Proof. By induction over n.1. n = 0. Trivial.2. n =m + 1. We know there exists k′′, Σ′′, ∆′ and ∆′′ such that

F ⊢ (k, Σ) τ7−→∆′

(k′′, Σ′′) , F ⊢ (k′′, Σ′′)τ7−→∆′′

m+1 (k′, Σ′) , ∆ = ∆′ ∪ ∆′′ .

Since HGuar(µ,∆0 ∪ ∆, F), we knowHGuar(µ,∆0 ∪ ∆′, F) .

Since (F, (k, Σ),∆0, ◦)4iµ (F , (κ,σ ), δ0, ◦), we know one of the following holds:

(a) there exists i ′ such thati. i ′ < i , andii. (F, (k′′, Σ′′),∆0 ∪ ∆′, ◦) 4i′

µ (F , (κ,σ ), δ0, ◦);(b) or, there exist κ ′′, σ ′′, δ ′ and i ′ such that

i. F ⊢ (κ,σ )τ7−→δ ′

+(κ ′′,σ ′′), andii. FPmatch(µ, (∆0 ∪ ∆′, Σ), (δ0 ∪ δ ′, F )), and

37

Page 38: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

iii. (F, (k′′, Σ′′),∆0 ∪ ∆′, ◦) 4i′µ (F , (κ ′′,σ ′′), δ0 ∪ δ ′, ◦).

For case (a), by the induction hypothesis, we know one of the following holds:(a1) there exists j such that

i. j < i ′, andii. (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µ (F , (κ,σ ), δ0, ◦);(a2) or, there exist κ ′, σ ′, δ and j such that

i. F ⊢ (κ,σ )τ7−→δ+(κ ′,σ ′), and

ii. FPmatch(µ, (∆0 ∪ ∆, Σ′′), (δ0 ∪ δ , F )), andiii. (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µ (F , (κ ′,σ ′), δ0 ∪ δ◦).Since F ⊢ (k, Σ) τ

7−→∆′

(k′′, Σ′′), we know forward(Σ, Σ′′) and (locs(Σ)− locs(Σ′′)) ⊆ ∆′.ws∩∆′.rs. Thus, if FPmatch(µ, (∆0∪

∆, Σ′′), (δ0 ∪ δ , F )), by Lemma 57, thenFPmatch(µ, (∆0 ∪ ∆, Σ), (δ0 ∪ δ , F )) .

Thus we are done.For case (b), by the induction hypothesis, we know one of the following holds:

(b1) there exists j such thati. j < i ′, andii. (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µ (F , (κ ′′,σ ′′), δ0 ∪ δ ′, ◦);(b2) or, there exist κ ′, σ ′, δ ′′ and j such that

i. F ⊢ (κ ′′,σ ′′)τ7−→δ ′′

+(κ ′,σ ′), andii. FPmatch(µ, (∆0 ∪ ∆, Σ′′), (δ0 ∪ δ ′ ∪ δ ′′, F )), andiii. (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µ (F , (κ ′,σ ′), δ0 ∪ δ ′ ∪ δ ′′, ◦).For case (b1), since FPmatch(µ, (∆0 ∪ ∆′, Σ), (δ0 ∪ δ ′, F )), we know

FPmatch(µ, (∆0 ∪ ∆, Σ), (δ0 ∪ δ ′, F )) .Thus we are done.For case (b2), we know

F ⊢ (κ,σ )τ7−→

δ ′∪δ ′′

+(κ ′,σ ′) .

Thus we are done.Thus we have finished the proof. �

Lemma 54. If (F, (k, Σ),∆0, ◦) 4iµ (F , (κ,σ ), δ0, ◦) and F ⊢ (k, Σ) τ

7−→∆

n (k′, Σ′) and F ⊢ (k′, Σ′)ι7−→emp

(k′′, Σ′) and ι , τ andHGuar(µ,∆0 ∪ ∆, F), then there exist κ ′, δ , σ ′, κ ′, µ ′ and j such that

1. F ⊢ (κ,σ )τ7−→δ∗(κ ′,σ ′), and F ⊢ (κ ′,σ ′)

ι7−→emp

(κ ′′,σ ′), and2. FPmatch(µ, (∆0 ∪ ∆, Σ), (δ0 ∪ δ , F )), and3. EvolveG(µ, µ ′, Σ′,σ ′, F, F ), and4. (F, (k′′, Σ′), emp, •) 4j

µ′ (F , (κ′′,σ ′), emp, •).

Proof. By induction over n.1. n = 0. Trivial.2. n =m + 1. We know there exists k′′′, Σ′′′, ∆′ and ∆′′ such that

F ⊢ (k, Σ) τ7−→∆′

(k′′′, Σ′′′) , F ⊢ (k′′′, Σ′′′)τ7−→∆′′

m (k′, Σ′) , ∆ = ∆′ ∪ ∆′′ .

Since HGuar(µ,∆0 ∪ ∆, F), we knowHGuar(µ,∆0 ∪ ∆′, F) .

Since (F, (k, Σ),∆0, ◦)4iµ (F , (κ,σ ), δ0, ◦), we know one of the following holds:

(a) there exists i ′ such thati. i ′ < i , andii. (F, (k′′′, Σ′′′),∆0 ∪ ∆′, ◦) 4i′

µ (F , (κ,σ ), δ0, ◦);(b) or, there exist κ ′′′, σ ′′′, δ ′ and i ′ such that

i. F ⊢ (κ,σ )τ7−→δ ′

+(κ ′′′,σ ′′′), and38

Page 39: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

ii. FPmatch(µ, (∆0 ∪ ∆′, Σ), (δ0 ∪ δ ′, F )), andiii. (F, (k′′′, Σ′′′),∆0 ∪ ∆′, ◦) 4i′

µ (F , (κ ′′′,σ ′′′), δ0 ∪ δ ′, ◦).For case (a), by the induction hypothesis, we are done.For case (b), by the induction hypothesis, we know there exist κ ′, δ , σ ′, κ ′, µ ′ and j such that

(1) F ⊢ (κ ′′′,σ ′′′)τ7−→δ∗(κ ′,σ ′), and F ⊢ (κ ′,σ ′)

ι7−→emp

(κ ′′,σ ′), and(2) FPmatch(µ, (∆0 ∪ ∆, Σ′′′), (δ0 ∪ δ ′ ∪ δ , F )), and(3) EvolveG(µ, µ ′, Σ′,σ ′, F, F ), and(4) (F, (k′′, Σ′), emp, •) 4j

µ′ (F , (κ′′,σ ′), emp, •).

Since F ⊢ (k, Σ) τ7−→∆′

(k′′′, Σ′′′), we know forward(Σ, Σ′′′) and (locs(Σ) − locs(Σ′′′)) ⊆ ∆′.ws ∩ ∆′.rs. Thus, fromFPmatch(µ, (∆0 ∪ ∆, Σ′′′), (δ0 ∪ δ ′ ∪ δ , F )), by Lemma 57, then

FPmatch(µ, (∆0 ∪ ∆, Σ), (δ0 ∪ δ ′ ∪ δ , F )) .Thus we are done.

Thus we have finished the proof. �

Lemma 55. If FPmatch(µ1, (∆1, Σ1), (∆2, F2)), FPmatch(µ2, (∆2, Σ2), (∆3, F3)),wf(µ1, (Σ1, F1), (Σ2, F2)),wf(µ2, (Σ2, F2), (Σ3, F3)),µ1 = (S1, S2, f1), µ2 = (S2, S3, f2) and µ = (S1, S3, f2 ◦ f1), then FPmatch(µ, (∆1, Σ1), (∆3, F3)).

Proof. From FPmatch(µ1, (∆1, Σ1), (∆2, F2)) and FPmatch(µ2, (∆2, Σ2), (∆3, F3)), we know

∆2.rs ∪ ∆2.ws ⊆ TF2U ∪ f1⟨⟨locs(Σ1) ∩ TS1U⟩⟩,∆2.rs ∩ f1⟨⟨locs(Σ1) ∩ TS1U⟩⟩ ⊆ f1⟨⟨locs(Σ1) ∩ ∆1.rs⟩⟩, ∆2.ws ∩ f1⟨⟨locs(Σ1) ∩ TS1U⟩⟩ ⊆ f1⟨⟨locs(Σ1) ∩ ∆1.ws⟩⟩,

∆3.rs ∪ ∆3.ws ⊆ TF3U ∪ f2⟨⟨locs(Σ2) ∩ TS2U⟩⟩,∆3.rs ∩ f2⟨⟨locs(Σ2) ∩ TS2U⟩⟩ ⊆ f2⟨⟨locs(Σ2) ∩ ∆2.rs⟩⟩, ∆3.ws ∩ f2⟨⟨locs(Σ2) ∩ TS2U⟩⟩ ⊆ f2⟨⟨locs(Σ2) ∩ ∆2.ws⟩⟩ .

From wf(µ1, (Σ1, F1), (Σ2, F2)) and wf(µ2, (Σ2, F2), (Σ3, F3)), we know

injective(f1, Σ1) , S1 ⊆ dom(f1) , f1{{S1}} ⊆ S2 , f1{{F1 ∩ S1}} ⊆ F2 , f1{{S1 − F1}} ∩ F2 = ∅ ,injective(f2, Σ2) , S2 ⊆ dom(f2) , f2{{S2}} ⊆ S3 , f2{{F2 ∩ S2}} ⊆ F3 , f2{{S2 − F2}} ∩ F3 = ∅ .

Thus

∆3.rs ⊆ TF3U ∪ (∆3.rs ∩ f2⟨⟨locs(Σ2) ∩ TS2U⟩⟩)⊆ TF3U ∪ f2⟨⟨∆2.rs⟩⟩⊆ TF3U ∪ f2⟨⟨(TF2U ∪ f1⟨⟨locs(Σ1) ∩ TS1U⟩⟩)⟩⟩⊆ TF3U ∪ TF3U ∪ (f2 ◦ f1)⟨⟨locs(Σ1) ∩ TS1U⟩⟩ = TF3U ∪ (f2 ◦ f1)⟨⟨locs(Σ1) ∩ TS1U⟩⟩

Similarly, we can also prove ∆3.ws ⊆ TF3U ∪ (f2 ◦ f1)⟨⟨locs(Σ1) ∩ TS1U⟩⟩.Also, since f1⟨⟨locs(Σ1)⟩⟩ ⊆ locs(Σ2) and f1{{S1}} ⊆ S2, we know

f1⟨⟨locs(Σ1) ∩ TS1U⟩⟩ ⊆ locs(Σ2) ∩ TS2U .

Since injective(f2, Σ2), we know

f2⟨⟨locs(Σ2) ∩ ∆2.rs⟩⟩ ∩ f2⟨⟨(f1⟨⟨locs(Σ1) ∩ TS1U⟩⟩)⟩⟩ ⊆ f2⟨⟨(∆2.rs ∩ f1⟨⟨locs(Σ1) ∩ TS1U⟩⟩)⟩⟩ .

Thus

∆3.rs ∩ (f2 ◦ f1)⟨⟨locs(Σ1) ∩ TS1U⟩⟩⊆ (∆3.rs ∩ f2⟨⟨locs(Σ2) ∩ TS2U⟩⟩) ∩ (f2 ◦ f1)⟨⟨locs(Σ1) ∩ TS1U⟩⟩⊆ f2⟨⟨locs(Σ2) ∩ ∆2.rs⟩⟩ ∩ f2⟨⟨(f1⟨⟨locs(Σ1) ∩ TS1U⟩⟩)⟩⟩⊆ f2⟨⟨(∆2.rs ∩ f1⟨⟨locs(Σ1) ∩ TS1U⟩⟩)⟩⟩⊆ (f2 ◦ f1)⟨⟨locs(Σ1) ∩ ∆1.rs⟩⟩

Similarly, we can also prove ∆3.ws ∩ (f2 ◦ f1)⟨⟨locs(Σ1) ∩ TS1U⟩⟩ ⊆ (f2 ◦ f1)⟨⟨locs(Σ1) ∩ ∆1.rs⟩⟩. Thus we are done. �

Lemma 56. If EvolveG(µ1, µ ′1, Σ′1, Σ

′2, F1, F2), EvolveG(µ2, µ

′2, Σ

′2, Σ

′3, F2, F3), µ1 = (S1, S2, f1), µ2 = (S2, S3, f2), µ = (S1, S3, f2 ◦

f1), µ ′1 = (S′1, S′2, f

′1 ), µ

′2 = (S′2, S

′3, f

′2 ), and µ ′ = (S′1, S

′3, f

′2 ◦ f ′1 ), then EvolveG(µ, µ ′, Σ′

1, Σ′3, F1, F3).

Proof. From EvolveG(µ1, µ ′1, Σ′1, Σ

′2, F1, F2) and EvolveG(µ2, µ ′2, Σ

′2, Σ

′3, F2, F3), we know

39

Page 40: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

wf(µ ′1, Σ′1, F1, F2), f1 ⊆ f ′1 , Inv(f ′1 , Σ

′1, Σ

′2), S′1 = cl(S1, Σ′

1) ⊆ dom(Σ′1), S′2 = cl(S2, Σ′

2) ⊆ dom(Σ′2),

wf(µ ′2, Σ′2, F2, F3), f2 ⊆ f ′2 , Inv(f ′2 , Σ

′2, Σ

′3), S′2 = cl(S2, Σ′

2) ⊆ dom(Σ′2), S′3 = cl(S3, Σ′

3) ⊆ dom(Σ′3),

S′1 − S1 ⊆ F1, S′2 − S2 ⊆ F2 .

Thus we know

wf(µ ′, Σ′1, F1, F3), (f2 ◦ f1) ⊆ (f ′2 ◦ f ′1 ), Inv(f ′2 ◦ f ′1 , Σ

′1, Σ

′3), S′1 = cl(S1, Σ′

1) ⊆ dom(Σ′1), S′3 = cl(S3, Σ′

3) ⊆ dom(Σ′3),

S′1 − S1 ⊆ F1.

Thus we are done. �

Lemma 57. If FPmatch(µ, (∆, Σ′), (δ , F )), dom(µ . f ) = µ .S ⊆ dom(Σ), forward(Σ, Σ′) and (locs(Σ) − locs(Σ′)) ⊆ ∆.ws ∩ ∆.rs,then FPmatch(µ, (∆, Σ), (δ , F )).

Proof. From FPmatch(µ, (∆, Σ′), (δ , F )), we know

δ .rs ∪ δ .ws ⊆ TFU ∪ µ . f ⟨⟨locs(Σ′) ∩ Tµ .SU⟩⟩ ,δ .rs ∩ µ . f ⟨⟨locs(Σ′) ∩ Tµ .SU⟩⟩ ⊆ µ . f ⟨⟨locs(Σ′) ∩ ∆.rs⟩⟩ ,δ .ws ∩ µ . f ⟨⟨locs(Σ′) ∩ Tµ .SU⟩⟩ ⊆ µ . f ⟨⟨locs(Σ′) ∩ ∆.ws⟩⟩ .

From forward(Σ, Σ′), we know locs(Σ′) ∩ Tdom(Σ)U ⊆ locs(Σ). Since µ .S ⊆ dom(Σ), we know

locs(Σ′) ∩ Tµ .SU ⊆ locs(Σ) ∩ Tµ .SU .

Thus

µ . f ⟨⟨locs(Σ′) ∩ Tµ .SU⟩⟩ ⊆ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩ ,µ . f ⟨⟨locs(Σ′) ∩ ∆.ws⟩⟩ ⊆ µ . f ⟨⟨locs(Σ) ∩ ∆.ws⟩⟩ , µ . f ⟨⟨locs(Σ′) ∩ ∆.rs⟩⟩ ⊆ µ . f ⟨⟨locs(Σ) ∩ ∆.rs⟩⟩ .

Thus we have

δ .rs ∪ δ .ws ⊆ TFU ∪ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩ .

Since (locs(Σ) − locs(Σ′)) ⊆ ∆.ws ∩ ∆.rs, we know

locs(Σ) ∩ Tµ .SU ⊆ (locs(Σ′) ∩ Tµ .SU) ∪ (∆.ws ∩ ∆.rs ∩ locs(Σ)) .

Thus

δ .rs ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩ ⊆ µ . f ⟨⟨locs(Σ) ∩ ∆.rs⟩⟩ ,δ .ws ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩ ⊆ µ . f ⟨⟨locs(Σ) ∩ ∆.ws⟩⟩ .

Thus we are done. �

E.3 Proof of Compositionality of Simulation (Lemma 48)

For any T, t, 𝕕 and Σ, if (let Γ in f1 | . . . | fn, Σ) :load==⇒ (T, t, 𝕕, Σ), we know there exist F1, . . . , Fn , K1, . . . ,Kn such that

initFList(Σ, (F1, . . . , Fn)) , ∀i ∈ {1, . . . ,n}.Ki = initRtMd(Γ, fi , Fi ) ,T = {t1 ; K1, . . . , tn ; Kn} , t ∈ {t1, . . . , tn} , 𝕕 = {t1 ; 0, . . . , tn ; 0} .

From initFList(Σ, (F1, . . . , Fn)), we know∀i ∈ {1, . . . ,n}. (dom(Σ) ∩ Fi = ∅) ∧ (∀j , i . Fi ∩ Fj = ∅) .

From ∀i ∈ {1, . . . ,n}.Ki = initRtMd(Γ, fi , Fi ), we know there exist a1, . . . ,an , k1, . . . , kn such that∀i ∈ {1, . . . ,n}. ai ∈ {1, . . . ,m} ∧ ki = slai .InitCore(γai , fi ) ∧ Ki = (slai , (γai , Fi ), ki ) .

Similarly, for anyT and σ , if (let Π in f1 | . . . | fn,σ ) :load==⇒ (T , t, 𝕕,σ ), we know there exist F1, . . . , Fn , K1, . . . ,Kn such that

initFList(σ , (F1, . . . , Fn)) , ∀i ∈ {1, . . . ,n}.Ki = initRtMd(Π, fi , Fi ) , T = {t1 ; K1, . . . , tn ; Kn} .

From initFList(σ , (F1, . . . , Fn)), we know∀i ∈ {1, . . . ,n}. (dom(σ ) ∩ Fi = ∅) ∧ (∀j , i . Fi ∩ Fj = ∅) .

From ∀i ∈ {1, . . . ,n}.Ki = initRtMd(Π, fi , Fi ), ∀i ∈ {1, . . . ,m}. dom(sli .InitCore(γi )) = dom(tli .InitCore(πi )) and the assump-tion that entries of different modules are different, we know there exist κ1, . . . ,κn such that

∀i ∈ {1, . . . ,n}.κi = tlai .InitCore(πai , fi ) ∧ Ki = (tlai , (πai , Fi ),κi ) .40

Page 41: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

For any µ, S and S , if ge ⊆ locs(Σ), dom(Σ) = S ⊆ dom(φ), cl(S, Σ) = S, locs(σ ) = φ⟨⟨locs(Σ)⟩⟩, Inv(φ, Σ,σ ) and µ =(S,φ{{S}},φ |S), since ge =

⋃i gei , we know

∀i ∈ {1, . . . ,m}. gei ⊆ locs(Σ) .From

∀ai ∈ {1, . . . ,m}. (slai , geai ,γai ) 4φ (tlai , ge′ai , πai ) ,

we know∀ai ∈ {1, . . . ,m}. ∃ji ∈ index. (Fi , (ki , Σ), emp, •) 4ji

µ (Fi , (κi ,σ ), emp, •) .

Since t ∈ {t1, . . . , tn}, from(Ft, (kt, Σ), emp, •) 4jt

µ (Ft, (κt,σ ), emp, •) ,

we know there exists j such that(Ft, (kt, Σ), emp, ◦) 4j

µ (Ft, (κt,σ ), emp, ◦) .

Then, by the following Lemma 58, we know

((T, t, 𝕕, Σ), emp) 4jµ ((T , t, 𝕕,σ ), emp) .

Lemma 58. If1. T = {t1 ; (sl1, F1, k1), . . . , tn ; (sln, Fn, kn)}, T = {t1 ; (tl1, F1,κ1), . . . , tn ; (tln, Fn,κn)},2. t ∈ {t1, . . . , tn}, (Ft, (kt, Σ),∆0, ◦) 4i

µ (Ft, (κt,σ ), δ0, ◦), and RC(Ft, µ .S, (kt, Σ), ◦),3. for any t′ , t, there exists it′ ∈ index such that (Ft′, (kt′, Σ0), emp, •) 4

it′µ (Ft′, (κt′,σ0), emp, •), andRC(Ft′, µ .S, (kt′, Σ0), •),

4. ∀t′ , t. (Ft′ ∩ Ft = ∅) ∧ (Ft′ ∩ Ft = ∅), and for any t, we have wd(slt) and wd(tlt),5. Σ0

U−∆0 .ws========= Σ, σ0

U−δ0 .ws========= σ , forward(Σ0, Σ), forward(σ0,σ ),

then ((T, t, 𝕕, Σ),∆0) 4iµ ((T , t, 𝕕,σ ), δ0).

Proof. By co-induction. Let W = (T, t, 𝕕, Σ) and W = (T , t, 𝕕,σ ). We need to prove the following:

1. if (T, t, 𝕕, Σ) :τ=⇒∆W′, then one of the following holds:

a. there exists j such thati. j < i , andii. (W′,∆0 ∪ ∆) 4j

µ (W , δ0);b. or, there exist W ′, δ and j such that

i. (T , t, 𝕕,σ ) :τ=⇒δ+W ′, and

ii. FPmatch(µ, (∆0 ∪ ∆, Σ), (δ0 ∪ δ , Ft)), andiii. (W′,∆0 ∪ ∆) 4j

µ (W ′, δ0 ∪ δ );

Proof. From (T, t, 𝕕, Σ) :τ=⇒∆W′, we know there exist k′

t and Σ′ such that

W′ = (T{t ; (slt, Ft, k′t)}, t, 𝕕, Σ′) , Ft ⊢ (kt, Σ)

τ7−→∆

(k′t, Σ

′)

From (Ft, (kt, Σ),∆0, ◦) 4iµ (Ft, (κt,σ ), δ0, ◦) and RC(Ft, µ .S, (kt, Σ), ◦), we know one of the following holds:

(A) there exists j such that(1) j < i , and(2) (Ft, (k′

t, Σ′),∆0 ∪ ∆, ◦) 4j

µ (Ft, (κt,σ ), δ0, ◦);(B) or, there exist κ ′

t, σ′, δ and j such that

(3) Ft ⊢ (κt,σ )τ7−→δ+(κ ′

t,σ′), and

(4) FPmatch(µ, (∆0 ∪ ∆, Σ), (δ0 ∪ δ , Ft)), and(5) (Ft, (k′

t, Σ′),∆0 ∪ ∆, ◦) 4j

µ (Ft, (κ′t,σ

′), δ0 ∪ δ , ◦);For case (A):From Ft ⊢ (kt, Σ)

τ7−→∆

(k′t, Σ

′), by wd(slt), we know

ΣU−∆.ws======== Σ′ , forward(Σ, Σ′) .

41

Page 42: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

From Σ0U−∆0 .ws========= Σ and forward(Σ0, Σ), we know

Σ0U−(∆0 .ws∪∆.ws)=============== Σ′ , forward(Σ0, Σ

′) .

From (2), by the co-induction hypothesis, we know

(W′,∆0 ∪ ∆) 4jµ (W , δ0) .

For case (B):From (3), we know there exists W ′ such that

W ′ = (T {t ; (tlt, Ft,κ ′t)}, t, 𝕕,σ ′) , (T , t, 𝕕,σ ) :

τ=⇒δ+W ′

From Ft ⊢ (kt, Σ)τ7−→∆

(k′t, Σ

′), by wd(slt), we know

ΣU−∆.ws======== Σ′ , forward(Σ, Σ′) .

From Σ0U−∆0 .ws========= Σ and forward(Σ0, Σ), we know

Σ0U−(∆0 .ws∪∆.ws)=============== Σ′ , forward(Σ0, Σ

′) .

Similarly, from σ0U−δ0 .ws========= σ , forward(σ0,σ ), and (3), by wd(tlt), we know

σ0U−(δ0 .ws∪δ .ws)============== σ ′ , forward(σ0,σ ′) .

From (5), by the co-induction hypothesis, we know

(W′,∆0 ∪ ∆) 4jµ (W ′, δ0 ∪ δ ) .

Thus we are done for this clause. �

2. ∀T′, 𝕕′,o. if o , τ and ∃t′. (T, t, 𝕕, Σ) :o==⇒emp

(T′, t′, 𝕕′, Σ), then there exist W ′, δ , µ ′, T ′, σ ′ and j such that for any

t′ ∈ dom(T′),a. (T , t, 𝕕,σ ) :

τ=⇒δ∗W ′, and W ′ :

o==⇒emp

(T ′, t′, 𝕕′,σ ′), and

b. FPmatch(µ, (∆0, Σ), (δ0 ∪ δ ,T (t).F )), andc. ((T′, t′, 𝕕′, Σ), emp) 4j

µ′ ((T′, t′, 𝕕′,σ ′), emp);

Proof. Since ∃t′. (T, t, 𝕕, Σ) :o==⇒emp

(T′, t′, 𝕕′, Σ), we know one of the following two cases hold:(A) there exist ι and k′

t such that

ι , ret , ι , τ , Ft ⊢ (kt, Σ)ι7−→emp

(k′t, Σ) , T′ = T{t ; (slt, Ft, k′

t)}.

From (Ft, (kt, Σ),∆0, ◦) 4iµ (Ft, (κt,σ ), δ0, ◦) and RC(Ft, µ .S, (kt, Σ), ◦), we know there exist κ ′

t, σ′, δ , κ ′′

t , µ′ and jt

such that(1) Ft ⊢ (κt,σ )

τ7−→δ∗(κ ′

t,σ′), and Ft ⊢ (κ

′t,σ

′)ι7−→emp

(κ ′′t ,σ

′), and(2) blocks(∆0.rs ∪ ∆0.ws) ⊆ Ft ∪ µ .S, and(3) FPmatch(µ, (∆0, Σ), (δ0 ∪ δ , Ft)), and(4) EvolveG(µ, µ ′, Σ,σ ′, Ft, Ft), and(5) (Ft, (k′

t, Σ), emp, •) 4jtµ′ (Ft, (κ

′′t ,σ

′), emp, •).Let W ′ = (T {t ; (tlt, Ft,κ ′

t)}, t, 𝕕,σ ′). From (1), we know there exists T ′ such that for any t′,

(T , t, 𝕕,σ ) :τ=⇒δ∗W ′ , W ′ :

o==⇒emp

(T ′, t′, 𝕕′,σ ′) , T ′ = T {t ; (tlt, Ft,κ ′′t )}.

From (5), we know there exists j ′t such that

(Ft, (k′t, Σ), emp, ◦) 4

j′tµ′ (Ft, (κ

′′t ,σ

′), emp, ◦) .

For any t′ , t, from Ft′ ∩ Ft = ∅ and blocks(∆0.ws) ⊆ Ft ∪ µ .S, we know

TFt′ − µ .SU ⊆ U − ∆0.ws .42

Page 43: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

From Σ0U−∆0 .ws========= Σ, we know

Σ0TFt′−µ .SU========== Σ .

For the target states σt′ and σ , from FPmatch(µ, (∆0, Σ), (δ0 ∪ δ , Ft)), we know

δ0.ws ∪ δ .ws ⊆ TFtU ∪ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩ .

Since forward(Σ0, Σ) and µ .S ⊆ dom(Σ0), we know

locs(Σ) ∩ Tµ .SU ⊆ locs(Σ0) ∩ Tµ .SU .

Then, from Ft′ ∩ Ft = ∅, we know

TFt′U − µ . f ⟨⟨locs(Σ0) ∩ Tµ .SU⟩⟩ ⊆ U − (δ0.ws ∪ δ .ws) .

From σ0U−δ0 .ws========= σ , forward(σ0,σ ), and (1), by wd(tlt), we know

σ0U−(δ0 .ws∪δ .ws)============== σ ′ , forward(σ0,σ ′) .

Thusσ0

TFt′U−µ .f ⟨⟨locs(Σ0)∩Tµ .SU⟩⟩======================== σ ′ .

Also, since EvolveG(µ, µ ′, Σ,σ ′, Ft, Ft) and Ft′ ∩ Ft = ∅ and Ft′ ∩ Ft = ∅, we know

EvolveR(µ, µ ′, Σ,σ ′, Ft′, Ft′) .

From (Ft′, (kt′, Σ0), emp, •) 4it′µ (Ft′, (κt′,σ0), emp, •) and RC(Ft′, µ .S, (kt′, Σ0), •), we know there exists jt′ such that

(Ft′, (kt′, Σ), emp, ◦) 4jt′µ′ (Ft′, (κt′,σ

′), emp, ◦) .

Also, by Lemma 59, we know

(Ft′, (kt′, Σ), emp, •) 4it′µ′ (Ft′, (κt′,σ

′), emp, •) .

Thus, for any t′ (no matter whether t′ = t or t′ , t), by the co-induction hypothesis, we know there exists j such that

((T′, t′, 𝕕′, Σ), emp) 4jµ′ ((T

′, t′, 𝕕′,σ ′), emp) .

(B) there exists k′t such that

Ft ⊢ (kt, Σ)ret7−−→emp

(k′t, Σ) , T′ = T\t , 𝕕′ = 𝕕\t .

From (Ft, (kt, Σ),∆0, ◦) 4iµ (Ft, (κt,σ ), δ0, ◦) and RC(Ft, µ .S, (kt, Σ), ◦), we know there exist κ ′

t, σ′, δ , κ ′′

t , µ′ and jt

such that(1) Ft ⊢ (κt,σ )

τ7−→δ∗(κ ′

t,σ′), and Ft ⊢ (κ

′t,σ

′)ret7−−→emp

(κ ′′t ,σ

′), and(2) blocks(∆0.rs ∪ ∆0.ws) ⊆ Ft ∪ µ .S, and(3) FPmatch(µ, (∆0, Σ), (δ0 ∪ δ , Ft)), and(4) EvolveG(µ, µ ′, Σ,σ ′, Ft, Ft).Let W ′ = (T {t ; (tlt, Ft,κ ′

t)}, t, 𝕕,σ ′). From (1), we know there exists T ′ such that for any t′ ∈ dom(T \t), we have

(T , t, 𝕕,σ ) :τ=⇒δ∗W ′ , W ′ :

o==⇒emp

(T ′, t′, 𝕕′,σ ′) , T ′ = T \t .

Similarly as (A), we know there exists jt′ such that

(Ft′, (kt′, Σ), emp, ◦) 4jt′µ′ (Ft′, (κt′,σ

′), emp, ◦) .

Thus, by the co-induction hypothesis, we know

((T′, t′, 𝕕′, Σ), emp 4jt′µ′ ((T

′, t′, 𝕕′,σ ′), emp) .

Thus we are done for this clause. �

3. if (T, t, 𝕕, Σ) :τ==⇒emp

done, then there exist W ′ and δ such that

a. (T , t, 𝕕,σ ) :τ=⇒δ∗W ′, and W ′ :

τ==⇒emp

done, and

b. FPmatch(µ, (∆0, Σ), (δ0 ∪ δ , Ft)).43

Page 44: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

thread t thread t ′A B

CD

E

F

G

H

I J

Rely Rely

RelyRely

Figure 24. The execution in the auxiliary lemmas for the compositionality proofs.

Proof. From (T, t, 𝕕, Σ) :τ==⇒emp

done, we know

Ft ⊢ (kt, Σ)ret7−−→emp

(k′t, Σ) , dom(T) = {t} .

From (Ft, (kt, Σ),∆0, ◦) 4iµ (Ft, (κt,σ ), δ0, ◦) and RC(Ft, µ .S, (kt, Σ), ◦), we know there exist κ ′

t, δ , σ′, κ ′′

t such that

Ft ⊢ (κt,σ )τ7−→δ∗(κ ′

t,σ′) , Ft ⊢ (κ

′t,σ

′)ret7−−→emp

(κ ′′t ,σ

′) , FPmatch(µ, (∆0, Σ), (δ0 ∪ δ , Ft)) .

Let W ′ = (T {t ; (tlt, Ft,κ ′t)}, t, 𝕕,σ ′). Since dom(T ) = {t}, we know

(T , t, 𝕕,σ ) :τ=⇒δ∗W ′ , W ′ :

τ==⇒emp

done .

Thus we are done. �

Thus we have finished the proof. �

Lemma 59 (From A to D in Fig. 24). If1. (F, (k, ΣA), emp, •) 4i

µA (F , (κ,σA), emp, •),

2. ΣATF−µA .SU========== ΣD ,σA

TFU−µA .f ⟨⟨locs(Σ)∩TµA .SU⟩⟩========================= σD , forward(ΣA, ΣD ), forward(σA,σD ), and EvolveR(µA, µD , ΣD ,σD , F, F ),

then (F, (k, ΣD ), emp, •) 4iµD (F , (κ,σD ), emp, •).

Proof. By co-induction. From (F, (k, ΣA), emp, •) 4iµA (F , (κ,σA), emp, •), we know

wf(µA, (ΣA, F), (σA, F )) .

From EvolveR(µA, µD , ΣD ,σD , F, F ), we know

wf(µD , (ΣD , F), (σD , F )), µA. f ⊆ µD . f , Inv(µD . f , ΣD ,σD ),µD .S = cl(µA.S, ΣD ), µD .S = cl(µA.S,σD ), (µD .S − µA.S) ∩ F = ∅ .

For any σG , ΣG , µG , suppose

ΣDTF−µD .SU========== ΣG , σD

TFU−µD .f ⟨⟨locs(ΣD )∩TµD .SU⟩⟩========================== σG , forward(ΣD , ΣG ), forward(σD ,σG ), EvolveR(µD , µG , ΣG ,σG , F, F ).

From (µD .S − µA.S) ∩ F = ∅ and ΣDTF−µD .SU========== ΣG , we know

ΣDTF−µA .SU========== ΣG .

From ΣATF−µA .SU========== ΣD , we know

ΣATF−µA .SU========== ΣG .

44

Page 45: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

From (µD .S − µA.S) ∩ F = ∅, we know

µD . f {{µD .S − µA.S}} ⊆ µD . f {{µD .S − F}} .

Since µD . f {{µD .S − F}} ∩ F = ∅, we know

µD . f {{µD .S − µA.S}} ∩ F = ∅ .

Thus

µD . f ⟨⟨(locs(ΣD ) ∩ TµD .SU) − (locs(ΣA) ∩ TµA.SU)⟩⟩ ∩ TFU = ∅ .

From µA. f ⊆ µD . f and µA.S ⊆ dom(µ . f ), we know

(µD . f ⟨⟨locs(ΣD ) ∩ TµD .SU⟩⟩ − µA. f ⟨⟨locs(ΣA) ∩ TµA.SU⟩⟩) ∩ TFU = ∅ .

From σDTFU−µD .f ⟨⟨locs(ΣD )∩TµD .SU⟩⟩========================== σG , we know

σDTFU−µA .f ⟨⟨locs(ΣA)∩TµA .SU⟩⟩========================== σG .

From σATFU−µA .f ⟨⟨locs(ΣA)∩TµA .SU⟩⟩========================== σD , we know

σATFU−µA .f ⟨⟨locs(ΣA)∩TµA .SU⟩⟩========================== σG .

From forward(ΣA, ΣD ), forward(σA,σD ), forward(ΣD , ΣG ) and forward(σD ,σG ), we know

forward(ΣA, ΣG ), forward(σA,σG ) .

Let µE = (SE , SE , fE ), where SE = cl(µA.S, ΣG ), SE = cl(µA.S,σG ) and fE = (µG . f )|SE . Since µD .S = cl(µA.S, ΣD ), we knowµA.S ⊆ µD .S. Since µG .S = cl(µD .S, ΣG ), we know SE ⊆ µG .S. Similarly we know SE ⊆ µG .S . Since dom(µG . f ) = µG .S, weknow dom(fE ) = SE . Since µA. f ⊆ µD . f ⊆ µG . f and µA.S ⊆ SE , we know

µA. f ⊆ fE .

Since Inv(µG . f , ΣG ,σG ), dom(µA. f ) = µA.S and (µA. f ){{µA.S}} ⊆ µA.S , by Lemma 61, we know (µG . f ){{SE }} ⊆ SE . Thus

fE {{SE }} ⊆ SE .

Since wf(µG , (ΣG , F), (σG , F )), we know

wf(µE , (ΣG , F), (σG , F )) .

Since Inv(µG . f , ΣG ,σG ) and SE = cl(SE , ΣG ), by Lemma 62, we know

Inv(fE , ΣG ,σG ) .

Since SE ⊆ µG .S, (µG .S − µD .S) ∩ F = ∅ and (µD .S − µA.S) ∩ F = ∅, we know

(SE − µA.S) ∩ F = ∅ .

Thus we know

EvolveR(µA, µE , ΣG ,σG , F, F ).

From (F, (k, ΣA), emp, •) 4iµA (F , (κ,σA), emp, •), we know there exists j such that

(F, (k, ΣG ), emp, ◦) 4jµE (F , (κ,σG ), emp, ◦).

Since (µG .S − µD .S) ∩ F = ∅, (µD .S − µA.S) ∩ F = ∅ and (SE − µA.S) ∩ F = ∅, we know

(µG .S − SE ) ∩ F = ∅ .

By Lemma 60, we know

(F, (k, ΣG ), emp, ◦) 4jµG (F , (κ,σG ), emp, ◦).

Thus we are done. �45

Page 46: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

Lemma 60 (From E to G in Fig. 24). If1. (F, (k, Σ),∆0, ◦) 4i

µE (F , (κ,σ ), δ0, ◦),

2. ΣGU−∆0 .ws========= Σ, σG

U−δ0 .ws========= σ , forward(ΣG , Σ), forward(σG ,σ ), µE = (SE , SE , fE ), µG = (SG , SG , fG ),

wf(µE , (ΣG , F), (σG , F )), wf(µG , (ΣG , F), (σG , F )), SE = cl(SE , ΣG ), SG = cl(SG , ΣG ), SE = cl(SE ,σG ), SG = cl(SG ,σG ),SE ⊆ SG , SE ⊆ SG , fE = fG |SE , (SG − SE ) ∩ F = ∅,

then (F, (k, Σ),∆0, ◦) 4iµG (F , (κ,σ ), δ0, ◦).

Proof. By co-induction. Since forward(ΣG , Σ) and wf(µG , (ΣG , F), (σ , F )), we know

wf(µG , (Σ, F), (σ , F )).

Also we need to prove the following.

1. ∀k′, Σ′,∆. if F ⊢ (k, Σ) τ7−→∆

(k′, Σ′), then one of the following holds:a. there exists j such that

i. j < i , andii. (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µG (F , (κ,σ ), δ0, ◦);b. or, there exist κ ′, σ ′, δ and j such that

i. F ⊢ (κ,σ )τ7−→δ+(κ ′,σ ′), and

ii. FPmatch(µG , (∆0 ∪ ∆, Σ), (δ0 ∪ δ , F )), andiii. (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µG (F , (κ ′,σ ′), δ0 ∪ δ , ◦).Proof. From (F, (k, Σ),∆0, ◦) 4i

µE (F , (κ,σ ), δ0, ◦), we know one of the following holds:(A) there exists j such that

(1) j < i , and(2) (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µE (F , (κ,σ ), δ0, ◦);(B) or, there exist κ ′, σ ′, δ and j such that(3) F ⊢ (κ,σ )

τ7−→δ+(κ ′,σ ′), and

(4) FPmatch(µE , (∆0 ∪ ∆, Σ), (δ0 ∪ δ , F )), and(5) (F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µE (F , (κ ′,σ ′), δ0 ∪ δ , ◦).For case (A):From F ⊢ (k, Σ) τ

7−→∆

(k′, Σ′), by wd(sl), we know

ΣU−∆.ws======== Σ′ , forward(Σ, Σ′) .

Since ΣGU−∆0 .ws========= Σ and forward(ΣG , Σ), we know

ΣGU−(∆0∪∆).ws============ Σ′, forward(ΣG , Σ′) .

From (2), by the co-induction hypothesis, we know(F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µG (F , (κ,σ ), δ0, ◦) .For case (B):Since (F ⊢ (k, Σ) τ

7−→∆

(k′, Σ′), ΣGU−∆0 .ws========= Σ and forward(ΣG , Σ), we know

ΣGU−(∆0∪∆).ws============ Σ′, forward(ΣG , Σ′) .

From (3), σGU−δ0 .ws========= σ and forward(σG ,σ ), by wd(tl), we know

σGU−(δ0∪δ ).ws============ σ ′, forward(σG ,σ ′) .

From (5), by the co-induction hypothesis, we know(F, (k′, Σ′),∆0 ∪ ∆, ◦) 4j

µG (F , (κ ′,σ ′), δ0 ∪ δ , ◦) .From (4), by Lemma 63, we know

FPmatch(µG , (∆0 ∪ ∆, Σ), (δ0 ∪ δ , F )) .Thus we have done for this clause. �

46

Page 47: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

2. ∀k′, ι. if ι , τ and F ⊢ (k, Σ) ι7−→emp

(k′, Σ),then there exist κ ′, δ , σ ′, κ ′, µH and j such thata. F ⊢ (κ,σ )

τ7−→δ∗(κ ′,σ ′), and

F ⊢ (κ ′,σ ′)ι7−→emp

(κ ′′,σ ′), andb. blocks(∆0.rs ∪ ∆0.ws) ⊆ F ∪ SG andc. FPmatch(µG , (∆0, Σ), (δ0 ∪ δ , F )), andd. EvolveG(µG , µH , Σ,σ ′, F, F ), ande. (F, (k′, Σ), emp, •) 4j

µH (F , (κ ′′,σ ′), emp, •).Proof. From (F, (k, Σ),∆0, ◦) 4i

µE (F , (κ,σ ), δ0, ◦), we know there exist κ ′, δ , σ ′, κ ′, µF and j such that(1) F ⊢ (κ,σ )

τ7−→δ∗(κ ′,σ ′), and

F ⊢ (κ ′,σ ′)ι7−→emp

(κ ′′,σ ′), and(2) blocks(∆0.rs ∪ ∆0.ws) ⊆ F ∪ SE and(3) FPmatch(µE , (∆0, Σ), (δ0 ∪ δ , F )), and(4) EvolveG(µE , µF , Σ,σ ′, F, F ), and(5) (F, (k′, Σ), emp, •) 4j

µF (F , (κ ′′,σ ′), emp, •).From (2), since SE ⊆ SG , we know

blocks(∆0.rs ∪ ∆0.ws) ⊆ F ∪ SG .From (3), by Lemma 63, we know

FPmatch(µG , (∆0, Σ), (δ0 ∪ δ , F )) .Suppose µF = (SF , SF , fF ). From (4), we know

wf(µF , (Σ, F), (σ ′, F )), fE ⊆ fF , Inv(fF , Σ,σ ′),SF = cl(SE , Σ), SF = cl(SE ,σ ′), (SF − SE ) ⊆ F .

Let µH = (SH , SH , fH ), where SH = cl(SG , Σ), SH = cl(SG ,σ ′) and fH = fG ∪ (fF |SH−SG ). Thus we know SG ⊆ SH ,SG ⊆ SH and fG ⊆ fH . Since SE ⊆ SG and SE ⊆ SG , we know SF ⊆ SH and SF ⊆ SH . By Lemma 64, we know

SH − SG ⊆ SF .Since SE ⊆ SG and (SF − SE ) ⊆ F, we know

SH − SG ⊆ SF − SE ⊆ F .Since SF ⊆ SH and (SG − SE ) ∩ F = ∅, we know

SF − SE ⊆ (SH − SE ) ∩ F = ((SH − SG ) ∪ (SG − SE )) ∩ F = SH − SG .Also we know dom(fH ) = SH . Since fF {{SF }} ⊆ SF , we know

fF {{SH − SG }} ⊆ SF = cl(SE ,σ ′) ⊆ cl(SG ,σ ′) = SH .Since fG {{SG }} ⊆ SG , we know

fH {{SH }} ⊆ SH .Also we know

fH = fG ∪ (fF |SH−SG ) = (fG |SG−SE ) ∪ fE ∪ (fF |SH−SG ) = (fG |SG−SE ) ∪ fF .Since injective(fG , ΣG ), dom(fG ) = SG ⊆ dom(ΣG ) and forward(ΣG , Σ), we know

injective(fG , Σ) .Since (SG − SE ) ∩ F = ∅, fG {{SG − F}} ∩ F = ∅, (SH − SG ) ⊆ F and fF {{SF ∩ F}} ⊆ F , we know

fG {{SG − SE }} ∩ F = ∅ , fF {{SH − SG }} ⊆ F .Since fE ⊆ fF and injective(fF , Σ), we know

injective(fH , Σ) .Since SF ⊆ dom(Σ), SG ⊆ dom(ΣG ) and forward(ΣG , Σ), we know

SH ⊆ dom(Σ) .From (1), by wd(tl), we know

σU−δ .ws======== σ ′, forward(σ ,σ ′) .

47

Page 48: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

Since σGU−δ0 .ws========= σ and forward(σG ,σ ), we know

σGU−(δ0∪δ ).ws============ σ ′, forward(σG ,σ ′) .

From (3), we know(δ0 ∪ δ ).ws ⊆ TFU ∪ fG ⟨⟨locs(Σ) ∩ TSEU⟩⟩ .

Since fG {{SG − SE }} ∩ F = ∅ and injective(fG , Σ), we knowfG ⟨⟨locs(Σ) ∩ TSG − SEU⟩⟩ ∩ (TFU ∪ fG ⟨⟨locs(Σ) ∩ TSEU⟩⟩) = ∅ .

ThusfG ⟨⟨locs(Σ) ∩ TSG − SEU⟩⟩ ⊆ U − (δ0 ∪ δ ).ws .

As a result, since fH = (fG |SG−SE ) ∪ fF , forward(ΣG , Σ), fG ⟨⟨locs(ΣG )⟩⟩ ⊆ locs(σG ) and fF ⟨⟨locs(Σ)⟩⟩ ⊆ locs(σ ′), we knowfH ⟨⟨locs(Σ)⟩⟩= fG ⟨⟨locs(Σ) ∩ TSG − SEU⟩⟩ ∪ fF ⟨⟨locs(Σ)⟩⟩⊆ (fG ⟨⟨locs(Σ) ∩ TSG − SEU⟩⟩ ∩ fG ⟨⟨locs(ΣG ) ∩ TSG − SEU⟩⟩) ∪ locs(σ ′)

⊆ ((U − (δ0 ∪ δ ).ws) ∩ locs(σG )) ∪ locs(σ ′)

⊆ locs(σ ′)

Thus we have proved wf(µH , Σ,σ ′). Since fH = fG ∪ (fF |SH−SG ), fG {{SG ∩ F}} ⊆ F and fF {{SH − SG }} ⊆ F , we knowfH {{SH ∩ F}}⊆ fG {{SG ∩ F}} ∪ fF {{(SH − SG ) ∩ F}}⊆ F

Also, since fG {{SG − F}} ∩ F = ∅ and (SH − SG ) ⊆ F, we knowfH {{SH − F}} ∩ F⊆ (fG {{SG − F}} ∪ fF {{(SH − SG ) − F}}) ∩ F= ∅

Thus we have proved wf(µH , (Σ, F), (σ ′, F )). Since Inv(fG , ΣG ,σG ) and forward(ΣG , Σ), we know: for any b, n, b ′ and n′,if (b,n) ∈ locs(Σ) and (fG |SG−SE )⟨(b,n)⟩ = (b ′,n′), we know

(b ′,n′) ∈ locs(σG ) , ΣG (b)(n)fG↪→ σG (b

′)(n′) .Since blocks(∆0.ws) ⊆ F ∪ SE and (SG − SE ) ∩ F = ∅, we know

TSG − SEU ⊆ U − ∆0.ws .

Since ΣGU−∆0 .ws========= Σ and b ∈ SG − SE , we know

ΣG (b)(n) = Σ(b)(n) .

Since fG ⟨⟨locs(Σ) ∩ TSG − SEU⟩⟩ ⊆ U − (δ0 ∪ δ ).ws and σGU−(δ0∪δ ).ws============ σ ′, we know

(b ′,n′) ∈ locs(σ ′) , σG (b′)(n′) = σ ′(b ′)(n′) .

Thus we know

Σ(b)(n)fG↪→ σ ′(b ′)(n′) .

Since fH = fG ∪ (fF |SH−SG ) = (fG |SG−SE ) ∪ fF and Inv(fF , Σ,σ ′), we knowInv(fH , Σ,σ ′) .

Thus we have proved EvolveG(µG , µH , Σ,σ ′, F, F ). Also, since SF − SE = SH − SG and (SG − SE ) ∩ F = ∅, we know(SH − SF ) ∩ F = ∅ .

Below we consider the case when (F, (k′, Σ), emp, •) 4jµF (F , (κ ′′,σ ′), emp, •). We prove

(F, (k′, Σ), emp, •) 4jµH (F , (κ ′′,σ ′), emp, •) (E.2)

Proof: by co-induction. For any σ J , Σ J , µ J , suppose µ J = (SJ , S J , f J ) and

ΣTF−SHU========= Σ J , σ ′ TFU−fH ⟨⟨locs(Σ)∩TSHU⟩⟩

===================== σ J , forward(Σ, Σ J ), forward(σ ′,σ J ), EvolveR(µH , µ J , Σ J ,σ J , F, F ).

Since (SH − SF ) ∩ F = ∅ and ΣTF−SHU========= Σ J , we know

ΣTF−SFU======== Σ J .

Since fH {{SH − F}} ∩ F = ∅, we know48

Page 49: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

fH {{SH − SF }} ∩ F = ∅ .Since fF ⊆ fH and SF ⊆ dom(fF ), we know

(fH ⟨⟨locs(Σ) ∩ TSHU⟩⟩ − fF ⟨⟨locs(Σ) ∩ TSFU⟩⟩) ∩ TFU = ∅ .

Since σ ′ TFU−fH ⟨⟨locs(Σ)∩TSHU⟩⟩===================== σ J , we know

σ ′ TFU−fF ⟨⟨locs(Σ)∩TSFU⟩⟩===================== σ J .

Let µI = (SI , SI , fI ), where SI = cl(SF , Σ J ), SI = cl(SF ,σ J ) and fI = f J |SI . Since SF ⊆ SH , SF ⊆ SH , SJ = cl(SH , Σ J ) andS J = cl(SH ,σ J ), we know

SI ⊆ SJ , SI ⊆ S J .Since dom(f J ) = SJ , we know

dom(fI ) = SI .Since fF ⊆ fH , fH ⊆ f J and SF ⊆ SI , we know

fF ⊆ fI ⊆ f J .Since Inv(f J , Σ J ,σ J ), dom(fF ) = SF and fF {{SF }} ⊆ SF , by Lemma 61, we know f J {{SI }} ⊆ SI . Thus

fI {{SI }} ⊆ SI .Since wf(µ J , (Σ J , F), (σ J , F )), we know

wf(µI , (Σ J , F), (σ J , F )) .Since Inv(f J , Σ J ,σ J ) and SI = cl(SI , Σ J ), by Lemma 62, we know

Inv(fI , Σ J ,σ J ) .Since SI ⊆ SJ , (SJ − SH ) ∩ F = ∅ and (SH − SF ) ∩ F = ∅, we know

(SI − SF ) ∩ F = ∅ .Thus we know

EvolveR(µF , µI , Σ J ,σ J , F, F ).From (F, (k′, Σ), emp, •) 4j

µF (F , (κ ′′,σ ′), emp, •), we know there exists j ′ such that

(F, (k′, Σ J ), emp, ◦) 4j′µI (F , (κ

′′,σ J ), emp, ◦) .Since (SI − SF ) ∩ F = ∅, (SJ − SH ) ∩ F = ∅ and (SH − SF ) ∩ F = ∅, we know

(SJ − SI ) ∩ F = ∅ .Thus by the co-induction hypothesis, we know

(F, (k′, Σ J ), emp, ◦) 4j′µ J (F , (κ

′′,σ J ), emp, ◦) .Thus we have proved (E.2). Thus we have done for this clause. �

Thus we are done. �

Lemma 61. If f1 ⊆ f2, dom(f1) = S1, f1{{S1}} ⊆ S1, S2 = cl(S1, Σ) ⊆ dom(f2), S2 = cl(S1,σ ) and Inv(f2, Σ,σ ), then f2{{S2}} ⊆ S2.

Proof. We prove: for any k , f2{{clk (S1, Σ)}} ⊆ clk (S1,σ ). By induction over k .• Base case: k = 0. From f1 ⊆ f2, dom(f1) = S1 and f1{{S1}} ⊆ S1, we are done.• Inductive step: k = n + 1. For any b ′ and b ′1 such that b ′ ∈ cln+1(S1, Σ) and f2(b

′) = (b ′1, _), we know there exist b, n andn′ such that

b ∈ cln(S1, Σ) , Σ(b)(n) = (b ′,n′) .Since cln(S1, Σ) ⊆ dom(f2), we know there exist b1 and n1 such that

f2⟨(b,n)⟩ = (b1,n1) .By the induction hypothesis, we know f2{{cln(S1, Σ)}} ⊆ cln(S1,σ ). Thus

b1 ∈ cln(S1,σ ) .From Inv(f2, Σ,σ ), we know

f2⟨(b′,n′)⟩ = σ (b1)(n1) .

Thus b ′1 ∈ cln+1(S1,σ ). Thus f2{{cln+1(S1, Σ)}} ⊆ cln+1(S1,σ ).Thus we are done. �

49

Page 50: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

Lemma 62. If f1 ⊆ f2, dom(f1) = S1, S1 = cl(S1, Σ), and Inv(f2, Σ,σ ), then Inv(f1, Σ,σ ).

Proof. For any b, n, b ′ and n′ such that (b,n) ∈ locs(Σ) and f1⟨(b,n)⟩ = (b ′,n′), since f1 ⊆ f2, we know f2⟨(b,n)⟩ = (b ′,n′).Since Inv(f2, Σ,σ ), we know

(b ′,n′) ∈ locs(σ ), Σ(b)(n)f2↪→ σ (b ′)(n′) .

Suppose Σ(b)(n) = v1 and σ (b ′)(n′) = v2. Thus

(v1 < U) ∧ (v1 = v2) ∨ v1 ∈ U ∧v2 ∈ U ∧ f2⟨v1⟩ = v2 .

If v1 ∈ U, suppose v1 = (b1,n1). Since b ∈ dom(f1) = S1 and S1 = cl(S1, Σ), we know b1 ∈ S1. Thus f1⟨v1⟩ = v2. Thus we know

Σ(b)(n)f1↪→ σ (b ′)(n′) .

Thus we are done. �

Lemma 63. If µE = (SE , SE , fE ), µG = (SG , SG , fG ), SE ⊆ SG , fE ⊆ fG , SE ⊆ dom(fE ), fG {{SG −F}} ∩ F = ∅, (SG −SE ) ∩F = ∅

and FPmatch(µE , (∆, Σ), (δ , F )), then FPmatch(µG , (∆, Σ), (δ , F )).

Proof. From FPmatch(µE , (∆, Σ), (δ , F )), we know

δ .rs ∪ δ .ws ⊆ TFU ∪ fE ⟨⟨locs(Σ) ∩ TSEU⟩⟩,δ .rs ∩ fE ⟨⟨locs(Σ) ∩ TSEU⟩⟩ ⊆ fE ⟨⟨locs(Σ) ∩ ∆.rs⟩⟩,δ .ws ∩ fE ⟨⟨locs(Σ) ∩ TSEU⟩⟩ ⊆ fE ⟨⟨locs(Σ) ∩ ∆.ws⟩⟩ .

Since SE ⊆ SG and fE ⊆ fG , we know

δ .rs ∪ δ .ws ⊆ TFU ∪ fG ⟨⟨locs(Σ) ∩ TSGU⟩⟩ .

Since (SG − SE ) ∩ F = ∅, we know

fG {{SG − SE }} ⊆ fG {{SG − F}} .

Since fG {{SG − F}} ∩ F = ∅, we know

fG {{SG − SE }} ∩ F = ∅ .

Thus

fG ⟨⟨(locs(Σ) ∩ TSGU) − (locs(Σ) ∩ TSEU)⟩⟩ ∩ TFU = ∅ .

From fE ⊆ fG and SE ⊆ dom(fE ), we know

(fG ⟨⟨locs(Σ) ∩ TSGU⟩⟩ − fE ⟨⟨locs(Σ) ∩ TSEU⟩⟩) ∩ TFU = ∅ .

Thusδ .rs ∩ fG ⟨⟨locs(Σ) ∩ TSGU⟩⟩= (δ .rs ∩ fE ⟨⟨locs(Σ) ∩ TSEU⟩⟩) ∪ (δ .rs ∩ TFU ∩ (fG ⟨⟨locs(Σ) ∩ TSGU⟩⟩ − fE ⟨⟨locs(Σ) ∩ TSEU⟩⟩))⊆ fE ⟨⟨locs(Σ) ∩ ∆.rs⟩⟩ ⊆ fG ⟨⟨locs(Σ) ∩ ∆.rs⟩⟩

Similarly we can also prove

δ .ws ∩ fG ⟨⟨locs(Σ) ∩ TSGU⟩⟩ ⊆ fG ⟨⟨locs(Σ) ∩ ∆.ws⟩⟩

Thus we are done. �

Lemma 64. If SG = cl(SG , ΣG ), SH = cl(SG , Σ), SF = cl(SE , Σ), (SG −SE )∩F = ∅, ΣGU−∆0 .ws========= Σ and blocks(∆0.ws) ⊆ F∪SE ,

then SH − SG ⊆ SF .

Proof. For any b ∈ SH − SG , since SH = cl(SG , Σ) and SG = cl(SG , ΣG ), we know there exist b0, b1, . . . , bk , n0, n1, . . . , nk , n,msuch that

b0 ∈ SG , ∀i ∈ [0..k). Σ(bi )(ni ) = (bi+1,ni+1), Σ(bk )(nk ) = (b,n),0 ≤ m ≤ k , Σ(bm)(nm) , ΣG (bm)(nm), ∀i ∈ [0..m). Σ(bi )(ni ) = ΣG (bi )(ni ) .

50

Page 51: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

Since ΣGU−∆0 .ws========= Σ and blocks(∆0.ws) ⊆ F ∪ SE , we know

bm ∈ F ∪ SE .

Also we know bm ∈ cl(SG , ΣG ) = SG . Since (SG − SE ) ∩ F = ∅, we know

bm ∈ SE .

Thus b ∈ cl(SE , Σ) = SF . Thus we are done. �

E.4 Proof of Flip of Simulation (Lemma 50)To prove Lemma 50, we first define a relation ≃ between source and target programs. We also define a relation ≃i which isparameterized with an index i . This index describes the number of steps at the target that corresponds to zero steps of thesource.

Definition 65 (Whole-Program Existential Relation).Let P = letΓ in f1| . . . |fm , P = letΠ in f1| . . . |fm , Γ = {(sl1, ge1,γ1), . . . , (slm, gem,γm)} andΠ = {(tl1, ge′1, π1), . . . , (tlm, ge

′m, πm)}.

P ≃φ P iff ∀i ∈ {1, . . . ,m}. φ⟨⟨gei ⟩⟩ = ge′i , and

for any Σ, σ , µ, W, W , if initM(φ,⋃

gei , Σ,σ ), µ = (dom(Σ), dom(σ ),φ), (P,σ ) :load==⇒ W , and (P, Σ) :

load==⇒ W, then W ≃

(emp,emp)µ W .

Here we define W ≃(∆0,δ0)µ W as the largest relation such that whenever W ≃

(∆0,δ0)µ W , then the following are true:

1. W.t = W .t, and W.𝕕 = W .𝕕, and wf(µ, W.Σ,W .σ );2. one of the following holds:a. there exist W′, ∆, W ′ and δ such that

i. W :τ=⇒∆

+W′, and W :τ=⇒δ+W ′, and

ii. FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0 ∪ δ , ((W .T )(W .t)).F )), andiii. W′ ≃

(∆0∪∆,δ0∪δ )µ W ′;

b. or, there exist W′, ∆, W ′, δ , µ ′, T′, T ′, 𝕕′, o, Σ′ and σ ′ such thati. W :

τ=⇒∆

∗W′, and W :τ=⇒δ∗W ′, and

ii. FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0 ∪ δ , ((W .T )(W .t)).F )), andiii. for any t′ ∈ dom(T′), we haveA. W′ :

o==⇒emp

(T′, t′, 𝕕′, Σ′), and W ′ :o==⇒emp

(T ′, t′, 𝕕′,σ ′), and o , τ , and

B. (T′, t′, 𝕕′, Σ′) ≃(emp,emp)µ′ (T ′, t′, 𝕕′,σ ′);

c. or, there exist W′, ∆, W ′ and δ such thati. W :

τ=⇒∆

∗W′, and W′ :τ==⇒emp

done, and W :τ=⇒δ∗W ′, and W ′ :

τ==⇒emp

done, and

ii. FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0 ∪ δ , ((W .T )(W .t)).F )).With index. Also we define W ≃

(i ,(∆0,δ0))µ W as the largest relation such that whenever W ≃

(i ,(∆0,δ0))µ W , then the following

are true:1. W.t = W .t, and W.𝕕 = W .𝕕, and wf(µ, W.Σ,W .σ );2. one of the following holds:a. there exist j, W′, ∆, W ′ and δ such that

i. W :τ=⇒∆

+W′, and W :τ=⇒δi+1W ′, and

ii. FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0 ∪ δ , ((W .T )(W .t)).F )), andiii. W′ ≃

(j ,(∆0∪∆,δ0∪δ ))µ W ′;

b. or, there exist j, W′, ∆, W ′, δ , µ ′, T′, T ′, 𝕕′, o, Σ′ and σ ′ such thati. W :

τ=⇒∆

∗W′, and W :τ=⇒δi W ′, and

ii. FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0 ∪ δ , ((W .T )(W .t)).F )), andiii. for any t′ ∈ dom(T′), we haveA. W′ :

o==⇒emp

(T′, t′, 𝕕′, Σ′), and W ′ :o==⇒emp

(T ′, t′, 𝕕′,σ ′), and o , τ , and51

Page 52: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

B. (T′, t′, 𝕕′, Σ′) ≃(j ,(emp,emp))µ′ (T ′, t′, 𝕕′,σ ′);

c. or, there exist W′, ∆, W ′ and δ such thati. W :

τ=⇒∆

∗W′, and W′ :τ==⇒emp

done, and W :τ=⇒δi W ′, and W ′ :

τ==⇒emp

done, and

ii. FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0 ∪ δ , ((W .T )(W .t)).F )).

Lemma 66. If (W,∆0) 4iµ (W , δ0) and Safe(W), then W ≃

(∆0,δ0)µ W .

Lemma 67. If W ≃(∆0,δ0)µ W , then there exists i such that W ≃

(i ,(∆0,δ0))µ W .

Lemma 68. If W ≃(i ,(∆0,δ0))µ W and ∀t. det((W .T )(t).tl), then (W , δ0) 6i

µ (W,∆0).

Then, for Lemma 24, from let Γ in f1 | . . . | fm 4ge,φ let Π in f1 | . . . | fm , we know∀i ∈ {1, . . . ,m}. dom(sli .InitCore(γi )) = dom(tli .InitCore(πi )) .

For any T, T , t, 𝕕, Σ, σ , µ, S and S , if (P, Σ) :load==⇒ (T, t, 𝕕, Σ), (P,σ ) :

load==⇒ (T , t, 𝕕,σ ), ge ⊆ locs(Σ), dom(Σ) = S ⊆ dom(φ),

cl(S, Σ) = S, locs(σ ) = φ⟨⟨locs(Σ)⟩⟩, Inv(φ, Σ,σ ) and µ = (S,φ{{S}},φ |S), from let Γ in f1 | . . . | fm 4ge,φ let Π in f1 | . . . | fm ,we know there exists i ∈ index such that

((T, t, 𝕕, Σ), emp) 4iµ ((T , t, 𝕕,σ ), emp) .

By Lemmas 66, 67 and 68, we know there exists j such that ((T , t, 𝕕,σ ), emp) 6jµ ((T, t, 𝕕, Σ), emp). Thus we are done.

Proof of Lemma 66. We prove the following strengthened result:

If W :τ=⇒∆′0

∗W0, W :τ=⇒δ ′0

∗W0 and (W0,∆0 ∪ ∆′0) 4

iµ (W0, δ0 ∪ δ ′

0), then W ≃(∆0,δ0)µ W .

By co-induction. Then, by induction over i (using the well-founded ordering of index).1. i = 0.

From Safe(W), we know ¬(W0 :τ==⇒emp

abort), we know there exists W′ and ∆′ such that

W0 :τ=⇒∆′W′, or ∃o , τ . W0 :

o==⇒empW′, or W0 :

τ==⇒emp

done

a. If W0 :τ=⇒∆′W′, from (W0,∆0 ∪ ∆′

0) 4iµ (W0, δ0 ∪ δ ′

0), we know there exist W ′, δ ′ and j such that

i. W0 :τ=⇒δ ′

+W ′, and

ii. FPmatch(∆0 ∪ ∆′0 ∪ ∆′, δ0 ∪ δ ′

0 ∪ δ ′, µ, (W0.Σ, ((W .T )(W .t)).F )), andiii. (W′,∆0 ∪ ∆′

0 ∪ ∆′) 4jµ (W ′, δ0 ∪ δ ′

0 ∪ δ ′).Thus we know

W :τ====⇒∆′0∪∆

+W′, and W :τ====⇒δ ′0∪δ

+W ′ .

From FPmatch(∆0 ∪ ∆′0 ∪ ∆′, δ0 ∪ δ ′

0 ∪ δ ′, µ, (W0.Σ, ((W .T )(W .t)).F )), by Lemma 57, we knowFPmatch(∆0 ∪ ∆′

0 ∪ ∆′, δ0 ∪ δ ′0 ∪ δ ′, µ, (W.Σ, ((W .T )(W .t)).F )) .

Since W′ :τ==⇒emp

∗W′, W ′ :τ==⇒emp

∗W ′ and (W′,∆0 ∪∆′0 ∪∆′) 4j

µ (W ′, δ0 ∪δ ′0 ∪δ ′), by the co-induction hypothesis, we know

W′ ≃(∆0∪∆

′0∪∆

′,δ0∪δ ′0∪δ

′)

µ W ′ .Thus we are done.

b. If o , τ and W0 :o==⇒empW′, by the operational semantics, we know there exist T′, 𝕕′ and Σ′ such that for any t′ ∈ dom(T′),

we haveW0 :

o==⇒emp

(T′, t′, 𝕕′, Σ′) .

From (W0,∆0 ∪ ∆′0) 4

iµ (W0, δ0 ∪ δ ′

0), we know there exist W ′, δ ′, µ ′, T ′, σ ′ and j such that for any t′ ∈ dom(T′), wehavei. W0 :

τ=⇒δ ′

∗W ′, and W ′ :o==⇒emp

(T ′, t′, 𝕕′,σ ′), and

ii. FPmatch(∆0 ∪ ∆′0, δ0 ∪ δ ′

0 ∪ δ ′, µ, (W0.Σ, ((W .T )(W .t)).F )), andiii. ((T′, t′, 𝕕′, Σ′), emp) 4j

µ′ ((T′, t′, 𝕕′,σ ′), emp).

52

Page 53: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

Since ((T′, t′, 𝕕′, Σ′), emp) 4jµ′ ((T

′, t′, 𝕕′,σ ′), emp), by the co-induction hypothesis, we know(T′, t′, 𝕕′, Σ′) ≃

(emp,emp)µ′ (T ′, t′, 𝕕′,σ ′) .

Thus we are done.c. If W0 :

τ==⇒emp

done, from (W0,∆0 ∪ ∆′0) 4

iµ (W0, δ0 ∪ δ ′

0), we know there exist W ′ and δ ′ such that

i. W0 :τ=⇒δ ′

∗W ′, and W ′ :τ==⇒emp

done, and

ii. FPmatch(∆0 ∪ ∆′0, δ0 ∪ δ ′

0 ∪ δ ′, µ, (W0.Σ, ((W .T )(W .t)).F )).Thus we are done.

2. i > 0.From Safe(W), we know ¬(W0 :

τ==⇒emp

abort), we know there exists W′ and ∆′ such that

W0 :τ=⇒∆′W′, or ∃o , τ . W0 :

o==⇒empW′, or W0 :

τ==⇒emp

done

a. If W0 :τ=⇒∆′W′, from (W0,∆0 ∪ ∆′

0) 4iµ (W0, δ0 ∪ δ ′

0), we know one of the following holds:i. there exists j such thatA. j < i , andB. (W′,∆0 ∪ ∆′

0 ∪ ∆′) 4jµ (W0, δ0 ∪ δ ′

0);ii. or, there exist W ′, δ ′ and j such thatA. W0 :

τ=⇒δ ′

+W ′, and

B. FPmatch(∆0 ∪ ∆′0 ∪ ∆′, δ0 ∪ δ ′

0 ∪ δ ′, µ, (W0.Σ, ((W .T )(W .t)).F )), andC. (W′,∆0 ∪ ∆′

0 ∪ ∆′) 4jµ (W ′, δ0 ∪ δ ′

0 ∪ δ ′).For case (i), we know

W :τ====⇒∆′0∪∆

∗W′, and W :τ====⇒δ ′0∪δ

∗W ′ .

Since j < i , by the induction hypothesis, we knowW ≃

(∆0,δ0)µ W .

For case (ii), the proof is similar to the above 1(a).b. If ∃o , τ . W0 :

o==⇒empW′′, the proof is similar to the above 1(b).

c. If W0 :τ==⇒emp

done, the proof is similar to the above 1(c).

Thus we are done. �

Proof of Lemma 67. We prove W ≃(i ,(∆0,δ0))µ W holds if W.t = W .t, and W.𝕕 = W .𝕕, and wf(µ, W.Σ) and one of the following

holds:1. there exist W′, ∆, W ′ and δ such thata. W :

τ=⇒∆

+W′, and W :τ=⇒δi+1W ′, and

b. FPmatch(∆0 ∪ ∆, δ0 ∪ δ , µ, (W.Σ, ((W .T )(W .t)).F )), andc. W′ ≃

(∆0∪∆,δ0∪δ )µ W ′;

2. or, there exist W′, ∆, W ′, δ , µ ′, T′, T ′, 𝕕′, o, Σ′ and σ ′ such thata. W :

τ=⇒∆

∗W′, and W :τ=⇒δi W ′, and

b. FPmatch(∆0 ∪ ∆, δ0 ∪ δ , µ, (W.Σ, ((W .T )(W .t)).F )), andc. for any t′ ∈ dom(T′), we have

i. W′ :o==⇒emp

(T′, t′, 𝕕′, Σ′), and W ′ :o==⇒emp

(T ′, t′, 𝕕′,σ ′), and o , τ , and

ii. (T′, t′, 𝕕′, Σ′) ≃(emp,emp)µ′ (T ′, t′, 𝕕′,σ ′);

3. or, there exist W′, ∆, W ′ and δ such thata. W :

τ=⇒∆

∗W′, and W′ :τ==⇒emp

done, and W :τ=⇒δi W ′, and W ′ :

τ==⇒emp

done, and

b. FPmatch(∆0 ∪ ∆, δ0 ∪ δ , µ, (W.Σ, ((W .T )(W .t)).F )).We prove the above by co-induction. Thus we are done. �

53

Page 54: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

Proof of Lemma 68. By co-induction. We need to prove the following:

1. ∀W ′, δ . if W :τ=⇒δW ′, then one of the following holds:

a. there exists j such thati. j < i , andii. (W ′, δ0 ∪ δ ) 6j

µ (W,∆0);b. there exist W′, ∆ and j such that

i. W :τ=⇒∆

+W′, and

ii. FPmatch(∆0 ∪ ∆, δ0 ∪ δ , µ, (W.Σ, ((W .T )(W .t)).F )), andiii. (W ′, δ0 ∪ δ ) 6j

µ (W′,∆0 ∪ ∆).

Proof. By inversion of W ≃(i ,(∆0,δ0))µ W , we know one of the following holds:

(A) there exist W1, ∆, W1, W2, δ1, δ2 and i ′ such that(1) W :

τ=⇒∆

+W1, and W :τ=⇒δ1

W1, and W1 :τ=⇒δ2

i W2, and

(2) FPmatch(∆0 ∪ ∆, δ0 ∪ δ1 ∪ δ2, µ, (W.Σ, ((W .T )(W .t)).F )), and(3) W1 ≃

(i′,(∆0∪∆,δ0∪δ1∪δ2))µ W2.

W

W

W1

W2

First, from (3), by the co-induction hypothesis, we know

(W2, δ0 ∪ δ1 ∪ δ2) 6i′µ (W1,∆0 ∪ ∆) (E.3)

Then we prove:

∀W0,n0,n1, δ′, δ ′′. (W1 :

τ=⇒δ ′

n0 W0) ∧ (W0 :τ=⇒δ ′′

n1 W2) =⇒ (W0, δ0 ∪ δ1 ∪ δ ′) 6i′+n1µ (W1,∆0 ∪ ∆) (E.4)

By induction over n1.i. n1 = 0. Thus W0 = W2. Since ∀t. det((W .T )(t).tl), we know δ ′ = δ2. By (E.3), we are done.ii. n1 = m + 1. Thus there exist W ′

0 , δ′0 and δ

′1 such that δ ′′ = δ ′

0 ∪ δ ′1, W0 :

τ=⇒δ ′0

W ′0 and W ′

0 :τ=⇒δ ′1

m W2. By the induction

hypothesis, we know(W ′

0 , δ0 ∪ δ1 ∪ δ ′ ∪ δ ′0) 6

i′+mµ (W1,∆0 ∪ ∆) .

To prove (W0, δ0 ∪δ1 ∪δ ′) 6i′+m+1µ (W1,∆0 ∪∆), by co-induction. Since ∀t. det((W .T )(t).tl), we can finish the proof.

Thus we are done of (E.4). Thus, we know (E.5) holds.

(W1, δ0 ∪ δ1) 6i′+iµ (W1,∆0 ∪ ∆) (E.5)

Finally, since ∀t. det((W .T )(t).tl), we know W1 = W′ and δ1 = δ . From (2), we know

FPmatch(∆0 ∪ ∆, δ0 ∪ δ1, µ, (W.Σ, ((W .T )(W .t)).F )).From (1) and (E.5), we are done.

(B) there exist i ′, W1, ∆, W1, δ1, µ ′, T′, T ′, 𝕕′, o, Σ′ and σ ′ such that(1) W :

τ=⇒∆

∗W1, and W :τ=⇒δ1

i W1, and

(2) FPmatch(∆0 ∪ ∆, δ0 ∪ δ1, µ, (W.Σ, ((W .T )(W .t)).F )), and(3) for any t′ ∈ dom(T′), we haveA. W1 :

o==⇒emp

(T′, t′, 𝕕′, Σ′), and W1 :o==⇒emp

(T ′, t′, 𝕕′,σ ′), and o , τ , and

B. (T′, t′, 𝕕′, Σ′) ≃(i′,(emp,emp))µ′ (T ′, t′, 𝕕′,σ ′).

W

W

W1

W1

o

o

or

W,W1

WW1

o

o

54

Page 55: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

First, from (3), by the co-induction hypothesis, we know

((T ′, t′, 𝕕′,σ ′), emp) 6i′µ′ ((T

′, t′, 𝕕′, Σ′), emp) . (E.6)

Since ∀t. det((W .T )(t).tl), we know

(W1, δ0 ∪ δ1) 60µ (W1,∆0 ∪ ∆) . (E.7)

Also i > 0 and there exists δ ′1 such that δ1 = δ ∪ δ ′

1,W :

τ=⇒δW ′ and W ′ :

τ=⇒δ ′1

i−1W1 .

Similarly to the above case (A), we can prove (E.4). Then we know

(W ′, δ0 ∪ δ ) 6i−1µ (W1,∆0 ∪ ∆) (E.8)

Since ∀t. det((W .T )(t).tl), from (1) and (2), we are done.(C) there exist W1, ∆, W1 and δ1 such that

(1) W :τ=⇒∆

∗W1, and W1 :τ==⇒emp

done, and W :τ=⇒δ1

i W1, and W1 :τ==⇒emp

done, and

(2) FPmatch(∆0 ∪ ∆, δ0 ∪ δ1, µ, (W.Σ, ((W .T )(W .t)).F )).Since ∀t. det((W .T )(t).tl), we know

(W1, δ0 ∪ δ1) 60µ (W1,∆0 ∪ ∆) . (E.9)

Also i > 0 and there exists δ ′1 such that δ1 = δ ∪ δ ′

1,W :

τ=⇒δW ′ and W ′ :

τ=⇒δ ′1

i−1W1 .

Similarly to the above case, we can prove (E.4). Then we know

(W ′, δ0 ∪ δ ) 6i−1µ (W1,∆0 ∪ ∆) (E.10)

Since ∀t. det((W .T )(t).tl), from (1) and (2), we are done.Thus we are done. �

2. ∀T ′, 𝕕′,σ ′,o. if o , τ and ∃t′.W :o==⇒emp

(T ′, t′, 𝕕′,σ ′), then there exist W′, ∆, µ ′, T′, Σ′ and j such that for any t′ ∈ dom(T ′),we havea. W :

τ=⇒∆

∗W′, and W′ :o==⇒emp

(T′, t′, 𝕕′, Σ′), and

b. FPmatch(∆0 ∪ ∆, δ0, µ, (W.Σ, ((W .T )(W .t)).F )), andc. ((T ′, t′, 𝕕′,σ ′), emp) 6j

µ′ ((T′, t′, 𝕕′, Σ′), emp).

Proof. By inversion of W ≃(i ,(∆0,δ0))µ W , since ∀t. det((W .T )(t).tl), we know there exist i ′, W1, ∆, µ ′, T′, T ′, 𝕕′, o, Σ′ and

σ ′ such that(1) W :

τ=⇒∆

∗W1, and

(2) FPmatch(∆0 ∪ ∆, δ0, µ, (W.Σ, ((W .T )(W .t)).F )), and(3) for any t′ ∈ dom(T′), we have

i. W1 :o==⇒emp

(T′, t′, 𝕕′, Σ′), and W :o==⇒emp

(T ′, t′, 𝕕′,σ ′), and o , τ , and

ii. (T′, t′, 𝕕′, Σ′) ≃(i′,(emp,emp))µ′ (T ′, t′, 𝕕′,σ ′).

From (3), by the co-induction hypothesis, we know((T ′, t′, 𝕕′,σ ′), emp) 6i′

µ′ ((T′, t′, 𝕕′, Σ′), emp) .

Thus we are done. �

3. if W :τ==⇒emp

done, then there exist W′ and ∆ such that

a. W :τ=⇒∆

∗W′, and W′ :τ==⇒emp

done, and

b. FPmatch(∆0 ∪ ∆, δ0, µ, (W.Σ, ((W .T )(W .t)).F )).Proof. By inversion of W ≃

(i ,(∆0,δ0))µ W , since ∀t. det((W .T )(t).tl), we know there exist W1 and ∆ such that

(1) W :τ=⇒∆

∗W1, and W1 :τ==⇒emp

done, and W :τ==⇒emp

done, and55

Page 56: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

(2) FPmatch(∆0 ∪ ∆, δ0, µ, (W.Σ, ((W .T )(W .t)).F )).Thus we are done. �

Thus we have finished the proof. �

E.5 Proof of NPDRF Preservation (Lemma 51)We only need to prove: for any P, P , φ, Σ and σ , if P 6φ P, initM(φ,GE(P.Γ), Σ,σ ), and (P,σ ) Z=⇒ Race, then (P,σ ) Z=⇒ Race.

From (P,σ ) Z=⇒ Race, we have two cases:

1. there exist W , t1, t2, δ1, d1 δ2, d2 such that (P,σ ) :load==⇒ W , t1 , t2, NPpred(W , t1, (δ1,d1)), NPpred(W , t2, (δ2,d2)) and

¬((δ1,d1)⌣ (δ2,d2)).From NPpred(W , t1, (δ1,d1)) and NPpred(W , t2, (δ2,d2)), we know there exist T , 𝕕, σ , tl1, π1, F1, κ1, κ ′

1, σ′1 , tl2, π2, F2, κ2,

κ ′2, σ

′2 such that

W = (T , _, 𝕕,σ ), T (t1) = (tl1, (π1, F1),κ1), (π1, F1) ⊢ (κ1,σ )τ7−→δ1

∗(κ ′1,σ

′1), 𝕕(t1) = d1,

T (t2) = (tl2, (π2, F2),κ2), (π2, F2) ⊢ (κ2,σ )τ7−→δ2

∗(κ ′2,σ

′2), 𝕕(t1) = d2

By the operational semantics, we know

(P,σ ) :load==⇒ (T , t1, 𝕕,σ ), (P,σ ) :

load==⇒ (T , t2, 𝕕,σ ), d1 = d2 = 0,

(T , t1, 𝕕,σ ) :τ=⇒δ1

∗(T {t1 ; (tl1, (π1, F1),κ ′1), t1, 𝕕,σ

′1), (T , t2, 𝕕,σ ) :

τ=⇒δ2

∗(T {t2 ; (tl2, (π2, F2),κ ′2), t2, 𝕕,σ

′2)

From P 6φ P, we know there exist f1, . . . , fn , Γ = {(sl1,γ1), . . . , (slm,γm)} and Π = {(tl1, π1), . . . , (tlm, πm)}, such thatP = let Γ in f1 | . . . | fm , P = let Π in f1 | . . . | fm and ∀i ∈ {1, . . . ,m}. dom(sli .InitCore(γi )) = dom(tli .InitCore(πi )).Thus we know there exists T such that

(P, Σ) :load==⇒ (T, t1, 𝕕, Σ), (P, Σ) :

load==⇒ (T, t2, 𝕕, Σ).

Let S = dom(Σ), S = φ{{S}} and µ = (S, S,φ |S). We know there exists i1, i2 ∈ index such that((T , t1, 𝕕,σ ), emp) 6i1

µ ((T, t1, 𝕕, Σ), emp), ((T , t2, 𝕕,σ ), emp) 6i2µ ((T, t2, 𝕕, Σ), emp) .

By Lemma 69, we know there exist W ′1 , δ

′1, W

′1, ∆

′1 W

′2 , δ

′2, W

′2 and ∆′

2 such that

(T , t1, 𝕕,σ ) :τ=⇒δ ′1

∗W ′1 , (T, t1, 𝕕, Σ) :

τ=⇒∆′1

∗W′1, δ1 ⊆ δ ′

1, FPmatch(µ, (∆′1, Σ), (δ

′1, F1));

(T , t2, 𝕕,σ ) :τ=⇒δ ′2

∗W ′2 , (T, t2, 𝕕, Σ) :

τ=⇒∆′2

∗W′2, δ2 ⊆ δ ′

2, FPmatch(µ, (∆′2, Σ), (δ

′2, F2)).

Let W = (T, t1, 𝕕, Σ). ThenNPpred(W, t1, (∆

′1,d1)) and NPpred(W, t2, (∆

′2,d2)) .

Since ¬((δ1,d1)⌣ (δ2,d2)), we know¬(δ1 ⌣ δ2) and ¬(d1 = d2 = 1) .

Since δ1 ⊆ δ ′1 and δ2 ⊆ δ ′

2, we know¬(δ ′

1 ⌣ δ ′2) .

Since (P,σ ) :load==⇒ W , we know

F1 ∩ F2 = ∅ .Since wf(µ, Σ,σ ), by Lemma 70, we know

¬(∆′1 ⌣ ∆′

2) .Thus ¬((∆′

1,d1)⌣ (∆′2,d2)). Thus we have

(P, Σ) Z=⇒ Race .

2. there exist W and n such that (P,σ ) :load==⇒ W and W :Z=⇒n Race.

Suppose W = (T , t, 𝕕,σ ). From P 6φ P, we know there exist f1, . . . , fn , Γ = {(sl1,γ1), . . . , (slm,γm)} and Π =

{(tl1, π1), . . . , (tlm, πm)}, such that P = let Γ in f1 | . . . | fm , P = let Π in f1 | . . . | fm and∀i ∈ {1, . . . ,m}. dom(sli .InitCore(γi )) = dom(tli .InitCore(πi )). Thus we know there exists T such that

(P, Σ) :load==⇒ (T, t, 𝕕, Σ) .56

Page 57: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

Let S = dom(Σ), S = dom(σ ), µ = (S, S,φ |S) and ∆0 = δ0 = emp. We know there exists i ∈ index such that

((T , t, 𝕕,σ ), δ0) 6iµ ((T, t, 𝕕, Σ),∆0) .

We only need to prove (T, t, 𝕕, Σ) :Z=⇒+ Race. By induction over n.a. n = 1.

From (T , t, 𝕕,σ ) : Z=⇒ Race, we know there exist W ′, o, t1, t2, δ1, δ2, d1 and d2 such that(T , t, 𝕕,σ ) :

o==⇒emp

W ′ , o = e ∨ o = sw , t1 , t2 ,

NPpred(W ′, t1, (δ1,d1)) , NPpred(W ′, t2, (δ2,d2)) , ¬((δ1,d1)⌣ (δ2,d2))

Suppose W ′ = (T ′, t′, 𝕕′,σ ). From (T , t, 𝕕,σ ) :o==⇒emp

W ′, o = e ∨ o = sw and ((T , t, 𝕕,σ ), δ0) 6iµ ((T, t, 𝕕, Σ),∆0), we

know there exist ∆′′, W′′, µ ′, T′, Σ′ and j such that(T, t, 𝕕, Σ) :

τ=⇒∆′′

∗W′′ , W′′ :o==⇒emp

(T′, t1, 𝕕′, Σ′) , ((T ′, t1, 𝕕′,σ ), emp) 6jµ′ ((T

′, t1, 𝕕′, Σ′), emp)

W′′ :o==⇒emp

(T′, t2, 𝕕′, Σ′) , ((T ′, t2, 𝕕′,σ ), emp) 6jµ′ ((T

′, t2, 𝕕′, Σ′), emp) .

From NPpred(W ′, t1, (δ1,d1)) and NPpred(W ′, t2, (δ2,d2)), we know there exist tl1, π1, F1, κ ′1, κ

′′1 , σ

′′1 , tl2, π2, F2, κ

′2, κ

′′2 ,

σ ′′2 such that

T ′(t1) = (tl1, (π1, F1),κ ′1), (π1, F1) ⊢ (κ

′1,σ )

τ7−→δ1

∗(κ ′′1 ,σ

′′1 ), 𝕕′(t1) = d1,

T ′(t2) = (tl2, (π2, F2),κ ′2), (π2, F2) ⊢ (κ

′2,σ )

τ7−→δ2

∗(κ ′′2 ,σ

′′2 ), 𝕕′(t1) = d2, F1 ∩ F2 = ∅ .

By the operational semantics, we know(T ′, t1, 𝕕′,σ ) :

τ=⇒δ1

∗(T ′{t1 ; (tl1, (π1, F1),κ ′′1 ), t1, 𝕕

′,σ ′′1 ), (T ′, t2, 𝕕′,σ ) :

τ=⇒δ2

∗(T ′{t2 ; (tl2, (π2, F2),κ ′′2 ), t2, 𝕕

′,σ ′′2 )

By Lemma 69, we know there exist W ′1 , δ

′1, W

′1, ∆

′1 W

′2 , δ

′2, W

′2 and ∆′

2 such that(T ′, t1, 𝕕′,σ ) :

τ=⇒δ ′1

∗W ′1 , (T′, t1, 𝕕′, Σ′) :

τ=⇒∆′1

∗W′1, δ1 ⊆ δ ′

1, FPmatch(µ ′, (∆′1, Σ

′), (δ ′1, F1));

(T ′, t2, 𝕕′,σ ) :τ=⇒δ ′2

∗W ′2 , (T′, t2, 𝕕′, Σ′) :

τ=⇒∆′2

∗W′2, δ2 ⊆ δ ′

2, FPmatch(µ ′, (∆′2, Σ

′), (δ ′2, F2)).

Let W′ = (T′, t1, 𝕕′, Σ′). ThenNPpred(W′, t1, (∆

′1,d1)) and NPpred(W′, t2, (∆

′2,d2)) .

Since ¬((δ1,d1)⌣ (δ2,d2)), we know¬(δ1 ⌣ δ2) and ¬(d1 = d2 = 1) .

Since δ1 ⊆ δ ′1 and δ2 ⊆ δ ′

2, we know¬(δ ′

1 ⌣ δ ′2) .

Since F1 ∩ F2 = ∅ and wf(µ ′, Σ′,σ ), by Lemma 70, we know¬(∆′

1 ⌣ ∆′2) .

Thus ¬((∆′1,d1)⌣ (∆′

2,d2)). Thus we have(T, t, 𝕕, Σ) Z=⇒+ Race .

b. n =m + 1. Thus we know there exist W ′, o, δ such that(T , t, 𝕕,σ ) :

o=⇒δW ′ and W ′ : Z=⇒m Race .

Since ((T , t, 𝕕,σ ), δ0) 6iµ ((T, t, 𝕕, Σ),∆0), we know there exists ∆, W′, µ ′, j, ∆′

0 and δ′0 such that

(T, t, 𝕕, Σ) :o=⇒∆

∗W′ , (W ′, δ ′0) 6

jµ′ (W

′,∆′0) .

By the induction hypothesis, we knowW′ :Z=⇒+ Race .

Thus we have(T, t, 𝕕, Σ) Z=⇒+ Race .

Thus we are done.

Lemma 69. If (W , δ0) 6iµ (W,∆0), Safe(W ) and W :

τ=⇒δ1

n W1, then there exist W ′, δ , W′ and ∆ such that

1. W :τ=⇒δ∗W ′, and W :

τ=⇒∆

∗W′, and δ1 ⊆ δ , and

2. FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0 ∪ δ , ((W .T )(W .t)).F )).57

Page 58: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Hanru Jiang, Hongjin Liang, Siyang Xiao, Junpeng Zha, and Xinyu Feng

Proof. By induction over n.

1. n = 0. Thus δ1 = emp. By induction over i (using the well-founded order).a. ¬∃j .j < i .

Since ¬(W :τ==⇒emp

abort), we know there exists W ′′ and δ ′′ such that

W :τ=⇒δ ′′

W ′′, or ∃o , τ . W :o==⇒emp

W ′′, or W :τ==⇒emp

done

i. If W :τ=⇒δ ′′

W ′′, we let W ′ = W ′′ and δ = δ ′′. From (W , δ0) 6iµ (W,∆0) and ¬∃j .j < i , we know there exist W′ and ∆

such thatW :

τ=⇒∆

∗W′, and FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0 ∪ δ , ((W .T )(W .t)).F )).

ii. If ∃o , τ . W :o==⇒emp

W ′′, we let W ′ = W . From (W , δ0) 6iµ (W,∆0), we know there exist W′ and ∆ such that

W :τ=⇒∆

∗W′, and FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0, ((W .T )(W .t)).F )) .

iii. If W :τ==⇒emp

done, we let W ′ = W . From (W , δ0) 6iµ (W,∆0), we know there exist W′ and ∆ such that

W :τ=⇒∆

∗W′, and FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0, ((W .T )(W .t)).F )) .b. ∃j .j < i .

Since ¬(W :τ==⇒emp

abort), we know there exists W ′′ and δ ′′ such that

W :τ=⇒δ ′′

W ′′, or ∃o , τ . W :o==⇒emp

W ′′, or W :τ==⇒emp

done

We consider only the case ofW :τ=⇒δ ′′

W ′′ (the other two cases are similar to the above proofs). From (W , δ0) 6iµ (W,∆0),

we know one of the following holds:i. there exist W′′, ∆′′ and j such that

W :τ=⇒∆′′

∗W′′, and j < i , and (W ′′, δ0 ∪ δ ′′) 6jµ (W′′,∆0 ∪ ∆′′) .

By the induction hypothesis, we know there exist W ′, δ , W′ and ∆ such thatW ′′ :

τ=⇒δ∗W ′ and W′′ :

τ=⇒∆

∗W′, and FPmatch(µ, (∆0 ∪ ∆′′ ∪ ∆, W′′.Σ), (δ0 ∪ δ ′′ ∪ δ , ((W ′′.T )(W ′′.t)).F )).

By the operational semantics, we know ((W ′′.T )(W ′′.t)).F = ((W .T )(W .t)).F . By Lemma 57, we knowW :

τ====⇒δ ′′∪δ

∗W ′ and W :τ====⇒∆′′∪∆

∗W′, and FPmatch(µ, (∆0 ∪ ∆′′ ∪ ∆, W.Σ), (δ0 ∪ δ ′′ ∪ δ , ((W .T )(W .t)).F )).

ii. there exist W′ and ∆ such thatW :

τ=⇒∆

∗W′, and FPmatch(µ, (∆0 ∪ ∆, W.Σ), (δ0 ∪ δ ′′, ((W .T )(W .t)).F )) .

2. n =m + 1. Thus there exist W2, δ2 and δ ′2 such that

W :τ=⇒δ2

W2, W2 :τ=⇒δ ′2

m W1, δ1 = δ2 ∪ δ ′2

From (W , δ0) 6iµ (W,∆0), we know one of the following holds:

a. there exist W2, ∆2 and j such thatW :

τ=⇒∆2

∗W2, and j < i , and (W2, δ0 ∪ δ2) 6jµ (W2,∆0 ∪ ∆2) .

By the induction hypothesis, we know there exist W ′, δ , W′ and ∆ such thatW2 :

τ=⇒δ∗W ′ and W2 :

τ=⇒∆

∗W′, and δ ′2 ⊆ δ , and FPmatch(µ, (∆0 ∪ ∆2 ∪ ∆, W2.Σ), (δ0 ∪ δ2 ∪ δ , ((W2.T )(W2.t)).F )).

By the operational semantics, we know ((W2.T )(W2.t)).F = ((W .T )(W .t)).F . By Lemma 57, we knowW :

τ===⇒δ2∪δ

∗W ′ and W :τ====⇒∆2∪∆

∗W′, and δ1 ⊆ δ2 ∪ δ , and FPmatch(µ, (∆0 ∪ ∆2 ∪ ∆, W.Σ), (δ0 ∪ δ2 ∪ δ , ((W .T )(W .t)).F )).

b. there exist W2 and ∆2 such thatW :

τ=⇒∆2

∗W2, and FPmatch(µ, (∆0 ∪ ∆2, W.Σ), (δ0 ∪ δ2, ((W .T )(W .t)).F )) and (W2, emp) 6jµ (W2, emp) .

By the induction hypothesis, we know there exist W ′, δ , W′ and ∆ such thatW2 :

τ=⇒δ∗W ′ and W2 :

τ=⇒∆

∗W′, and δ ′2 ⊆ δ , and FPmatch(µ, (∆, W2.Σ), (δ , ((W2.T )(W2.t)).F )).

58

Page 59: Towards Certified Separate Compilation for Concurrent Programs · 2019-09-01 · Towards Certified Separate Compilation for Concurrent Programs Extended Version Hanru Jiang∗ University

Towards Certified Separate Compilation for Concurrent Programs

By the operational semantics, we know ((W2.T )(W2.t)).F = ((W .T )(W .t)).F . By Lemma 57, we knowW :

τ===⇒δ2∪δ

∗W ′ and W :τ====⇒∆2∪∆

∗W′, and δ1 ⊆ δ2 ∪ δ , and FPmatch(µ, (∆, W.Σ), (δ , ((W .T )(W .t)).F )).

From FPmatch(µ, (∆0 ∪ ∆2, W.Σ), (δ0 ∪ δ2, ((W .T )(W .t)).F )), we knowFPmatch(µ, (∆0 ∪ ∆2 ∪ ∆, W.Σ), (δ0 ∪ δ2 ∪ δ , ((W .T )(W .t)).F )).

Thus we are done. �

Lemma 70. If ∆1 ⌣ ∆2, FPmatch(µ, (∆1, Σ), (δ1, F1)), FPmatch(µ, (∆2, Σ), (δ2, F2)), injective(µ . f , Σ) and F1 ∩ F2 = ∅, thenδ1 ⌣ δ2.

Proof. From FPmatch(µ, (∆1, Σ), (δ1, F1)) and FPmatch(µ, (∆2, Σ), (δ2, F2)), we know

δ1.rs ∪ δ1.ws ⊆ TF1U ∪ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩ ,δ1.rs ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩ ⊆ µ . f ⟨⟨locs(Σ) ∩ ∆1.rs⟩⟩ , δ1.ws ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩ ⊆ µ . f ⟨⟨locs(Σ) ∩ ∆1.ws⟩⟩ ;

δ2.rs ∪ δ2.ws ⊆ TF2U ∪ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩ ,δ2.rs ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩ ⊆ µ . f ⟨⟨locs(Σ) ∩ ∆2.rs⟩⟩ , δ2.ws ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩ ⊆ µ . f ⟨⟨locs(Σ) ∩ ∆2.ws⟩⟩ .

Thus we knowδ1.rs ⊆ (TF1U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∪ (δ1.rs ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩)

⊆ (TF1U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∪ (µ . f ⟨⟨locs(Σ) ∩ ∆1.rs⟩⟩ ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ,δ1.ws ⊆ (TF1U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∪ (δ1.ws ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩)

⊆ (TF1U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∪ (µ . f ⟨⟨locs(Σ) ∩ ∆1.ws⟩⟩ ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ,δ2.rs ⊆ (TF2U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∪ (δ2.rs ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩)

⊆ (TF2U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∪ (µ . f ⟨⟨locs(Σ) ∩ ∆2.rs⟩⟩ ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ,δ2.ws ⊆ (TF2U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∪ (δ2.ws ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩)

⊆ (TF2U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∪ (µ . f ⟨⟨locs(Σ) ∩ ∆2.ws⟩⟩ ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) .

We have

(TF1U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∩ (µ . f ⟨⟨locs(Σ) ∩ ∆2.rs⟩⟩ ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) = ∅ ,(TF2U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∩ (µ . f ⟨⟨locs(Σ) ∩ ∆1.rs⟩⟩ ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) = ∅ ,(TF1U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∩ (µ . f ⟨⟨locs(Σ) ∩ ∆2.ws⟩⟩ ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) = ∅ ,(TF2U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∩ (µ . f ⟨⟨locs(Σ) ∩ ∆1.ws⟩⟩ ∩ µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) = ∅ .

Since F1 ∩ F2 = ∅, we know

(TF1U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) ∩ (TF2U − µ . f ⟨⟨locs(Σ) ∩ Tµ .SU⟩⟩) = ∅ .

Since ∆1 ⌣ ∆2, we know

∆1.ws ∩ (∆2.rs ∪ ∆2.ws) = ∅ and ∆1.rs ∩ ∆2.ws = ∅ .

Since injective(µ . f , Σ), we know

µ . f ⟨⟨locs(Σ) ∩ ∆1.ws⟩⟩ ∩ µ . f ⟨⟨locs(Σ) ∩ ∆2.rs⟩⟩ = ∅ ,µ . f ⟨⟨locs(Σ) ∩ ∆1.ws⟩⟩ ∩ µ . f ⟨⟨locs(Σ) ∩ ∆2.ws⟩⟩ = ∅ ,µ . f ⟨⟨locs(Σ) ∩ ∆1.rs⟩⟩ ∩ µ . f ⟨⟨locs(Σ) ∩ ∆2.ws⟩⟩ = ∅ .

Thus we know

δ1.ws ∩ (δ2.rs ∪ δ2.ws) = ∅ and δ1.rs ∩ δ2.ws = ∅ .

Thus δ1 ⌣ δ2. �

59