Automating Construction of Provably Correct Software Viktor Kuncak EPFL School of Computer and...

Post on 27-Dec-2015

218 views 0 download

Transcript of Automating Construction of Provably Correct Software Viktor Kuncak EPFL School of Computer and...

Automating Construction ofProvably Correct Software

Viktor KuncakEPFL School of Computer and Communication Sciences

Laboratory for Automated Reasoning and Analysis

http://lara.epfl.ch

Your Wish is my Command

wish

11011001 0101110111011001 0101110111011001 0101110111011001 01011101

requirementformalization

conventionalcompilation

implementation (program): p

specification (constraint): C

How to automatically transformspecifications into implementations?

This Talk

Command

Given a list of numbers, make this list sorted

8900

6000

24140

2900

2900

6000

8900

24140

Example Wish: Sorting

wish

8900 > 6000 2900 < 6000

6000 < 8900

8900 < 24140

input output

Given a list of numbers, make this list sorted

8900

6000

24140

2900

2900

6000

8900

24140

Sorting Specification as a Program

wish

8900 > 6000 2900 < 6000

6000 < 8900

8900 < 24140

def sort_spec(input : List, output : List) : Boolean = content(output)==content(input) /\ isSorted(output)

input output

Specification (for us) is a program that checks, for a given input, whether the given output is acceptable

8900

6000

24140

2900

2900

6000

8900

24140

Specification vs Implementationdef C(i : List, o : List) : Boolean = content(o)==content(i) /\ isSorted(o)

input output

implementation

specification

true / false

def p(i : List) : List = sort i using a sorting algorithm and return the result

U p C

more behaviors

fewer behaviors

constraint on the output

function that computes the output

Synthesizing Sort in Leon System

Ivan Kuraj Philippe SuterEtienne Kneuss

OOPSLA 2013: Synthesis Modulo Recursive Functions

http://leon.epfl.ch

Example Results

Techniques used:– Leon’s verification capabilities– synthesis for theory of trees– recursion schemas– case splitting– symbolic exploration of the

space of programs– synthesis based on type

inhabitation– fast falsification using previous

counterexamples– learning conditional

expressions– cost-based search over possible

synthesis steps

Approaches and Their Guarantees

a) Check assertion while program p runs: C(i,p(i))

c) Constraint programming: once i is known, find o to satisfy a given constraint: find o such that C(i,o)

b) Verify whether program always meets the spec:

i. C(i,p(i))

d) Synthesis: solve C symbolically to obtain program p that is correct by construction, for all inputs: find p such that i.C(i,p(i)) i.e. p Crun-time compile-time

both specification C and program p are given:

only specification C is given:

Runtime Assertion Checking

a) Check assertion while program p runs: C(i,p(i))

def content(lst : List) = lst match { case Nil() Set.empty⇒ case Cons(x, xs) Set(x) ++ content(xs)⇒}def isSorted(lst : List) = lst match { case Nil() true⇒ case Cons(_, Nil()) true⇒ case Cons(x, Cons(y, ys)) ⇒ x < y && isSorted(Cons(y, ys))}

def p(i : List) : List = { sort i using a sorting algorithm and return the result} ensuring (o content(i)==content(o) && isSorted(o))⇒

Already works in Scala!Key design decision: constraints are programs

Ongoing:high-level optimization of run-time checks

Can we give stronger guarantees? prove postcondition always true

Static Verification in Leonb) Verify that program always meets spec: i. C(i,p(i))

def content(lst : List) = lst match { case Nil() Set.empty⇒ case Cons(x, xs) Set(x) ++ content(xs)⇒}def isSorted(lst : List) = lst match { case Nil() true⇒ case Cons(_, Nil()) true⇒ case Cons(x, Cons(y, ys)) ⇒ x < y && isSorted(Cons(y, ys))}

def p(i : List) : List = { sort i using a sorting algorithm and return the result} ensuring (o content(i)==content(o) && isSorted(o))⇒

Type in a Scala program and spec, see it verified

timeout

proof of i. C(i,p(i))

input i such thatnot C (i,p(i))

Insertion Sort Verified as You Type It

Web interface: http://lara.epfl.ch/leon

Reported Counterexample in Case of a Bug

Approaches and Their Guarantees

a) Check assertion while program p runs: C(i,p(i))

c) Constraint programming: once i is known, find o to satisfy a given constraint: find o such that C(i,o)

both specification C and program p are given:

only specification C is given:

b) Verify that program always meets spec:

i. C(i,p(i))

d) Synthesis: solve C symbolically to obtain program p that is correct by construction, for all inputs: find p such that i.C(i,p(i)) i.e. p Crun-time compile-time

Using Assertions to Compute• what to do when assertion fail?– presumably some values are wrong– what to change (e.g. in repair)

• alternative: leave some variables unknown (logical variables); find their values to satisfy the assertions: constraint programming

• like CLP, but– richer constraints– new compilation techniques (synthesis)– embedded in Scala, no Prolog with "cut"

Programming with Specifications

c) Constraint programming: find a value that satisfies a given constraint: find o such that C(i,o)Method: use verification technology, try to prove that no such o exists, report counter-examples!

Philippe Suter Ali Sinan Köksal

Etienne Kneuss

Sorting a List Using Specificationsdef content(lst : List) = lst match { case Nil() Set.empty⇒ case Cons(x, xs) Set(x) ++ content(xs)⇒}def isSorted(lst : List) = lst match { case Nil() true⇒ case Cons(_, Nil()) true⇒ case Cons(x, Cons(y, ys)) x < y && isSorted(Cons(y,ys))⇒}((l : List) isSorted(lst) && content(lst) == Set(0, 1, -⇒3)) .solve

> Cons(-3, Cons(0, Cons(1, Nil())))

Comparison: Date Conversion in CKnowing number of days since 1980, find current year and dayBOOL ConvertDays(UINT32 days) { year = 1980; while (days > 365) { if (IsLeapYear(year)) { if (days > 366) { days -= 366; year += 1; } } else { days -= 365; year += 1; } ... }

Enter December 31, 2008All music players (of a major brand) froze in the boot sequence.

Date Conversion using Specifications

val origin = 1980 // beginning of the universedef leapsTill(y : Int) = (y-1)/4 - (y-1)/100 + (y-1)/400

val (year, day)=choose( (year:Int, day:Int) => { days == (year-origin)*365 + leapsTill(year)-leapsTill(origin) + day && 0 < day && day <= 366}) // Choose year and day such that the property holds.

• We did not write how to compute year and day• Instead, we gave a constraint they should satisfy• We defined them implicitly, though this constraint• More freedom (can still do it the old way, if needed)• Correctness, termination simpler than with loop

Knowing number of days since 1980, find current year and day

invariants -specification

Implementation:next 30 pages

Formalizing Tree Invariants (in Scala)sealed abstract class Treecase class Empty() extends Treecase class Node(color: Color, left: Tree, value: Int, right: Tree) extends Treedef blackBalanced(t : Tree) : Boolean = t match { case Node(_,l,_,r) => blackBalanced(l) && blackBalanced(r) && blackHeight(l) ==blackHeight(r) case Empty() => true}def blackHeight(t : Tree) : Int = t match { case Empty() => 1 case Node(Black(), l, _, _) => blackHeight(l) + 1 case Node(Red(), l, _, _) => blackHeight(l)}

def rb(t: Tree) : Boolean = t match { case Empty() => true case Node(Black(), l, _, r) => rb(l) && rb(r) case Node(Red(), l, _, r) => isBlack(l) && isBlack(r) && rb(l) && rb(r)} ... def isSorted(t:Tree) = ...

Define Abstraction as ‘tree fold’def content(t: Tree) : Set[Int] = t match { case Empty() => Set.empty case Node(_, l, v, r) => content(l) ++ Set(v) ++ content(r)}

7

4 9

2 5

{ 2, 4, 5, 7, 9 }

We can now define insertion

def insert(x : Int, t : Tree) = choose(t1:Tree => isRBT(t1) && content(t1) = content(t) ++ Set(x))

Objection: it took a lot of effort to write isRBT

Answer:• no more effort than implementation - wrote some functions• these invariants is what drives data structure design• this is how things are explained in a textbook• it promotes reuse!

Evolving the Program

Suppose we have a red-black tree implementation

We only implemented ‘insert’ and ‘lookup’

Now we also need to implement ‘remove’

void RBDelete(rb_red_blk_tree* tree, rb_red_blk_node* z){ rb_red_blk_node* y; rb_red_blk_node* x; rb_red_blk_node* nil=tree->nil; rb_red_blk_node* root=tree->root; y= ((z->left == nil) || (z->right == nil)) ? z : TreeSuccessor(tree,z); x= (y->left == nil) ? y->right : y->left; if (root == (x->parent = y->parent)) { /* assignment of y->p to x->p is intentional */ root->left=x; } else { if (y == y->parent->left) { y->parent->left=x; } else { y->parent->right=x; } } if (y != z) { /* y should not be nil in this case */#ifdef DEBUG_ASSERT Assert( (y!=tree->nil),"y is nil in RBDelete\n");#endif /* y is the node to splice out and x is its child */ if (!(y->red)) RBDeleteFixUp(tree,x); tree->DestroyKey(z->key); tree->DestroyInfo(z->info); y->left=z->left; y->right=z->right; y->parent=z->parent; y->red=z->red; z->left->parent=z->right->parent=y; if (z == z->parent->left) { z->parent->left=y; } else { z->parent->right=y; } free(z); } else { tree->DestroyKey(y->key); tree->DestroyInfo(y->info); if (!(y->red)) RBDeleteFixUp(tree,x); free(y); } #ifdef DEBUG_ASSERT Assert(!tree->nil->red,"nil not black in RBDelete");#endif}

void RBDeleteFixUp(rb_red_blk_tree* tree, rb_red_blk_node* x) { rb_red_blk_node* root=tree->root->left; rb_red_blk_node* w; while( (!x->red) && (root != x)) { if (x == x->parent->left) { w=x->parent->right; if (w->red) {

w->red=0;x->parent->red=1;LeftRotate(tree,x->parent);w=x->parent->right;

} if ( (!w->right->red) && (!w->left->red) ) {

w->red=1;x=x->parent;

} else {if (!w->right->red) { w->left->red=0; w->red=1; RightRotate(tree,w); w=x->parent->right;}w->red=x->parent->red;x->parent->red=0;w->right->red=0;LeftRotate(tree,x->parent);x=root; /* this is to exit while loop */

} } else { /* the code below is has left and right switched from above */ w=x->parent->left; if (w->red) {

w->red=0;x->parent->red=1;RightRotate(tree,x->parent);w=x->parent->left;

} if ( (!w->right->red) && (!w->left->red) ) {

w->red=1;x=x->parent;

} else {if (!w->left->red) { w->right->red=0; w->red=1; LeftRotate(tree,w); w=x->parent->left;}w->red=x->parent->red;x->parent->red=0;w->left->red=0;RightRotate(tree,x->parent);x=root; /* this is to exit while loop */

} } } x->red=0;#ifdef DEBUG_ASSERT Assert(!tree->nil->red,"nil not black in RBDeleteFixUp");#endif}

140 lines of tricky C, evenreusing existing functions

remove using specifications: 2 lines

def remove(x : Int, t : Tree) = choose(t1:Tree => isRBT(t1) && content(t1)=content(t) – Set(x))

The biggest expected payoff:properties are more reusable

Further Features Supported

• computing minimal / maximal solution of constraints value using binary search

• on-the-fly construction of constraints– first-class constraints, like first-class functions– but they can also be syntactically manipulated

• enumeration of all values that satisfy constraint– application in automated test input generation– can be used like Korat tool for test generation

Approaches and Their Guarantees

a) Check assertion while program p runs: C(i,p(i))

c) Constraint programming: once i is known, find o to satisfy a given constraint: find o such that C(i,o)

both specification C and program p are given:

only specification C is given:

b) Verify that program always meets spec:

i. C(i,p(i))

d) Synthesis: solve C symbolically to obtain program p that is correct by construction, for all inputs: find p such that i.C(i,p(i)) i.e. p Crun-time compile-time

Implicit Programming (ERC project)

specification(constraint)implicit

implementation(function)explicit

x2 + y2 = 1

y = sqrt(1-x2) compute missing part of a satisfying assignment (SAT)

i is assignment for some vars of a propositional formula

o is its completion to make formula true

x

y

i

o

i

o

x

U Usynthesis

def secondsToTime(totalSeconds: Int) : (Int, Int, Int) = choose((h: Int, m: Int, s: Int) (⇒ h * 3600 + m * 60 + s == totalSeconds && h ≥ 0 && m ≥ 0 && m < 60 && s ≥ 0 && s < 60 ))

Synthesis for Linear Arithmetic

def secondsToTime(totalSeconds: Int) : (Int, Int, Int) = val t1 = totalSeconds div 3600 val t2 = totalSeconds -3600 * t1 val t3 = t2 div 60 val t4 = totalSeconds - 3600 * t1 - 60 * t3 (t1, t3, t4)

close to a wish

could infer from types

Compile-time warningsdef secondsToTime(totalSeconds: Int) : (Int, Int, Int) = choose((h: Int, m: Int, s: Int) (⇒ h * 3600 + m * 60 + s == totalSeconds && h ≥ 0 && m ≥ 0 && m ≤ 60 && s ≥ 0 && s < 60 ))

Warning: Synthesis predicate has multiple solutions for variable assignment: totalSeconds = 60Solution 1: h = 0, m = 0, s = 60Solution 2: h = 0, m = 1, s = 0

Synthesis for sets (BAPA)

def splitBalanced[T](s: Set[T]) : (Set[T], Set[T]) = choose((a: Set[T], b: Set[T]) (⇒ a.size – b.size ≤ 1 && b.size – a.size ≤ 1 && a union b == s && a intersect b == empty ))

def splitBalanced[T](s: Set[T]) : (Set[T], Set[T]) = val k = ((s.size + 1)/2).floor val t1 = k val t2 = s.size – k val s1 = take(t1, s) val s2 = take(t2, s minus s1) (s1, s2) a

b

s

PhilippeSuter

RuzicaPiskac

MikaelMayer

balanced

partition

we can conjoin specs

Synthesis for Theories 3 i + 2 o = 13 o = (13 – 3 i)/2• Wanted: "Gaussian elimination" for programs

– for linear integer equations: extended Euclid’s algorithm– need to handle disjunctions, negations, more data types

• For every formula in e.g. Presburger arithmetic– synthesis algorithm terminates– produces the most general precondition

(assertion characterizing when the result exists)– generated code always terminates and gives correct result

• If there are multiple or no solutions for some input parameters, the algorithm identifies those inputs

• Works not only for arithmetic but also for e.g. sets with sizes and for trees

• Goal: lift everything done for SMT solvers to synthesizers

assert(i % 2 == 1)

Decision & Synthesis ProceduresFor a well-defined class of formulas:

Decision procedure

• Input: a formula

Synthesis procedure

• Input: a formula, with input and output variables

• Output: a modelof the formula

• Output: a program to compute output values from input values

5a + 7x = 31

a ↦ 2x ↦ 3

Inputs: { a } outputs: { x }5a + 7x = 31

x ↦ (31 – 5a) / 7

(model-generating)

a theorem prover that always succeeds a synthesizer that always succeeds

33

Framework: Transforming Relations

34

⟦ a̅ ⟨ C1 ⟩ x ̅ ⟧ ⊦ ⟦ a̅ ⟨ C2 ⟩ x ̅ ⟧

∀ a̅, x ̅. C2 ⇒ C1 Refinement

∀ a̅. (∃ x ̅ : C1) (⇒ ∃ x ̅ : C2)Domain preservation

Input variables

Output variablesSynthesis predicate

Programs as Relations

35

⟦ a̅ ⟨ C ⟩ x ̅ ⟧ ⊦ ⟨ P | T ̅ ⟩

Input variables

Output variables Precondition

Program terms

∀ a̅. P ⇒ C[x ̅ ↦T ̅]Refinement

∀ a̅. (∃ x ̅ : C) ⇒ PDomain preservation

P (∧ x ̅ = T ̅)Represents the relation:

Compare to Quantifier Elimination• A problem of the form:

• Corresponds to constructively solving the quantifier elimination problem:

⟦ a̅ ⟨ C ⟩ x ̅ ⟧

∃ x ̅ : C( a ̅, x ̅ )

36

• In the solution, P corresponds to the result of Q.E. and T ̅ are witness terms.

⟨ P | T ̅ ⟩

Transforming Relations

37

⟦ a ̅ ⟨ C[x0↦t] ⟩ x ̅ ⟧ ⊦ ⟨ P | T ̅ ⟩ x0 vars(∉ t)

⟦ a ̅ ⟨ x0 = t ∧ C ⟩ x0;x# ⟧ ⊦ ⟨ P | let x ̅ := T ̅ in t;x ̅ ⟩One-Point

⟦ a ̅ ⟨ C1 ⟩ x ̅ ⟧ ⊦ ⟨ P | T ̅ ⟩ C1 ⇔ C2

⟦ a ̅ ⟨ C2 ⟩ x# ⟧ ⊦ ⟨ P | T ̅ ⟩Equivalence

⟦ a ̅ ⟨ C1 ⟩ x ̅ ⟧ ⊦ ⟨ P1 | T ̅1 ⟩ ⟦ a ̅ ⟨ C2 ⟩ x ̅ ⟧ ⊦ ⟨ P2 | T ̅2 ⟩

⟦ a ̅ ⟨ C1 ∨ C2 ⟩ x# ⟧ ⊦ ⟨ P1 ∨ P2 | if(P1) T ̅1 else T ̅2 ⟩Case-Split

Synthesis for Linear Integer Arithmetic

38

⟦ a ⟨ 7t ≤ a ∧ 5a ≤ 12t ⟩ t ⟧ ⊦

⟦ a ⟨ 5x + 7y = a ∧ 0 ≤ x x ∧ ≤ y ⟩ x,y ⟧ ⊦

5x + 7y = aOne-dimensional solution space.

x = -7t + 3ay = 5t – 2a

is a solution for any t.

7t ≤ a ∧ 5a ≤ 12t

t is bound on both sides, and admits a solution whenever

⌈5a/12 ≤ ⌉ ⌊3a/7⌋

⟨ ⌈5a/12 ≤ ⌉ ⌊3a/7 ⌋ | let t = ⌈5a/12 ⌉ in (-7t+3a, 5t-2a) ⟩⌈5a/12 ≤ ⌉ ⌊3a/7⌋ ⌈5a/12⌉

⟨ ⌈5a/12 ≤ ⌉ ⌊3a/7 | ⌋⌈5a/12 ⌉ ⟩

And/Or Search for Rule Applications

39

1

C

A

5

4

2

3

F

E

D

G

B

6

7

H

J

……

Driven by cost- For rule applications: size of

term contributed to program.- For (sub)problems: estimate

based on variables and boolean structure.

Synthesis problem

Rule application

Synthesis in http://lara.epfl.ch/leon

Techniques used:– Leon’s verification capabilities– synthesis for theory of trees– recursion schemas– case splitting– symbolic exploration of the

space of programs– synthesis based on type

inhabitation– fast falsification using previous

counterexamples– learning conditional

expressions– cost-based search over possible

synthesis steps

Generating Expression Terms• What do we do with problems that:– do not fall in a well-defined, synthesizable subset,– do not get simplified by decomposition?

• Use counter-example guided inductive synthesis to search over small expressions

• Two algorithms– use SMT solvers to enumerate terms and evaluate

them to find new blocking clauses– type-based enumeration (Gvero,Piskac,Kuraj)

combined with discovery of preconditions41

Approaches and Their Guarantees

a) Check assertion while program p runs: C(i,p(i))

c) Constraint programming: once i is known, find o to satisfy a given constraint: find o such that C(i,o)

both specification C and program p are given:

only specification C is given:

b) Verify that program always meets spec:

i. C(i,p(i))

d) Synthesis: solve C symbolically to obtain program p that is correct by construction, for all inputs: find p such that i.C(i,p(i)) i.e. p Crun-time compile-time

Synthesis and Constraint SolvingIf we did not find an expression that solves it in all cases, we emit a runtime call to solverResult: solver invoked only in some cases– for some components of result– for some conditions on inputs

1

C

A

5

4

2

3

F

E

D

G

B

6

7

H

J

…………

after timeout, close the remaining branches by inserting a runtime solver call

Example Data Structure with Cachecase class CTree(cache : Int, data : Tree)def inv(ct : CTree) : Boolean = isRBT(data) && (ct.data = Empty || content(ct.data) contains ct.cache) def member(v : Int, ct : CTree) : Boolean = { require(inv(ct)) choose( (x:Boolean) => x == (content(ct.data) contains v)) }

ADT and equality split, one point rule, simplificationsdef member(v : Int, ct : CTree) : Boolean = { require(inv(ct)) ct.data match { case n:Node => if (ct.cache == v) true else choose( (x:Boolean) => x == (content(ct.data) contains v)) case Empty => false }

Synthesis did not solve fully but optimized spec for 2 common cases

From In-Memory to External Sorting

• transformation rules for monad algebra of nested sequences• exploration of equivalent algorithms through

performance estimation w/ non-linear constraint solving

Ioannis Klonatos Christoph Koch Andres Nötzli Andrej SpielmannSynthesis of Out-of-Core Algorithms (SIGMOD 2013)

in-memory sort external 2k-way merge sort with blocking

C implementation

treeFold[2k]([], unfoldR( funcPow[k](mrg))

Real-World Reasoning Gap between floating points and reality– input measurement error– floating-point round-off error– numerical method error

x<y need not mean x*<y*Automated verification tools to • compute upper error bound• generate code to match mathApplied to code fragments for• embedded systems (car,train)• physics simulations OOPSLA'11,RV'12, EMSOFT'13, POPL'14

Eva Darulova

wish

11011001 0101110111011001 0101110111011001 0101110111011001 01011101

requirementformalization

conventionalcompilation

implementation (program): p

specification (constraint): C

Command

Can we help with designing specification themselves, to make programming

accessible to non-experts?

Programming by Demonstration

http://www.youtube.com/watch?v=bErU--8GRsQTry "Pong Designer" in Android Play Store

Mikael Mayer and Lomig Mégard

Describe functionality by demonstrating and modifying behaviors while the program runs

– demonstrate desired actions by moving back in time and referring to past events

– system generalizes demonstrations into rules

SPLASH Onward'13