Flyspecking Flyspeck - University of Pittsburgh · The Conjecture Kepler Conjecture (1611): – The...

Post on 23-Jun-2020

9 views 0 download

Transcript of Flyspecking Flyspeck - University of Pittsburgh · The Conjecture Kepler Conjecture (1611): – The...

Flyspecking Flyspeck

Mark AdamsRadboud University / Proof Technologies Ltd

Tom’s 60th BirthdayHappy Birthday Tom!

Overview

● Part 1: Brief History of Tom’s Proof● Part 2: The Flyspeck Proof● Part 3: Possible Pitfalls in Flyspeck● Part 4: Proof Auditing Flyspeck

PART 1

Brief History of Tom’s Proof

The Conjecture

● Kepler Conjecture (1611):– The best sphere packing is the Face-Centered

Cubic (FCC) packing

– Density = /√18 ≈ 74.048%

● In other words:– Any cuboid box of same-shaped balls has enough

space to hold at least 25.951% liquid

Outline Proof

● Part A (Main Text):– Show that can reduce to considering spheres within a

locality of k (>2) from a central sphere.

– For a given packing, consider the graph from projecting the spheres’ centres onto the surface of the central sphere, where edges connect centres with locality of k from each other. Show that, for sufficiently small k, the graph is planar.

– Show that to do better than an FCC packing, the planar graph must be “tame”.

Outline Proof

● Part B (Graph Enumeration):– Show that there are only ≈3,000 possible tame planar graphs

(modulo isomorphism)

● Part C (Non-linear Inequalities):– For each tame graph, find a set of non-linear inequalities that

must hold, and reduce each of these down to a set of linear inequalities.

● Part D (Linear Inequalities):– Show that each linear inequality set is unsatisfiable.

● Thus, by contradiction, there is no packing better than FCC

Proof Size

● The main text (Part A):– 300 pages of “traditional” mathematical text

● Graph enumeration (Part B):– 2,000 lines of Java code

● Non-linear inequalities (Part C):– Thousands of lines of C / C++

● Linear inequalities (Part D):– 100,000 inequalities in 200 variables

– Solved using CPlex and Mathematica

Publication Review

● Submitted to the Annals of Mathematics● 5 years of reviewing (1998-2003)● Referees found no significant error● But only a qualified acceptance: “99%” certain

of overall correctness

Formalisation

● Flyspeck project (2003 – 2014)● Proof completely re-expressed in terms of formal logic

– Using highly trustworthy theorem prover software

● Largest ever formalisation by most measures– 20-30 person-years of effort

– 450,000 lines of proof script + 50,000 lines of theorem prover extension

– An estimated 2,000,000,000,000 primitive inferences

– Around 20 contributors

Outsourcing

● Main text (Part A) outsourced to Vietnam● 2009 workshop

– Background mathematics and overview of proof

– How to use HOL Light

– English lessons

● Proof chopped up into around 700 lemmas● Bounty awarded upon completion of a lemma

PART 2

The Flyspeck Proof

Theorem Provers (Proof Assistants)

● Software for performing mechanised formal proof, rigidly adhering to a given formal logic

● User guides the proof using a proof script– Steps roughly correspond to “human” steps

● Breaks down proof script steps into tiny primitive inferences of the formal logic– So each step is justified in terms of the formal logic

● Manages joining the proof script steps together to prove a given lemma– Ensures that the intended result is proved by the proof script

Theorem Prover Components

● Inference kernel (LCF-style systems only)– Small collection of inference rules and definition commands that can

create theorems

– ‘thm’ made a private datatype, and ML type system ensures that all theorems must be created ulitmately via the kernel

● Parser– for turning concrete syntax into abstract syntax

● Pretty printer– For turning abstract syntax into concrete syntax

● Derived inference rules● Environment for managing a proof script

Example Proof Script Extract let MUL_POW2 = REAL_ARITH` (a*b) pow 2 = a pow 2 * b pow 2 `;;

let x = Some(0, `x pow 4 = x pow 2`) ;;

let COMPUTE_SIN_DIVH_POW2 = prove(`! (v0: real^N) va vb vc.

let betaa = dihV v0 vc va vb in

let a = arcV v0 vc vb in

let b = arcV v0 vc va in

let c = arcV v0 va vb in

let p =

&1 - cos a pow 2 - cos b pow 2 - cos c pow 2 +

&2 * cos a * cos b * cos c in

~collinear {v0, vc, va} /\ ~collinear {v0, vc, vb} ==>

( sin betaa ) pow 2 = p / ((sin a * sin b) pow 4) `,

REPEAT STRIP_TAC THEN MP_TAC (SPEC_ALL RLXWSTK ) THEN

REPEAT LET_TAC THEN SIMP_TAC[SIN_POW2_EQ_1_SUB_COS_POW2 ] THEN

REPEAT STRIP_TAC THEN REPLICATE_TAC 2 (FIRST_X_ASSUM MP_TAC) THEN

NHANH (NOT_COLLINEAR_IMP_NOT_SIN0) THEN

EXPAND_TAC "a" THEN EXPAND_TAC "b" THEN PHA THEN

SIMP_TAC[REAL_FIELD` ~( a = &0 ) /\ ~ ( b = &0 ) ==>

&1 - ( x / ( a * b )) pow 2 = (( a * b ) pow 2 - x pow 2 ) / (( a * b ) pow 2 )`;

eval "x"] THEN

ASM_SIMP_TAC[] THEN STRIP_TAC THEN

MATCH_MP_TAC (MESON[]` a = b ==> a / x = b / x `) THEN

EXPAND_TAC "p" THEN SIMP_TAC[MUL_POW2; SIN_POW2_EQ_1_SUB_COS_POW2] THEN

REAL_ARITH_TAC);;

HOL Light

● Theorem prover for the HOL logic● Simple logical core

– 10 primitive inference rules

– 3 axioms

– 2 commands for conservative definition

● Parser and pretty printer for concrete syntax● Theory library of 2,000 theorems● Overall around 800 lines of trusted code● Logical core has been proved correct

Formalisation Stages

1. Prepare the proof– Re-express in a “formalisable” form

– Symbolic; No big steps; Coherent foundation

2. Prepare the theorem prover– Ensure the library supports the proof’s foundation

3. Only then can start proving– Translate the formalisable proof into proof script

Flyspeck Preparation

● Tom made significant changes to the original proof to make more formalisable– Changed partition of geometric space (Voronoi → Marshall)

– Used hypermaps instead of planar graphs

– Detail added in various places

– Parts B/C/D were adjusted (20,000 tame hypermaps)

● John Harrison added HOL Light Multivariate library– Vectors, Determinants, Topology, Integration, Measure, …

– 190,000 lines of proof script

Formalised Proof

● Formal text (Part A) proved in 450,000 lines of HOL Light proof script

● Tame planar graphs (Part B) generated by program proved by Isabelle/HOL.

● Non-linear inequalities (Part C) automatically proved by 25,000 lines of HOL Light extension– 5,000 hours of processing

● Linear inequalities (Part D) automatically proved by a few thousand lines of HOL Light extention

PART 3

Possible Pitfalls in Flyspeck

Is the Formal Statement Correct?

● Informal statement:

A / B ≤ / √18where A is the volume occupied by a packing of same-sized spheres within a containing sphere of volume B, as the radius of B tends to infinity

● Formal statement in HOL Light:!V. packing V

==> (?c. !r. &1 <= r

==> &(CARD(V INTER ball(vec 0,r))) <=

pi * r pow 3 / sqrt(&18) + c * r pow 2)

● Are these precisely equivalent?

Multiple Sessions

● The proof is spread over multiple sessions– One HOL Light session for Parts A and D– 600 parallel HOL Light sessions for Part C– One Isabelle/HOL session for Part B to generate a list manually

transcribed for use in Part A

● How can we be sure that these sessions fit together coherently without introducing inconsistency?

● Was the process of transcribing the list correct?● Cross checks were done

– But this is not policed by theorem proving

Other Concerns

● Does the proof actually run without falling over?● Was ‘new_axiom’ used in any session?

– If used, this can introduce inconsistency

● Were any HOL Light unsoundnesses exploited?– Can use mutable strings to rename constants

– Can use Obj.magic to subvert the OCaml type system

● Were any Isabelle/HOL unsoundnesses exploited?– Known soundness bugs in some versions

● Is the display of formulae correct?– HOL Light and Isabelle/HOL have known problems– Can mean that definitions or top-level theorem isn’t what it seems

● [Demo ...]

Pedigree of Formalisation Team

● Cannot rely on pedigree of formalisation team● Although innocent error is unlikely, it cannot be dismissed

as impossible– 20 contributors

– 450,000 lines of proof script

– 50,000 lines of complex automatic extension performing over a trillion proof steps

● Malicious error cannot be dismissed either– Outsourcing makes this more likely

● All it takes is one tiny exploit!

Can anyone see the exploit in the proof script extract?

(I maliciously doctored it!)

PART 4

Proof Auditing Flyspeck

Proof Auditing

● We propose the rigorous, independent assessment of important formalisation projects– Would result in EITHER a robust justification that a

complete and correct formal proof of the original informal theorem has been performed OR exposure of flaws in the project

● Should aim to make the justification as simple as possible

● Should not assume the good intentions of the project team

Proposed Auditing Process

1. Replay original project– Run each of the sessions of the formal proof

2. Port the proof to a trusted target system– Use proof porting in the original system(s) to export proof objects

– Consolidate the proof objects into a single session in target system

3. Examine the final state of the target system– Examine the display settings

– Examine the list of axioms

– Review the statement of the top-level theorem, and its dependency graph of supporting definitions

Requirements for the Process

● Proof porting software– Must be able to efficiently and reliably record and

export proofs from the original system

– Must be able to handle proofs of very large scale

● Trusted target system– Ideally want a system that is widely trusted not to

suffer from soundness issues or display issues

Common HOL

● A standard for basic HOL system functionality● Enables portability of proofs and source code between HOL

systems– HOL4, HOL Light, ProofPower HOL, Isabelle/HOL, HOL Zero, hol90

– Currently only implemented for HOL Light and HOL Zero

● Consists of:– Application Programming Interface (API)

– Standard HOL theory

– Adapted versions of various HOL systems for the API/theory

– Proof object exporter

– Proof object importer

Common HOL API

● Interface of around 450 ML functions/values– Functional programming library (100)

– Type, term & theorem utilities (150)

– Theory extension & listing commands (40)

– Inference rules (100)

– Parsing & pretty printing (20)

– Theorems (55)

● Enables fast and reliable porting of source code between HOL systems

Common HOL: Axioms

hol90 HOL4 ProofPower HOL Light HOL Zero

IMP_ANTISYM_AX axiom derived axiom - axiom

ETA_AX axiom axiom axiom axiom axiom

SELECT_AX axiom axiom axiom axiom axiom

BOOL_CASES_AX axiom axiom axiom derived derived

INFINITY_AX axiom axiom axiom axiom axiom

Common HOL: Inference Rules

hol90 HOL4 ProofPower HOL Light HOL Zero

ABS prim prim prim prim prim

ASSUME prim prim prim prim prim

BETA (not in platform) - - - prim -

BETA_CONV prim prim prim derived prim

DISCH prim prim prim derived prim

DEDUCT_ANTISYM_RULE - - - prim derived

EQ_MP k-derived k-derived k-derived prim prim

INST {k-derived} k-derived {k-derived} prim derived

INST_TYPE {prim} {prim} {prim} {prim} prim

MK_COMB k-derived k-derived k-derived prim prim

MP prim prim prim derived prim

REFL prim prim prim prim prim

SUBST prim prim prim derived derived

Common HOL: Term Utilities

hol90 HOL4 ProofPower HOL Light HOL Zero

type_of type_of type_of type_of type_of

type_vars_in_term type_vars_in_term {term_tyvars} type_vars_in_term term_tyvars

aconv aconv (~=$) aconv alpha_eq

- rename_bvar - {alpha} rename_bvar

free_vars free_vars frees frees free_vars

free_varsl free_varsl - freesl list_free_vars

- var_occurs is_free_in {vfree_in} var_free_in

{free_in} free_in - free_in term_free_in

all_vars - - variables all_vars

all_varsl - - - list_all_vars

inst {inst} {inst} {inst} tyvar_inst

- - {var_subst} vsubst var_inst

{subst} {subst} subst subst subst

variant variant variant {variant} variant

Using Common HOL

● Successfully used to record and port Flyspeck Parts A/D between two HOL Light sessions– 1.4 billion primitive inferences

– Recording/exporting overhead is around 40% of execution time

– Requires around 300MB of RAM

– Proof objects occupy around 200MB of .tgz files

– Replays in about 15% of time for original proof

● Scope for improvement

HOL Zero

● Another implementation of the HOL logic● Designed for proof checking, not interactive proof● Simple LCF-style implementation● Extensive code comments● Carefully designed concrete syntax and pretty printer● Avoids OCaml exploits● No known flaws● $100 soundness bounty!● [Demo …]

Using HOL Zero

● Successfully used Common HOL to port Flyspeck Parts A/D to HOL Zero– First-time port did fail, but due to subtle error in

implementation of Common HOL API for HOL Light

● Significantly slower than HOL Light– 6 hours vs 33 minutes

– Due to highly conservative architecture of HOL Zero

– Could perhaps speed up

● No issues found in Parts A/D

Modus Ponens

● In Common HOL:A ⟝ P ⇒ Q B ⟝ P A ∪ B ⟝ Q● In HOL Light:A ⟝ P ⇒ Q B ⟝ P (A \ {P}) ∪ B ⟝ Q

The Future

● Full audit of Flyspeck is feasible– Parts A/B/C/D all in a single HOL session

● Two remaining challenges– Recording/replaying Non-Linear Inequalities (Part C)

– Manual port of Graph Enumeration (Part B) to HOL Light

● Need to speed up HOL Zero● Need to improve proof porting● Need to port fixity as well as definitions/proofs

Try It Yourself

● Proof Technologies website contains downloads for replaying the Main Text (Part A)– Follow link Various / Flyspeck / Flyspeck Replay

– HOL Zero and HOL SuperLight target systems

– 200MB of proof modules

● [demo …]