Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell...

22
Scalable Certification for Scalable Certification for Typed Assembly Language Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation A FTER

Transcript of Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell...

Page 1: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

Scalable Certification for Scalable Certification for Typed Assembly LanguageTyped Assembly Language

Dan Grossman (with Greg Morrisett)Cornell University

2000 ACM SIGPLAN Workshop on Types in Compilation

AFTER

Page 2: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

2

Types Types AfterAfter Compilation -- Why? Compilation -- Why?

Verifying object code is “well-behaved”

means we needn’t trust the code producer

• Producer-supplied types guide verification

• Encourages compiler robustness

• Promises efficient untrusted plug-ins

To maximize benefit, we want...

Page 3: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

3

Certified Code Design GoalsCertified Code Design Goals

• Low-level target languageavoids performance / trusted computed base trade-off

• Source-language & compiler independentavoids hacks, promotes re-use, the object-code way

• Permit efficient object codeotherwise, just interpret or monitor at run time

• Small Certificates and Fast Verificationotherwise, only small programs are possible

Still learning how to balance these needs in practice

Page 4: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

4

State of the ArtState of the Art

Low-level Compiler-independent

Efficient Code

Efficient Certification

JVML No No Yes? Yes

PCC Yes No Yes Yes

ECC Yes No No Yes

Appel/ Felty

Yes! Yes Yes? ???

TAL Yes Yes Yes (This talk)

Page 5: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

5

Scalable Certification in 15 minsScalable Certification in 15 mins

• Classification of Approaches

• Why Compiler Independence Makes Scalability Harder

• Techniques that Make TAL Work

• Experimental Results

• Summary of some lessons learned

See the paper for much, much more

Page 6: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

6

Approach #1 -- Bake It InApproach #1 -- Bake It In

If you allow only one way, no annotations needed and it’s trivial to check

Examples:

• Grouping code into procedures

• Function prologues

• Installing exception handlers

The type system is at a different level of abstraction

An analogy: RISC vs. CISC

Page 7: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

7

Approach #2 -- Don’t OptimizeApproach #2 -- Don’t Optimize

Optimizations that are expensive to prove safe are expensive to certify

Examples:

• Dynamic type tests

• Arithmetic (division by zero, array-bounds elimination)

• Memory initialized before use

Better code can make a system look worse

A new factor for where to optimize?

Page 8: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

8

Approach #3 -- ReconstructApproach #3 -- Reconstruct

Don’t write down what the verifier can

easily determineExamples:

• Don’t put types on every instruction/operand

• Omit proof steps where inversion suffices

• Re-verify target code at each “call” site (virtual inlining)

Can trade time for space or get a win/win

Analogy: source-level type inference w/o the human factor

Page 9: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

9

Approach #4 -- CompressApproach #4 -- Compress

Let gzip and domain-specific tricks

solve our problems

• For annotation size, no reason not to compress

• Easy to pipeline decompression, but certification isnot I/O bound

Then again, object code compresses too

Page 10: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

10

Approach #5 -- AbbreviateApproach #5 -- Abbreviate

Give the code producer type-level tools for parameterization and re-use

• Just (terminating) functions at the type level

• Usually easy for the code producer

• Improves certificate size, but may hurt certification time

Not much harder than implementing the lambda-calculus

Page 11: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

11

Approaches SummaryApproaches Summary

• Bake it in

• Don’t optimize

• Reconstruct

• Compress

• Abbreviate

Now let’s get our hands dirty...

Page 12: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

12

An Example – Code Pre-conditionAn Example – Code Pre-conditionint foo(int x) { return x; }

foo:MOV EAX, [ESP+0]

RETN

Pre-condition describes calling convention:

where are the arguments, results, return address,

exception handler (what’s an exception anyway), ...

Page 13: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

13

Bake it in...Bake it in...int foo(int x) { return x; }

foo:intintMOV EAX, [ESP+0]

RETN

Pre-condition describes calling convention:

where are the arguments, results, return address,

exception handler (what’s an exception anyway), ...

Page 14: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

14

Really bake it in...Really bake it in...int foo(int x) { return x; }

foo_Fii:

MOV EAX, [ESP+0]

RETN

Pre-condition describes calling convention:

where are the arguments, results, return address,

exception handler (what’s an exception anyway), ...

Page 15: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

15

Or spell it all out...Or spell it all out...int foo(int x) { return x; }

foo:a:T,b:T,c:T,r1:S,r2:S,e1:C,e2:C.{ESP: {ESP:int::r1@{EAX:exn,ESP:r2,M:e2}::r2 EAX:int, EBX:a,ESI:b,EDI:c, M:e1+e2, EBP: {EAX:exn,ESP:r2,M:e2}::r2,

}::int::r1@{EAX:exn,ESP:r2,M:e2}::r2, EBP: {EAX:exn,ESP:r2,M:e2}::r2, EBX:a, ESI:b, EDI:c, M:e1+e2}

MOV EAX, [ESP+0]

RETN

Pre-condition describes calling convention: arguments, results, return address pre-condition, callee-save registers, exception handler, ...

Page 16: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

16

What to do?What to do?

a:T,b:T,c:T,r1:S,r2:S,e1:C,e2:C.

{ESP: {ESP:int:: r1@{EAX:exn,ESP:r2,M:e2}::r2 EAX:int, EBX:a,ESI:b,EDI:c, M:e1+e2, EBP: {EAX:exn,ESP:r2,M:e2}::r2,

}::int:: r1@{EAX:exn,ESP:r2,M:e2}::r2, EBP: {EAX:exn,ESP:r2,M:e2}::r2, EBX:a, ESI:b, EDI:c, M:e1+e2}

• Compress (compiler invariants are very repetitious)

• Don’t optimize (fewer invariants)

• Abbreviate:

foo: F [int] int

F = argsresults

args

args

result

Page 17: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

17

And Reconstruction TooAnd Reconstruction Too

If we elide a pre-condition, the verifier can

re-verify the block for each predecessor

• Restrict to forward jumps to prevent loops

• Beware exponential blowup

• Bad news: Optimal type placement appears intractable

• Good news: Naive heuristics save significant space

Page 18: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

18

A real applicationA real application

A bootstrapping compiler from Popcorn to TAL

• Popcorn: • “Java w/o objects, w/ polymorphism and limited pattern-

matching”• “ML w/o closures or modules, w/ C-like core syntax”• “Safe C – pointerful, garbage collection, exceptions”

• Compiler: • Conventional• Graph-coloring register allocation, null-check elimination

• Verifier: OCaml 2.04 • System: Pentium II, 266MHz, 64MB, NT4.0

Page 19: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

19

Bottom line – it worksBottom line – it works

• Source code: 18KLOC, 39 files

• Target code: 816 Kb (335 Kb after strip)

• Target types: 419 Kb

• Compilation: 40 secs

• Assembly: 20 secs

• Verification: 34.5 secsAnd proportional to file size

Page 20: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

20

The engineering mattersThe engineering matters

(Recall: 419Kb of types, 34.5 secs to verify)

• Without abbreviations: 2041Kb• Without pre-condition elision: 550Kb• Without either: 4500Kb

• As much elision as legal: 402Kb, 740 secs

•gzip reduces the 419Kb to 163Kb

Page 21: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

21

Also studied...Also studied...

• Differences among code styles

• Techniques for speeding up the verifier

• Other forms of reconstruction

• Being “gzip-friendly”

Page 22: Scalable Certification for Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation.

September 2000TIC00 Montreal

22

Some engineering lessonsSome engineering lessons

• Compiler-independence produces large repetitious annotations.

• Abbreviations are easy and space-effective, but not time-effective.

• Overhead should never be proportional to the number of loop-free paths in the code.

• Certification bottlenecks often do not appear in small, simple programs.