Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf ·...

46
Bytecode Decompilation: Typing Etienne M. Gagnon, Laurie J. Hendren and Guillaume Marceau McGill University COMP 621: Static Analysis & Transformations Presented by Alexandre Beaulieu March 29, 2012

Transcript of Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf ·...

Page 1: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Bytecode Decompilation: Typing

Etienne M. Gagnon, Laurie J. Hendren and Guillaume Marceau

McGill University

COMP 621: Static Analysis & TransformationsPresented by Alexandre Beaulieu

March 29, 2012

Page 2: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

Outline

1 Preliminaries

2 Type Inference

3 Three Stage Algorithm

4 Conclusion

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 2 / 46

Page 3: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

prelim

Outline

1 Preliminaries

2 Type Inference

3 Three Stage Algorithm

4 Conclusion

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 3 / 46

Page 4: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

prelim

Previously on Dava

Dava: Compiler-agnostic Java bytecode decompiler

Produces very clean, human readable high level output

Executes efficiently (Under 5 seconds per method decompilation)

Optimizes output for human readability

Handles obfuscated bytecode

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 4 / 46

Page 5: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

prelim

Dava at a Glance

1 Bytecode

2 Jimple

3 Grimp

4 Control Flow Graph

5 Structure Encapsulation Tree

6 Abstract Syntax Tree

7 Java

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 5 / 46

Page 6: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

prelim

Java Bytecode: A Refresher

Important IR (Lots of JIT compilers and interpreters for it. Popular)

Supported by modern web browsers

A lot of languages compile down to it (Ada, ML, Scheme, Eiffel,Perl, . . . )

Verifiable bytecode has interesting properties

Guaranteed to be well-behaved (not well-typed)Contains some basic type information (Method signatures, Classhierarchy)

However, bytecode has some negative aspects

Not ideal for program analysis and optimization (Expression Stack!)Does not work so well for register allocation (Expression Stack!)Not easy to understand (Low-level representation)

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 6 / 46

Page 7: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

prelim

Decompiling Goals (Specific to Dava)

Compiler-Agnosticism Any verifiable bytecode should decompile properly

Efficiency Decompiling should be done within reasonable time

Readability Code should be easy to read for humans

Correctness Code should be correct and preserve original behaviour

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 7 / 46

Page 8: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

prelim

Intermediate Representations

In order to facilitate decompilation, Dava works with multiple IRs

Need some useful type information in order to generate accurateoutput

Type information in bytecode is insufficient

We need some powerful type inference

Jimple: three-address code representation

This paper focuses on a static type inference algorithm for Jimple

Grimp: Aggregated Jimple, Dava’s input.

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 8 / 46

Page 9: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

prelim

A Closer Look at Jimple

Three-address-code

Transforms stack based operations into variable based operations

Preserves all the type information provided by the bytecode

Program Analysis are much easier to run on Jimple

Makes it an ideal candidate for static type inference

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 9 / 46

Page 10: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

prelim

And from Bytecode, He Created Jimple

Transforming bytecode to Jimple is very straight forward. Here is themagical recipe:

1 Compute stack depth at each program point (Those of you whotook COMP 520 are free to feel nostalgic now)

2 Introduce a new local variable for each stack depth

3 Rewrite the instruction stream using the shiny new local variables

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 10 / 46

Page 11: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

prelim

Transformation For the Visual People

1 i l o ad_1 // 0−>12 i l o ad_2 // 1−>23 i add // 2−>14 i s t o r e _ 1 // 1−>0

1 s_1 = l_12 s_2 = l_23 s_1 = s_1 + s_24 l_1 = s_1

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 11 / 46

Page 12: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

ti

Outline

1 Preliminaries

2 Type Inference

3 Three Stage Algorithm

4 Conclusion

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 12 / 46

Page 13: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

ti

Type Inference: The General Idea

Using the limited type information available from bytecode, collecttype constraints for each identifier in the program

Using those constraints, build a constraint problem

Formulate the problem as a graph problem using the constraints andknown type hierarchy

Variable types are called soft nodesNodes belonging to the type hierarchy are called hard nodes

Find a coalescing of the graph such that there is only one hard nodeper group

Use the found coalescing of the graph to assign static types tovariables

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 13 / 46

Page 14: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

ti

A Simple Example:

p u b l i c j a v a . l ang . S t r i n g f ( ) {? a ;? b ;? c ;c = new C ( ) ;b = new B ( ) ;i f ( . . . )

a = c ;e l s e

a = b ;s = a . t o S t r i n g ( ) ;r e t u r n s ;

}

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 14 / 46

Page 15: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

ti

A Simple Example: Solution

p u b l i c j a v a . l ang . S t r i n g f ( ) {A a ;B b ;C c ;S t r i n g s ;c = new C ( ) ;b = new B ( ) ;i f ( . . . )

a = c ;e l s e

a = b ;s = a . t o S t r i n g ( ) ;r e t u r n s ;

}

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 15 / 46

Page 16: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

ti

Of course, it’s not that simple

Bytecode verification is program point specific

Multiple Inheritance due to interfaces makes type inference hairy

Arrays are not straightforward to correctly type

Solving a constraint problem is NP-Hard

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 16 / 46

Page 17: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

ti

Example: Multiple Definition and Use Points

c l a s s A ex t end s Object { f ( ){} . . . }c l a s s B ex t end s Object { g ( ){} . . . }

c l a s s Mu l t i e x t end s Object {vo i d hard ( ) {

? x ;i f ( . . . ) {

x = new A ( ) ; x . f ( ) ; }e l s e {

x = new B ( ) ; x . g ( ) ; }x . t o S t r i n g ( ) ;

}

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 17 / 46

Page 18: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

ti

Example: Interfaces

c l a s s Hardes t {IC getC { r e t u r n new C ( ) ; }ID getD { r e t u r n new D( ) ; }

vo i d h a r d e s t ( ) {? oops ;i f ( . . . )

oops = getC ( ) ;e l s e

oops = getD ( ) ;oops . f ( ) ; // IA . foops . g ( ) ; // IB . g

}

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 18 / 46

Page 19: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

ti

Solution Outline

Polynomial run-time multi-stage algorithm

Bypass the complexity by using program transforms to simplify hardcases

Algorithm preserves program semantics (One would hope so)

Algorithm uses two transformations (Stage 2 and 3, respectively)1 Variable splitting at object creation sites2 Insertion of type casts that are guaranteed to succeed at runtime

(Why?)

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 19 / 46

Page 20: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

ti

Step by Step Outline

1 Produce Bare Jimple (Jimple containing only type information fromthe bytecode)

2 Compute DU/UD chains (as seen in class)

3 Split all local variables (one per DU/UD web) (Why?)

4 Run the three-stage type inference algorithm

5 Clean up the code generated by DU/UD splitting using CopyPropagation and Elimination

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 20 / 46

Page 21: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

ti

Example: DU/UD Splitting

s_1 = l_1s_2 = l_2s_1 = s_1 + s_2l_1 = s_1

s_1_0 = l_1_0s_2_0 = l_2_0s_1_1 = s_1_0 + s_2_0l_1_1 = s_1_1

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 21 / 46

Page 22: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Outline

1 Preliminaries

2 Type Inference

3 Three Stage Algorithm

4 Conclusion

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 22 / 46

Page 23: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Algorithm Overview

boolean, byte, char, short and int are all ints

GOAL: Find a static type assignment for each local variable thatsatisfies all of the use constraints

Each stage is run in order. Either it yields a solution, or thealgorithm moves to the next stage

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 23 / 46

Page 24: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Stage 1 Overview

1 Construct directed graph of program constraints

2 Merge connected components in the graph

3 Remove transitive constraints

4 Merge single constraints

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 24 / 46

Page 25: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Stage 1: Building the Constraint Graph

The constraint graph contains the following elements:

hard node: Represents an explicit type

soft node: Represents a type variable

directed edge: Represents a constraint between two nodes.

a← b: b is assignable to a according to Java assignment rules

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 25 / 46

Page 26: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Contraint Graph: Some Examples

a = b T (a)← T (b)

a = b + 3 T (a)← T (b), T (a)← int, int ← T (b)

a = b.equals(c) java.lang .Object ← T (b), java.lang .Object ← T (c),T (a)← int

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 26 / 46

Page 27: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Stage 1: Merging Connected Components

There are three cases for merging connected components

All soft nodes ⇒ naive merging of all soft nodes into a single one

The component has a single hard node ⇒ Merge all soft nodes intothe hard node. (Verify constraints and fail if not satisfied)

More than one hard node in the component ⇒ fail and skip to stage2

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 27 / 46

Page 28: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Stage 1: Removing Transitive Constraints

Transitivity A constraint x ← y is said to be transitive if there existsanother constraint p ← y such that p 6= x and there existsa path from p to x in the directed graph.

We eliminate any such transitive edge regardless of node type, except inthe case of hard-hard constraints. We also take this opportunity to mergeprimitive types.

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46

Page 29: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Stage 1: Merging Single Constraints

Single Parent Constraint A node x has a single parent constraint to y ify ← x and for any p 6= y there is no constraint p ← x

Single Child Constraint A node x has a single child constraint to y ifx ← y and for any p 6= y there is no constraint x ← p

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 29 / 46

Page 30: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Stage 1: Single Constraints Priority

1 Merge single child constraints

2 Merge with least common ancestor

3 Merge single soft parent constraints

4 Merge remaining single parent constraints

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 30 / 46

Page 31: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Stage 2 Overview

1 Apply variable splitting transformations (Only known case: x = new

A())

2 Run Stage 1

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 31 / 46

Page 32: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Stage 2: Applying Variable Splitting

c l a s s A ex t end s Object {}c l a s s B ex t end s Object {}c l a s s Mu l t i e x t end s Object {

vo i d j a v a ( ) {Object y ;i f ( . . . )

y = new A ( ) ;e l s e

y = new B ( ) ;y . t o S t r i n g ( ) ;

}}

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 32 / 46

Page 33: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Stage 2: Variable Splitting (cont)

vo i d th ree_addr ( ) {? y ;i f ( . . . ) {

y = new A ( ) ;y . [ A.< i n i t > ( ) ] ( ) ;

} e l s e {y = new B ( ) ;y . [ B.< i n i t > ( ) ] ( ) ;

}y . t o S t r i n g ( ) ;

}

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 33 / 46

Page 34: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Stage 2: Variable Splitting (cont)

vo i d t h r e e _ a d d r _ s p l i t ( ) {? y , y1 , y2 ;i f ( . . . ) {

y1 = new A ( ) ;y = y1 ;y1 . [ A.< i n i t > ( ) ] ( ) ;

} e l s e {y2 = new B ( ) ;y = y2 ;y2 . [ B.< i n i t > ( ) ] ( ) ;

}y . t o S t r i n g ( ) ;

}

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 34 / 46

Page 35: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Stage 3 Overview

1 Construct constraint graph with only variable definition constraints

2 Ignore use constraints and assume all interfaces inherit fromjava.lang.Object

3 Solve the system using the least common ancestor of classes andinterfaces

4 Add typecasts according to use constraints (Why can we do thissafely?)

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 35 / 46

Page 36: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Handling Arrays

A 7→ B: A is an array of B

Represented in constraint graph with dashed lines

Java property that says: (A[]→ B[])⇔ (A← B) andA← B[]⇔ (A ∈ {Object, Serializable, Cloneable})

Build graph without array constraints

Solve normally

Use that solution to give arrays types

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 36 / 46

Page 37: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

tsa

Inferring Integer Types

Two phase sub-algorithm that infers the proper types

Stage 1 (Fixed Point Computation)

Constraint CollectionMerge connected components (may fail)Merge single relations until fixed point is reached

Stage 2 (Similar, different constraints)

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 37 / 46

Page 38: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

concl

Outline

1 Preliminaries

2 Type Inference

3 Three Stage Algorithm

4 Conclusion

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 38 / 46

Page 39: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

concl

Results

16,492 methods extracted from JDK 1.1 were typed without everresorting to type casting (stage 3)

Out of those 16,492, only 29 required variable splitting (stage 2)

98.8% of methods typed successfully with stage 1

0.2% of methods typed successfully with stage 2

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 39 / 46

Page 40: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

concl

Short Questions

What is the reason we are doing DU/UD web splitting beforerunning the type inference algorithm?

What enables us to typecast to the appropriate types without beingworried about runtime casting exceptions in stage 3 of thealgorithm?

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 40 / 46

Page 41: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

concl

Short Questions

What is the reason we are doing DU/UD web splitting beforerunning the type inference algorithm?

Stack positions and local variables in the bytecode can store differenttype of values at different program points. Splitting along DU/UDensures that this won’t cause an issue for typing.

What enables us to typecast to the appropriate types without beingworried about runtime casting exceptions in stage 3 of thealgorithm?

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 41 / 46

Page 42: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

concl

Short Questions

What is the reason we are doing DU/UD web splitting beforerunning the type inference algorithm?

Stack positions and local variables in the bytecode can store differenttype of values at different program points. Splitting along DU/UDensures that this won’t cause an issue for typing.

What enables us to typecast to the appropriate types without beingworried about runtime casting exceptions in stage 3 of thealgorithm?

We are working under the assumption that the bytecode passedverification. Because of that, we have a guarantee that the types wewill be casting to are valid subtypes at runtime.

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 42 / 46

Page 43: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

concl

Assignment Question

You are tasked with typing the following method, given the classhierarchy. (Next slide)

Show your type constraint list

Show your type constraint graph

Show your final graph reduction

Show the typed output that the algorithm yielded.

Note: You do not have to run Integer Type Inference

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 43 / 46

Page 44: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

concl

Assignment Question

p u b l i c c l a s s Topping {? i d ;? p r i c e ;i n t g e t I d ( ) { r e t u r n i d ; }i n t g e t P r i c e ( ) { r e t u r n p r i c e ; }Topping ( i n t id , i n t p r i c e ) {

t h i s . i d = i d ;t h i s . p r i c e = p r i c e ;

}}

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 44 / 46

Page 45: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

concl

Assignment Question

p u b l i c c l a s s P i z za {? t1 , t2 , t3 ;P i z za ( Topping a , Topping b , Topping c ) {

t1 = a ; t2 = b ; t3 = c ;}i n t buy ( ) {r e t u r n t1 . g e t P r i c e ( )

+ t2 . g e t P r i c e ( )+ t3 . g e t P r i c e ( ) ;

}}

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 45 / 46

Page 46: Bytecode Decompilation: Typing - McGill Universityhendren/621/2012/alexandre-jimpletyping.pdf · Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 28 / 46. Preliminaries

Preliminaries Type Inference Three Stage Algorithm Conclusion

concl

Bibliography

Benjamin Bellamy, Magdalen College, and Trinity Term.Efficient local type inference 3rd year project report.

Etienne M. Gagnon, Laurie J. Hendren, and Guillaume Marceau.Efficient inference of static types for java bytecode, 2000.

Etienne M. Gagnon et al Bytecode Decompilation: Typing 03/29/2012 46 / 46