Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C....

19
Analysis of Low-Level Analysis of Low-Level Code Using Cooperating Code Using Cooperating Decompilers Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006

Transcript of Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C....

Page 1: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Analysis of Low-Level Code Analysis of Low-Level Code Using Cooperating Using Cooperating

DecompilersDecompilers

Bor-Yuh Evan ChangMatthew Harren

George C. Necula

University of California, Berkeley

SAS 2006

Page 2: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

Why Analyze Low-Level Code?Why Analyze Low-Level Code?

• Analyze what is executed– Avoids issues with compiler bugs and

underspecified source-language semantics

• Analyze when source is unavailable– Applications in mobile code safety assurance

asmcode

asmcode

source

code

source

code

compiler

Code ConsumerCode Producer

analyzer

Page 3: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

MotivationMotivation

class C extends P {void m() { … }

}

P p = new P();P c = … ? new C() :

p;…c.m();

rrc := mm[rrsp]

if (rrc = 0) Lexc

rr1 := mm[rrc]

rr1 := mm[rr1+28]

rrsp := rrsp - 4

mm[rrsp] := mm[rrsp+4]

icall [rr1]

Analyzers for low-level code are more difficult and tedious to buildExample: Java Type Analysis

hrrc : P, … i

hrrc : nonnull P, …i

hrr1 : disp(P), …i

hrr1 : meth(P,28), …i

hmm[rrsp] : nonnull P, …i

Type analysis intermixed with low-level reasoning• e.g., args on stack

UnsoundUnsound::Dependencies must be carefully tracked

Equal

Page 4: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

ObservationsObservations

• Handling low-level implementation details is common to many low-level analyzers– call stack (provides “local variables”)– register allocation– dependencies across instructions

• Ad-hoc modularization attempts failed

GoalGoal:: Design a modular framework that makes it easy to write high-level analyses– for different architectures– for the output of different compilers

Page 5: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

ObservationsObservations

• Intermediate languages abstract varying levels of detail– E.g., source language hides compiler

details– Provides well-specified interface between

(de)compiler phases

ProposalProposal:: Structure low-level analyses as small, incremental decompilation phases

Page 6: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

Basic IdeaBasic Idea static void f(C c) { c.m(); }

f:

rrc := mm[rrsp+12]

if (rrc = 0) Lexc

rr1 := mm[rrc]

rr1 := mm[rr1+28]

rrsp := rrsp - 4

mm[rrsp] :=

mm[rrsp+16]

icall [rr1]

f:

rrc := mm[rrsp+12]

if (rrc = 0) Lexc

rr1 := mm[rrc]

rr1 := mm[rr1+28]

rrsp := rrsp - 4

mm[rrsp] :=

mm[rrsp+16]

icall [rr1]

f(ttc):

rc := ttc

if (rrc = 0) Lexc

rr1 := mm[rrc]

rr1 := mm[rr1+28]

tt1 := ttc

icall [rr1](tt1)

f(ttc):

rc := ttc

if (rrc = 0) Lexc

rr1 := mm[rrc]

rr1 := mm[rr1+28]

tt1 := ttc

icall [rr1](tt1)

f(c):

if (c = 0) Lexc

icall [mm[mm[c]+28]] (c)

f(c):

if (c = 0) Lexc

icall [mm[mm[c]+28]] (c)

f(obj c):

if (c = 0) Lexc

invokevirtual [c, 28] ()

f(obj c):

if (c = 0) Lexc

invokevirtual [c, 28] ()

f(C c):

if (c = 0) Lexc

c.m()

Locals SymEval OO JavaTypesyour analyzer

Symbolic Evaluation

Dynamic Dispatch

Local Variables

Page 7: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

DifficultiesDifficulties

• Unidirectional communication + analysis is insufficient

icall [rr1](tt1) icall [mm[mm[c]+28]] (c)

invokevirtual [c, 28] ()

c.m()

icall [rr1]

Locals SymEval OO JavaTypes

Needs to knowNeeds to know “call with 1 arg”

Can answerCan answer “method call to C.m()” because knowsknows about the class table but needsneeds previous instrs for analysis

Page 8: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

Overview of the FrameworkOverview of the Framework

• To enable bidirectional communication– cannot decompile in stages– decompile at all levels simultaneously

• Each decompiler analyzes the preceding instructions before decompiling the next

• “Pipeline” “Reduced product”

Page 9: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

f: …

rc := m[rsp+12]

if (rc = 0) Lexc

r1 := m[rc]

r1 := m[r1+28]

rsp := rsp - 4

m[rsp] :=

m[rsp+16]

icall [rr1]

f(tc):

rc := tc

if (rc = 0) Lexc

r1 := m[rc]

r1 := m[r1+28]

t1 := tc

icall [rr1](tt1)

f(c):

if (c = 0) Lexc

icall [mm[mm[c]+28]] (c)

f(obj c):

if (c = 0) Lexc

invokevirtual [c, 28] ()

f(C c):

if (c = 0) Lexc

c.m()

QueriesQueries static void f(C c) { c.m(); }

Locals SymEval OO JavaTypes

rsp : sp(-12)r1 = m[m[c]+28]]t1 = c

c : nonnull obj c : C

isFunc(r1

)?isFunc(m[m[c]

+28]])?isMethod(c,2

8)?

Yes,0

args

Yes,1 arg

Yes,1 arg

Page 10: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

Communication SummaryCommunication Summary

• Low-to-High (Primary)– Decompilation Stream

• High-to-Low– Queries

• Initiated by lower-level• Questions decompiled

– Reinterpretations• Initiated by higher-level• Answers decompiled

Page 11: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

Soundness of Decompiler PipelinesSoundness of Decompiler Pipelines

• Operational semantics for each language• Safety encoded as “not getting stuck”

IL

IH

l l0

h h0

Page 12: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

Soundness of Decompiler PipelinesSoundness of Decompiler Pipelines

• Safe at high-level implies safe at low-level

a a0

IL

IH

l l0

h h0

l 2 (a)

Page 13: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

ExperimentsExperiments

• Evaluate flexibility

• Evaluate modularity

• Evaluate applicability of existing source-level tools

Page 14: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

FlexibilityFlexibility

• For the output of gcc, gcj, and compilers for Cool (a “mini-Java”)

• To implement JavaTypes (no exns, interfaces)– 3-4 hours, 500 lines

Locals SymEval JavaTypes

CoolTypes

OO

CTypes

MIPS

MIPS

x86x86C

Cool

Java

40-50%

55-60%

Page 15: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

Modularization

646 588

1060

3047497

660

358

644

0

500

1000

1500

2000

2500

3000

3500

4000

Decompilers MonolithicAnalyzer

Line

s of

Cod

e

ILs

Locals

SymEval

OO

CoolTypes

Table Parsing

Modularity: Decompilers vs. Modularity: Decompilers vs. MonolithicMonolithic

• Compared on 10,339 tests– 49 tests, 211 compilers– 182 disagreements

(pass/fail)

• Decompilers (new)– 1 incompletenesses– 0 soundness bugs

• Monolithic (used heavily)– 5 incompletenesses– 3 soundness bugs– mishandling of run-time

library functions• Calls uniform - Locals• SSA - SymEval

Type Checking Compiled CoolType Checking Compiled Cool

• Modules approx. same size

Page 16: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

Applicability of Source-Level ToolsApplicability of Source-Level Tools

• Experimental Setup– Compiled 3

previously reported benchmarks for BLAST (B) and Cqual (Q)

– Verified presence (or absence) of bugs as in the original

Test Case

Code SizeDecom

pVerification

C(kloc)

x86(kloc) (s)

Orig(s)

Decomp

(s)

qpmouse.c (B) 8.0 1.9 0.74 0.34 1.26

tlan.c (B) 10.9 10.7 8.1641.20 94.30

gamma_dma.c(Q)

11.2 5.2 2.44 0.97 1.05

• Limitations– Source-level tools

needed types– Recovered types

from debugging information

• On optimized code (qpmouse.c)– gcc –O2 except

“merge constants”– reads byte in middle

of word-sized field

Page 17: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

Conclusion: Lessons LearnedConclusion: Lessons Learned

• Need types for existing source analyses– E.g., BLAST on untyped C does not work

• Very useful low-level modules– Locals: recover statically-scoped local variables– SymEval: recover normalized expr. trees, SSA

• Defining output IL guides analysis impl.– IL specifies what analysis should guarantee– Leads to small and well-isolated modules

Page 18: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Thank You!Thank You!

http://www.cs.berkeley.edu/~bec

Page 19: Analysis of Low-Level Code Using Cooperating Decompilers Bor-Yuh Evan Chang Matthew Harren George C. Necula University of California, Berkeley SAS 2006.

Chang, Harren, Necula - Analysis of Low-Level Code Using Cooperating Decompilers

High-to-Low Communication High-to-Low Communication ExamplesExamples• Queries

– “Does e point to a function/method?”– “Does e point to an object?”

• Reinterpretations– Exceptional successor for try-catch blocks