The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working...

25
The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineeri ng On page(s): 280 - 291 6-8 Oct. 1999 Atlanta, GA, USA 1999
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working...

Page 1: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

The Design of a Resourceable and Retargetable Binary Translator

Cristina CifuentesSixth Working Conference on Reverse Engineering

On page(s): 280 - 291 6-8 Oct. 1999

Atlanta, GA, USA 1999

Page 2: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Outline

IntroductionGoals and ObjectivesResourceable and retargetable binary t

ranslationCore translator Implementation and preliminary resultDiscussion

Page 3: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

IntroductionProblems of binary translation

– It is not always possible to find all the code in a binary program.

– Static analysis cannot always find the targets of jumps to addresses computed at run time.

Previous work– Digital’s VEST – OpenVMS and MIPS to Alpha– FX!32 – x86 windows NT to Alpha

Page 4: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Goals and objectivesReuse is difficult because binary translators

are highly machine-dependent. Another impediment to reuse is that many d

etails of existing translators remain proprietary and unpublished.

The goal of the UQBT project is to develop a binary translator that will be constructed from well-specified components.

Page 5: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Resourceable and retargetable binary translation

Page 6: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Binary-file decoder The initial state of the MS processor that would a

pply when about to run this binary on a native MS processor, including at least the initial value of the program counter,

A list of potential entry points for procedures, including at least the initial program counter as an entry point,

The contents of the program's code and data segments, by the address those contents would occupy in a running MS executable

A list of procedures that are to be linked with the binary dynamically, and the names of the libraries containing them.

Page 7: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Mapping managerWhich maps locations in code space, loca

tions in data space, and locations referring to registers or other processor state.

Page 8: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Target memory manager

To allocate locations in the target machine's storage space, e.g., to store translated code.

Page 9: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Boundary manager

Tracks the boundary between translated and untranslated code and handles flow of control across the boundary.

Page 10: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Core translatorTranslates groups of machine instructions

Page 11: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Binary-le encoderSpecify the contents of the MT address sp

ace at the start of execution,Establish the state of the MT processor at t

he start of execution,Write an executable le in the MT native for

mat.Set up dynamic linking information for tran

slated or native libraries.

Page 12: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Intermediate Representations

Low-level representation– A single instruction corresponds to a register-transfer

list or RTL.– Machine-dependent

High-level representation– HRTL is a higher-level language that abstracts away

from the machine-dependent details of procedure calls, intraprocedural control flow, and relational expressions.

– Higher-level HRTL form is more like a compiler's inter-mediate code, because it uses high-level abstractions of control flow.

Page 13: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Core translator

Page 14: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Core translator (cont.)

Decoding binary code to MS-RTLs– SLED, SSL

Translating MS-RTLs up to HRTL– CTL, DCTL, PAL

Manipulating HRTL– General-purpose optimizers

Translating HRTL down to MT-RTLs– VPO

Page 15: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

SLED SLED (Specification Language for

Encoding and Decoding) describes the binary representation of each instruction.

# SLED specs for immediate call instructionfields of instruction (32)inst 0:31 op 30:31 disp30 0:29 rd 25:29 op2 22:24 imm22 0:21 a 29:29 cond 25:28 disp22 0:21 op3 19:24rs1 14:18 i 13:13 asi 5:12 rs2 0:4 simm13 0:12 opf 5:13 fd 25:29 cd 25:29 fs1 14:18 fs2 0:4patterns[ TABLE_F2 CALL TABLE_F3 TABLE_F4 ] is op = {0 to 3}constructorscall__ addr { addr = L + 4 * disp30! } is L: CALL & disp30

Page 16: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

SSL

semantic mapper is table-driven from an SSL (Semantic Specification Language) specification, which associates an RTL with each instruction.

# SSL specs for immediate call instructioncall__ disp30 *32* r[15] := %pc

*32* %pc := %npc*32* %npc := r[15] + (4 * disp30);

Page 17: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

CTL CTL, Control Transfer Language, describes whi

ch machine instructions map into the high-level control transfer instructions of HRTL, namely, conditional and unconditional jumps, calls, and returns.

call mappingcall__ addr & address = addr |JMPL dispA(rs1, i), %o7 & address = rs1+i |JMPL absoluteA(i), %o7 & address = i |JMPL indirectA(rs1), %o7 & address = rs1 |JMPL indexA (rs1, rs2), %o7 & address = rs1+rs2.

Page 18: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

PAL Procedural Abstraction Language, describes ho

w a stack frame is set up when a procedure is invoked and how values are accessed on the stack frame, according to the machine's calling convention.

PAL allows for the description of valid prologues and epilogues for callers and callees, the locations that can be used for passing parameters, the location(s) used for returning values from a function, and the location block used by locals.

Page 19: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

VPO

Very Portable Optimizer, and with C optimizing compilers (both, gcc and cc).

VPO emits assembly code for the target machine.

Page 20: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

ImplementationUQBT is written in C++ and compiles usin

g gcc on Solaris and Linux systems. It uses SLED, SSL and CTL descriptions f

or SPARC and Pentium, and PAL descriptions for SPARC and Pentium Unix System.

Page 21: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

The test programs are:– Fibo(40), which calculates the bonacci of 40 and has 19 line

s of C code, – Sieve(3000), which calculates the rst 3000 primes and has 1

3 lines of C code, and – Mbanner(500K), a modied version of banner(1), which loops

500,000 times to display argv[1] (\ELF" in this case) and has 135 lines of C code.

SPARC results were obtained on an Ultra-SPARC II, 250MHz machine with 320Mb RAM running Solaris 2.6.

Pentium results were obtained on a Pentium MMX, 250 MHz machine with 128Mb RAM running Solaris 2.6.

Preliminary Results

Page 22: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Preliminary Results (cont.)

Page 23: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Preliminary Results (cont.)

Page 24: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Discussion to understand what knowledge of instruction rep

resentation, of instruction semantics, of calling conventions, and of binary-le formats is needed to perform binary translation,

to formalize that knowledge in appropriate description languages,

to derive components from the descriptions, to understand how to implement machine-depen

dent analyses on the RTL and HRTL representations, and

to understand which analyses can be made machine-independent, and how.

Page 25: The Design of a Resourceable and Retargetable Binary Translator Cristina Cifuentes Sixth Working Conference on Reverse Engineering On page(s): 280 - 291.

Effort

Number of Lines of Code for SPARC and Pentium Specs.