Introducción a GData Colombia GTUG. Julio 2009 David Cifuentes Eforcers S.A.
Intraprocedural Static Slicing of Binary Executables Cristina Cifuentes Antoine Fraboulet University...
-
Upload
sandra-poole -
Category
Documents
-
view
219 -
download
0
Transcript of Intraprocedural Static Slicing of Binary Executables Cristina Cifuentes Antoine Fraboulet University...
Intraprocedural Static Slicing of Binary
ExecutablesCristina CifuentesAntoine Fraboulet
University of QueenslandInternational Conference on Software
Maintenance(ICSM ’97)
1st ICSM
Since its start in 1983, ICSM (International Conference on Software Maintenance) has grown and developed into an international forum for software maintenance researchers and practitioners to examine key issues facing the software maintenance community.
LINK: http://conferences.computer.org/icsm/
Cristina Cifuentes Senior Staff Engineer for Sun Microsystems
Laboratories working in the Static Program Analysis group
ACM SIGPLAN Executive Committee, Treasurer and member-at-large, 2007-2012
Chair of the IEEE Reverse Engineering and Reengineering Committee, Technical Council on Software Engineering, 2002-2003
Technical Report User-Input Dependence Analysis via Graph Reachability By: Bernard
Scholz, Chenyi Zhang and Cristina Cifuentes Report Number:TR-2008-171 Mar 31, 2008
Partitioning of Code for a Massively Parallel Machine By: Michael Ball, Cristina Cifuentes and Deepankar Bairagi Report Number:TR-2004-134 Nov 1, 2004
A Transformational Approach to Binary Translation of Delayed Branches with Applications to SPARC® and PA-RISC Instructions Sets By: Cristina Cifuentes and Norman Ramsey Report Number:TR-2002-104 Jan 1, 2002
Experience in the Design, Implementation and Use of a Retargetable Static Binary Translation Framework By: Cristina Cifuentes, Mike Van Emmerik, Brian T.Lewis and Norman Ramsey Report Number:TR-2002-105 Jan 1, 2002
Walkabout-A Retargetable Dynamic Binary Translation Framework By: Brian T. Lewis, David Ung and Cristina Cifuentes Report Number:TR-2002-106 Jan 1, 2002
Antoine Fraboulet Associate Professor at INSA Lyon and a member of the
CITI Lab and of the INRIA Amazones group
Member of the program committee of t2pWSN 07 workshop
Member of the program committee of MSN 07 conference
Member of the program committee of MSN 06 conference
Member of the commitee of the TSI Journal
Organization committee of the RECAP 2006 workshop
Paper Assembly to High-Level Language Translation Cifuentes Cristina; Simon Doug;
Fraboulet Antoine In International Conference on Software Maintenance (11/1998) 228-237
Loop Alignment for Memory Accesses Optimization Fraboulet Antoine ; Huard Guillaume; Mignotte AnneIn International Symposium on System synthesis (ISSS) (10/1999) 71-77
Memory Optimization of Data Flow Applications at the Codesign Level Fraboulet Antoine ; Just-Meunier Laurence; Mignotte AnneIn Sophia Antipolis Forum on MicroElectronics (SAME) (10/2000) 16-21
Loop fusion for memory space optimization Fraboulet Antoine ; Godary Karen ; Mignotte AnneIn International Symposium on System Synthesis (10/2001) 95 – 100
Source Code Loop Transformations for Memory Hierarchy Optimizations Fraboulet Antoine ; Mignotte AnneIn International Conference on Parallel Architectures and Compilation Techniques. Workshop on MEmory access DEcoupled Architectures (MEDEA) (2001) 6
Recommend
Ákos Kiss, Judit Jász, Gábor Lehotai, Tibor Gyimóthy. Interprocedural Static Slicing of Binary Executables. 3rd IEEE International Workshop on Source Code Analysis and Manipulation (SCAM 2003), 26-27 September 2003, Amsterdam, The Netherlands 2003
Ákos Kiss, Judit Jász, Tibor Gyimóthy. Using Dynamic Information in the Interprocedural Static Slicing of Binary Executables. Software Quality Journal 2005, Volume 13
3rd MOTIVATION
Primary initial goal of slicing was to assist with debugging
Programmers naturally form program slices,mentally, when they debug and understand programs
A program slice consists of the parts of a program that potentially affect the values computed at some point of interest (slicing criteria)
4th BackgroundTools for executable programs
Decode the information stored in the binary file
Decode the machine instructions and translate them to an assembly representation or an equivalent intermediate representation
Problem?
Separation of data and code
Huge amount of codes when debugging
cannot determine what paths to traverse next when faced with an indexed jump instruction, or an indirect call or jump instruction on the value of a register
Representation of Binaries
The general format of a binary executable varies widely based on the binary-file format used by the OS
When running a program, the binary-file format is decoded by the operating system’s loader, which loads the program into memory and passes control to the program via its entry point
WorkflowStep1(binaryassembly or IL):
dasm not only decodes the binary file and its machine instructions, but also stores the instructions in terms of low-level instructions(icode) and control flow graphs for each procedure
Step2(assembly or IL high-level language)
Dateflow analysis is performed in the CFGs to recover high-level information
Dasm:the disassembler of the dcc decompiler, which implements a decoder of the EXE format and creates an intermediate representation of the program.
Icode:resemble assembly instructions with property of only performing one operation at time
5th BASIC KNOWLEDGE
Classification:
Backward Slicing
Forward Slicing
Or
Static Slicing
Dynamic Slicing
Program Slice
Backward slice consists of all statements that the computation at the slicing criteria may depend on
public class SimpleExample { static int add(int a, int b){
return(a+b); }public static void main(final String[] arg){
int i = 1; int sum = 0; while (i < 11) { sum = add(sum, i); i = add(i, 1); } System.out.println("sum = " + sum); System.out.println("i = " + i);
}}
SlicingCriterion
SlicingCriterion
Forward slice includes all statements depending on the slicing criterion
public class SimpleExample { static int add(int a, int b){
return(a+b); }public static void main(final String[] arg){
int i = 1; int sum = 0; while (i < 11) { sum = add(sum, i); i = add(i, 1); } System.out.println("sum = " + sum); System.out.println("i = " + i);
}}
SlicingCriterion
SlicingCriterion
Static slice: all possible executions of the program are taken into account.
Dynamic slice is constructed with respect to only one execution of the program (iteration number is taken into account)
Slicing Method
The most popular approaches are based on dependency graphs (non-executable slices)
Q: How does a specific statement affect the others?
Construct a Program Dependence Graph
A Combination of Data Dependency Graph and Control Dependency Graph
Identify Data Dependency
b depends on a
Identify Control Dependency
Both assignments depend on if statement
How to Determine a Slice
1. a:=32. b:=a
1. if a=true then2. b:=13. else4. c:=0
movl $0x0,0xfffffff8(%ebp)
cmpl $0x0,0xfffffff8(%ebp)
jne 0x8048475 <main+49>
movl $0x1,0xfffffffc(%ebp)
jmp 0x8048485 <main+65>
movl $0x7,0xfffffffc(%ebp)
mov 0xfffffffc(%ebp),%eax
mov %eax,0xfffffff8(%ebp)
Data Dependence GraphControl Dependence Graph
mov $0x5,%eax
sub 0xfffffffc(%ebp),%eax
mov %eax,0xfffffff4(%ebp)
mov 0xfffffffc(%ebp),%eax
sub $0x5,%eax
mov %eax,0xfffffff4(%ebp)
movl $0x0,0xfffffff8(%ebp)
cmpl $0x0,0xfffffff8(%ebp)
jne 0x8048475 <main+49>
movl $0x1,0xfffffffc(%ebp)
jmp 0x8048485 <main+65>
movl $0x7,0xfffffffc(%ebp)
mov 0xfffffffc(%ebp),%eax
mov %eax,0xfffffff8(%ebp)
mov $0x5,%eax
sub 0xfffffffc(%ebp),%eax
mov %eax,0xfffffff4(%ebp)
mov 0xfffffffc(%ebp),%eax
sub $0x5,%eax
mov %eax,0xfffffff4(%ebp)
Control Dependency Graph A node V is post-dominated by a node W if every directed
path from V to Stop contains W
An instruction Y is control dependent on another instruction X iff
There exists a directed path P from X to Y with another instruction Z in P, post-dominated by Y
X is not post-dominated by Y
CFGCFG
Post Dominator
Tree
Post Dominator
Tree
AA
DD
BB CC
STOPSTOP
DD
AA BB CC
6th ALGORITHM
Determine the slice using the conventional algorithm
Add unconditional jumps and returns to the slice
Fix jump labels
BASIC COMPONENTS
Lvalue
An lvalue is expressed as a pair of a base plus an offset. The base address can be either the starting address for the storage for a variable (local or global) or any pointer expression.
Step1 Start point
Using basic blocks as the node granularity
Control dependency
Control dependencies are based on the PDT of a procedure
Control flow analysis of the CFG for the purposes of recovering the underlying control structures of a graph and their nesting level
Data dependency
Data dependencies are presented as ud-chains at the procedure level
Ud-chains are generated for each register and condition code used in an instructon
Idiom analysis & dead-condition code elimination
Step2 & Step 3 Lexical successor tree is used to represent the next high-level statement at
the same nesting level of a given statement
An unconditional jump and a return instruction introduce a break in the flow of control of the program so that they are added to the final slice if they are in the path to the instructions in the slice
Fix target labels by checking all the jumps that belong to the slice