Using Dyninst for Program Binary Analysis and Instrumentation
description
Transcript of Using Dyninst for Program Binary Analysis and Instrumentation
Paradyn Project
Paradyn / Dyninst WeekMadison, Wisconsin
April 29 - May 1, 2013
Using Dyninst for Program Binary Analysis and Instrumentation
Emily Jacobson
No Source Code — No ProblemWith Dyninst we can:o Find (stripped) code
o in program binarieso in live processes
o Analyze code o functionso control-flow-graphso loop, dominator analyses
o Instrument codeo statically (rewrite binary)o dynamically (instrument
live process)Using Dyninst for Analysis and Instrumentation
Libraries
Executablesa.out
Live ProcessExecutableLibrary 1
Library N
…
lib.so
prog.exe
lib.dll
2
Static Rewriting Dynamic Instrumentation
oAmortize parsing and instrumentation time.
oExecute instrumentation at a particular time (oneTimeCode).
oPotential to generate more efficient modified binaries.
o Insert and remove instrumentation at run time.
o3rd party response to runtime events
o1st party response to runtime events
3Using Dyninst for Analysis and Instrumentation
Choice of Static vs. Dynamic Instrumentation
• Find memory leaks• Add printfs to malloc, free• Stackwalk malloc calls that are not
freed
4
Example Dyninst Program
Using Dyninst for Analysis and Instrumentation
ChaosPro ver 3.1
Dyninst Components
Using Dyninst for Analysis and Instrumentation
Binary Code
Code Generator
Instrumenter
Stack Walker
(Stackwalker-API)
Process Controller
(ProcControl-API)
Symbol Table Parser
(SymtabAPI) Code
Parser(ParsingAPIInstruction
Decoder(Instruction
-API)
InstrumentationRequests
Stack WalkRequests
Analysis Requests
5
Process Control• Several supported
OS’s
Using Dyninst for Analysis and Instrumentation
Linux
Windows
Process Controller
6
Process Control• Several supported
OS’s• Broad functionality• Attach/create process• Monitor process
status changes• Callbacks for
fork/exec/exit• Mutatee operations:
malloc, load library, inferior RPC
• Uses debugger interface Using Dyninst for Analysis and Instrumentation
Analyst Program(Mutator)
Dyninst Library
Monitored Process(Mutatee)
Dyninst Runtime Lib
Process Controller
Debugger Interface
7
...
...
Dyninst’s Process Interface
Using Dyninst for Analysis and Instrumentation
http://paradyn.org/html/manuals.html
8
Example: Create a ChaosPro.exe Process
BPatch bpatch;
static void exitCallback(BPatch_thread*,BPatch_exitType) { printf(“About to exit\n”);}
int main(int argc, char *argv[]) { if (argc < 2) { fprintf(stderr, "Usage: %s prog_filename\n", argv[0]); return 1; }
BPatch_process *proc = bpatch.processCreate( argv[1] , argv+1 );
bpatch.registerExitCallback( exitCallback );
proc->continueExecution(); while ( ! proc->isTerminated() ) bpatch.waitForStatusChange(); return 0;}
> mutator.exe C:\Chaos\ChaosPro.exe
9Using Dyninst for Analysis and Instrumentation
Unified Abstractions
Using Dyninst for Analysis and Instrumentation 10
BPatch_processBPatch_binaryEdit
a.out
libc.so
Live Process
BPatch_addressSpace
a.out
libc.so
Add/remove instrumentation, lookups by
address, allocate
variables in mutatee
Process state,
threads, one-time
instrument-ation
write file
Symbol Table Parsing
Using Dyninst for Analysis and Instrumentation
Mutatee Code Generator
Instrumenter
Stack Walker
Process Controller
Symbol Table Parser
Code Parser
Instruction Decoder
chaospro.exe
Runtime Lib
msvcrt.dll
Where are malloc, free?Mutator
Dyninst Library
11
Symbol Table Parsing
Using Dyninst for Analysis and Instrumentation
Where are malloc, free?
Mutatee
Symbol Table Parser
PE
ELF
XCOFF
Program Headers
Shared Object
Dependencies
TypeInformation
ExceptionInformation
Symbols
SymbolVersions
SectionHeaders
SectionData
DynamicSegment
Information
Relocations
Local variableInformation
Line NumberInformation
Symbol Addressfunc1
func2 0x0804cd1d
variable10x0804cc840x0804cd00
Size100
4500
Runtime Lib
12
chaospro.exe
msvcrt.dll
int main(int argc, char *argv[]){ ...
BPatch_image* image = proc->getImage();
BPatch_module* libc = image->findModule( “msvcrt” );
vector< BPatch_function* > * funcs = libc->findFunction( “malloc” );
BPatch_function * bp_malloc = (*funcs)[0];
Address start = bp_malloc->getBaseAddr(); Address size = bp_malloc->getSize();
printf( “malloc: [%x %x]\n", start , start + size ); ...}
Example: Find malloc
Using Dyninst for Analysis and Instrumentation
Mutatee
Mutator
Dyninst Library
Runtime Lib
13
chaospro.exe
msvcrt.dll
Decoding and Parsing of Binary Code
Using Dyninst for Analysis and Instrumentation
Mutatee Code Generator
Instrumenter
Stack Walker
Code Parser
Instruction Decoder
Mutator
Dyninst Library
Runtime Lib
Process Controller
Symbol Table Parser
14
chaospro.exe
msvcrt.dll
Get parameters, return values for malloc, free
Instruction Decoding
Using Dyninst for Analysis and Instrumentation
Instruction Decoder
Abstract Syntax Treemov eax -> [ebx * 4 +
ecx]
deref
add
mult
mov
eax
[ebx * 4 + ecx]
ecx
ebx 4
IA32
AMD64
POWER
Mutatee8b 04 99 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73
1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07
57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 0c 85 a5 94 2b 20 fd 5b
15
Parsing
• Identify basic blocks, functions• Builds control-flow graph•Operate on stripped code, but use symbol information opportunistically
Using Dyninst for Analysis and Instrumentation
Instruction DecoderMutatee
8b 04 99 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73
1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07
57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 0c 85 a5 94 2b 20 fd 5b
Code Parser
mov eax -> [ebx * 4 + ecx]
deref
add
mult ecx
ebx 4
mov eax [ebx * 4 + ecx]
Parse-time analyses:
16
IA32
AMD64
POWER
Binary Code ParsingTask: instrument malloc at its entry
and exit points, instrument free at its entry point
Subtask: find malloc and parse it
Using Dyninst for Analysis and Instrumentation
Code Parser
Instruction Decoder
Process Controller
Symbol Table Parser
chaospro.exe
Mutatee84 04 99 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73
1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07
57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 0c 85 a5 94 2b 20 fd 5b
msvcrt.dllmalloc 77C2C407free 77C2C21Batoi 77C1BE7Bstrcpy 77C46030memmove 77C472B0
mov eax -> [ebx * 4 + ecx]
deref
add
mult ecx
ebx 4
mov eax [ebx * 4 + ecx]
17
Control Flow Traversal Parsing• Function symbols may
be sparse• Executables must
provide only one function address
• Libraries provide symbols for exported functions
• Parsing finds additional functions by following call edges
Using Dyninst for Analysis and Instrumentation
_start [80483b0 80483fa] _init [8048354 804836b] _fini [8048580 804859c] main [8048480 80484cf]targ3d4 [80483d4 80483fa]targ400 [8048400 804843e]targ440 [8048440 8048468]
18
Control Flow Graph
Using Dyninst for Analysis and Instrumentation
E
C
EE
C R
RRR
19
Address pointAddr;BPatch_procedureLocation type;enum { BPatch_entry,
BPatch_exit,BPatch_subroutine,BPatch_address }
• Graph elements:• BPatch_function• BPatch_basicBlock• BPatch_edge
• Instrumentation points:• BPatch_point
Example: Find malloc’s Exit Points
vector< BPatch_function * > * funcs;• funcs = bp_image->getProcedures();• funcs = bp_image->findFunction(“malloc”);
Using Dyninst for Analysis and Instrumentation
E
C
EE
C R
RRR
Mutatee
chaospro.exe
msvcrt.dll
Parsing is triggered automatically as needed
malloc
kernel32.dll
20
Example: Find malloc’s Exit Points
vector< BPatch_function * > * funcs;• funcs = bp_image->findFunction(“malloc”);
• funcs = libc_mod->findFunction(“malloc”);
Using Dyninst for Analysis and Instrumentation
E
C
EE
C R
RRR
Mutatee
chaospro.exe
msvcrt.dll
Parsing is triggered automatically as needed
malloc
kernel32.dll
21
BPatch_function * bp_malloc = (*funcs)[0];vector< BPatch_point* > * points = BPatch_entry bp_malloc->findPoints BPatch_subroutine ;
BPatch_exit
Example: Find malloc’s Exit Points
Using Dyninst for Analysis and Instrumentation
E
C
EE
C
R
R
RR
Mutatee
malloc
22
chaospro.exe
msvcrt.dll
kernel32.dll
Instrumentation (at last!)
Using Dyninst for Analysis and Instrumentation
Code Generator
Instrumenter
Stack Walker
Code Parser
Instruction Decoder
Mutatee
chaospro.exe
Mutator
Dyninst Library
Runtime Lib
msvcrt.dll
Process Controller
Symbol Table Parser
23
Instrument-ation Points
Abstract Syntax TreeSnippet
Specifying Instrumentation Requests
Using Dyninst for Analysis and Instrumentation
InstrumentationRequests
Code Generator
Instrumenter
R
R
what
where
24
BPatch_Snippet Subclasses• BPatch_sequence( vector < BPatch_Snippet*> items )
• BPatch_variableExpr() int value• BPatch_constExpr char* value void* value• BPatch_ifExpr( BPatch_boolExpr condition, BPatch_Snippet then_clause, BPatch_Snippet else_clause )• BPatch_funcCallExpr( BPatch_function * func, vector< BPatch_Snippet* > args )• BPatch_paramExpr( int param_number )
• BPatch_retExpr()
Using Dyninst for Analysis and Instrumentation 25
BPatch_Snippet Classes
Using Dyninst for Analysis and Instrumentation 26
Example: Forming printf Snippet
Using Dyninst for Analysis and Instrumentation
printf( “free(%x)\n” , arg0 );
BPatch_funcCallExpr
BPatch_paramExpr arg0(0)
Bpatch_function bp_printf
Efree(ptr)
vector
“free(%x)\n”
BPatch_constExpr
BPatch_funcCallExpr ( BPatch_function * func, vector< BPatch_Snippet* > args )
27
Example: Instrument free w/ call to printf
Using Dyninst for Analysis and Instrumentation
BPatch_function * bp_free;vector< BPatch_point * > entryPoints;...BPatch_constExpr arg0 ( “free(%x)\n” );BPatch_paramExpr arg1 (0);
vector< BPatch_snippet * > printf_args;printf_args.push_back( & arg0 );printf_args.push_back( & arg1 );
BPatch_funcCallExpr callPrintf( *bp_printf, printfArgs );
bpatch.beginInsertionSet();for ( int idx =0; idx < entryPoints.size(); idx++ ) proc->insertSnippet( callPrintf,
*entryPoints[idx] );bpatch.finalizeInsertionSet();
BPatch_funcCallExpr
BPatch_paramExpr arg0(0)
bp_printf vector
“free(%x)\n”
BPatch_constExpr
Efree(ptr)
28
Using Variables
• Find / create variablebp_image->findVariable(“global1”);bp_proc->malloc(bp_image->findType(“int”));
• Initialization instrumentation• e.g., assignment at entry point of main
• Manipulation instrumentation• e.g., arithmetic assignment expression
• Gather / print out values• e.g., through callback instrumentation
Using Dyninst for Analysis and Instrumentation 29
malloc instrumentation: save argument in a variable
Example: Instrumenting malloc
Using Dyninst for Analysis and Instrumentation
void * malloc ( size_t size ){ MALLOC_ARG = size; ... if (MALLOC_ARG > 1000)
printf(“%x = malloc(%x)\n”,retnValue,MALLOC_ARG);
}
E
R
R
malloc
BPatch_assign
BPatch_arithExpr
MALLOC_ARG BPatch_constExpr
1
30
vector
Example: Instrumenting malloc
Using Dyninst for Analysis and Instrumentation
BPatch_ifExpr
Bpatch_boolExpr
E
R
R
malloc
BPatch_constExpr(100)
MALLOC_ARG
BPatch_gt
BPatch_funcCallExpr
BPatch_functionbp_printf “%x = malloc(.)\n”
BPatch_retExpr retnValue
BPatch_constExpr
31
void * malloc ( size_t size ){ MALLOC_ARG = size; ... if (MALLOC_ARG > 100)
printf(“%x = malloc(%x)\n”,retnValue,MALLOC_ARG);
}
Generating the Instrumentation Code
Using Dyninst for Analysis and Instrumentation
Code Generator
Instrumenter
BPatch_funcCallExpr
BPatch_paramExpr arg0(0)
bp_printf vector
“free(%x)\n”
BPatch_constExpr
mov eax -> [ebx * 4 + ecx]
deref
add
mult ecx
ebx 4
mov eax [ebx * 4 + ecx]
Instrumentation snippet
Code at the instrumented point
IA32
AMD64
POWER
32
Stack Walking
Using Dyninst for Analysis and Instrumentation
Code Generator
Instrumenter
Stack Walker
Code Parser
Instruction Decoder
Mutatee
chaospro.exe
Mutator
Dyninst Library
Runtime Lib
msvcrt.dll
Process Controller
Symbol Table Parser
33
Example: Stack Walk of malloc Call• Callback
triggers stackwalk• BPatch_thread:
: getCallStack(…)
Using Dyninst for Analysis and Instrumentation
Mutatee
chaospro.exe
Mutator
Dyninst Library
Runtime Lib
msvcrt.dll
Choose instrumentation point• the exit points of
malloc Insert callback
instrumentation• use stopThreadExpr
snippet
Stack Walker
E
R
R
malloc
34
Implementation Session
Code Coverage• Create a mutator that counts
function invocations• See description of the lab at
http://www.paradyn.org/tutorial/
Using Dyninst for Analysis and Instrumentation 35