Paradyn/Condor Week Madison, WI March 12-14, 2001
Efficient x86 Instrumentation:Dynamic Rewriting and Function
Relocation
Itai [email protected]
Computer Science DepartmentUniversity of Wisconsin
1210 W. Dayton St.Madison, WI 53706-1685
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 2
Introduction
Dynamic Instrumentation:• Insert instrumentation into application in
execution• Used by Paradyn to gather performance
data• Paradyn instrumentation is inserted for
three types of points– function entry, exit, and call
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 3
Paradyn
Executable CodeExecutable Code
Instrumentation Points
foo () {
call <bar>
}
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 4
Instrumentation Points
EntryEntry
CallCall
ExitExit
Paradyn
Executable CodeExecutable Code
foo () {
call <bar>
}
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 5
EntryEntry
CallCall
ExitExit
startTimer()
stopTimer()
counter++
Executable CodeExecutable CodeInstrumentationInstrumentation
Paradyn Instrumentation Points
foo () {
call <bar>
}
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 6
Transfer from function to instrumentation code as quickly as possible
Goal
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 7
To switch execution from a function to its instrumentation code:– Overwrite instructions in function with a control
transfer instruction.– Equivalent of overwritten instructions are copied to
the code patch area. – On the x86, Paradyn uses, by default, a 5- byte
jump to transfer control the instrumentation code.•5-byte jump range is whole address space
– If a 5-byte instruction won’t fit, we use a 1-byte traps (int3 instruction).
Control Transfer
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 8
• Dynamically rewrite function in place• Different techniques for different types
of instrumentation points
Inserting Control Transfer Instructions
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 9
Jumps and TrapsInstrument Entry Point
Case 1
push mov sub
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 10
Jumps and TrapsInstrument Entry Point
Case 1
jmp <instrumentation>
push mov sub
Enough room to replace Enough room to replace instruction with a jumpinstruction with a jump
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 11
Jumps and TrapsInstrument Entry Point
Case 2
push mov jmp
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 12
Jumps and TrapsInstrument Entry Point
Case 2
push mov
jmp <instrumentation>
jmp
jmp
Inserting a jump instruction interferes withInserting a jump instruction interferes withthe target of the backwards jumpthe target of the backwards jump
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 13
Jumps and TrapsInstrument Entry Point
Case 2
push mov jmp
int3 mov jmp
Must use a trap instruction Must use a trap instruction to get to instrumentationto get to instrumentation
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 14
Jumps and Traps
call <Foo>
Instrument Call Point
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 15
Jumps and Traps
jmp <instrumentation>
Instrument Call Point
Enough room Enough room to replace instruction to replace instruction with a jumpwith a jump
call <Foo>
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 16
Instrument Exit PointCase 1
Jumps and Traps
mov leave ret
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 17
Jumps and Traps
jmp <instrumentation>
Instrument Exit PointCase 1
Back up far enough to replaceBack up far enough to replaceinstructions with a jumpinstructions with a jump
mov leave ret
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 18
Jumps and TrapsInstrument Exit Point
Case 2
call <Foo> leave ret
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 19
Jumps and Traps
call jmp <instrumentation>
Instrument Exit PointCase 2
Jump interferes with Jump interferes with the preceding callthe preceding call
call <Foo> leave ret
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 20
Jumps and TrapsInstrument Exit Point
Case 2a
call <Foo> leave ret
Beginning of nextBeginning of nextfunction function (4-byte boundary)(4-byte boundary)
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 21
Jumps and TrapsInstrument Exit Point
Case 2a
call <Foo> leave ret
Compiler padsCompiler padswith “bonus bytes”with “bonus bytes”
Beginning of nextBeginning of nextfunction function (4-byte boundary)(4-byte boundary)
? ? ?
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 22
Jumps and Traps
jmp <instrumentation>
Instrument Exit PointCase 2a
call <Foo>
Replace instructionsReplace instructionswith a jumpwith a jump
call <Foo> leave ret
Compiler padsCompiler padswith “bonus bytes”with “bonus bytes”
Beginning of nextBeginning of nextfunction function (4-byte boundary)(4-byte boundary)
? ? ?
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 23
Jumps and TrapsInstrument Exit Point
Case 2b
call <Foo> leave ret ?
Not enough Not enough ““bonus bytes” bonus bytes” to overwrite to overwrite with a jump with a jump (if any)(if any)
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 24
Jumps and TrapsInstrument Exit Point
Case 2b
Overwrite Overwrite return with return with a trapa trap
call <Foo> leave ret
call <Foo> leave int3 ?
Not enough Not enough ““bonus bytes” bonus bytes” to overwrite to overwrite with a jumpwith a jump(if any) (if any)
?
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 25
Jumps and TrapsExtra slot
push mov sub mov
No jumps to first ten bytes of functionNo jumps to first ten bytes of function
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 26
Jumps and TrapsExtra slot
push mov sub
jmp <instrumentation> mov
No jumps to first ten bytes of functionNo jumps to first ten bytes of function
Enough space to Enough space to overwrite entry overwrite entry with a jumpwith a jump
mov
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 27
Jumps and TrapsExtra slot
push mov sub
jmp <instrumentation> jmp <instrumentation>
Enough space to Enough space to overwrite entry overwrite entry with a jumpwith a jump
No jumps to first ten bytes of functionNo jumps to first ten bytes of function
Make 2-byte jump to “extraMake 2-byte jump to “extraslot”, overwrite “extra slot” slot”, overwrite “extra slot” with jump to instrumentationwith jump to instrumentation
mov
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 28
Traps on x86• Generate an exception that is caught by
either the application (Solaris, Linux) or the paradyn daemon (Windows NT).
• Address of trap instruction is used to calculate which instrumentation code to execute.
Control Transfer
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 29
Trap handling is slow:• On Solaris 2.6 jumps are over 1000 times faster than traps.• On Linux 2.2 jumps are over 200 times faster than traps
Traps Limit Instrumentation:• can’t insert as much or at as fine a granularity
Trap handling logic is difficult:• Susceptible to bugs
• Difficult to understand and maintain
Problem
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 30
Solution
Rewrite functions that do not have enough room for jumps, into functions that do have enough room for jumps.– Rewrite the function, on-the-fly:
combines dynamic instrumentation, binary rewriting.
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 31
DynamicRewriting
Dynamic Rewriting
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 32
DynamicRewriting
overwriteexisting
instructions
Dynamic Rewriting
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 33
DynamicRewriting
overwriteexisting
instructions
expand instrumentation
points
Dynamic Rewriting
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 34
DynamicRewriting
overwriteexisting
instructions
Relocate Function
Dynamic Rewriting
expand instrumentation
points
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 35
In Paradyn we rewrite a function:– only if the function contains an instrumentation
point that would require using a trap to instrument
– the first time a request to instrument the function is made
– even if the instrumentation to be inserted is not for a point that requires using a jump •e.g. the exit needs a trap, the entry can use
a jump, request is to instrument the entry
Function Rewriting and Relocation
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 36
– all instrumentation points that cannot use a jump are expanded.
Function Rewriting and Relocation(continued)
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 37
Rewriting A Function
EntryEntry
push mov
call <Bar>
CallCall ExitExit
ret
call <Foo>
CallCall
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 38
push mov
EntryEntry
nop
Insert nop at entryInsert nop at entry
call <Bar>
CallCall ExitExit
ret
call <Foo>
Rewriting A Function
CallCall
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 39
EntryEntry
Insert nop at entryInsert nop at entry
call <Bar>
CallCall ExitExit
ret
call <Foo> jmp < instrumentation >
Rewriting A Function
CallCall
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 40
call <Bar>
CallCall
nop nop nop nop
Insert nops at exitInsert nops at exit
ExitExit
ret
call <Foo>
EntryEntry
Insert nop at entryInsert nop at entry
jmp < instrumentation >
Rewriting A Function
CallCall
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 41
call <Bar>
CallCall
Insert nops at exitInsert nops at exit
ExitExit
call <Foo> jmp < instrumentation >
jmp < instrumentation >
Rewriting A Function
EntryEntry
Insert nop at entryInsert nop at entry
CallCall
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 42
Rewriting A Function
EntryEntry
push mov
call <Bar>
CallCall ExitExit
ret
call <Foo>
Original FunctionOriginal Function CallCall
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 43
Overwrite entry of original Overwrite entry of original function with jump to function with jump to rewritten functionrewritten function
call <Foo>
CallCall ExitExit
ret
Rewriting A Function
EntryEntry
call <Foo> jmp < rewritten function>
Original FunctionOriginal Function
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 44
Update Jumps and Calls• PC-relative jump and call instructions:
– with destinations outside the function will have incorrect displacements
– some jumps to locations inside the function will have incorrect displacements
• 2-byte jumps:– have range of 128 bytes forward, 127
bytes backwards– if target address is no longer in range,
replace 2-byte instruction with 5-byte instruction that has further reach
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 45
Status
Dynamic rewriting and function relocation is operational in Paradyn release 3.2 for x86 (Solaris, Linux, Windows NT).
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 46
Current Limitations
We do not relocate a function if: – the application is executing within the
function we want to instrument– it has a jump table
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 47
Jumps vs. Traps
Trap JumpSolaris
Linux
37.6
8.3
.03
.04
Trap handling:Trap handling:Average time to get to instrumentation and back Average time to get to instrumentation and back
• time in microsecondstime in microseconds
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 48
• Relocating functions that are performance bottlenecks, leads to greatest speedup
• More instrumentation can be inserted since perturbation to system is minimized.
• In Paradyn, ratio of speedup depends on type of metric (e.g. CPU time, number of procedure calls)
Jumps vs. Traps
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 49
Some Resultsbubba (circuit layout)bubba (circuit layout)• instrumented 9 functions for CPU
– all required trap for exit point– 5 relocated functions
•called 400 thousand times •consumed 20% of CPU.
• 23 seconds to execute using relocation • 42 seconds to execute without relocation
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 50
Some Resultsfspx (2-D heat transfer simulation)fspx (2-D heat transfer simulation)• 4 of 46 functions required traps
– all for exit points
• instrumented __atan for CPU – required trap for exit– called 107 million times – consumed 25% of CPU.
• 7.5 minutes to execute using relocation • 115 minutes to execute without
relocation
Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Page 51
Conclusions
Dynamic rewriting and function relocation:
• Used by Paradyn to allow using jumps, instead of traps, when profiling applications, to improve performance.
• Crucial for large scale and fine-grained instrumentation.
Top Related