Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan...

19
Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC 2005

Transcript of Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan...

Page 1: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Automatic Extraction of Function Bodies from Software Binaries

Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee

(Northwestern University)ASP-DAC 2005

Page 2: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Outline

• Authors• Motivation• Function Extraction• Experimental Results• References

Page 3: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Gaurav Mittal• 2009 Gaurav Mittal, David Zaretsky, Prithviraj Banerjee: Streaming implementation of a

sequential decompression algorithm on an FPGA. FPGA 2009: 283 • Lei Gao, David Zaretsky, Gaurav Mittal, Dan Schonfeld, Prith Banerjee: A software pipelining

algorithm in high-level synthesis for FPGA architectures. ISQED 2009: 297-302 • 2007 David Zaretsky, Gaurav Mittal, Robert P. Dick, Prith Banerjee: Balanced Scheduling and

Operation Chaining in High-Level Synthesis for FPGA Designs. ISQED 2007: 595-601 • Gaurav Mittal, David Zaretsky, Xiaoyong Tang, Prithviraj Banerjee: An Overview of a Compiler

for Mapping Software Binaries to Hardware. IEEE Trans. VLSI Syst. 15(11): 1177-1190 (2007) • 2006 Gaurav Mittal, Sushrutha Locharam, Sreela Sasi, Glenn R. Shaffer, Ajith K. Kumar: An

Efficient Video Enhancement Method Using LA*B* Analysis. AVSS 2006: 66 • Gaurav Mittal, Sreela Sasi: Robust Preprocessing Algorithm for Face Recognition. CRV 2006:

57 • David Zaretsky, Gaurav Mittal, Robert P. Dick, Prith Banerjee: Dynamic Template Generation

for Resource Sharing in Control and Data Flow Graphs. VLSI Design 2006: 465-468 • 2005 Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee: Automatic extraction

of function bodies from software binaries. ASP-DAC 2005: 928-931 • David Zaretsky, Gaurav Mittal, Robert P. Dick, Prith Banerjee: Generation of Control and Data

Flow Graphs from Scheduled and Pipelined Assembly Code. LCPC 2005: 76-90

Page 4: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

David Zaretsky• 2009 Gaurav Mittal, David Zaretsky, Prithviraj Banerjee: Streaming implementation of a

sequential decompression algorithm on an FPGA. FPGA 2009: 283 • Lei Gao, David Zaretsky, Gaurav Mittal, Dan Schonfeld, Prith Banerjee: A software pipelining

algorithm in high-level synthesis for FPGA architectures. ISQED 2009: 297-302 • 2007 David Zaretsky, Gaurav Mittal, Robert P. Dick, Prith Banerjee: Balanced Scheduling and

Operation Chaining in High-Level Synthesis for FPGA Designs. ISQED 2007: 595-601 • Gaurav Mittal, David Zaretsky, Xiaoyong Tang, Prithviraj Banerjee: An Overview of a Compiler

for Mapping Software Binaries to Hardware. IEEE Trans. VLSI Syst. 15(11): 1177-1190 (2007) • 2006 David Zaretsky, Gaurav Mittal, Robert P. Dick, Prith Banerjee: Dynamic Template

Generation for Resource Sharing in Control and Data Flow Graphs. VLSI Design 2006: 465-468 • 2005 Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee: Automatic extraction of

function bodies from software binaries. ASP-DAC 2005: 928-931 • David Zaretsky, Gaurav Mittal, Robert P. Dick, Prith Banerjee: Generation of Control and Data

Flow Graphs from Scheduled and Pipelined Assembly Code. LCPC 2005: 76-90

Page 5: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Gokhan Memik• 2009 Yan Pan, Joonho Kong, Serkan Ozdemir, Gokhan Memik, Sung Woo Chung: Selective

wordline voltage boosting for caches to manage yield under process variations. DAC 2009: 57-62

• Yan Pan, Prabhat Kumar, John Kim, Gokhan Memik, Yu Zhang, Alok N. Choudhary: Firefly: illuminating future network-on-chip with nanophotonics. ISCA 2009: 429-440

• Bin Lin, Arindam Mallik, Peter A. Dinda, Gokhan Memik, Robert P. Dick: User- and process-driven dynamic voltage and frequency scaling. ISPASS 2009: 11-22

• Yu Zhang, Berkin Özisikyilmaz, Gokhan Memik, John Kim, Alok N. Choudhary: Analyzing the impact of on-chip network traffic on program phases for CMPs. ISPASS 2009: 218-226

• Alex Shye, Benjamin Scholbrock, Gokhan Memik: Into the wild: studying real user activity patterns to guide power optimizations for mobile architectures. MICRO 2009: 168-178

• 2008 Arindam Mallik, Jack Cosgrove, Robert P. Dick, Gokhan Memik, Peter A. Dinda: PICSEL: measuring user-perceived performance to control dynamic frequency scaling. ASPLOS 2008: 70-79

• Alex Shye, Yan Pan, Benjamin Scholbrock, J. Scott Miller, Gokhan Memik, Peter A. Dinda, Robert P. Dick: Power to the people: Leveraging human physiological traits to control microprocessor frequency. MICRO 2008: 188-199

• Abhishek Das, Berkin Özisikyilmaz, Serkan Ozdemir, Gokhan Memik, Joseph Zambreno, Alok N. Choudhary: Evaluating the effects of cache redundancy on profit. MICRO 2008: 388-398

Page 6: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Prith Banerjee• 2010 Prith Banerjee: An Intelligent IT Infrastructure for the Future. ICDCN 2010: 1 • 2009 Nikolaos D. Liveris, Hai Zhou, Prithviraj Banerjee: Complete-k-distinguishability for

retiming and resynthesis equivalence checking without restricting synthesis. ASP-DAC 2009: 636-641

• Prith Banerjee, Chandrakant D. Patel, Cullen Bash, Parthasarathy Ranganathan: Sustainable data centers: enabled by supply and demand side management. DAC 2009: 884-887

• Gaurav Mittal, David Zaretsky, Prithviraj Banerjee: Streaming implementation of a sequential decompression algorithm on an FPGA. FPGA 2009: 283

• Prith Banerjee: An intelligent IT infrastructure for the future. HPCA 2009: 3-4 • Lei Gao, David Zaretsky, Gaurav Mittal, Dan Schonfeld, Prith Banerjee: A software pipelining

algorithm in high-level synthesis for FPGA architectures. ISQED 2009: 297-302 • 2008 Nikolaos D. Liveris, Hai Zhou, Prithviraj Banerjee: A dynamic-programming algorithm for

reducing the energy consumption of pipelined System-Level streaming applications. ASP-DAC 2008: 42-48

• Nikolaos D. Liveris, Hai Zhou, Robert P. Dick, Prithviraj Banerjee: State space abstraction for parameterized self-stabilizing embedded systems. EMSOFT 2008: 11-20

Page 7: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Asia and South Pacific Design Automation Conference 2010

• Deadline for Paper Submission: 5 PM JST (GMT+9) July 19 (Mon), 2010

• Deadline for University LSI Design Contest: 5 PM JST (GMT+9) July 19 (Mon), 2010

• Notification of acceptance: September 24 (Fri), 2010

• Deadline for Final Version: 5 PM JST (GMT+9) November 15 (Mon.), 2010

Page 8: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

MotivationAddress Op Operands0x05A0 CallAddEx:0x05A0 STW B3,*SP--[0x4]0x05A4 NOP 20x05A8 STW B4,*+SP[0x2]0x05AC STW A4,*+SP[0x1]0x05B0 NOP 20x05B4 MV A4,B40x05B8 B B40x05BC LDW *+SP[0x2],A40x05C0 MVK 0x05cc,B30x05C4 MVKH 0x0000,B30x05C8 NOP 20x05CC LDW *+

+SP[0x4],B30x05D0 NOP 40x05D4 B B30x05D8 NOP 5

0x05DC add_ex:0x05DC SUB SP,0x8,SP0x05E0 STW A4,*+SP[0x1]0x05E4 NOP 20x05E8 B B30x05EC ADD SP,0x8,SP0x05F0 NOP 40x05F4 main:0x05F4 STW B3,*SP--[0x2]0x05F8 NOP 20x05FC ZERO B40x0600 CMPGT 10,B4,B00x0604 [!B0]B L20x0608 NOP 40x060C STW B4,*+SP[0x1]

0x0610 L1:0x0610 B CallAddEx0x0614 MVK 0x05dc,A40x0618 MVK 0x0628,B30x061C MVKH 0x0000,A40x0620 MVKH 0x0000,B30x0624 NOP0x0628 RL1:0x0628 LDW *+SP[0x1],B40x062C NOP 40x0630 ADD B4,0x1,B40x0634 CMPGT 10,B4,B00x0638 [B0] B L10x063C NOP 40x0640 STW B4,*+SP[0x1]0x0644 L2:0x0644 ZERO A40x0648 LDW *++SP[0x2],B30x064C NOP 40x0650 B B3

For example, for the TI chip series the caller prologue needs the return address to be moved to register ‘B3’ before the branch is executed. On the other hand, the callee epilogue consists of a jump to register ‘B3’. However, it might not be possible to determine these destinations in all cases. For example, it is not clear by simple inspection if the branch at instruction 0x05B8 is to function ‘add_ex’. This would require knowledge of the input parameter at compile time and may not be available for complicated real world applications. Thus, if the destination of the call to ‘add_ex’ is not recognized, ‘add_ex’ will not be recognized as a function using caller-prologues alone.

Page 9: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Function Extraction• Their main contribution in this paper is an algorithm to extract function

bodies from the binaries, where the function boundaries are not clear. • They use the procedure calling convention to recognize caller prologues and callee epilogues.

Initially, these prologues and epilogues are assumed to determine the function bodies. Following steps perform refinement on this initial function list to extract the final list of function bodies. During this process, the heuristic needs to maintain information on the identified functions. It also needs a function mapping instruction addresses to labels and one mapping labels to instruction pointers. This information is maintained as three separate hash structures to reduce processing time. Finally, a function call graph is generated. This is used to identify procedures that can be moved to hardware. Ongoing work on hardware/software partitioning will try to automate the selection process. However, such techniques are out of the scope of this paper. They measure the success of their algorithm by the fraction of functions discovered.

• First Pass: Caller Prologues and Callee Epilogues.• Second Pass: Merging Function Bodies.• Third Pass: Disjoint Set formation.

Page 10: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

First Pass: Caller Prologues and Callee Epilogues

• This pass traverses the instruction list from top to bottom while searching for caller prologues.

• The purpose of this first pass is to simplify the identification process in the subsequent passes.

Page 11: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

the call to ‘CallAddEx’ at 0x0610 and the return address are easily recognized, as are the call at 0x05B8 and the returnaddress 0x05CC. In the latter, however, the destination in register ‘B4’ is not clear. While identifying prologues, the last calculated value within B3 is used if one is not found in the current block. In pipelined code, it is possible for other branches to break up the caller prologue to prevent the call from being made in some circumstances[15][17]. For the TI code, the return address in B3 is compared to the projected return address; the branch is identified as a prologue only on a match.

Address Op Operands0x05A0 CallAddEx:0x05A0 STW B3,*SP--[0x4]0x05A4 NOP 20x05A8 STW B4,*+SP[0x2]0x05AC STW A4,*+SP[0x1]0x05B0 NOP 20x05B4 MV A4,B40x05B8 B B40x05BC LDW *+SP[0x2],A40x05C0 MVK 0x05cc,B30x05C4 MVKH 0x0000,B30x05C8 NOP 20x05CC LDW *+

+SP[0x4],B30x05D0 NOP 40x05D4 B B30x05D8 NOP 5

0x05DC add_ex:0x05DC SUB SP,0x8,SP0x05E0 STW A4,*+SP[0x1]0x05E4 NOP 20x05E8 B B30x05EC ADD SP,0x8,SP0x05F0 NOP 40x05F4 main:0x05F4 STW B3,*SP--[0x2]0x05F8 NOP 20x05FC ZERO B40x0600 CMPGT 10,B4,B00x0604 [!B0]B L20x0608 NOP 40x060C STW B4,*+SP[0x1]

0x0610 L1:0x0610 B CallAddEx0x0614 MVK 0x05dc,A40x0618 MVK 0x0628,B30x061C MVKH 0x0000,A40x0620 MVKH 0x0000,B30x0624 NOP0x0628 RL1:0x0628 LDW *+SP[0x1],B40x062C NOP 40x0630 ADD B4,0x1,B40x0634 CMPGT 10,B4,B00x0638 [B0] B L10x063C NOP 40x0640 STW B4,*+SP[0x1]0x0644 L2:0x0644 ZERO A40x0648 LDW *++SP[0x2],B30x064C NOP 40x0650 B B3

Page 12: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Second Pass: Merging Function Bodies

• The second pass looks for return calls within function bodies. When none are found, it merges a copy of the adjacent function’s body with the function that was being processed.

• This assumes that the function bodies are not scattered as fragments within the binary. Callee epilogues are used to recognize functions returns. The function returns are changed to artificial jumps to a new label, a control sink, attached to the end of the instruction list as discussed earlier. This aids the interval analysis in the third pass. The generated instruction lists are used by the next pass, which also extracts erroneously merged function bodies.

Page 13: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

In the example code from Figure 1, there is no call to ‘main’ and ‘add_ex’ is not recognized as a call destination. Thus thesecond pass merges ‘add_ex’ and ‘main’ with ‘CallAddEx’. Hence, even though callee epilogues are recognized in pass two,there is no need to split the instruction list.

Address Op Operands0x05A0 CallAddEx:0x05A0 STW B3,*SP--[0x4]0x05A4 NOP 20x05A8 STW B4,*+SP[0x2]0x05AC STW A4,*+SP[0x1]0x05B0 NOP 20x05B4 MV A4,B40x05B8 B B40x05BC LDW *+SP[0x2],A40x05C0 MVK 0x05cc,B30x05C4 MVKH 0x0000,B30x05C8 NOP 20x05CC LDW *+

+SP[0x4],B30x05D0 NOP 40x05D4 B B30x05D8 NOP 5

0x05DC add_ex:0x05DC SUB SP,0x8,SP0x05E0 STW A4,*+SP[0x1]0x05E4 NOP 20x05E8 B B30x05EC ADD SP,0x8,SP0x05F0 NOP 40x05F4 main:0x05F4 STW B3,*SP--[0x2]0x05F8 NOP 20x05FC ZERO B40x0600 CMPGT 10,B4,B00x0604 [!B0]B L20x0608 NOP 40x060C STW B4,*+SP[0x1]

0x0610 L1:0x0610 B CallAddEx0x0614 MVK 0x05dc,A40x0618 MVK 0x0628,B30x061C MVKH 0x0000,A40x0620 MVKH 0x0000,B30x0624 NOP0x0628 RL1:0x0628 LDW *+SP[0x1],B40x062C NOP 40x0630 ADD B4,0x1,B40x0634 CMPGT 10,B4,B00x0638 [B0] B L10x063C NOP 40x0640 STW B4,*+SP[0x1]0x0644 L2:0x0644 ZERO A40x0648 LDW *++SP[0x2],B30x064C NOP 40x0650 B B3

Page 14: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Third Pass: Disjoint Set formation• This part of the heuristic works on the individual function

bodies one by one and tries to weed out function bodies not recognized in pass one. Particularly, the third pass traverses the list of function bodies generated by pass two and analyzes the basic blocks in each function body to recognize any possible errors from the first pass.

• To perform this task, first a control and data flow graph is generated from the instruction list. Information from the first pass is used to generate some of the missing edges. The call instructions are connected to the destinations of their corresponding returns.

• After this step, perform induction and interval analysis. Induction analysis attempts to identify the values contained in destination registers.

Page 15: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Third Pass: Disjoint Set formationFigure 2.1 shows the CDFG for the code from Figure 1 and its interval graph. This graph is generated at the third stage. Block BBLK_8 contains the control sink. The basic block 1 (BBLK_1) and basic block 2 (BBLK_2) are disjoint because of the call in line 0x05D4 of Figure 1. After interval analysis, the third pass forms the final interval graph depicted in Figure 2.2. Block BBLK_3 in Figure 2.2 contains the instruction (NOP) bearing the control sink. It denotes the leaf node. If it is removed, the other three blocks, viz. BBLK_0, BBLK_1, BBLK_2, form disjoint sets representing the functions ‘CallAddEx’, ‘add_ex’, and ‘main’, respectively.

Page 16: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Experimental Results

Table 1 shows the extraction results. Columns 2 and 3 show the time (seconds) taken for call graph generation and function extraction, respectively. Columns 4-6 and 9 show the number of functions recognized by each stage of their algorithm and those by calling conventions alone[9] (pass 1 – pass 2). Function calls identified by pass 1 but not be assigned instruction bodies and were deleted (“Null FNs”). This was due to the incomplete nature of the selected code fragments. The total number of functions found (total FNs) is ‘Pass 3 + Null FNs’. The rightmost column of Table 1 presents a comparison of their algorithm to function extraction using only procedure calling conventions.

Page 17: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Related Work• There has been some related work in the field of binary translation.

CodeMorphing on the Crusoe processor [12], the Dynamo system [13], and BOA [14] are good examples. Cifuentes et al [3][4] have presented a detailed analysis of different strategies. Our approach is unique in its choice of a reconfigurable target platform. It introduces flexibility and several new research questions.

• Cifuentes et al [11] have reported algorithms for identifying function calls from assembly programs using predefined procedure call interfaces. An approach that is not applicable to all DSP binaries. They also discuss the use of use-def analysis for identifying function arguments. Baily and Davidson [9] introduced a formal model to specify procedure-calling conventions. Mike Van Emmerik used idioms to identify library functions [10]. The calling conventions and idioms help in identifying caller/callee prologues and epilogues. In hand-written and/or optimized assembly it is possible that the code comprising these conventions has been moved. If functions pointers are passed as arguments to other functions, it is very likely to miss the called functions completely. They also do not prove sufficient to identify complete function bodies.

Page 18: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

References• V. Bala et al, “Dynamo: A Transparent Dynamic Optimization System,” Proc. ACM

SIGPLAN Conf. On Programming Language Design and Implementation (PLDI), June 2000.

• M. Gschwind et al, “Dynamic and Transparent Binary Translation,” IEEE Computer Magazine, Vol. 33, No. 3, pp. 54-59, March 2000.

• GMittal et al, "Automatic Translation of Software Binaries onto FPGAs,” Proc. Design Automation Conference, San Diego, Jun. 2004.

• D C Zaretsky et al, “Evaluation of Scheduling and Allocation Algorithms While Mapping Software Assembly onto FPGAs,” Proc. Great Lakes Symp. On VLSI, Apr 2004, Boston, MA, USA.

• D C Zaretsky et al, “Overview of the FREEDOM Compiler for Mapping DSP Software to FPGAs”, IEEE Symposium on Field-Programmable Custom Computing Machines, April 21, 2004.

Page 19: Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC.

Thank you!