[IEEE 2008 Third International Conference on Risks and Security of Internet and Systems( CRiSIS) -...

5
Third International Conference on Risks and Security of Internet and Systems: CRiSIS’2008 978-1-4244-3309-4/08/$25.00 ©2008 IEEE VisAA: Visual Analyzer for Assembler Philippe Andouard Serma Technologies 30 avenue Gustave Eiffel 33608 Pessac, France [email protected] Olivier Ly Bordeaux 1 University, LaBRI, 351 cours de la libération 33405 Talence, France [email protected] Davy Rouillard Serma Technologies 30 avenue Gustave Eiffel 33608 Pessac, France [email protected] ABSTRACT Reading and understanding the structure of assembly code is often a tedious and difficult task. It becomes much more difficult when exact timing analysis on control flow paths is required to detect timing attacks. We describe our semi- automated tool VisAA used for visualization of control flow information and timing analysis of execution paths to detect portions of code vulnerable to timing attacks on 8-bit AVR microchip assembly code. Our system provides a great aid by saving much human effort in unravelling and analyzing assembly code. Keywords Security, assembly code, timing attacks, visualization 1. INTRODUCTION This paper deals with software security. Our goal is to improve the software analysis process by providing semi- automatic tools dedicated to security evaluation and audits. More precisely, our evaluation takes place in the content of the Common Criteria [1], a smart card evaluation scheme. During the evaluation process, the security expert with the Serma TechnologiesItsef 1 examines the software and tries to find security breaches against known attacks using their own database of vulnerabilities with respect to smart cards. Nowadays, this task is essentially done by hand and we want to automate the search of vulnerability as much as possible. Vulnerabilities can appear at several levels of the software design. At high level, one has to examine the complete- ness and consistency of security policies against identified threats. However, the validation of the high level design is not sufficient. Lot of vulnerabilities can appear at lower lev- els of software design [2]. For instance vulnerabilities against stack buffer overflow occur at the implementation level. We actually are interested in even lower levels of the software development. Indeed, the recent advent of side channel at- tacks [3, 4] shows that the security of software, in particular of embedded software, is also based on the lowest level of the implementation: the assembly code. In this article, we focus on timing attacks [3, 5]. Such attacks consist in retrieving information from the time of execution. Typically, to show that given software is not sensitive to such attacks, the expert must show that execution time does not 1 Information Technology Security Evaluation Facility depend on secret information. Our goal is to provide a semi- automatic tool to measure execution time from the assembly code. Let us note that such a tool cannot be fully automatic since the problem is undecidable. Indeed, for instance, one cannot compute the number of iterations of a loop in general. Moreover, we focus our attention on Atmel microchips [6] that do not use instruction pipelining or caches. 1.1 Related work Our tool statically analyze the execution time of an assembly code and provide features for visualizing and understanding assembly code whereas this last point is not new. Software visualization tools are used for educational purpose [7] to help students understand how a compiler generates assem- bly code. They are also used in code simulation [8, 9], where a user can execute a code step-by-step and see the content of a register at a precise moment just like a debugger. In addition to provide an overall understanding of the code, visualization systems can be used in the domain of com- puter security to aid in deobfuscating or finding back doors in assembly code [10]. While all these tools permit to under- stand the structure of an assembly code, none of them was designed to find security flaws in source code by examining the required computation time of specific execution path. 1.2 Our contributions Our approach is to provide a framework to aid unravelling assembly code and detect portions of code vulnerable to tim- ing attacks. The main components of the analysis tool can be summarized as follows: Understanding assembly programs: we create a graph- ical view of the assembly program by exhibiting the appropriate call graph and control flow graph. Analyzing assembly programs in order to detect loops and remove unreachable code. A regular expression evaluator is provided to aid the user evaluating the precise number of clock cycles on a specific control flow path. The system was implemented in Objective Caml [11] since it is a programming language that integrates solid tools for lexing and parsing source code and it is ideal to manipulate recursive data structures. To visualize the call graph and the control flow graph we use Graphviz [12] that is a very suitable tool to represent structural information as diagrams

Transcript of [IEEE 2008 Third International Conference on Risks and Security of Internet and Systems( CRiSIS) -...

Page 1: [IEEE 2008 Third International Conference on Risks and Security of Internet and Systems( CRiSIS) - Tozeur, Tunisia (2008.10.28-2008.10.30)] 2008 Third International Conference on Risks

Third International Conference on Risks and Security of Internet and Systems: CRiSIS’2008

978-1-4244-3309-4/08/$25.00 ©2008 IEEE

VisAA: Visual Analyzer for Assembler

Philippe AndouardSerma Technologies

30 avenue Gustave Eiffel33608 Pessac, France

[email protected]

Olivier LyBordeaux 1 University, LaBRI,

351 cours de la libération33405 Talence, France

[email protected]

Davy RouillardSerma Technologies

30 avenue Gustave Eiffel33608 Pessac, France

[email protected]

ABSTRACTReading and understanding the structure of assembly codeis often a tedious and difficult task. It becomes much moredifficult when exact timing analysis on control flow pathsis required to detect timing attacks. We describe our semi-automated tool VisAA used for visualization of control flowinformation and timing analysis of execution paths to detectportions of code vulnerable to timing attacks on 8-bit AVR

microchip assembly code. Our system provides a great aidby saving much human effort in unravelling and analyzingassembly code.

KeywordsSecurity, assembly code, timing attacks, visualization

1. INTRODUCTIONThis paper deals with software security. Our goal is toimprove the software analysis process by providing semi-automatic tools dedicated to security evaluation and audits.More precisely, our evaluation takes place in the content ofthe Common Criteria [1], a smart card evaluation scheme.During the evaluation process, the security expert with theSerma Technologies’ Itsef 1 examines the software andtries to find security breaches against known attacks usingtheir own database of vulnerabilities with respect to smartcards. Nowadays, this task is essentially done by hand andwe want to automate the search of vulnerability as much aspossible.

Vulnerabilities can appear at several levels of the softwaredesign. At high level, one has to examine the complete-ness and consistency of security policies against identifiedthreats. However, the validation of the high level design isnot sufficient. Lot of vulnerabilities can appear at lower lev-els of software design [2]. For instance vulnerabilities againststack buffer overflow occur at the implementation level. Weactually are interested in even lower levels of the softwaredevelopment. Indeed, the recent advent of side channel at-tacks [3, 4] shows that the security of software, in particularof embedded software, is also based on the lowest level ofthe implementation: the assembly code.

In this article, we focus on timing attacks [3, 5]. Such attacksconsist in retrieving information from the time of execution.Typically, to show that given software is not sensitive to suchattacks, the expert must show that execution time does not

1Information Technology Security Evaluation Facility

depend on secret information. Our goal is to provide a semi-automatic tool to measure execution time from the assemblycode. Let us note that such a tool cannot be fully automaticsince the problem is undecidable. Indeed, for instance, onecannot compute the number of iterations of a loop in general.Moreover, we focus our attention on Atmel microchips [6]that do not use instruction pipelining or caches.

1.1 Related workOur tool statically analyze the execution time of an assemblycode and provide features for visualizing and understandingassembly code whereas this last point is not new. Softwarevisualization tools are used for educational purpose [7] tohelp students understand how a compiler generates assem-bly code. They are also used in code simulation [8, 9], wherea user can execute a code step-by-step and see the contentof a register at a precise moment just like a debugger. Inaddition to provide an overall understanding of the code,visualization systems can be used in the domain of com-puter security to aid in deobfuscating or finding back doorsin assembly code [10]. While all these tools permit to under-stand the structure of an assembly code, none of them wasdesigned to find security flaws in source code by examiningthe required computation time of specific execution path.

1.2 Our contributionsOur approach is to provide a framework to aid unravellingassembly code and detect portions of code vulnerable to tim-ing attacks. The main components of the analysis tool canbe summarized as follows:

• Understanding assembly programs: we create a graph-ical view of the assembly program by exhibiting theappropriate call graph and control flow graph.

• Analyzing assembly programs in order to detect loopsand remove unreachable code. A regular expressionevaluator is provided to aid the user evaluating theprecise number of clock cycles on a specific control flowpath.

The system was implemented in Objective Caml [11] sinceit is a programming language that integrates solid tools forlexing and parsing source code and it is ideal to manipulaterecursive data structures. To visualize the call graph andthe control flow graph we use Graphviz [12] that is a verysuitable tool to represent structural information as diagrams

Page 2: [IEEE 2008 Third International Conference on Risks and Security of Internet and Systems( CRiSIS) - Tozeur, Tunisia (2008.10.28-2008.10.30)] 2008 Third International Conference on Risks

of abstract graphs. The rest of the paper is organized asfollows: section 2 describes visualization system and howthe structure of assembly code is represented in our tool.Section 3 explains how to analyze precisely the number ofclock cycles of a path. Section 4 describes an application on acomparison function. We talk about possible improvementsin section 5 and we finally conclude in section 6.

2. VISUAL REPRESENTATIONNowadays, assembly is used to optimize, typically speed up,portions of code by mixing it with high level languages. Al-though assembly code is very efficient, its non-structurednature renders its behavior difficult to understand. To savehuman effort in unravelling assembly code, we provide a nicevisual representation by using classical graph constructionsissue from compilers: the call graph presents the relationbetween procedures while the control flow graph shows highlevel constructs that are not explicit in the code such asloops, branches and procedures calls.

2.1 Call graphA call graph is a directed graph that represents calling re-lationships between procedures in a program. Each node ofthe graph corresponds to a procedure and there is a directededge from the caller node to the callee node as depicted inFigure 1b.

A procedure begins with a label and ends with the RET key-word. As not all labels signify the beginning of a procedure,we identify procedures by seeing if a label is the start ofthe program or matches the destination of a procedure-callinstruction from anywhere in the program.

2.2 Control flow graphA control flow graph (CFG) is a graph where nodes rep-resent computations and edges represent the flow of controlbetween the nodes. In this graph, each node is called a basicblock [13]. A basic block is a sequence of consecutive state-ments in which flow of control enters at the beginning andleaves at the end without halt or possibility of branching ex-cept at the end. The reconstruction of control flow includesfinding the division into basic blocks and their connections.In order to construct the control flow graph representationof a program, we first search the set of leaders using thefollowing algorithm [14]:

• The first statement is a leader.

• Any statement that is the target of a conditional orunconditional jump is a leader.

• Any statement that immediately follows a jump or aconditional jump statement is a leader.

Then, for each leader, we add successively all statements upto but not including the next leader or the end of the pro-gram. Once the basics blocks are computed, we add edgesbetween these blocks :

• If a basic block B2 immediately follows a basic blockB1 in the order of the program and B1 doesn’t end inan unconditional branch.

(a) Program visual-ization.

(b)Callgraph.

(c) Control flowgraph.

Figure 1: Back edges are drawn in red, call edgesare drawn in green and return edges are drawn inorange. Basic blocks’ top left corner contains labelswhile the top right corner shows blocks’ identifiers.

• If there is a branch from last statement of block B1

and to the first statement of B2 (i.e. a leader).

• There’s an edge between the end of a function (i.e. RETinstruction) and the next statement of the caller basicblock.

We assume that all resulting flow graphs are reducible [13].It follows from that property that all loops are natural loopscharacterized by their back edges and that there are nojumps into the middle of loops. Figure 1c shows the re-sulting control flow graph.

During the construction of the flow graph, VisAA automati-cally detects the unreachable code, removes it from the CFGand informs the user. Unreachable code is a part of a pro-gram that will never be executed (i.e. portion of code thatcannot be executed regardless of the input data). This fea-ture expurgates the resulting graph, giving the user a bettervision of the program.

3. PATHS EVALUATIONThe main objective of our tool is to detect unbalanced pathsin assembly code, which means finding two distinct pathswith a different number of clock cycles. To achieve this taskwe need to precisely analyze the timing of specific executionpaths of assembly code by counting their number of clockcycles. If such paths could be found in a procedure that

- 222 -

Page 3: [IEEE 2008 Third International Conference on Risks and Security of Internet and Systems( CRiSIS) - Tozeur, Tunisia (2008.10.28-2008.10.30)] 2008 Third International Conference on Risks

manipulates sensitive data, they may be exploited by a tim-ing attack. Due to the undecidability of computing boundsloops and infeasible paths, we cannot expect to do this workautomatically. For this reason, we provide an interactivetool in which the user manually enters the path of interest.

3.1 Regular expressionsRegular expressions are ideally suited to describe paths throughan automaton. The idea is to automatically obtain the reg-ular expression of the control flow graph so that a user onlyneeds to instantiate it to describe a path of interest.

As a result, we consider the CFG as an automaton thatrecognizes the language of basic blocks’ identifiers traversedthrough the control flow from one basic block to another. Itmeans that transitions between two consecutive blocks willbe labelled by the identifier of the initial block.

Let us detail this construction on the example depicted onFigure 1c. Assuming that the first basic block is the number6 and the final one is the number 7, all the execution pathsare modeled by the following minimal regular expression:

E := 6 0 (1 (2 + 3) 4)∗ 5 7

where * is the Kleene star and + the alternation symbol.

3.2 System of equationsAs it is a difficult and tedious task to work out the regularexpression of a whole program, our tool computes it auto-matically. To achieve that, it constructs a system of lan-guage equations with respect to the relation between basicblocks where all equations are of the form:

L0 = aL1 + bL2

L0, L1, L2 are the languages corresponding to the basic blocksB0, B1, B2 and a, b are the transitions between block B0 andblocks B1, B2. To solve such systems of language equations,we use the Arden’s lemma [15].

The system of equations corresponding to the graph depictedin Figure 1c is the following:

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

L0 = 0.L1

L1 = 1.L3 + 1.L2

L2 = 2.L4

L3 = 3.L4

L4 = 4.L1 + 4.L5

L5 = 5.L7

L6 = 6.L0

L7 = 7

The Figure 2 shows the solution of the above system of equa-tions expressed as a regular expression automatically com-puted by our tool.

E = ((((6 0).((1 2 4) + (1 3 4))*).(1 2 4 5 7))

+ (((6 0).((1 2 4) + (1 3 4))*).(1 3 4 5 7)))

Figure 2: Regular expression of all execution pathsof the control flow graph in Figure 1c.

The obtained regular expression is not minimal as our tooldoes not compute the minimal automaton corresponding tothe CFG. And so its syntax differs from the expression insection 3.1 but the semantic is equivalent.

3.3 EvaluationLet us demonstrate how these features are used in an eval-uation process. We want to check whether a procedure isexecuted in constant time since timing variations can leaksome information and lead to recover the secret data ma-nipulated by the program. The idea is to manually searchwhether two different execution paths have different timing[3]. The instantiation of the program’s regular expressionpermits a user to analyze the timing of a specific path. Theinstantiation consists in replacing stars symbols by a valuecorresponding to the number of loops’ iterations.

Suppose that on the Figure 1c a user wants to examinethe execution where the loop iterates two times through theblock B2, then the corresponding regular expression is thefollowing:

Exec := 6 0 (1 2 4) (1 2 4) 5 7

Figure 3 shows how the regular expression above is trans-lated in our tool.

[ Number of clock cycles ]

Please enter the path to examine:

> 6 0 [(1 2 4)*2] 5 7

--

Result: 25 Clock cycles

Figure 3: Instantiation of a regular expression.Square brackets identify a loop and its number ofiterations while parenthesis identify the loop’s body.

By instantiating two different regular expressions correspond-ing to two different execution paths and comparing theirnumber of clock cycles, a user will be able to detect whetheran assembly code procedure is vulnerable to timing attacks.

It must be emphasized that in the case where proceduresdo not contain loops, our tool automatically computes andcompares the number of cycles of all execution paths, hencesaving human effort.

4. APPLICATIONLet us provide a very concrete example of the verification ofa comparison function. For confidentiality reasons, we cannot put the real code in this article. We consider a functionwith about 170 lines of assembly code that compares twoarrays of bytes. To illustrate our explanation we considerthe C prototype of the function:compare(byte * A, byte * B, byte len) where A, B aretwo arrays of bytes and len is the length of the data tocompare.

By examining function names appearing on the call graphwe can deduce which ones are related to comparison. Onour example we eliminate calls related to patch loading anderror managing since they do not manipulate secret data. Aschematic view of the comparison function is the following:

- 223 -

Page 4: [IEEE 2008 Third International Conference on Risks and Security of Internet and Systems( CRiSIS) - Tozeur, Tunisia (2008.10.28-2008.10.30)] 2008 Third International Conference on Risks

• check length of data

• compute a checksum

• compare the data

• set the result of the comparison in a register

The first portion of code we examine deals with length ofdata. As it does not manipulate the data itself, there is nopossible timing attack on this part.

The checksum operation is compounded by a loop with fourbasic blocks and two different paths in the loop’s body. Ourtool helped us to determine that the timing on each pathis equal, so this loop will not influence the total timing ofexecution paths.

The comparison is coded with a single loop that processes alldata bytes even if two bytes are different. The loop’s bodycontains seven basic blocks and there are multiple executionpaths as each block has two successors. As this portionof code does not contain loop anymore, our tool is able toexamine all execution paths and tell us that their timing isthe same.

The last part of the function puts the result of the compari-son in a register and returns. While this part is not critical,the developers have put comments in their code that countthe number of cycles on each instruction. So we wanted tocheck if all execution paths had effectively the same execu-tion time. Once again, we copy all the blocks of interest ina new file and launch the analysis. It assures us that all thepaths have the same execution time.

From the previous analysis, it follows that the comparisonfunction is not vulnerable to timing attacks.

5. FUTURE WORKAlbeit our tool is already operational, there are some possi-ble directions for future work. As our tool is semi-automated,we could ameliorate the graphical user interface which wouldgive more interaction between the user and assembly codeby coloring the path of interest and editing portions of as-sembly code. Another improvement is to extend our tool byintegrating a language description for assembly in such waythat it can be easily adapted to any assembly languages. Fi-nally, we could transform our graph into a minimum stateautomaton to compute a more compact regular expressionin sight to effortlessly instantiate it.

6. CONCLUSIONIn this work, we presented VisAA, a tool that can be usedto visualize the structure of assembly AVR code and analyzeprecisely the timing of control flow paths by instantiatingregular expressions. The system is designed to first aid auser to understand how the assembly program is designedby showing the control structures such as loops, branchesand functions calls. Secondly, this system provides a sup-port to determine the precise timing of control flow paths.The user can focus his analysis on certain control flow pathsof interests and analyze with great precision their timing

execution and compare them. To evaluate the applicabil-ity of our tool, we examined real-world assembly code. Thetests were performed during common criteria evaluations onsmart card banking applications. The tool permitted theevaluator to verify whether comparison functions were vul-nerable to timing attacks.

7. REFERENCES[1] Common Criteria. Common Criteria for Information

Technology Security EvaluationPart3 : Security assurance requirements.http://www.commoncriteriaportal.com, August2005.

[2] G. Balakrishnan, T. Reps, D. Melski, andT. Teitelbaum. WYSINWYX: What You See Is NotWhat You eXecute. In VSTTE 2005.

[3] Jean-Francois Dhem, Francois Koeune,Philippe-Alexandre Leroux, Patrick Mestre,Jean-Jacques Quisquater, and Jean-Louis Willems. Apractical implementation of the timing attack. InCARDIS, pages 167–182, 1998.

[4] Side channel attacks database.http://www.sidechannelattacks.com.

[5] A. Hevia and M. Kiwi. Strength of two dataencryption standard implementations under timingattacks. volume 1380, pages 192–205, 1998.

[6] Atmel AVR 8-bit RISC.http://www.atmel.com/products/avr/.

[7] Joshua C. Estep and Christopher A. Healy. A flexibletool for visualizing assembly code. J. Comput. SmallColl., 20(3):55–67, 2005.

[8] Patrick Borunda, Chris Brewer, and Cesim Erten.Gspim: graphical visualization tool for mips assemblyprogramming and simulation. In SIGCSE ’06:Proceedings of the 37th SIGCSE technical symposiumon Computer science education, pages 244–248, NewYork, NY, USA, 2006. ACM.

[9] Ben L. Titzer, Daniel K. Lee, and Jens Palsberg.Avrora: scalable sensor network simulation withprecise timing. In IPSN ’05: Proceedings of the 4thinternational symposium on Information processing insensor networks, page 67, Piscataway, NJ, USA, 2005.IEEE Press.

[10] Ida pro - the interactive disassembler.http://www.hex-rays.com/idapro/.

[11] Objective Caml. http://caml.inria.fr/ocaml/.

[12] John Ellson, Emden R. Gansner, EleftheriosKoutsofios, Stephen C. North, and Gordon Woodhull.Graphviz - open source graph drawing tools. In GraphDrawing, pages 483–484, 2001.

[13] Steven S. Muchnick. Advanced compiler design andimplementation. Morgan Kaufmann Publishers Inc.,San Francisco, CA, USA, 1997.

[14] Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman.Compilers: principles, techniques, and tools.Addison-Wesley Longman Publishing Co., Inc.,Boston, MA, USA, 1986.

[15] John E. Hopcroft and Jeffrey D. Ullman. Formallanguages and their relation to automata.Addison-Wesley Longman Publishing Co., Inc.,Boston, MA, USA, 1969.

- 224 -

Page 5: [IEEE 2008 Third International Conference on Risks and Security of Internet and Systems( CRiSIS) - Tozeur, Tunisia (2008.10.28-2008.10.30)] 2008 Third International Conference on Risks

[16] 8-bit AVR Instruction Set.www.atmel.com/atmel/acrobat/doc0856.pdf/.

[17] AVR GCC compiler.http://www.nongnu.org/avr-libc/.

- 225 -