Taint analysis for automotive safety using the LLVM...

Linköpings universitetSE–581 83 Linköping+46 13 28 10 00 , www.liu.se

Linköping University | Department of Computer and Information ScienceMaster thesis, 30 ECTS | Datateknik

2019 | LIU-IDA/LITH-EX-A--19/074--SE

Taint analysis for automotivesafety using the LLVM compilerinfrastructureÉléonore Goblé

Supervisor : Ulf KargénExaminer : Nahid Shahmehri

http://www.liu.se

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annananvändning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligheten finns lösningar av teknisk och administrativ art.Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning somgod sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentetändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.För ytterligare information om Linköping University Electronic Press se förlagets hemsidahttp://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for aperiod of 25 years starting from the date of publication barring exceptional circumstances.The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercialresearch and educational purpose. Subsequent transfers of copyright cannot revoke this permission.All other uses of the document are conditional upon the consent of the copyright owner. The publisherhas taken technical and administrative measures to assure authenticity, security and accessibility.According to intellectual property law the author has the right to bementionedwhen his/her workis accessed as described above and to be protected against infringement.For additional information about the Linköping University Electronic Press and its proceduresfor publication and for assurance of document integrity, please refer to its www home page:http://www.ep.liu.se/.

© Éléonore Goblé

http://www.ep.liu.se/

http://www.ep.liu.se/

Abstract

Software safety is getting more and more important in the automotive industry as me-chanical functions are replaced by complex embedded computer systems. Errors duringdevelopment can lead to accidents and threaten users’ lives. The safety level of the soft-ware must therefore be monitored through Automotive Safety Integrity Levels (ASILs),according to the standard ISO 26262. This thesis presents the development of a static taintanalysis tool using the LLVM compiler infrastructure in order to identify safety-criticalcomponents and analyze their dependencies in automotive software. The aim was to pro-vide a useful visualization tool to help safety engineers in their work and save time duringdevelopment. It was concluded that this static taint analysis tool can facilitate and improvethe precision of the ASIL decomposition of automotive software.

Acknowledgments

First and foremost, I would like to thank ARCCORE for giving me the opportunity to conductthis master thesis. In addition, I would like to thank my supervisor Daniels Umanovskis andmy colleague John Tinnerholm for their valuable help. I would also like to thank all mycolleagues at ARCCORE for their friendly welcome and their support.

Furthermore, I would like to thank my supervisor Ulf Kargén and my examiner NahidShahmehri for providing me with valuable feedback.

I would also like to thank my sister Morgane for proofreading my thesis.Finally, I would like to thank Linköping University and the University of Technology of

Compiègne for giving me the possibility to carry out this double-degree project.

Éléonore Goblé

iv

Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures vii

List of Tables viii

1 Introduction 11.1 Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theory 52.1 Automotive industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Functional safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Static Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Pointer and Alias Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.5 LLVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.7 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.8 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Method 153.1 LLVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Taint analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Results 304.1 LLVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Taint analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Discussion 395.1 Taint analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

v

5.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.4 Source criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.5 The work in a wider context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6 Conclusion 446.1 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Bibliography 46

vi

List of Figures

1.1 Master thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Compilation process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 An overview of the LLVM Value inheritance . . . . . . . . . . . . . . . . . . . . . . 173.2 UML Diagram, describing the architecture of the taint analysis pass . . . . . . . . . 183.3 SafeValue and SafeInstruction classes . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 The list of tainted functions and global variables in each file . . . . . . . . . . . . . 314.2 An example of the tree view, whose initiator is the variable safe. . . . . . . . . . . . 324.3 The alias view of the variable safe in the function testInterProcedural . . . . . . . . 324.4 Visualization tool overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.5 Which aspect has been used to find the ASIL rating of an object? . . . . . . . . . . . 354.6 An overview of the result of the taint analysis pass on the project (real names have

been modified) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

vii

List of Tables

3.1 Taint propagation policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Linear scale questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 Store test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.5 Load address test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.6 Pointer parameter test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.7 Global initialization test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.8 File test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.9 Call test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.10 Violation test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 Linear scale questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2 LLVM IR metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3 Taint information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.4 Taint analysis results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.6 Program execution time results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

viii

1 Introduction

The importance of safety in the automotive industry has significantly increased in recentyears. Purely mechanical functions have been replaced by complex embedded computer sys-tems, which require high levels of safety. In fact, errors during development can lead toaccidents and threaten users’ lives. The safety level of the software must therefore be as-sessed and monitored. ISO 26262 [1] is an industry-specific standard for functional safety ofroad vehicles, similar to the broader standard IEC 61508 which defines Functional Safety ofElectrical/Electronic/Programmable Electronic Safety-related Systems [2]. According to ISO26262, the safety level of an application can be measured by Automotive Safety Integrity Lev-els (ASILs). This standard recommends separating safety-critical objects from non-hazardousobjects in the memory.

1.1 Company

This master thesis is done in collaboration with ARCCORE AB [3], headquartered in Gothen-burg, Sweden. ARCCORE is a fully-owned subsidiary of Vector Informatik GmbH, head-quartered in Stuttgart, Germany. ARCCORE provides leading solutions for embedded sys-tems development in the automotive industry. ARCCORE software aims at being developedwith respect to the automotive standard AUTOSAR [4].

1.2 Motivation

In the automotive industry, the embedded code supplier needs to provide guarantees to theOriginal Equipment Manufacturer (OEM) with regards to safety requirements. In order toattempt to establish that the software is safe, the company needs to perform analysis on thecode.

Dynamic analysis techniques such as testing and verification are common ways to checksoftware safety, however these methods are tedious. The number of possible paths growsexponentially with the size of the program, therefore, testing only provides a “partial ver-ification”, according to Silva et al. [5]. Hardware protection can also be developed to en-sure safety. AUTOSAR [4] defines a standard for the architecture of Electronic Control Units(ECUs) and recommends functional measures for safety-relevant systems. In embedded sys-tems, a hardware Memory Protection Unit (MPU) [6] allows memory protection by defining

1

1.3. Aim

access rights to different parts of memory. In a safety-critical system, the MPU can be usedto partition the memory and prevent unsafe components from writing into the safe memoryduring run-time [7].

Static analysis consists in analyzing the source code before executing it, and thus enablesengineers to prove code safety. Static analysis could be used to find out the components tobe placed in the safe partition. Static analysis can be combined with dynamic analysis toimprove the efficiency of the analysis [8]. However, developing a sound static analyzer isexpensive in terms of complexity.

Moreover, safe components which have a higher ASIL need “Freedom from interference”(FFI) [9] from lower level components, which ensures that “a fault in a less safety criticalsoftware component will not lead to a fault in a more safety critical component“, accordingto Leitner-Fischer et al. [7].

Nevertheless, monitoring the safety of the entire software can be costly, according toAzevedo et al. [10]. For a developer of automotive software, it is desirable to limit the amountof ASIL components. In fact, such components have to be developed according to additionalrequirements imposed by ISO 26262, which significantly increases the effort during imple-mentation and testing phases. The goal is to reduce the volume of code involved by highsafety levels as much as possible, in order to be able to study these slices precisely and tolimit the risks.

Currently, a manual code inspection is performed in order to identify the dependenciesrelated to the variables used in safety-critical modules. The challenge is to develop a softwarethat would automatically identify dependencies between the safe objects of the program, andthus would give a base to safety engineers to help them to partition the memory.

Taint analysis [11] consists in detecting data coming from untrusted sources and propagat-ing the taint to the variables in relation with this data. Taint analysis can be used to identifydata which can influence safety-critical components.

The Low Level Virtual Machine (LLVM) [12] is a compiler infrastructure composed of a setof libraries and reusable objects. LLVM provides several modules for compiler construction,which can be used for static code analysis. The Clang compiler utilizes LLVM in order totransform C code into LLVM IR, which is an intermediate representation. This representationfacilitates the analysis of the relation between variables. LLVM also provides the LLVM PassFramework [13] which gives the possibility to develop an “LLVM analysis pass”, which is aplugin developed on the top of LLVM to analyze source code.

1.3 Aim

The aim of the thesis is to develop a static analysis tool, composed of a taint analysis passbased on existing static analyzers such as LLVM analysis modules and the LLVM Pass Frame-work, and a visualization tool to present the results. Through this, this thesis aims at exam-ining how taint analysis can be used to ensure embedded systems safety. This could be doneby analyzing C code and generating the dependencies related to safety-critical components.The output of the program should be easily understandable for the safety architects, whichmeans it should be easy and quick to learn how to use the tool, and the output should beprecise enough to provide them with additional information for their work.

1.4 Research questions

The first meetings and discussions made it possible to highlight the most important aspectsof the thesis and to raise the following questions:

1. Is LLVM suitable to perform static analysis on automotive software?

2

1.5. Delimitations

The first task is to study the possibility of implementing a new module on the top ofLLVM.

2. How can static taint analysis be used to track dependencies related to safe componentsin automotive software?

This thesis aims at studying the best method to implement a static taint analyzer forautomotive software. This analyzer should identify efficiently the components whichcan influence variables marked as safe.

3. How to represent results in an understandable way so that engineers can improve thesafety development process?

This thesis aims at generating an understandable output which focuses on the mostimportant and relevant information and presents useful data for safety engineers. Onetask is thus to study the best way to represent the dependencies between safe and unsafecomponents.

4. Is the taint analysis accuracy sufficient for the application? How does taint analysisvisualization affect the usefulness of the output?

The results of the tool can also be compared to manual analysis results performed onexisting projects to evaluate the accuracy. The output visualization can be submitted tosafety engineers, so that they can evaluate the usefulness of the result.

1.5 Delimitations

This thesis only aims at analyzing dependencies from safety components provided by theuser. Thus, the thesis does not provide identification of the initial components consideredas safe. This thesis aims at developing a standalone tool, so the integration of the tool is notincluded in the developing process. Moreover, this tool should be compatible with LLVM-5and should work on Windows, according to the company technical configurations. Finally,this thesis aims at analyzing embedded code for automotive industry which follows the rulesdescribed in MISRA C Guidelines [14].

1.6 Outline

The figure 1.1 illustrates the outline of the master thesis. This figure highlights the main stepsof the study.

First, a pre-study was conducted in order to define the subject and plan the thesis work.The Introduction Chapter [1] and the Research Questions were written following this. Someliterature and technical research was done in order to write the Theory Chapter [2] and to startthe development phase. The research study was useful to design the architecture of the taintanalysis LLVM pass based on the LLVM Pass Framework presented in the Method Chapter[3]. Then, the development of the taint analysis LLVM pass and the visualization tool wasdone iteratively. The main functionalities of the taint analysis pass were tested. A qualitativestudy was performed on the visualization tool and the taint analysis pass was tested on a real-project of the company in order to evaluate its accuracy. The Results Chapter [4] presents theresults of the evaluation and the static taint analyzer composed of the taint analysis pass andthe visualization tool. The Discussion Chapter [5] presents feedback and improvements madefollowing the different studies. Finally, the Conclusion Chapter [6] summarizes the results ofthe master thesis and suggests further work.

3

1.6. Outline

Introduction

Theory

Method

Results

Conclusion

Research questionsPlanningDefining subject

Discussion

Static taint analyzer Qualitative study

Accuracy evaluation

Architecture

Iterative development

Testing

Feedback

Improvements

Taint analysis for automotive safety using the LLVM compiler infrastructure

DevelopmentTaint analysis pass

Technical research

LLVM

Visualization

C++

Literature researchTaint analysisAutomotive systems

Figure 1.1: Master thesis outline

4

2 Theory

This sections aims at presenting the background and the related work relevant to this thesis.First, section 2.1 presents software development in automotive industry. Then, section 2.2defines functional safety standards and concepts. A review of the different types of staticanalysis is provided in section 2.3. A brief explanation about pointer analysis is given insection 2.4. Besides, an overview of LLVM is provided in section 2.5. Section 2.6 presentsexisting studies related to this topic. Finally, section 2.7 introduces software visualizationand section 2.8 reviews methods to evaluate software usability and accuracy in the context ofstatic analyzers.

2.1 Automotive industry

The automotive industry deals with safety-critical systems whose malfunctions could lead toserious consequences, including injury to people, environmental issues and large losses ofmoney [5]. Vehicles are increasingly automated and use a lot of embedded computer systems[15]. These systems require more and more checks to ensure vehicle passengers safety.

Automotive systems architecture

Automotive systems are divided into a physical hardware part, such as Electronic ControlUnits (ECUs), and a software part [16]. ECUs are embedded systems composed of “a micro-controller and a set of sensors” [15], aiming at controlling an electrical system in a vehiclethrough an embedded software. These systems need to implement protection methods toensure safety, both at hardware and software levels. The Memory Protection Unit (MPU) [6]is a hardware protection in ECUs aiming at restricting the access to the safe partition dur-ing run-time. A memory access violation generates an exception that terminates programexecution.

This thesis focuses on software methods to ensure embedded systems safety.

Embedded software development

According to Freund [17], embedded software involves many constraints such as “real-timescheduling, reliability and production requirements”, which influence software development

5

2.2. Functional safety

methods. Embedded software is usually developed in C because this language has beenused in critical systems for a long time, and efficient machine code can be generated from Cprograms [14].

MISRA C Guidelines [14] provide “a subset of the C language” which is supposed to re-duce the possibility of making mistakes during the development. This is done by removingC language expressions which could lead to undefined behaviour, misuse or misunderstand-ing. These guidelines are recommended in the development of embedded applications andsafety-related systems.

2.2 Functional safety

Functional safety aims at detecting hazardous situations and applying preventive solutions.These solutions should prevent systematic or hardware failures from having serious conse-quences [2]. Therefore, standards have been developed to assess functional safety and toprovide common methods to solve these issues.

Functional safety standards

The automotive industry is regulated by several standards which aim at standardizingproducts development. IEC 61508 defines Functional Safety of Electrical/Electronic/Pro-grammable Electronic Safety-related Systems [2]. ISO 26262 [1] is adapted from IEC 61508and deals with functional safety of road vehicles.

AUTOSAR (AUTOmotive Open System Architecture) [4] is an automotive standard forthe software architecture of ECUs. This standard recommends measures and mechanisms toimprove the development of safety-related software, such as memory partitioning [6]. Unsafeapplications are run in user mode whereas safe applications are run in supervisor mode, inorder to access the MPU without restriction.

Automotive Safety Integrity Levels (ASILs)

Part 9 of ISO 26262 defines “ASIL-oriented and safety-oriented analyses” in order to decom-pose the software into safety-related components and non-safety-related components. Auto-motive Safety Integrity Levels (ASILs) have been developed to check the safety level of anembedded system. Therefore, respecting ASILs aims at convincing the manufacturers thatthe products meet safety requirements. In order to develop ASIL software, designers mustfind out safety-critical components whose malfunctions could lead to serious issues [10], suchas the brake system. Therefore, risks related to hazardous situations are defined and classi-fied into four different levels (ASIL A, ASIL B, ASIL C, ASIL D) according to their severity,probability, and controllability [1]. ASIL components must be monitored through safety mea-sures and require more development effort [10]. Components which do not require specificsafety measured are identified as Quality Management (QM).

Freedom from interference

Freedom from interference (FFI) is defined by ISO 26262 Part 9 Section 6.2 [1] as the absenceof “cascading failures” from a lower ASIL element to a higher ASIL element. This meansthat components with lower ASIL should not influence components with higher ASIL. Thisshould prevent an error that happens in an unsafe module from propagating to a safety-critical module [7].

Therefore, ASIL components should be separated from QM components inside the mem-ory. ASIL components should be placed in the Memory Protection Unit (MPU) [6].

Finally, static code analysis can be performed in order to identify the components relatedto safety-critical modules.

6

2.3. Static Analysis

2.3 Static Analysis

Static analysis refers to the analysis of a program without running it [18]. Contrary to dy-namic analysis, which is performed on programs during run-time [11], static analysis canbe performed directly on source code or on intermediate code, for example on the LLVMintermediate representation (IR) [19].

Although dynamic analysis can be popular, this method has some limitations. One exe-cution path is generated for each input set, and one path is tested for a program at a time.Thus, achieving a high percentage of code coverage is challenging when the number of pathsincreases, and dynamic methods can thus “encounter [...] paths explosion problems”, ac-cording to Feng and Zhang [20]. Dynamic testing tends to provide only “partial verification”according to Silva et al. [5]: some paths can be missed and inaccurate results can be provided.Static analysis gives the possibility of simulating all the execution paths of the program dur-ing compile-time, which is called symbolic execution, according to Liang et al. [21].

However, static analysis tools are not always fully reliable [11]. They provide either over-approximation or under-approximation. These tools can be incomplete, and produce falsepositives (find an error where there is none), or unsound, and produce false negatives (errornot reported), depending on the chosen approximation method. According to Mock et al.[22], if the static analysis method is too precise, then the algorithm complexity can be a limitwhen running the analysis on large programs.

Analysis methods

Static analysis can be performed by applying formal methods, that is to say, analyzing math-ematically the source code in order to prove some results.

According to P. Cousot and R. Cousot [23], abstract interpretation approximates possiblevalues using abstract sets which aims at converting infinite spaces into finite ones. For exam-ple, as far as the sign of the variable’s values is concerned, the set of integers can be abstractedto the set t(+), (´), (0)u . Another technique is deductive verification, which aims at provingthe algorithm by dividing it into a list of mathematical proof obligations, according to Silva etal. [5]. Furthermore, symbolic execution consists in simulating the execution of the programduring compile-time, according to Liang et al. [21].

Static analysis can also be based on compiler technology. According to Arroyo et al. [11],modern compilers enable developers to build upon their structure elements, such as AbstractSyntax Tree (AST), Control Flow Graph (CFGs) and Call Graphs (CG), in order to performdata and control flow analysis. Data-flow analysis consists in analyzing the operations per-formed on a data set, whereas control-flow analysis is used to study the flow of tasks and thestructure of the program.

Taint Analysis

The field of static analysis developed in this thesis work is taint analysis. According to Arroyoet al. [11], taint analysis is based on information flow and “non-interference”: informationflow analysis is used to check that tainted information does not interfere with informationwhich should not be tainted.

Usually, in software security, taint analysis consists in marking data coming from un-trusted sources, such as user input, as unsafe, because external data is always a security risk[11]. As far as software safety is concerned, unsafe data does not necessarily come from theuser, but also from unsafe modules. Then, taint analysis can be used to track the unsafe vari-ables which can influence the safety-critical components. In the context of this thesis, tainteddata is classified into different safety levels. A lower ASIL data should not influence a higherASIL data, otherwise both data should be tainted with the higher ASIL level.

7

2.4. Pointer and Alias Analysis

Taint analysis is usually divided into three phases [20]. The first one is taint information,which aims at tainting the initiators (source objects). The second phase is taint propagation,which aims at broadcasting the taint to all the other objects in relation with the initiators. Thelast phase is taint checking, which consists in checking if an object which has been taintedshould not be tainted, to detect an unauthorized behavior. According to Schwarz et al. [24],the taint policy should define how the new objects are tainted, which operations propagatethe taint, and how the taint is checked at the end.

2.4 Pointer and Alias Analysis

During taint analysis, propagating operations need to be identified. As far as C language isconcerned, the main challenge is pointer and alias analysis. According to Avots et al. [25],C is an unsafe language and is difficult to analyze. In fact, operations can be performedon pointers, and pointers can either point to stack, heap objects or functions. There are alsomulti-level pointers. All of this increase the complexity of the analysis, according to Andersen[26]. Thus, a sound pointer analysis is really hard to achieve. A pointer analyzer must makecompromises to obtain readable and reasonable results. Therefore, different properties canbe used to identify the level of precision needed for the pointer analysis. According to Hind[27], this level should be in line with the customer’s needs.

Andersen presents in his PH.D. Thesis [26] a pointer analysis for C language based on sub-set constraints. This analysis is inter-procedural, which means that the relationships betweenthe functions are taken into account. Steensgaard [28] presents another inter-proceduralpointer analysis, which is based on equality constraints.

Definitions

Andersen [26] defines two fundamental concepts regarding pointer analysis: “alias pair” and“point-to information”.

Alias pair: if p = &x is an assignment, then ˚p is aliased with x. The alias pair is writtenx˚p, xy. “When the lvalue of two objects coincides, the objects are said to be aliased” [26].

Point-to information: if p = &x and p = &y are two assignments, then the point-to infor-mation of p is the set tx, yu, and is written p ÞÑ tx, yu. Point-to information denotes “the setof objects a pointer may point to” [26].

Properties

Pointer analysis properties aim at defining the level of precision needed by the application.

Field-sensitivity: Field-sensitivity deals with aggregate data types such as structures andarrays. A field-sensitive analysis studies each field of each structure separately, whereas afield-insensitive analysis considers each access to aggregate data as an access to the wholestructure [29].

Intra-procedural or inter-procedural: The intra-procedural pointer analysis performs data-flow analysis only inside functions. This is much easier than inter-procedural analysis, whichperforms a pointer analysis considering the interaction between functions. Inter-proceduralanalysis consists in analyzing each function call separately [26].

8

2.5. LLVM

Flow-sensitivity: Flow-sensitive analysis takes the execution order of the program, calledcontrol-flow, into consideration. This analysis is more precise because it could detect a depen-dency at a given line in the source code, which is also called program-point specific analysis.Contrary to flow-sensitive analysis, flow-insensitive analysis can only summarize the depen-dencies between pointers in the whole program. Pointers which are aliases only at a givenmoment of the program are referred to as “may-alias” [26].

2.5 LLVM

The Low Level Virtual Machine (LLVM) Project [12] is a compiler framework developed at theUniversity of Illinois. This framework is composed of “modular and reusable compiler andtoolchain technologies” [30]. LLVM aims at being a long-term code analysis and optimizationsystem by providing built-in optimization and analysis passes, and the possibility to developnew passes.

Compilation

The compilation is usually divided into three phases [Fig. 2.1]. First, a static compiler front-end, such as Clang, parses the source code and translates it into LLVM intermediate represen-tation (IR). Then, LLVM modules analyze LLVM IR to optimize the code, and finally machinecode compatible with the chosen platform is generated.

Figure 2.1: Compilation process

[31]

LLVM Intermediate representation (IR)

LLVM IR [19] is an intermediate representation used during compilation. It provides “a hu-man readable assembly language representation” (.ll) and a binary representation called “bit-code” (.bc) which can be executed and on which optimizations are performed.

LLVM IR is a “language independent type-system”, which uses common low-level prim-itives to implement complex high-level functions. Its architecture is a “load/store architec-ture”: all the accesses to the memory are done using load (read from memory) or store (writein the memory) instructions [32]. It means that all more complex operations which require anaccess to the memory will be divided into load and store instructions.

LLVM bitcode files can be linked together into one single file thanks to the LLVM linker[33], which aims at resolving the definition of functions and variables declared in differentfiles.

Static Single Assignment (SSA)

LLVM IR is a “Static Single Assignment (SSA)” [34] based language: each new assignment ofa value to a variable results in a new version of the variable being created. Data-flow analysisis facilitated by SSA representation which expresses a variable as a function of its previousversions.

According to Braun et al. [35], SSA form aims at improving the efficiency of the analysisby “compactly representing use-def chains”. A use-def chain is a data structure composedof an instruction (use) of a variable, and all the possible definitions of this variable. Thedef-use information is the list of all the instructions which involves a given variable. LLVM

9

2.6. Related Work

SSA is built according to Cytron et al.’s algorithm [34]. This algorithm first identifies thedifferent definitions of the variable. Then, if there are concurrent definitions, due to a if-statement for example, the multiple definitions are concatenated and propagated. Finally,the new definition of the variable replaces the old variable in its different uses.

LLVM Pass Framework

The LLVM project provides an LLVM Pass Framework [13]. An LLVM pass can be used totransform, analyze and optimize source code. New LLVM passes can also be developed inC++. Several types of passes are available, which enable the analysis of the source code ondifferent scales, such as modules, functions or basic blocks.

2.6 Related Work

Clang static analyzer

Clang static analyzer is an open-source tool, part of the Clang and the LLVM projects [36].The formal analysis is based on symbolic execution: a core engine simulates the differentexecution paths of the program, while the constraint manager checks if the path is satisfiable.The algorithm is path sensitive, so all the possible paths are explored. Arroyo et al. [11]developed a “user configurable static analyzer taint checker” plugin for Clang static analyzer,which aims at checking the propagation of tainted data in C, C++ and Objective C programs.Their tool provides a configuration file so that users can define the sources, propagators, sinksand filters of the taint analysis. Sinks are defined as “critical functions” which should not beinfluenced by tainted data. Filters are sanitizers which can generate safe data from tainteddata. This tool can be used to detect security flaws which could be triggered by malicioususer inputs.

Sparse Value Flow (SVF)

SVF (Sparse Value-Flow) [37] is an open-source static tool developed at the School of Com-puter Science and Engineering, UNSW, Australia. This static analysis tool is implemented onthe top of LLVM and aims at analyzing inter-procedural pointer dependencies for C and C++programs. This tool resolves both data and control flow dependencies, thus enabling a moreprecise analysis. The value-flow construction module, based on Andersen’s points-to infor-mation, generates an “inter-procedural memory SSA”[37] representation, providing def-usechains for pointers, whereas LLVM only provides an intra-procedural memory dependenceanalysis pass, according to Sui et al. [37]. The inter-procedural analysis is performed sparsely,that is to say, by first over-approximately computing def-use chains and then, by eliminatingunnecessary propagation and thus, refining the data-flow analysis. SVF can be used to detectbugs involving value-flow reachability, such as memory leak detection. SABER [38] is a mem-ory leak detector developed on the top of SVF. SVF can also be used to implement “scalableand precise pointer analyses” [38].

Frama-C

The Frama-C platform [39] is an open-source static analysis tool, which aims at performingsafety verification on industrial C code. This tool is supposed to be correct, which meansthat it provides over-approximation, in order to guarantee that no error remains undetected.Frama-C uses abstract interpretation, deductive verification and concolic testing, which is aform of dynamic symbolic execution, to prove the assertions. Frama-C is developed in OCamllanguage and aims at being an extensible platform, composed of several plugins which enablemore sophisticated approaches. Frama-C Evolved Value Analysis plugin aims at identifyingthe set of possible values of a variable, at a given moment of the execution. Frama-C also

10

2.7. Visualization

provides the possibility to slice the program in order to simplify it, and to navigate the use-defchains. Thus, Frama-C can be used to verify that the source code respects the specifications,which can be expressed as ACSL (a formal specification language) annotations. However,currently Frama-C does not provide a taint analysis plugin, although it is possible to computethe dependencies between variables.

Assisted Assignment of Automotive Safety Requirements

Azedevo et al. [10] have developed a tool aiming at automating ASIL allocation and decom-position during design phase. According to ISO 26262 Part 9.5 [1], if several independentsafety requirements are responsible for the ASIL rating of a common element, then it is pos-sible to assign a lower ASIL to these requirements. For example, if an element is tainted ASILD because of two ASIL D sub-elements, then these two sub-elements can be decomposed intotwo ASIL B requirements, since two ASIL B sub-elements are equivalent to an ASIL D ele-ment. This is done by associating an integer with each ASIL rating (i.e. A=1, B=2, C=3, D=4).In order to compute ASIL allocation, this tool first generates the fault trees thanks to an exist-ing safety analyzer and design optimizer called HiP-HOPS (hierarchically performed hazardorigin and propagation studies) [40]. Then, ASIL decomposition is computed by perform-ing a constraint solving algorithm on the “minimal cut set” [10], which refers to the smallestset of events that makes an element to be marked as ASIL. This tool can be used to reducedevelopment costs by limiting the amount of high ASIL elements.

2.7 Visualization

Software visualization refers to the visual representation of software components [41]. Thechallenge related to software visualization is to provide understandable and useful informa-tion for developers so that they can work more effectively [42]. In fact, software visualizationaims at reducing the effort spent by developers on development and maintenance tasks [43].According to Shahin et al.’s systematic review [41], the most used visualization technique isgraph-based visualization.

Graph representation

When static analysis is used to examine relations between objects in the source code, a graphrepresentation can be a suitable solution. In fact, graphs can be used to represent these rela-tionships graphically, nodes being objects and relations being edges. Some graphs are com-monly used in static analysis, such as call graphs, program dependence graphs and control-flow graphs. Call graphs display the calling relationship between functions, nodes beingfunctions and edges being calls. Program dependence graphs are used to show the depen-dencies between variables, nodes being statements or values, and edges being relations be-tween them. Control-flow graphs present the different execution paths of a program [42],nodes being instructions and the edges being instruction jumps.

The SVF tool [37] can generate a value-flow graph in order to display program dependen-cies. Different kinds of nodes exist and are highlighted by different colors in order to identifythem. The dependencies between elements are represented by edges.

As far as graph representation is concerned, the developer should be able to find easily theuseful information and to understand the relationship between objects. Providing interactivefeatures enables the user to hide information which is not currently important and to expanduseful details [42]. Information visualization can be facilitated by navigation interactionssuch as zooming, moving or expanding nodes.

The key issues related to graph representation are due to the information layout. To makea graph easy to read and understand, information should be organized clearly and followspecific rules, according to Herman et al. [44]. Graph drawing also has aesthetic and practical

11

2.7. Visualization

rules, such as equal space distribution between nodes. Moreover, edge crossing should beavoided if the graph is planar. One of the most common graph layouts is the tree layoutwhich is convenient to display hierarchical information. “Tree layout algorithms have thelowest complexity and are simpler to implement” [44].

Code annotation

As static analysis is used to analyze source code, a common visualization method is to anno-tate the code directly with the results. Usually, a plugin can be developed and integrated tothe IDE (integrated development environment).

The results of the taint checker developed by Arroyo et al. [11] are displayed as code anno-tation in order to warn the developer against untrusted data during development. Frama-C[39] also provides a user interface and a source code browser to display the results on thecode.

The advantage of code annotation is to let developers see the context of a result [42], that isto say, reading the code and locating the information inside the project. However, annotationson code do not give the possibility to have a global representation of the dependencies.

LaToza and Myers [42] developed an Eclipse plugin composed of both code annotationand graph representation in order to navigate the call graph of a module. In fact, inter-procedural dependencies are easily represented through a call graph. Thus, the user canget context information from the Eclipse IDE and global information from the graph.

Useful properties in software visualization tools

According to Bassil and Keller [43], “appropriate visualization can significantly reduce theeffort spent on system comprehension and maintenance”. In order to define what an “appro-priate visualization” is, Bassil and Keller conducted a survey about software visualizationtools. They aimed at evaluating the usefulness and the importance of different visualizationaspects. Bassil and Keller [43] report the most essential properties according to the results ofthe questionnaire:

1. “Search tools for graphical and/or textual elements”

2. “Source code visualization (textual views)”

3. “Hierarchical representation”

4. “Use of colors”

5. “Source code browsing”

6. “Navigation across hierarchies”

7. “Easy access, from the symbol list, to the corresponding source code”

Some useful but not essential properties have also been reported, such as “saving of viewsfor future use”, the “possibility of having multiple [...] instances of the same object beinghighlighted in all the views”, or the “visualization of different levels of detail in separatewindow”.

Bassil and Keller [43] have also questioned experts about code analysis support of soft-ware visualization tools. It has been reported that the most important functionalities are“visualization of function calls”, “visualization of inheritance graph” and “visualization ofdifferent levels of detail in separate window”.

12

2.8. Evaluation

2.8 Evaluation

In the context of static analysis in automotive safety, the accuracy of the tool should be mea-sured so that users can assess whether they can rely on the results. Moreover, the tool aimsat helping engineers to be more efficient in their work. Thus, the usefulness of the resultsshould be evaluated to check whether the tool fulfils its goal.

Evaluating the usefulness of the results

According to Seaman [45], qualitative research methods are increasingly used to take intoaccount human behaviour when evaluating software. Qualitative data can not be representedas numbers, contrary to quantitative data. Two data collection methods are commonly used:“participant observation and interviewing” [45]. The first one consists in observing softwaredevelopers while they are working and taking notes about their behaviour and thoughts. Thesecond one consists in asking a series of questions to developers. After collecting data, resultsshould be analyzed in order to extract “a statement or proposition” [45].

LaToza and Myers [42] evaluated the “potential productivity benefits [...] and the usabil-ity” of their static analyzer taint checker, called REACHER, by conducting a lab study on 12participants. This tool aims at reducing the time required for a task, by allowing the develop-ers to understand and navigate the code more effectively. The study consisted in comparingthe time the participants needed to perform a task with Eclipse, to the time needed to per-form the same task with REACHER. To make the two tools comparable, all the participantshad completed two tutorials on Eclipse and REACHER in order to familiarize with both in-terfaces, before taking part in the study. Each task involved the understanding of “controlflow between events” in the program and the use of a call graph, which is REACHER’s focus.Each task focused on a particular aspect of REACHER.

Evaluating the accuracy of the results

According to Anderson [46], ISO 26262 requires to qualify static analyzers by assessing thetool confidence level (TCL). This is expressed as the possibility that a failure in the tool pre-vents the requirements from being met (tool impact TI), and the probability that the failurecan be detected (tool error detection TD). Thus, the accuracy of the tool should be assessedand the functional requirements should be tested.

Arroyo et al. [11] evaluated the accuracy of their taint checker based on clang static ana-lyzer following these criteria:

• “capacity for finding usage of tainted data”: this refers, for example, to the capacity ofthe tool to detect the use of a tainted variable in a given instruction. Each type of usagewas tested in a test case.

• “the number of false positives”: this refers to the wrong propagation of tainted datagenerating false errors.

• “scalability”: the tool was tested on a real case, the hearth bleed vulnerability ofOpenSSL.

Sui et al. [38] performed an experimental evaluation in order to measure the accuracyof their static memory leaks detector, called SABER. They define accuracy as the “ability todetect memory leaks with a low false positive rate”. To conduct the study, they tested theirtool on “15 SPEC2000 C programs (620 KLOC) and seven open-source applications”. Theyreported the number of faults found by SABER, and the number of false positives. Then, theycomputed the false positive rate as seen in [eq. (2.2)]. Finally, they compared the results tothe results obtained with other analyzers.

13

2.8. Evaluation

Recall that the number of faults reported and the true number of faults can be expressedas follow:

number o f f aults reported = f alse positives + true positives (2.1)

Then, the false positive rate can be defined as:

f alse positives rate =f alse positives

number o f f aults reported(2.2)

They concluded that their detector is “neither complete [...] nor sound” [38] due to someapproximations, such as treating multi-dimensional arrays monolithically or bounding thenumber of loop iterations.

Imparato et al. [8] have reported “a comparative study of static analysis tools for AU-TOSAR”. They have evaluated the tools according to their precision and recall, which can beexpressed as follow:

precision =true positives

number o f f aults reported(2.3)

recall =true positives

f alse negatives + true positives(2.4)

A high precision saves time because it limits the amount of false alerts that developerswill have to check. The recall measures the number of errors detected out of the total numberof errors. If the recall equals 1, then the tool will detect all the errors.

14

3 Method

This chapter describes the implementation of the taint analyzer on the top of LLVM in sections3.1 and 3.2, the development of the visualization tool in section 3.3 and the evaluation of theaccuracy of the results and the usefulness of the visualization tool in section 3.4.

3.1 LLVM

The first research question was to examine if it was possible to utilize LLVM to develop astatic analysis tool for automotive software. This study was done in three steps.

The first step was to study how to develop a plugin on the top of LLVM. One of theadvantages of the compiler infrastructure is the LLVM Pass Framework [13], presented insection 2.5. LLVM passes can be used to transform, analyze and optimize source code in amodular way. Moreover, it is possible to develop new LLVM passes easily thanks to a setof reusable functions and application programming interfaces (APIs) written in C++. LLVMalso provides a detailed documentation [47] intended for developers. New passes inheritfrom one of the Pass child classes: ModulePass, CallGraphSCCPass, FunctionPass,LoopPass, RegionPass and BasicBlockPass. In the context of the thesis, the Modulepass was selected because it can analyze the whole program. Therefore, it enables inter-procedural analysis, whereas the Function pass only provides the possibility of analyzing thecontent of each function separately and independently. Finally, the runOnModule functionshould be overwritten and is the entry point of the pass. Thus, any object-oriented applicationcan be developed on the top of a Module pass.

The second step was to study how to perform the taint analysis based on the function andAPIs provided by the LLVM infrastructure. LLVM APIs give the possibility to iterate overseveral objects of the LLVM IR inside the module. For example, it is possible to iterate overeach instruction, each function or each global variable of the program. It is also possible toiterate over the def-use chains, defined in section 2.5, making LLVM especially well suited toperform taint analysis.

The last step was to study how to run the pass on the projects of the company. Once thepass is developed, it must be compiled with Clang in order to generate a shared library. Then,a pass can be run on an LLVM bitcode file through the command line interface thanks to “themodular optimizer, opt”, according to Lattner and Adve [48]. Thus, in order to analyzethe source files of the different projects of the company, the projects had to be compiled with

15

3.2. Taint analysis

Clang to obtain the corresponding bitcode files of each module, which is a single C translationunit.

3.2 Taint analysis

The second research question was to examine how taint analysis can be used to analyze thedependencies between safe variables in the automotive industry. The first phase was to definethe way to identify the source safe variables, called taint information, and how to implementthem in the tool. The second phase was to determine the taint propagation policy, that isto say the set of operations or actions propagating the taint, according to the automotiveindustry requirements and ISO 26262 [1]. The last phase was to study how to implement thetaint analysis algorithm to analyze the LLVM IR.

Taint information

Taint information, also called source information, represents the data set tainted at the initial-ization of the taint analysis algorithm. Thereafter, tainted data refers to safety-critical data,divided into four ASIL ratings (A, B, C, D), whereas untainted data refer to quality manage-ment (QM) data.

Specification The specifications related to taint information should state the type of objectswhich can be tainted by the user at the beginning. These specifications have been discussedduring a meeting with the safety engineers of the company. In the context of the thesis,according to the needs of the company, taint information should be user-configurable, whichmeans that the user can define the list of tainted values as an input of the taint analysis tool.Then, it has been decided that the source objects that a user can taint at the initialization couldbe:

• a global variable, identified by its name,

• a memory region, identified by an address range,

• a source-code file, identified by its name.

In fact, specifying the name of a safe global variable is sufficient to identify it in the sourcecode. Moreover, specifying a memory region can be used to taint the safe registers and thepartitions which should be protected in the MPU. Specifying a file is useful if a lot of functionsthat have to be tainted are located in the same file. This prevents developers from writing thename of each tainted function one at a time.

Finally, each user input can be associated to an ASIL rating (A, B, C, D).

Implementation To implement a user-configurable analyzer, taint variables are defined bythe user in an XML configuration file, which is read by the taint analysis pass using the C++XML processing library Pugixml [49]. Then, user input is converted into several instances ofthe Input class [Fig. 3.2]. This class is composed of the name of the object or of a memoryregion (start and end addresses), and an ASIL rating. All the instances of the Input class arestored in a list, which is a member of the taint analysis pass. Thus, this list represents the setof taint information.

Then, these inputs need to be associated to an LLVM class, that is to say an instance ofLLVM::Value, which is the most generic LLVM class used to define a variable. An LLVMfunction is used to select the LLVM::Value corresponding to a name or a memory region.The child classes of LLVM::Value are presented in [Fig. 3.1]. Taint information can eitherbe a global variable (LLVM::GlobalVariable), an address (LLVM::ConstantExpr) or afunction (LLVM::Function). LLVM::AllocaInst and LLVM::Argument cannot be part

16

3.2. Taint analysis

of taint information since it defines local variables. However, it will be used later in theanalysis of the dependencies.

Once the LLVM::Value instance corresponding to the Input has been identified, taintinformation is converted to an instance of the SafeValue class [Fig. 3.3], which is composedof:

• an LLVM::Value instance

• an instance of the enumeration ASIL (QM, A, B, C, D)

This class is the key of the taint analyzer because every LLVM::Value instance analyzedby the algorithm is stored in a SafeValue instance. All the instances being ASIL A, B, C orD are tainted information, whereas instances being QM are untainted information.

Figure 3.1: An overview of the LLVM Value inheritance

[30]

17

3.2.Taintanalysis

Figure 3.2: UML Diagram, describing the architecture of the taint analysis pass18

3.2. Taint analysis

Taint propagation policy

After defining the taint information, the second phase is to identify the kind of operationswhich can propagate the taint to other variables, which can be either global variables, localvariables, addresses or functions.

Specification The taint propagation policy has been defined in accordance with the opinionof the engineers of the company, based on their experience with safety requirements. Sevencases have been defined and are presented below [Tab. 3.1]. If an object is tainted by severalobjects, then the highest ASIL should be assigned to it, according to ISO 26262 Part 9 [1].

Store If a new value is assigned to an ASIL variable, resulting in the variable being mod-ified, then the function where the assignment is done should be tainted. A memory writeaccess is always translated by a store instruction in LLVM IR [32]. It is considered that aninstruction modifies tainted data if its memory location or its content is overwritten. Thus, iftainted data is a pointer, any assignment to the pointer or to the dereferenced pointer will beconsidered as a modification.

Load address If an ASIL hard-coded address is assigned to a scalar variable, or convertedand assigned to a pointer, then the variable or pointer should be tainted.

Pointer parameter If an ASIL pointer is passed as a parameter to a function, then the contentof the function should be analyzed to check if the pointer is modified inside, that is to say,if its memory location or its content is overwritten by another value. In order to do this, thefunction behaviour is first over-approximated: the calling function and the parameter insidethe function are tainted. Then, the content of the called function is analyzed to determinewhether the pointer is effectively modified. If there appears to be a modification, then thecalled function is tainted as well. If the pointer is not modified inside the function, then thecalled function is not tainted.

Function call If a function is tainted, then each function calling this function should also betainted. Thus, the taint is propagated to the functions of the call graph originating from thisfunction.

Global If a global value is initialized with tainted data, then this global value should alsobe tainted.

File A file can only be tainted if the user includes its name in the configuration file. If a fileis tainted, then all global variables and functions defined in this file should also be tainted.

Violation When the scalar value of a tainted variable is assigned to a QM variable or a lowerASIL variable, it is not a safety-critical operation, because the safe memory is not likely to bemodified. So, no tainted value is added. However, if a tainted pointer is stored in another QMor lower ASIL pointer, the safe memory could be modified later through this unsafe pointer.Thus, this case should not happen in a safe application, except if the tainted variable is ahard-coded address, or if it is a global variable definition. Assigning an ASIL variable to alower ASIL or QM variable is inconsistent with safety recommendations. Thus, this case isconsidered as a violation.

19

3.2. Taint analysis

Table 3.1: Taint propagation policy

Name Description Taintinformation

TaintPropagation Examples

Store

Modificationof a safevariableinsidea function

Lvalue(any type) Function

variable_asil = variable_qm;variable_asil = function_qm();pointer_asil = &variable_qm;

LoadAddress

A safeaddress isloaded into avariable insidea function

Rvalue(address)

Functionand lvalue

int* pointer = (int *) 0x0F;uint32 address = 0x0F;

Pointerparameter

A safepointer ispassed as aparameter toa function

The pointerparameter

Parameter,callingfunctionand calledfunction

called_fn(&variable_asil);

Definition:void called_fn(int* pointer){*pointer = variable_qm;}

Call A call to asafe function

Calledfunction

Callingfunction

void calling_function() {function_modifying_ASIL()}

GlobalA globalvaluedefinition

Rvalue Globalvariable

int* global = &global_asilint* global = 0x00001002

FileA fileis markedas safe

FileGlobalvariables,Functions

Violation

A safe pointeris loadedinto an unsafepointer

Rvalue (notan address) Violation pointer_qm = pointer_asil;

pointer_qm = &variable_asil;

Implementation

The first step of the implementation was to define the scope of the taint analysis pass. Then,the second step was to develop the algorithm to parse and analyze the LLVM IR, in order toidentify the different cases presented in the propagation policy [Tab. 3.1]. The last step wasto compile the project with Clang to generate LLVM IR.

MISRA C Guidelines Some assumptions have been made throughout the developmentprocess of the analyzer according to MISRA C Guidelines [14]. The following rules apply tothe embedded project analyzed by the taint analysis pass:

• Each line of code is reachable.

• Variables should always have distinct names.

• Dynamic allocation and deallocation functions are not used.

These rules allow some simplifications. All the lines of the LLVM bitcode file are analyzedas there is no unreachable code. A variable can be identified by its name since two differentvariables should have different names. Dynamic allocation and deallocation are not takeninto account during the analysis. Only hard-coded memory addresses are studied.

20

3.2. Taint analysis

Pointer analysis A pointer analysis can have different level of accuracy, as presented insection 2.4. The level of accuracy needed by the tool has been established according to theneeds of the company. The taint analyzer should be field-insensitive, which means that eachaccess to a sub-element is equivalent to an access of the whole aggregate data. In fact, ac-cording to ISO 26262 Part 9 Section 6.2 [1], elements composed of sub-elements should bedeveloped according to “the highest ASIL applicable to the element”. The taint analyzershould be inter-procedural so that relationships between functions can be analyzed, in orderto identify when a tainted pointer parameter is modified inside a function. Finally, the taintanalyzer should be flow-insensitive, which means that the execution order of the program isnot important. This is an over-approximation which aims at simplifying the analysis becauseflow-insensitive analysis is costly in terms of complexity.

Instruction level The taint analysis pass only analyzes source code on the instruction level.Thus, analyzing machine code such as assembly language is out of scope.

LLVM IR analysis At initialization time, taint information is defined. The taint should bepropagated to other data according to the taint propagation policy.

The users of each taint information, that is to say, in that case, the list of instructionsinvolving a given LLVM::Value instance, can be listed using the iterator over the users.Once a user is detected, it needs to be analyzed, to identify the taint propagation policy casethat it corresponds to. In that case, a user can either be an instruction or a constant expression.The AnalyzerFactory selects the child class of the Analyzer corresponding to the LLVMIR instruction type, as described on the UML diagram [Fig. 3.2].

The LLVM language reference manual [19] describes the different LLVM IR instructions.

Listing 3.1: Store Inst

s t o r e { type } { source } , { type } * { d e s t i n a t i o n } , a l i g n { type_alignment }

The store instruction writes a value inside an address of the memory. It is the onlyinstruction which can modify the content of an existing variable in the memory (on the LLVMIR level) [19]. Thus, this instruction is related to the Store case of the taint propagation policy,if the destination operand has a higher ASIL than the source operand. Otherwise, it is aviolation. Finally, if the source operand is a safety-critical address, then it is related to theLoad Address case of the taint propagation policy.

A store instruction is often preceded by a load instruction which aims at loading thedestination address or the source value of the store instruction.

Listing 3.2: Load Inst

{ r e s u l t } = load { type } , { type } * { source } , a l i g n { type_alignment }

The load instruction reads the content of an address in the memory and stores it insidean SSA result. This instruction is used each time the content of the address of the memoryneeds to be read. For example, a load instruction can be used to load the address stored in thepointer. In order to access the value pointed by the pointer, a second load instruction shouldbe used to load the content stored in the address.

A load instruction does not necessarily indicate that the loaded operand will be modified.In fact, the address of a pointer can either be loaded to modify its content, or to read itscontent. Then, the instructions following the load should be analyzed, until finding a storeinstruction or a call instruction.

The call instruction is a special case related to inter-procedural analysis.

Listing 3.3: Call Inst

{ r e s u l t } = c a l l { type } { func t ion } ( { funct ion arguments } )

21

3.2. Taint analysis

The call instruction is used for function calls. The return value is stored in an SSA result.When performing inter-procedural analysis, if safety-critical data is passed as a parameter tothe function, then the content of the function needs to be analyzed as well. This instruction isrelated to the Pointer Parameter and Call cases of the taint propagation policy.

Listing 3.4: Alloca Inst

{ r e s u l t } = a l l o c a { type } , a l i g n { type_alignment }

The alloca instruction is used to allocate memory on the stack frame during the execu-tion of a function. It enables the declarations of local variables which will be released afterthe function returned. An argument of a function is later assigned to a local value which isdeclared with an alloca instruction.

Listing 3.5: GetElementPtr Inst

{ r e s u l t } = gete lementptr inbounds { type } * { source } , { type } { index }

The getElementPtr instruction is used to “get the address of a sub-element of an ag-gregate data structure” [19], such as arrays or structures. As the load instruction, it does notnecessarily lead to the modification of the operand, then the following instructions need tobe analyzed.

Listing 3.6: Global variable

@{ globalVarName } = { g loba l | constant } { type } { i n i t i a l i z e r } ,a l i g n { type_alignment }

The global instruction is used to declare a global variable. A global variable can beinitialized with another global initializer, which can be a global variable or a constant. This isrelated to the Global case of the taint propagation policy.

Listing 3.7: An example of constant expression: inttoptr

{ d e s t i n a t i o n _ t y p e } i n t t o p t r ( { type } { value } to { d e s t i n a t i o n _ t y p e } )

Finally, a user can also be a constant expression, which is used to perform operations onconstants [19]. If a global value, which inherits from the LLVM::Constant class, is usedby a constant expression, then the users of this constant expression should also be analyzed.For example, the constant expression inttoptr [List. 3.7] can be used to convert a constantinteger, such as an address, to a pointer.

Propagation policy New tainted variables are stored in a SafeValue instance [Fig. 3.3],in the same way as taint information. It is useful to recall that the SafeValue class aims atstoring an LLVM::Value analyzed by the pass, which thus can be associated with an ASIL(A, B, C, D) or classified as QM. The SafeValue objects store a list of all their users, corre-sponding to a propagation case, in a map whose keys are the users’ location. In fact, eachtime a user is identified as a case of the taint propagation policy, it is stored in an instanceof SafeInstruction [Fig. 3.3], which is composed of the tainted value, its alias, the prop-agation type, and its location. If, at some point, the lvalue of two variables are equal, theyare said to be aliases, as explained in section 2.4. The location is a global object which referseither to the tainted function where the user is located, or to a tainted global variable if theuser is a global declaration. Finally, SafeValue instances are stored in a SafeMap whosekeys are the LLVM::Value instances. Thus, it is possible to find out which functions andaliases have been tainted because of a given value, and then to find out which case of thetaint propagation policy was responsible for the taint.

22

3.2. Taint analysis

Figure 3.3: SafeValue and SafeInstruction classes

Taint propagation algorithm The taint propagation algorithm developed in the context ofthis thesis is summarized below in pseudo-code. Each instance of the taint information istainted at the initialization. Then, users of tainted variables are analyzed. If the user corre-sponds to a propagation case of the taint propagation policy, then the taint is propagated tothe function or the alias. Finally, the user is converted to a instance of SafeInstructionwhich is inserted in the user map of the SafeValue instance.

Listing 3.8: Taint propagation algorithm

This i s the i n i t i a l i z a t i o n .

t a i n t _ i n f o r m a t i o n = l i s t _ o f _ s a f e _ v a l u e sfor each safe_va lue in t a i n t _ i n f o r m a t i o n

propagat ing_ta in t ( sa fe_va lue )

This funct ion propagates the t a i n t to the s a f e value andanalyzes i t s users .

void func t ion propagat ing_ta in t ( sa fe_va lue ) {i f not ( sa fe_va lue . t a i n t e d ) {

sa fe_va lue . t a i n t e d = t ruefor each user in safe_va lue . users ( ) {

i f user corresponds to a propagation case {i f STORE or LOAD or PARAMETER or CALL

propagat ing_ta in t ( funct ion )

i f LOAD or PARAMETER or GLOBALpropagat ing_ta in t ( a l i a s )

convert user to s a f e _ i n s t r u c t i o nappend s a f e _ i n s t r u c t i o n to safe_va lue . user_map

}}

}}

23

3.3. Visualization

Compiling a project with Clang To run the analysis pass on a project, the project has to becompiled with Clang, in order to generate the LLVM bitcode files.

The following command should be executed for each source file in order to generate thecorresponding bitcode file.

Listing 3.9: Build

clang ´g émit´llvm ó f i l e . bc ć f i l e . c

The linking part should be done with the LLVM linker, presented in section 2.5, whichcombines several bitcode files into a single bitcode file.

Listing 3.10: Linking

llvm´l i n k * . bc ó output . bc

3.3 Visualization

The third research question was “How to represent results in an understandable way so thatengineers can improve the safety development process?”. The development of the visualiza-tion tool was done in two phases: data structure and serialization, and the development ofthe graph representation.

Data structure and serialization

The main information to be stored is the list of tainted variables (the instances of theSafeValue class), and the userMap of each safe value, containing the list of functions, safeinstructions and aliases related to this tainted value.

Listing 3.11: Example of JSON representation

t r e e s [ ’ safeValue ’ ] = {"name " : " safeValue " ," userMap " : [

{"name" : " funct ion1 " ," s a f e I n s t r u c t i o n L i s t " : [

{" a l i a s " : " a l i a s 1 " ," propagationType " : " s t o r e " ,

} ,[ . . . ]

]} ,{

"name : " funct ion2 " ," s a f e I n s t r u c t i o n L i s t " : [

[ . . . ]]

}]

}

24

3.3. Visualization

It has been decided to use the JSON (Javascript Object Notation) format to store the infor-mation. JSON is a human-readable file format used to represent objects as pairs of keys andvalues. It can be used to represent simple data as well as aggregate data such as arrays andlists.

Data contained in each object has to be serialized, which means that data has to be trans-lated into a storable format, so that the results can be read later. The result of the tool isserialized into a JSON file using a C++ function inside the pass.

In the context of this thesis, each SafeValue instance corresponds to a JSON entry, aspresented in [List. 3.11]. Each safe value has at least a name, and a list of users. Each user iscomposed of a function, which is associated to a list of safe instructions. Each safe instructionhas at least an alias and a propagation type.

Graph representation

Javascript library D3 It has been decided to utilize Javascript to render the output of thetool. Javascript enables developing interactive graph easily. Moreover, it is possible to con-struct a Javascript graph from a JSON file.

The Javascript library D3 (Data-Driven Documents) [50] has been used. This library isused to create documents to visualize data and to generate SVG (Scalable Vector Graphics).This library aims at simplifying Data-Objects manipulation (DOM).

Visualization properties The most useful visualization properties, according to Bassil andKeller’s study [43], presented in section 2.7, have been implemented on the visualization tool.

1. “Search tools for graphical and/or textual elements”: The possibility of searching for aspecific value by entering its name has been added. The target node is highlighted andthe ASIL rating is displayed.

2. “Hierarchical representation”: The tree representation has been chosen in order topresent the results hierarchically.

3. “Use of colors”: The colors have been used to easily identify the different types of thetaint propagation policy [Tab. 3.1] and the ASIL (A, B, C, D) or QM ratings.

4. “Navigation across hierarchies”: The user can expand some function nodes to displaythe aliases of the tainted value, and minimize some branches of the tree.

Accesses to the source code have not been implemented since this visualization tool is notpart of an IDE, so it has been replaced by the possibility of accessing debugging informationsuch as file location.

The development of the visualization tool has been done iteratively, by regularly present-ing the results to the engineers to get their feedback. The tool was gradually improved, untilit meets their expectations. Thanks to these meetings, they had the possibility to assess theuser experience by navigating on the website or to ask for other functionalities. Thus, newfunctionalities have been added, such as the possibility to display the list of tainted functionsand tainted global variables in each file.

Tree layout A tree is an undirected and acyclic graph characterized by the fact that eachnode can only have one parent [51].

The tree layout has been chosen because it can provide a clear hierarchical organization.In fact, each identical hierarchical level is represented in the same line. Moreover, equalspatial distribution of the nodes is easy to achieve: the D3 tree layout [50] is obtained usingReingold-Tilford “tidy” algorithm [52]. Furthermore, the tree layout avoids crossing edges,

25

3.4. Evaluation

which is an important aspect according to Herman et al. [44]. Finally, in the context of thatthesis, each branch of the tree is meaningful, because it represents a safety-critical path.

However, the tree layout also has some drawbacks. Since each branch represents a path,several nodes can appear several times in different branches of the tree, for example if afunction is tainted by several initiators.

In the case of recursive functions, there could be cycles in the tree. To avoid cycles and toprevent the generation of endless trees, if a child node is equal to one of its ancestor nodes,its descendant nodes are not displayed.

An alternative to the tree layout would have been to construct a spanning tree. A spanningtree is a subset of a graph where all the nodes are present only once and linked togetherwith the minimum number of edges. As it is a tree, each pair of nodes must be linked byonly one path according to Graham and Kennedy [51]. Nevertheless, the issue of a spanningtree would be, in the context of that thesis, that some dependencies would not have beendisplayed in the graph.

These dependencies could also have been represented with a multiple tree structure [51].A multiple tree is a combination of several single trees. The advantages of this structure isthat a node can have several parents, while keeping the hierarchical organization. But, theissue of crossing edges still remains.

Traditional graph structures can be complex and overwhelming according to Mukherjeaet al. [53]. Thus, dividing the graph into several simpler trees seems to be a worthwhilesolution.

3.4 Evaluation

The evaluation was conducted to answer the research questions “How does taint analysis vi-sualization affect the usefulness of the output?” and “Is the taint analysis accuracy sufficientfor the application?”.

Usefulness

According to Seaman [45], qualitative research methods are suitable to evaluate human inter-actions with software. Thus, it has been decided to conduct a survey to assess the usefulnessof the visualization tool. This survey was made on Google Forms and sent by email to theparticipants who were all employees of the company.

In the context of this study, a “useful” functional aspect can be defined as a functionalitywhich provides new and relevant information to the developer, and that the developer willbe able to use to improve their work performance.

The first set of questions [Tab. 3.2] aimed at assessing the usefulness of the visualizationtool. The evaluation method used was based on the survey conducted by Bassil and Keller[43]. Thus, each question was written to evaluate one of the functional aspects of the visual-ization tool. The questions were formulated in the form of assertions. Each question could beanswered on a linear scale from one to five, one being that the interviewee does not agree atall with the assertion, and five being that the interviewee agrees with the assertion. Finally,the usefulness of the tool could be assessed by computing the average grade.

26

3.4. Evaluation

Table 3.2: Linear scale questions

Q1 The results provided by the visualization tool seem to be usefulQ2 Colors make the result easier to be readQ3 The hierarchical representation is relevantQ4 The information displayed in each node is usefulQ5 The alias overview provides new informationQ6 The detailed overview provides new informationQ7 The graph representation is well suited to visualize the relationships

between tainted variablesQ8 The tree representation (avoiding crossing edges) improves the visual-

ization, despite the fact that some variables appear several timesQ9 It is useful to minimize some branches of the treeQ10 The file location helps to understand the contextQ11 The search tool is usefulQ12 The files view is useful

The second set of questions [Tab. 3.3] was a list of tasks that the interviewees had to com-plete, based on the method used by LaToza and Myers [42]. These questions aimed at testingif the developers could understand the results and the notation used in the visualization tool,and thus assessing whether the user interface of the visualization tool was clear and under-standable enough. All the tasks were performed on the test project created for the unit tests,which covers all cases of the taint propagation policy.

Table 3.3: Tasks

Q13 What is the ASIL rating of the following functions?Q14 Please list one function which has been tainted by the safe variable

“safe_ptr”.Q15 Which alias is assigned to the safe variable “safe” in the function

“safeInteger”?Q16 Into which register(s) is the hard-coded address “0x4465” loaded?Q17 Which functions and variables are tainted in the file called “file.c”?

Accuracy

Two studies were conducted to evaluate the accuracy of the tool. First, a test suite was devel-oped to check the reliability of the functionalities requested by the taint propagation policy.Then, the pass was tested on an existing project of the company to assess the scalability ofthe tool and to compare the results of the analysis to the manual ASIL decomposition alreadydone on this project.

Unit tests Unit tests aim at testing the tool’s functional requirements, which can be used toassess the tool confidence level (TCL) [46]. Unit tests have been written using the Google TestFramework [54], which facilitates the implementation of unit tests in C++. At least one testcase has been written for each case of the taint propagation policy described in [Tab. 3.1]. Theunit tests are described below [Tab 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10]. In each test case, the newobjects added to the list of safe values are tested, as well as the ASIL propagation and the caseof the taint propagation policy.

27

3.4. Evaluation

Table 3.4: Store test cases

Test 1 A negative constant is assigned to a safe integer variableTest 2 A constant float is assigned to a safe float variableTest 3 An integer variable is assigned to a safe integer variableTest 4 The return value of a function is assigned to a safe integer variableTest 5 An integer variable is assigned to a safe dereferenced pointerTest 6 A pointer is assigned to a safe pointerTest 7 A hard-coded address is assigned to a safe pointerTest 8 An integer variable is inserted into a safe array of integersTest 9 Two different integer variables are assigned to a safe variable in an if-

statementTest 10 An integer variable is assigned to a safe address

Table 3.5: Load address test case

Test 11 A safe hard-coded address is assigned to a pointer

Table 3.6: Pointer parameter test cases

Test 12 A safe pointer is passed as a parameter to a function and is modifiedinside the function

Test 13 A safe pointer is passed as a parameter to a function and is not modifiedinside the function

Table 3.7: Global initialization test case

Test 14 A global variable is initialized with a safe variable

Table 3.8: File test case

Test 15 A file is tainted

Table 3.9: Call test case

Test 16 A safe function is called by another function

Table 3.10: Violation test case

Test 17 The address of a safe integer is assigned to an unsafe pointerTest 18 A safe pointer is assigned to an unsafe pointer

A case study: A real-world project The tool was tested on a real project of the company.This project, which was requested by a supplier in order to configure an ECU, was selectedbecause it was developed according to safety rules and was qualified ASIL.

The taint analysis’ accuracy was assessed according to “the number of false positives” andthe “scalability” criteria, in the same way as Arroyo et al. [11].

The study was conducted as follows. The initial taint information was identified togetherwith the safety engineers in order to configure and run the pass on the project.

In order to compute the false positives rate of the tool, the results obtained with the taintanalysis tool were compared to the existing ASIL decomposition of this project. In fact, thisproject had already been decomposed with respect to the ASIL, which means that safe mod-ules were already separated from QM modules. This decomposition had been done manually.

Thus, the ASIL functional blocks (composed of one or more modules), listed in the mem-ory map of the project, were compared to the ASIL functional blocks detected by the taintanalysis pass. The results of the taint analysis pass were taken from the file view, whichsummarizes the results of the analysis.

28

3.4. Evaluation

In the context of this study, a functional block was considered as ASIL as long as one of itsmodules (a source-code file and a header file) was ASIL. A module was considered as ASILif one of its global objects was tainted by the taint analysis pass.

Finally, this project is also a large project composed of more than 1 000 files. Thus, thisproject could also be used to assess the scalability of the taint analyzer and the visualizationtool. The program execution time of the taint analysis pass was measured using the commandtime. This command displays the real time, the user CPU time, and the system CPU timespent to execute the command. Only the real time was considered in this study.

Listing 3.12: Program execution time

time opt ´load ./ ModulePass . so ´TaintAnalys isPass ´config´f i l e name" conf ig/conf ig . xml " p r o j e c t . bc 2> log . t x t

29

4 Results

This chapter describes the results obtained at the end of this thesis. The first section 4.1 isrelated to the use of LLVM to perform static analysis on automotive software. The secondsection 4.2 presents the results of the implementation of the taint analysis pass. The thirdsection 4.3 presents the visualization tool. The last section 4.4 presents the results of theevaluation of the usefulness and the accuracy of the taint analysis tool.

4.1 LLVM

The taint analysis pass has first been tested on two embedded AUTOSAR example projectsof the company: HelloWorld and SafeInteriorLight. The HelloWorld project is not safety-critical, but it contains a complete ECU configuration, used to simulate a real-world ECU.The SafeInteriorLight project is composed of safety-critical parts and aims at controlling in-terior lights of a car. These projects were used to check whether it was possible to compileembedded projects with Clang to generate LLVM IR.

These projects were effectively compiled with Clang, on the condition that inline assemblyparts were withdrawn. All the bitcode files were linked together into a single LLVM IR fileusing the LLVM linker, as explained in section 3.2.

Then, the pass was run successfully on these two LLVM IR files. The configuration fileswere composed of a variable, a file, and a memory range. Minor changes have been done tothe taint analyzer as a result of these tests, such as increasing the size of the title frame onthe visualization to comply with longer names, and allowing the taint propagation when aglobal variable is initialized with safe aggregate data, such as a structure.

Thus, these projects were used as a proof of concept that a taint analysis pass based onLLVM can be developed to analyze automotive software.

4.2 Taint analysis

A taint analysis LLVM pass has been developed, which is an object-oriented C++ applicationcomposed of eleven classes. The taint analysis pass takes as input the configuration file of theuser and provides the results of the analysis in the form of a JSON file. The pass can be runon a project using the following command line.

30

4.3. Visualization

Listing 4.1: Running the pass

opt ´load ./ ModulePass . so ´TaintAnalys isPass p r o j e c t . bc 2> log . t x t

The configuration file path can also be passed as an option through the command line, byspecifying -config-file-name.

4.3 Visualization

An interactive visualization tool written in Javascript has been developed, based on the D3tree layout [55], and is presented in [Fig. 4.4]. The tool takes as input the JSON file suppliedby the pass, and generates a visualization composed of four views.

First, the main view [Fig. 4.2] is used to show the dependencies tree. This view shows thefunctions which are tainted by the initiators, and the call graph of that functions. The aliasesare not displayed in the main view, because the ASIL decomposition is usually done on thefunction level. So, the most important information to show is the list of tainted functions.

Therefore, the alias view is used to show which aliases are in relation with a safe variableinside a function [Fig. 4.3]. This view can be accessed by clicking on the functions taintedby the initiator. If the alias is also ASIL, then the functions tainted by the alias and their callgraph are displayed in the alias view.

The detailed view, which can be accessed from a node, provides more details about thetainted variable selected, such as its name, its ASIL, its type and its file location.

Finally, the files view displays a summary of all the functions and global variables taintedin each file [Fig. 4.1].

Figure 4.1: The list of tainted functions and global variables in each file

31

4.3. Visualization

Figure 4.2: An example of the tree view, whose initiator is the variable safe.

Figure 4.3: The alias view of the variable safe in the function testInterProcedural

32

4.3.V

isualization

Figure 4.4: Visualization tool overview

33

4.4. Evaluation

4.4 Evaluation

This section first presents the results of the survey to evaluate the usefulness of the visualiza-tion tool. Then, the results of the evaluation of the accuracy of the tool are described.

Usefulness

The survey was conducted as described in section 3.4. 10 participants took part in that eval-uation. Among them, 30% had already worked with automotive safety.

Participants were asked to evaluate the usefulness of several functional aspects of thevisualization tool. The results are presented in [Tab. 4.1].

The participants rated the usefulness of the visualization tool (Q1) 4.5 out of 5.The functional aspects which received the best grade (4.8) are the search tool (Q11) and

the alias overview (Q5). 80% of the participants rated the search tool’s usefulness 5 out of 5and 90% of the participants rated the alias view’s usefulness 5 out of 5.

The usefulness of the graph representation to visualize relationships between tainted vari-ables (Q7) was rated 4.7, but the choice of the tree layout to avoid crossing edges (Q8) wasrated 4.4. The hierarchical representation (Q3) was rated 4.6. The usefulness of the infor-mation displayed in each node (Q4) was rated 4.4. Finally, the possibility to minimize thebranches of the tree (Q9) was rated 4.3.

The file view (Q12) was rated 4.6 and the detailed overview (Q6) was rated 4.4.The functional aspects which received the lowest grades are the use of colors (Q2) and the

file location (Q10).The final average grade of the visualization tool, combining all the grades of the previous

questions, achieved 4.48 out of 5.

Table 4.1: Linear scale questions

Question Average gradeQ1 4.5Q2 4.2Q3 4.6Q4 4.4Q5 4.8Q6 4.4Q7 4.7Q8 4.4Q9 4.3Q10 4.1Q11 4.8Q12 4.6

Total 4.48

Then, participants were asked to complete some tasks, described in section 3.4. The firsttask (Q13) was to find the ASIL rating of a list of functions. Four of the tainted functions weredisplayed in the main tree view, whereas one of them was displayed in the alias view. Theanswers given for the fourth first functions were 100% correct. However, the question relatedto the function hidden in the alias view received 90% of correct answers: one participantcould not find the node in the tree.

Following this question, participants were asked to reveal which aspects they used toanswer the previous question: by using the search tool, looking at the tree, or both. Theresults are presented in [Fig 4.5]. All the participants used the search tool to answer thesequestions, among them 30% only used the search tool, and 70% used both the search tool andthe tree.

34

4.4. Evaluation

Figure 4.5: Which aspect has been used to find the ASIL rating of an object?

30%

70%

Search toolTree and Search tool

Question 14, which was related to the functions tainted by an initiator, received 80% ofcorrect answers. Question 15, which was related to the alias of a variable, received 100% ofcorrect answers. However, question 16, which was related to the aliases of an address, onlyreceived 80% of correct answers. Question 17, which was related to the file view, received90% of correct answers.

Some of the participants reported some comments on the visualization tool, in addition tothe questions of the survey. They suggested to add the possibility to minimize all nodes in thetree, and to have them be a minimal rectangle with variables only, in order to make the treeshorter. They also suggested to add colors explanations. Regarding the search tool which hasbeen mainly used to find the ASIL of the objects, they suggested to add an automatic scrollto the node in the tree. They also asked that the initiators of the trees were clearly marked assuch.

Accuracy

Unit tests A set of 18 unit tests have been implemented for each case of the taint propa-gation policy described in [Tab. 3.1]. The output of the unit tests is presented below [List.4.2].

Listing 4.2: Google Test Output

[==========] 18 t e s t s from 1 t e s t s u i t e ran . (150 ms t o t a l )[ PASSED ] 18 t e s t s .

A case study: A real-world project The study of the real-world project was conducted asexplained in section 3.4. The project was compiled with Clang to generate the correspondingLLVM IR. The results of the conversion of the project to LLVM IR are presented in [Tab. 4.2].

Table 4.2: LLVM IR metrics

number of files 318number of global objects 6569number of lines (without debugging information) 168 830number of lines (including debugging information) 577 668

The following step was to identify the taint information. The safety-critical functionalblocks of this project were the serial peripheral interface (SPI) driver, the MPU, the mechanical

35

4.4. Evaluation

sensors, and the watchdog manager (WdgM). The SPI driver is used to communicate withthe system basis chip (SBC) [56] inside the ECU. The watchdog manager aims at detecting aprogram flow error during runtime [57].

The safety-critical source-code objects identified in those blocks were the SPI registers, theMPU registers, the mechanical sensors input registers and variables, and the WdgM vari-ables. Registers here refer to hard-coded addresses.

The number of objects classified as ASIL in the source-code are presented in [Tab 4.3].

Table 4.3: Taint information

Module Number of objectsMechanical sensors input variables 5WdgM variables 5Mechanical sensors input registers 1SPI registers 256MPU registers 8237

These variables and registers have been copied to a configuration file like the one pre-sented below [List. 4.3]. The global variables are identified by their name and the memoryregions are identified by a starting and an ending address.

Listing 4.3: An example of configuration file

<config ><var iab le >

<name>wdgM_variable</name>< a s i l >A</ a s i l >

</var iab le ><var iab le >

<name>sensors_input </name>< a s i l >A</ a s i l >

</var iab le ><address >

< s t a r t >0 xf f f fC000 </ s t a r t ><end>0 xff f fE02C </end>< a s i l >C</ a s i l >

</address ><address >

< s t a r t >0xf000B124 </ s t a r t ><end>0xf000B124 </end>< a s i l >D</ a s i l >

</address ></config >

The visualization of the results of the taint analysis pass on this project is presented in[Fig. 4.6], and the number of objects tainted by the pass is listed in [Tab. 4.2].

Table 4.4: Taint analysis results

Number of functional blocks 16Number of modules 27Number of functions 75Number of global variables 12

36

4.4. Evaluation

5/23/2019 tree.html

file:///C:/Users/elgo/projects/helloWorldPass/HelloWorldPasses/Javascript/html/tree.html 1/2

FunctionFunctionFunction

ASIL: Dvariable_name_83

Function

ASIL: Avariable_name_84

Function


Function

ASIL: Bvariable_name_87

Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global

About Files view Minimize All

Search: adcChannel Search

store

loadparameterglobalviola�on

ASIL AASIL BASIL CASIL DQM

Minimize nodeShow aliasesShow detailed view

Ini�ator

5/23/2019 tree.html



Function


Function


Function


Function


Address


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Address


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function





Global



store




Ini�ator

5/23/2019 tree.html


FunctionFunctionFunction


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global



store




Ini�ator

5/23/2019 tree.html



Function


Function


Function


Function


Address


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Address


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Global


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function


Function





Global



store




Ini�ator

Figure 4.6: An overview of the result of the taint analysis pass on the project (real nameshave been modified)

37

4.4. Evaluation

The taint analysis pass identified 27 modules as ASIL, corresponding to 16 functionalblocks. A functional block is considered as ASIL if at least one of its modules are tainted,and a module is considered as ASIL if at least one of its global objects are tainted. In total, 89global objects were classified as ASIL.

21 modules were tainted without being directly influenced by the initiators, which meansthat none of their global objects or hard-coded addresses were tainted from the initialization.Although the MPU registers were written in the configuration file, no objects were taintedbecause of them. Otherwise, all the other variables and registers from the configuration filepropagated their taint to some objects.

As described in section 3.4, the list of functional blocks which were tainted according tothe manual decomposition was compared to the results of the taint analysis pass. The resultsof the comparison are presented below [Tab. 4.5].

Table 4.5: Results

Metrics Number of functional blocksTrue positives 11False positives 5False negatives 4

From those results, it was possible to compute the false positives rate, the precision andthe recall, as explained in section 3.4.

f alse positives rate = 5/16 = 31.25% (4.1)

precision = 11/16 = 68.75% (4.2)

recall = 11/15 = 73.33% (4.3)

In order to assess the scalability of the taint analysis pass, the execution time of the passwas measured as explained in section 3.4. This evaluation was conducted on a DellM2800computer with the following characteristics:

• Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz

• 16.0 GB RAM

• Windows 7 Professional

The results are presented in [Tab. 4.6].

Table 4.6: Program execution time results

Average Median Min Max4 m 45.0378 s 4 m 45.668 s 4 m 38.953 s 4 m 49.912 s

Regarding the scalability of the visualization tool, [Fig. 4.6] shows an overview of the treegraph. The visualization view extends depending on the number of initiators, so that thedistance between the nodes can remain sufficient and nodes do not overlap each other. Thisshows how the tool can cope with a large number of objects.

38

5 Discussion

This chapter first deals with the possible improvements of the taint analysis pass in section5.1. Then, the results of the evaluations, the method and the sources are discussed in sections5.2, 5.3 and 5.4, respectively. Finally, the results of the thesis are analyzed in a wider contextin section 5.5.

5.1 Taint analysis

Improvements

Some improvements could be carried out on the taint analysis pass.The pass could be integrated in the company’s IDE. In fact, using a standalone tool takes

more time than using a tool already integrated in an IDE. As presented in section 2.7, devel-oping a plugin to display the results of the pass on the source-code, in addition to the graphview which provides a clear overview of the results, would have been an interesting solution.

For the tool to be usable on a larger scale, it would be necessary to automate the compi-lation of the projects with Clang to generate LLVM IR. Currently, most of the projects of thecompany are made to be compiled with the GCC compiler. Although Clang is often compat-ible with GCC, generating LLVM IR requires additional handling. An efficient improvementcould be to include the pass in the continuous integration process of the company.

Currently, the pass does not handle the analysis of inline assembly language and specificmachine code. Inline assembly parts have to be ignored during the compilation so that thepass can be run on the project. In fact, the pass has been developed to analyze C code only.Future versions of this taint analyzer could include the analysis of other languages, whichrequires to handle some new cases. However, the algorithm used to propagate the taintshould be reusable because it does not depend on a specific language.

One of the limits of the visualization tool is that the functions tainted by an alias of aninitiator are not displayed in the main tree view. Thus, it could be difficult to find the contextof the tainted functions resulting from tainted aliases. Moreover, the current visualizationtool cannot display the root tree that a given alias belongs to. It would have been possibleto add the aliases directly in the main tree view, but the advantage of the alias view is thatit simplifies the main visualization and preventing it from graph explosion. This alias viewmakes the main graph easily readable.

39

5.2. Results

5.2 Results

Usefulness

The results of the survey show that the tool was generally considered useful by the partic-ipants. The visualization tool also seems to be easy to understand in the perspective of theanswers provided to the survey. In fact, the results are quite significant as each task receivedmore than 80% of correct answers.

By studying more precisely the results related to the different aspects of the tool, it appearsthat the search tool was evaluated as very useful, as predicted by the study conducted byBassil and Keller. [43]. Participants mainly used the search tool to find the ASIL rating of agiven object, instead of the tree view. The tree view can mainly be used to explore a safety-critical path from an initiator. As suggested by the participants, the possibility to scroll to thefirst occurrence in the tree of the search input was added, to facilitate the reading of the treegraph.

The alias overview was also evaluated as useful. In fact, this overview can be used togain information about the tainting context of a function and to show the function taintedby the aliases of the initiators. It was evaluated as more useful than the file view. This canbe explained by the fact that the alias overview is accessible directly from the tree, whereasthe file view is located on a different page. Despite its usefulness, some participants reportedthat the alias view was at first a bit hard to understand. The function hidden in the aliasview received only 90% of correct answers because some participants could not find it in thetree, while functions displayed in the tree view received 100% of correct answers. In fact,the search tool only highlighted the objects displayed in the main tree view. The possibilityto display the alias tree with the search tool has been added. Moreover, the task involvingthe address loaded into a register, whose answer was also in the alias view, received 80% ofcorrect answers. It is possible to deduce from this that the propagation policy case in the aliasview was a bit unclear. Therefore, explanations have been added to the header.

Contrary to the survey conducted by Bassil and Keller [43], the use of colors received thelowest grade. This can be explained by the fact that the choice of colors was not good, or bythe fact that the meaning of the colors was not explained. To solve this problem, a captionwas added to the header of the visualization tool to explain the meaning of each color.

The file view was rated 4.6. This view is useful to summarize the results of the pass andprovides a global overview of the project.

The detailed overview was rated 4.4. This overview should be used to show the source-code context, that is to say, the file location, which also received the lowest grade. This viewwas added to compensate for the lack of source-code browsing. It is possible to assume thatthis information was not enough to understand the context. Some debugging informationcould have been added, such as the line of the definition, or the C instruction which lead tothe tainting of an object.

The information displayed in each node was rated 4.4. This information was redundantwith the information displayed in the detailed view. Participants reported that it was notclear which objects were the initiators. Their name was written in italics to overcome thisissue.

Unit tests

The unit tests suite was very helpful during the development. Developing unit tests is timeconsuming, but it is worth it. The unit tests can be used to check that the functional require-ments are fulfilled, at least on specific cases. This also ensures that functionalities alreadydeveloped will not be removed. If additions to the code break the tests, this can be detectedimmediately. Thus, it increases the tool confidence level (TCL), presented in section 2.8, butalso facilitates the maintenance.

40

5.2. Results

A case study: a real-world project

Testing the pass on this project has made it possible to be aware of the real conditions relatedto an embedded project. The results of the taint analysis pass have been compared to theinitial decomposition of this project.

Recall Four false negatives were reported: these functional blocks were tainted in the orig-inal decomposition but not detected by the taint analysis pass, which leads to a recall of73.33%. These functional blocks were related to the functions which were used to access theMPU. However, the MPU addresses were not analyzed by the pass and the taint did not prop-agate. In fact, the MPU registers are not accessed with C instructions, but with instructionswhich are specific to the architecture of the embedded systems. These instructions are trans-lated into calls in LLVM IR and are not recognized by the tool as store or load instructions.

This is an issue, as it means that some safety-critical instructions may be missing, andtherefore safety-critical files may not be detected. The result of the analysis also depends onthe objects initially tainted. An omission in the taint information has an impact on the results.

False positives rate and precision Five functional blocks were identified as false positives,which lead to a false positives rate of 31.25%, and a precision of 68.75%.

Some over-approximations can increase the number of functions tainted by the pass. Forexample, if a global pointer is passed as a parameter to a function, but is not modified inthe function, the calling function remains tainted, due to the over-approximation of the inter-procedural analysis.

By analyzing in detail these five cases together with the safety engineers, it appeared thattwo of them should have been tainted in the initial decomposition. One tainted function wasa generated function, which was supposed to read a value, but which actually wrote into apointer. The other function was a hand-coded function used to validate a checksum.

If these two cases are considered, the updated results are the following:

f alse positive rate = 3/16 = 18.75% (5.1)

precision = 13/16 = 81.25% (5.2)

Thus, the precision achieved is 81.25%, which means that the tool does not perform manyover-approximations. Therefore, the tool is not likely to increase the work of engineers be-cause it does not taint too many objects compared to the reality.

ASIL decomposition Furthermore, the initial manual decomposition had been made onlyon the functional block level, whereas the taint analysis pass can mark modules, functionsand global variables as ASIL. Thus, the taint analysis pass allows for a more detailed decom-position.

The results of the taint analysis pass show that one of the system’s main runnables (AU-TOSAR terminology for a periodically scheduled C function), which was qualified ASIL,could be split into ASIL and QM parts. In fact, the tool indicates that this function wasqualified ASIL because of the modification of a safe global variable and the call to a safe func-tion, whose logical blocks represent around 24 lines of code out of 76 lines of code. Thus, thesplit would be around 70% QM and 30% ASIL code, and while the ASIL code in question isfunctionally simpler, such a split would have significantly decreased the development effortfor the runnable in question according to the safety engineers of the company.

On a module level, one of the system’s main modules that is qualified as an ASIL modulecould be reduced in size by moving parts of code that do not interact with ASIL data to a sep-arate module. The number of module-scoped functions that could be so moved is between35% and 45%, depending on how over-approximated the approach to the functional safety

41

5.3. Method

architecture is. Decreasing the size of the ASIL module this way would likely provide a sig-nificant decrease in the time required for post-implementation activities such as the module’ssafety qualification and analysis.

Another ASIL-classified module turned out to only have a small part of code, around 15%,that is actually safety-relevant. In this case as well, splitting the safety-relevant functional-ity off would have significantly reduced the later documentation, analysis and qualificationeffort.

Scalability The pass has been tested on a large project. As shown in [Fig. 4.6], the visual-ization tool is well suited to a large number of nodes. With regard to the time needed to runthe pass on this project, it is still reasonable because it does not exceed five minutes. This isacceptable for a static analysis tool.

5.3 Method

Validity

Regarding the evaluation of the usefulness, the participants were all working in the company.Thus, the survey has been done in a controlled environment, which guarantees the serious-ness of the participants and the reliability of the answers. Their involvement is also shownby the detailed comments provided by the participants. To increase the validity of the exper-iment and to obtain more significant results, the survey could have been submitted to moreparticipants, as noted by Bassil and Keller [43].

Regarding the evaluation of the accuracy, the project has been tested on a real project. Thisallowed the tool to be evaluated under real conditions.

Replicability

Regarding the evaluation of the usefulness, the level of experience with automotive safetymay affect the replicability of the evaluation. In fact, different participants can have differentopinions regarding the most important aspects of a visualization tool, depending on theirknowledge of the needs related to software safety. Some questions, related to user experience,were more subject to personal interpretation. Thus, these reasons could affect the results of asimilar evaluation.

The results of the evaluation of the accuracy depend on the project which is tested andits previous manual decomposition. Nevertheless, it can be expected that the general patternwill be similar, that is to say, that the tool would allow engineers to identify in more detailsthe safety-critical components of an application.

Reliability

Regarding the usefulness of the tool, the results were quite significant as the average gradeof the tool was above 4 out of 5. This means that participants agreed that the tool was useful.

Regarding the accuracy of the tool, the results are quite reliable because the tool has beensubjected to unit tests. Of course, this tool can be used as a basis for a safety engineer duringthe ASIL decomposition, but it is important to compare the results with another analysis,human or automated. A best practice would be to apply the same development process tothe analysis tool as to the tested project in order to increase the TCL [46].

42

5.4. Source criticism

5.4 Source criticism

Peer reviewed papers have been mostly used as primary sources. It was quite easy to findinformation about automotive safety and software development, including static analysis,software visualization and tool evaluation.

International standards have been used to gain recommendations and detailed informa-tion about safety for road vehicles and embedded systems.

However, there is unfortunately a lack of concrete information regarding the identifica-tions of patterns generating “cascading failures” [1] at the software level. Thus, the taintpropagation policy has been mainly based on the experience of the engineers with automo-tive safety. ISO 26262 Part 9 Section 7.4.4 [1] recommends the use of “checklists based on fieldexperience” to assess “potential dependent failures plausibility”.

Regarding the implementation of the pass, the LLVM Project provides a clear and detaileddocumentation which has been widely used to develop the pass.

5.5 The work in a wider context

This tool aims at facilitating the work of engineers by providing them with an analysis toolin order to support their work. This tool identifies the safety-critical components, so thatengineers can focus on the safe development of these components.

Automotive safety is a societal challenge. Vehicles are composed of more and more em-bedded computer systems. The users and manufacturer require safety guarantees in order totrust the vehicles. These safety expectations increase especially in autonomous driving: “Thereason for the large amount of software requirement is the electrification of the automobileand autonomous driving systems” according to Sari and Reuss [58].

Vision Zero [59] is a road safety project created in Sweden in 1997. This philosophy con-siders users’ serious injuries, due to road vehicles or the road transport system, as “unaccept-able”. Therefore, safety should not be “traded against” mobility [59]. Thus, it is useful torecall that emphasis should be placed on safety in the automotive industry.

However, it is still hard or impossible to “reduce the risk to zero” [46]. Therefore, a statictool analyzer aims as reducing the risk “as low as reasonably practicable” according to An-derson [46].

43

6 Conclusion

This chapter summarizes the purpose of this work and the answers to the research questions:

1. Is LLVM suitable to perform static analysis on automotive software?

This research question aimed at determining whether it was possible to develop a staticanalyzer using the LLVM compiler infrastructure. The features offered by the LLVMcompiler infrastructure, such as the LLVM Pass Framework, were studied. It was de-ducted that it was possible to compile a project with Clang to generate LLVM IR, and todevelop a pass to analyze this intermediate representation. The pass was successfullyrun on three automotive projects. Thus, it was concluded that it was possible to developan LLVM pass to analyze automotive software.

2. How can static taint analysis be used to track dependencies related to safe componentsin automotive software?

This research question aimed at examining how to use taint analysis in order to track thedependencies related to safe components in automotive software. An inter-procedural,field-insensitive and flow-insensitive taint analyzer was developed to analyze the de-pendencies between safety-critical components in automotive software. Therefore, thetaint propagation policy was set up and implemented. LLVM IR was analyzed to iden-tify the safety-critical operations. A taint analysis algorithm was developed to propa-gate the taint to the new users related to the taint information.

3. How to represent results in an understandable way so that engineers can improve thesafety development process?

This research question aimed at studying the alternatives to represent results in an un-derstandable way. A Javascript tool was developed to visualize the results providedby the LLVM pass. The dependencies between safe objects were represented in a treegraph, in order to highlight the safety-critical paths of the software. A file view, show-ing the functions and global variables tainted in each file, was used to summarize theresults of the pass.

4. Is the taint analysis accuracy sufficient for the application? How does taint analysisvisualization affect the usefulness of the output?

44

6.1. Consequences

These research questions aimed at evaluating the results of the thesis. The usefulness ofthe visualization was assessed using a survey submitted to the employees of the com-pany. This survey showed that the search tool and the alias view were the most usefulaspects of the visualization. Overall, the tool was considered as useful and understand-able by the participants.

The accuracy of the tool was assessed through unit tests and the analysis of a case study.The unit tests were used to ensure the functionalities of the tool. The case study showedthat the tool was incomplete due to over-approximations, and unsound because it couldnot detect the dependencies related to the MPU. Nevertheless, the tool was able to de-tect two functional blocks which should have been tainted in the initial decomposition.Finally, the case study revealed that the tool could effectively improve the precision ofthe ASIL decomposition.

6.1 Consequences

This thesis can be used as proof of concept to show that it is possible to develop a static taintanalysis tool using the LLVM compiler to analyze automotive software.

According to Anderson [46], static analysis tools exist to check MISRA C rules and codingbest practices. But fewer tools exist to check the requirements of the ISO 26262 certification.

It is hoped that this taint analysis tool can help safety engineers in their work. This toolshould allow them to save time and development effort, by highlighting the safety-criticalcomponents in automotive software. Separating the safety-critical parts from QM parts inautomotive software would allow engineers to save time and money. It would prevent de-velopers from classifying an entire application as ASIL.

According to Heling et al. [6], “it is not necessary to assume that every requirement of thebasic software must be generally classified as safety related”. Therefore, ASIL decompositionis important. It allows engineers to focus effort on the components which require safety-oriented development.

Thus, taint analysis can be used to support and improve the precision of ASIL decompo-sition.

6.2 Further work

The taint analysis pass could be integrated in the development process of the company. Infact, ASIL decomposition should be prepared early in the development cycle. This wouldallow engineer to identify the safety-critical components of the software iteratively and fordeveloping them according to the ISO 26262 requirements.

The tool could also be integrated in the company’s IDE, which would facilitate the se-lection of taint information. This would make possible the use of the tool during the de-velopment phase. Due to the fact that automotive projects are quite large, displaying theinformation directly in the IDE would improve the usability of the results. In fact, the visu-alization would be clearer because the results would be annotated to the source-code files inaddition to the dependencies graph.

Moreover, the integration of the tool would simplify the LLVM IR generation step. Thiscould be added to the compilation process of the project, although it requires a compilationwith Clang instead of the GCC compiler.

45

Bibliography

[1] ISO 26262-9:2018(en), Road vehicles — Functional safety — Part 9: Automotive safety in-tegrity level (ASIL)-oriented and safety-oriented analyses. URL: https://www.iso.org/obp/ui/#iso:std:iso:26262:-9:ed-2:v1:en (visited on 04/02/2019).

[2] IEC Functional Safety and IEC 61508. URL: https : / / www . iec . ch /functionalsafety/ (visited on 03/01/2019).

[3] ARCCORE - Company. URL: https://www.arccore.com/company (visited on03/01/2019).

[4] AUTOSAR development cooperation. About. en. URL: https://www.autosar.org/about/ (visited on 03/01/2019).

[5] R. A. B. e Silva, N. N. Arai, L. A. Burgareli, J. M. P. de Oliveira, and J. S. Pinto. “For-mal Verification With Frama-C: A Case Study in the Space Software Domain”. In: IEEETransactions on Reliability 65.3 (Sept. 2016), pp. 1163–1179. ISSN: 0018-9529. DOI: 10.1109/TR.2015.2508559.

[6] Günther Heling and Jochen Rein. “SilentBSW – Silent AUTOSAR Basic Software forSafety Related ECUs”. en. In: 2012, p. 4. URL: https://assets.vector.com/cms/content/know-how/_technical-articles/AUTOSAR/AUTOSAR_SilentBSW_ATZ_Elektronik_201211_PressArticle_EN.pdf.

[7] Florian Leitner-Fischer, Stefan Leue, and Sirui Liu. “Automated Freedom from Interfer-ence Analysis for Automotive Software”. In: CARS 2016 - 4th International Workshop onCritical Automotive applications : Robustness & Safety. Ed. by Matthieu Roy. CARS 2016 -Critical Automotive applications : Robustness & Safety. Göteborg, Sweden, Sept. 2016.(Visited on 02/15/2019).

[8] A. Imparato, R. R. Maietta, S. Scala, and V. Vacca. “A Comparative Study of Static Anal-ysis Tools for AUTOSAR Automotive Software Components Development”. In: 2017IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).Oct. 2017, pp. 65–68. DOI: 10.1109/ISSREW.2017.21.

[9] A. Goebel, R. Mader, and O. Tripon. “Performance and Freedom From Interference - acontradiction in embedded automotive multi-core applications?” In: ARCS 2017; 30thInternational Conference on Architecture of Computing Systems. Apr. 2017, pp. 1–9.

46

https://www.iso.org/obp/ui/#iso:std:iso:26262:-9:ed-2:v1:en

https://www.iso.org/obp/ui/#iso:std:iso:26262:-9:ed-2:v1:en

https://www.iec.ch/functionalsafety/

https://www.iec.ch/functionalsafety/

https://www.arccore.com/company

https://www.autosar.org/about/

https://www.autosar.org/about/

https://doi.org/10.1109/TR.2015.2508559

https://doi.org/10.1109/TR.2015.2508559

https://assets.vector.com/cms/content/know-how/_technical-articles/AUTOSAR/AUTOSAR_SilentBSW_ATZ_Elektronik_201211_PressArticle_EN.pdf



https://doi.org/10.1109/ISSREW.2017.21

Bibliography

[10] L. d S. Azevedo, D. Parker, M. Walker, Y. Papadopoulos, and R. E. Araújo. “AssistedAssignment of Automotive Safety Requirements”. In: IEEE Software 31.1 (Jan. 2014),pp. 62–68. ISSN: 0740-7459. DOI: 10.1109/MS.2013.118.

[11] M. Arroyo, F. Chiotta, and F. Bavera. “An user configurable clang static analyzer taintchecker”. In: 2016 35th International Conference of the Chilean Computer Science Society(SCCC). IEEE, Oct. 2016, pp. 1–12. DOI: 10.1109/SCCC.2016.7835996.

[12] C. Lattner and V. Adve. “LLVM: A compilation framework for lifelong program analy-sis & transformation”. en. In: International Symposium on Code Generation and Optimiza-tion, 2004. CGO 2004. San Jose, CA, USA: IEEE, 2004, pp. 75–86. ISBN: 978-0-7695-2102-2. DOI: 10.1109/CGO.2004.1281665. URL: http://ieeexplore.ieee.org/document/1281665/ (visited on 03/01/2019).

[13] Writing an LLVM Pass — LLVM 9 documentation. URL: https://llvm.org/docs/WritingAnLLVMPass.html (visited on 04/16/2019).

[14] Motor Industry Software Reliability Association, ed. MISRA C:2012: guidelines for theuse of the C language in critical systems. en. OCLC: 847117002. Nuneaton: Misra, 2013.ISBN: 978-1-906400-10-1 978-1-906400-11-8.

[15] Rajeshwari Hegde, Geetishree Mishra, and Gurumurthy. “Software and Hardware De-sign Challenges in Automotive Embedded System”. en. In: International Journal of VLSIDesign & Communication Systems 2.3 (Sept. 2011), pp. 165–174. ISSN: 09761357. DOI: 10.5121/vlsic.2011.2314. URL: http://www.aircconline.com/vlsics/V2N3/2311vlsics14.pdf (visited on 04/01/2019).

[16] K. Lind and R. Heldal. “Automotive System Development Using Reference Architec-tures”. In: 2012 35th Annual IEEE Software Engineering Workshop. Oct. 2012, pp. 42–51.DOI: 10.1109/SEW.2012.11.

[17] U. Freund. “Multi-level system integration based on AUTOSAR”. In: 2008 ACM/IEEE30th International Conference on Software Engineering. May 2008, pp. 581–582. DOI: 10.1145/1368088.1368168.

[18] Static Code Analysis. URL: https://www.owasp.org/index.php/Static_Code_Analysis.

[19] Language Reference Manual — LLVM 9 documentation. URL: https://llvm.org/docs/LangRef.html (visited on 02/26/2019).

[20] C. Feng and X. Zhang. “A Static Taint Detection Method for Stack Overflow Vulnerabil-ities in Binaries”. In: 2017 4th International Conference on Information Science and ControlEngineering (ICISCE). July 2017, pp. 110–114. DOI: 10.1109/ICISCE.2017.33.

[21] H. Liang, S. Liu, Y. Zhang, and M. Wang. “Improving the precision of static analysis:Symbolic execution based on GCC abstract syntax tree”. In: 2017 18th IEEE/ACIS In-ternational Conference on Software Engineering, Artificial Intelligence, Networking and Paral-lel/Distributed Computing (SNPD). June 2017, pp. 395–400. DOI: 10.1109/SNPD.2017.8022752.

[22] Markus Mock, Manuvir Das, Craig Chambers, and Susan J. Eggers. “Dynamic points-to sets: a comparison with static analyses and potential applications in program un-derstanding and optimization”. en. In: Proceedings of the 2001 ACM SIGPLAN-SIGSOFTworkshop on Program analysis for software tools and engineering - PASTE ’01. Snowbird,Utah, United States: ACM Press, 2001, pp. 66–72. ISBN: 978-1-58113-413-1. DOI: 10.1145/379605.379671. URL: http://portal.acm.org/citation.cfm?doid=379605.379671 (visited on 02/25/2019).

47

https://doi.org/10.1109/MS.2013.118

https://doi.org/10.1109/SCCC.2016.7835996

https://doi.org/10.1109/CGO.2004.1281665

http://ieeexplore.ieee.org/document/1281665/


https://llvm.org/docs/WritingAnLLVMPass.html

https://llvm.org/docs/WritingAnLLVMPass.html

https://doi.org/10.5121/vlsic.2011.2314

https://doi.org/10.5121/vlsic.2011.2314

http://www.aircconline.com/vlsics/V2N3/2311vlsics14.pdf

http://www.aircconline.com/vlsics/V2N3/2311vlsics14.pdf

https://doi.org/10.1109/SEW.2012.11

https://doi.org/10.1145/1368088.1368168

https://doi.org/10.1145/1368088.1368168

https://www.owasp.org/index.php/Static_Code_Analysis

https://www.owasp.org/index.php/Static_Code_Analysis

https://llvm.org/docs/LangRef.html

https://llvm.org/docs/LangRef.html

https://doi.org/10.1109/ICISCE.2017.33

https://doi.org/10.1109/SNPD.2017.8022752

https://doi.org/10.1109/SNPD.2017.8022752

https://doi.org/10.1145/379605.379671

https://doi.org/10.1145/379605.379671

http://portal.acm.org/citation.cfm?doid=379605.379671


Bibliography

[23] Patrick Cousot and Radhia Cousot. “Abstract Interpretation: A Unified Lattice Modelfor Static Analysis of Programs by Construction or Approximation of Fixpoints”. In:Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of ProgrammingLanguages. POPL ’77. event-place: Los Angeles, California. New York, NY, USA: ACM,1977, pp. 238–252. DOI: 10.1145/512950.512973. URL: http://doi.acm.org/10.1145/512950.512973 (visited on 02/25/2019).

[24] E. J. Schwartz, T. Avgerinos, and D. Brumley. “All You Ever Wanted to Know about Dy-namic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraidto Ask)”. In: 2010 IEEE Symposium on Security and Privacy. May 2010, pp. 317–331. DOI:10.1109/SP.2010.26.

[25] D. Avots, M. Dalton, V. B. Livshits, and M. S. Lam. “Improving software security with aC pointer analysis”. In: Proceedings. 27th International Conference on Software Engineering,2005. ICSE 2005. IEEE, May 2005, pp. 332–341. DOI: 10.1109/ICSE.2005.1553576.URL: https://ieeexplore.ieee.org/document/1553576.

[26] Lars Ole Andersen. Program Analysis and Specialization for the C Programming Language.Tech. rep. 1994.

[27] Michael Hind. “Pointer analysis: Haven’t we solved this problem yet?” In: Paste’01.ACM Press, 2001, pp. 54–61.

[28] Bjarne Steensgaard. “Points-to Analysis in Almost Linear Time”. In: Proceedings of the23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. POPL’96. event-place: St. Petersburg Beach, Florida, USA. New York, NY, USA: ACM, 1996,pp. 32–41. ISBN: 978-0-89791-769-8. DOI: 10.1145/237721.237727. URL: http://doi.acm.org/10.1145/237721.237727 (visited on 03/05/2019).

[29] Sheng-Hsiu Lin. Alias Analysis in LLVM. en. Theses and Dissertations. Lehigh Univer-sity, 2015.

[30] The LLVM Compiler Infrastructure Project. URL: https : / / llvm . org/ (visited on02/26/2019).

[31] The Architecture of Open Source Applications: LLVM. URL: http://www.aosabook.org/en/llvm.html (visited on 02/21/2019).

[32] Kaleidoscope: Extending the Language: Mutable Variables — LLVM 8 documentation. URL:http://releases.llvm.org/8.0.0/docs/tutorial/LangImpl07.html(visited on 05/16/2019).

[33] llvm-link - LLVM bitcode linker — LLVM 9 documentation. URL: http://llvm.org/docs/CommandGuide/llvm-link.html (visited on 05/24/2019).

[34] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. “An efficientmethod of computing static single assignment form”. en. In: Proceedings of the 16th ACMSIGPLAN-SIGACT symposium on Principles of programming languages - POPL ’89. Austin,Texas, United States: ACM Press, 1989, pp. 25–35. ISBN: 978-0-89791-294-5. DOI: 10.1145/75277.75280. URL: http://portal.acm.org/citation.cfm?doid=75277.75280 (visited on 02/12/2019).

[35] Matthias Braun, Sebastian Buchwald, Sebastian Hack, Roland Leißa, Christoph Mal-lon, and Andreas Zwinkau. “Simple and Efficient Construction of Static Single Assign-ment Form”. In: Proceedings of the 22Nd International Conference on Compiler Construction.CC’13. event-place: Rome, Italy. Berlin, Heidelberg: Springer-Verlag, 2013, pp. 102–122.ISBN: 978-3-642-37050-2. DOI: 10.1007/978- 3- 642- 37051- 9_6. URL: http://dx.doi.org/10.1007/978-3-642-37051-9_6 (visited on 02/19/2019).

[36] Checker Developer Manual. URL: https://clang-analyzer.llvm.org/checker_dev_manual.html#start (visited on 03/20/2019).

48

https://doi.org/10.1145/512950.512973

http://doi.acm.org/10.1145/512950.512973

http://doi.acm.org/10.1145/512950.512973

https://doi.org/10.1109/SP.2010.26

https://doi.org/10.1109/ICSE.2005.1553576

https://ieeexplore.ieee.org/document/1553576

https://doi.org/10.1145/237721.237727

http://doi.acm.org/10.1145/237721.237727

http://doi.acm.org/10.1145/237721.237727

https://llvm.org/

http://www.aosabook.org/en/llvm.html

http://www.aosabook.org/en/llvm.html

http://releases.llvm.org/8.0.0/docs/tutorial/LangImpl07.html

http://llvm.org/docs/CommandGuide/llvm-link.html

http://llvm.org/docs/CommandGuide/llvm-link.html

https://doi.org/10.1145/75277.75280

https://doi.org/10.1145/75277.75280



https://doi.org/10.1007/978-3-642-37051-9_6

http://dx.doi.org/10.1007/978-3-642-37051-9_6

http://dx.doi.org/10.1007/978-3-642-37051-9_6

https://clang-analyzer.llvm.org/checker_dev_manual.html#start

https://clang-analyzer.llvm.org/checker_dev_manual.html#start

Bibliography

[37] Yulei Sui and Jingling Xue. SVF: Interprocedural Static Value-Flow Analysis in LLVM. en.Tech. rep. Australia: School of Computer Science and Engineering, UNSW Australia.URL: https://github.com/SVF-tools/SVF.

[38] Yulei Sui, Ding Ye, and Jingling Xue. “Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis”. en. In: IEEE Transactions on Software Engineering 40.2(Feb. 2014), pp. 107–122. ISSN: 0098-5589, 1939-3520. DOI: 10 . 1109 / TSE . 2014 .2302311. URL: http://ieeexplore.ieee.org/document/6720116/ (visitedon 03/21/2019).

[39] Florent Kirchner, Nikolai Kosmatov, Virgile Prevosto, Julien Signoles, and BorisYakobowski. “Frama-C: A software analysis perspective”. en. In: Formal Aspects of Com-puting 27.3 (May 2015), pp. 573–609. ISSN: 1433-299X. DOI: 10.1007/s00165-014-0326-7. URL: https://doi.org/10.1007/s00165-014-0326-7 (visited on03/06/2019).

[40] Yiannis Papadopoulos, Martin Walker, David Parker, Erich Rüde, Rainer Hamann, An-dreas Uhlig, Uwe Grätz, and Rune Lien. “Engineering failure analysis and design op-timisation with HiP-HOPS”. In: Engineering Failure Analysis. The Fourth InternationalConference on Engineering Failure Analysis Part 1 18.2 (Mar. 2011), pp. 590–608. ISSN:1350-6307. DOI: 10.1016/j.engfailanal.2010.09.025. URL: http://www.sciencedirect.com/science/article/pii/S1350630710001779 (visited on04/02/2019).

[41] Mojtaba Shahin, Peng Liang, and Muhammad Ali Babar. “A systematic review of soft-ware architecture visualization techniques”. en. In: Journal of Systems and Software 94(Aug. 2014), pp. 161–185. ISSN: 01641212. DOI: 10.1016/j.jss.2014.03.071. URL:https://linkinghub.elsevier.com/retrieve/pii/S0164121214000831(visited on 04/01/2019).

[42] T. D. LaToza and B. A. Myers. “Visualizing call graphs”. In: 2011 IEEE Symposium onVisual Languages and Human-Centric Computing (VL/HCC). Sept. 2011, pp. 117–124. DOI:10.1109/VLHCC.2011.6070388.

[43] S. Bassil and R. K. Keller. “Software visualization tools: survey and analysis”. In: Pro-ceedings 9th International Workshop on Program Comprehension. IWPC 2001. May 2001,pp. 7–17. DOI: 10.1109/WPC.2001.921708.

[44] I. Herman, G. Melancon, and M. S. Marshall. “Graph visualization and navigation ininformation visualization: A survey”. In: IEEE Transactions on Visualization and ComputerGraphics 6.1 (Jan. 2000), pp. 24–43. ISSN: 1077-2626. DOI: 10.1109/2945.841119.

[45] C. B. Seaman. “Qualitative methods in empirical studies of software engineering”. In:IEEE Transactions on Software Engineering 25.4 (July 1999), pp. 557–572. ISSN: 0098-5589.DOI: 10.1109/32.799955.

[46] Paul Anderson. “More Software Safety A Static Analysis Tools Perspective”. en. In:ATZelektronik worldwide 12.1 (Feb. 2017), pp. 16–21. ISSN: 2192-9092. DOI: 10.1007/s38314-016-0101-z. URL: https://doi.org/10.1007/s38314-016-0101-z(visited on 04/01/2019).

[47] Documentation — LLVM 9 documentation. URL: https://llvm.org/doxygen/ (vis-ited on 04/16/2019).

[48] Chris Lattner and Vikram Adve. The LLVM Instruction Set and Compilation Strat-egy. Tech. Report UIUCDCS-R-2002-2292. CS Dept., Univ. of Illinois at Urbana-Champaign, Aug. 2002. URL: https : / / llvm . org / pubs / 2002 - 08 - 09 -LLVMCompilationStrategy.html (visited on 04/16/2019).

[49] Arseny Kapoulkine. Light-weight, simple and fast XML parser for C++ with XPath support:zeux/pugixml. original-date: 2012-07-06T10:51:03Z. May 2019. URL: https://github.com/zeux/pugixml (visited on 05/09/2019).

49

https://github.com/SVF-tools/SVF

https://doi.org/10.1109/TSE.2014.2302311

https://doi.org/10.1109/TSE.2014.2302311


https://doi.org/10.1007/s00165-014-0326-7

https://doi.org/10.1007/s00165-014-0326-7

https://doi.org/10.1007/s00165-014-0326-7

https://doi.org/10.1016/j.engfailanal.2010.09.025

http://www.sciencedirect.com/science/article/pii/S1350630710001779

http://www.sciencedirect.com/science/article/pii/S1350630710001779

https://doi.org/10.1016/j.jss.2014.03.071

https://linkinghub.elsevier.com/retrieve/pii/S0164121214000831

https://doi.org/10.1109/VLHCC.2011.6070388

https://doi.org/10.1109/WPC.2001.921708

https://doi.org/10.1109/2945.841119

https://doi.org/10.1109/32.799955

https://doi.org/10.1007/s38314-016-0101-z

https://doi.org/10.1007/s38314-016-0101-z

https://doi.org/10.1007/s38314-016-0101-z

https://llvm.org/doxygen/

https://llvm.org/pubs/2002-08-09-LLVMCompilationStrategy.html

https://llvm.org/pubs/2002-08-09-LLVMCompilationStrategy.html

https://github.com/zeux/pugixml

https://github.com/zeux/pugixml

Bibliography

[50] Tree Layout - D3 wiki. URL: https://d3-wiki.readthedocs.io/zh_CN/master/Tree-Layout/ (visited on 04/01/2019).

[51] Martin Graham and Jessie B. Kennedy. “A survey of multiple tree visualisation”. In:Information Visualization 9 (2010), pp. 235–252. DOI: 10.1057/ivs.2009.29.

[52] E.M. Reingold and J.S. Tilford. “Tidier Drawings of Trees”. en. In: IEEE Transactions onSoftware Engineering SE-7.2 (Mar. 1981), pp. 223–228. ISSN: 0098-5589. DOI: 10.1109/TSE.1981.234519. URL: http://ieeexplore.ieee.org/document/1702828/(visited on 04/01/2019).

[53] Sougata Mukherjea, James D. Foley, and Scott Hudson. “Visualizing complex hyperme-dia networks through multiple hierarchical views”. en. In: Proceedings of the SIGCHI con-ference on Human factors in computing systems - CHI ’95. Denver, Colorado, United States:ACM Press, 1995, pp. 331–337. ISBN: 978-0-201-84705-5. DOI: 10 . 1145 / 223904 .223947. URL: http://portal.acm.org/citation.cfm?doid=223904.223947(visited on 05/20/2019).

[54] Googletest: Google Testing and Mocking Framework. Contribute to google/googletest develop-ment by creating an account on GitHub. original-date: 2015-07-28T15:07:53Z. Apr. 2019.URL: https://github.com/google/googletest (visited on 04/17/2019).

[55] Zhulinpinyu. D3 layout tree. URL: https://codepen.io/zhulinpinyu/details/EaZrmM (visited on 05/22/2019).

[56] Markus Schwarz. SBC and CANbedded. en. Tech. Report. 2005, p. 4. URL: https://assets.vector.com/cms/content/know-how/_application-notes/AN-ISC-1-1027_SBC_and_CANbedded.pdf.

[57] Matthias Krause and Carsten Weich. Intrinsic Safety of AUTOSAR Basic Software. en.Tech. Report. 2012, p. 4.

[58] Bulent Sari and Hans-Christian Reuss. “A model-driven approach for the developmentof safety-critical functions using modified architecture description language (ADL)”.In: 2016 International Conference on Electrical Systems for Aircraft, Railway, Ship Propulsionand Road Vehicles & International Transportation Electrification Conference (ESARS-ITEC).Toulouse, France: IEEE, Nov. 2016, pp. 1–5. ISBN: 978-1-5090-0814-8. DOI: 10.1109/ESARS-ITEC.2016.7841346. URL: http://ieeexplore.ieee.org/document/7841346/ (visited on 05/03/2019).

[59] Claes Tingvall and Narelle Haworth. “Vision Zero - An ethical approach to safety andmobility”. en. In: (), p. 14.

50

https://d3-wiki.readthedocs.io/zh_CN/master/Tree-Layout/

https://d3-wiki.readthedocs.io/zh_CN/master/Tree-Layout/

https://doi.org/10.1057/ivs.2009.29

https://doi.org/10.1109/TSE.1981.234519

https://doi.org/10.1109/TSE.1981.234519


https://doi.org/10.1145/223904.223947

https://doi.org/10.1145/223904.223947


https://github.com/google/googletest

https://codepen.io/zhulinpinyu/details/EaZrmM

https://codepen.io/zhulinpinyu/details/EaZrmM

https://assets.vector.com/cms/content/know-how/_application-notes/AN-ISC-1-1027_SBC_and_CANbedded.pdf



https://doi.org/10.1109/ESARS-ITEC.2016.7841346

https://doi.org/10.1109/ESARS-ITEC.2016.7841346



Taint analysis for automotive safety using the LLVM...

Documents

Transcript of Taint analysis for automotive safety using the LLVM...