costa.fdi.ucm.esdamiano/pubs/phDTh.pdf · Summary This work is organized in seven chapters. The...

Damiano Zanardini

Certified

Abstract Non-Interference -

Object-Oriented Code Validation

for Information Flow Security

Ph.D. Thesis

21 Aprile 2006

Universita degli Studi di Verona

Dipartimento di Informatica

Advisor:prof. Roberto Giacobazzi

Series N◦: TD-07-06

Universita di VeronaDipartimento di InformaticaStrada le Grazie 15, 37134 VeronaItaly

Summary

This work is organized in seven chapters. The introduction, Chapter 1, illus-trates the context and highlights the importance of security properties in thepresent day practice of software developing and analysis.

Chapter 2 introduces the various programming languages which are usedthroughout the remaining chapters (excluding Haskell, which implements thedependency calculus in Appendix A). It also provides the necessary back-ground in lattice and fixpoint theory.

The original definition of Abstract Non-Interference and its foundationaltheories (in particular, Abstract Interpretation and the basics of InformationFlow analysis) are presented in Chapter 3. Since we are dealing with theframework of code certification, the promising Proof-Carrying code architec-ture is also illustrated. Proof-Carrying code is the goal of the framework whichis sketched in Chapter 6. Chapter 3 also presents, in its last section, recentresearch which is relevant to our purpose. In particular, security analysis forObject-Oriented programming languages is illustrated. Our focus will also beon the current research on modeling abstract properties.

The main part of this work begins with Chapter 4. The problem of rep-resenting abstract information flow security is here widely discussed. A type-based approach is first presented, which is an initial step towards automatizingAbstract Non-Interference. Then, a realistic approach to ANI is discussed. Weprovide a comprehensive definition of ANI for a Java-like language. For thispurpose, we take advantage of the language mechanisms for enforcing correct-ness.

The definition of the previous chapter is applied in Chapter 5, which de-velops ANI analysis on source code programs by means of boolean functions.The information flow analysis relies on the calculus of abstract dependencieson data, which are computed by an static analysis algorithm. Dependenciesare also considered in their relation with abstract program slicing.

Source code analysis is applied, in Chapter 6, to bytecode, in the directionof PCC-style code certification. Our security analysis is put in relation with

ii

a Proof-Carrying code architecture, based on the bytecode level instead ofnative code programs.

Conclusions and future research directions are outlined in Chapter 7.

Acknowledgements

Saying thank you to a lot of people would sound to me as overemphasizingthe importance of my work. Then, I’ll try to be moderate...

First, I must be grateful to Prof. Roberto Giacobazzi for accepting meas one of his students and helping me during these fruitful years. This workwould not have been possible without the help of Prof. Giuseppe Favretto.Thanks to people at Centro Docimologico, who were valuable partners andare still good friends.

I feel particularly indebted to Prof. Andrea Masini: his constant supportand a couple of friendly conversations, more or less at the α and the ω ofmy PhD, definitely helped me in my efforts. (Not only) Young people at Ca’Vignal II have been to me an example with their spirit and skill. I’d like tomention here Isabella, Mila and Rosalba.

Adding the list of who, in my personal life, has been significant to me wouldbe long and, probably, not very interesting. I only remember my parents. Theothers... indeed, they are important, and they know it.

D. Z.

Contents

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Software security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Confidentiality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Relaxing security requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Code Certification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 The goal of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1 Example languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Imperative simple language . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.2 Functional simple language . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 The Object-Oriented Java-like language . . . . . . . . . . . . . . . . . . . . 92.2.1 Syntax and semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.2 Primitive types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.3 Classes as singletons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Java bytecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.1 Syntax and semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Lattice and fixpoint theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.1 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.2 Orderings and lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4.4 Fixpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

VI Contents

3 Theoretical foundations and state of the art . . . . . . . . . . . . . . . 173.1 Abstract Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 Program analysis frameworks . . . . . . . . . . . . . . . . . . . . . . . 183.1.2 Principles of Abstract Interpretation . . . . . . . . . . . . . . . . . 183.1.3 Abstract domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.4 Property transformers and completeness . . . . . . . . . . . . . . 223.1.5 Fixpoint approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1.6 Program properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.1.7 Typical (numerical) domains . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Information Flow Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 Abstract Non-Interference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.1 Weakening Non-Interference . . . . . . . . . . . . . . . . . . . . . . . . 283.3.2 Attackers as abstract interpretations . . . . . . . . . . . . . . . . . 283.3.3 Narrow and Abstract Non-Interference . . . . . . . . . . . . . . . 30

3.4 Program Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4.1 An algorithm for program slicing . . . . . . . . . . . . . . . . . . . . 313.4.2 Slicing and information flow . . . . . . . . . . . . . . . . . . . . . . . . 32

3.5 Proof-Carrying Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.5.1 The PCC architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.5.2 A novel approach to PCC architecture design . . . . . . . . . 343.5.3 Application examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.6 The state of the art: Object-Oriented Information Flow analysis 363.6.1 Analysis with pointer aliasing . . . . . . . . . . . . . . . . . . . . . . . 363.6.2 Abstract dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.6.3 JVM-like programming languages . . . . . . . . . . . . . . . . . . . 403.6.4 Code certification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.6.5 Mathematical formalisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.6.6 Proving Abstract Non-Interference . . . . . . . . . . . . . . . . . . . 43

4 Representing Abstract Non-Interference . . . . . . . . . . . . . . . . . . . 454.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 Security types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2.1 Representing abstract Information Flow with types . . . . 474.2.2 Higher-Order security types and type inference . . . . . . . . 484.2.3 Stack-based security types and type inference . . . . . . . . . 514.2.4 Brief discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.3 A realistic approach to Information Flow . . . . . . . . . . . . . . . . . . . 514.3.1 Which definition of Non-Interference . . . . . . . . . . . . . . . . . 524.3.2 What is an NI-secure program . . . . . . . . . . . . . . . . . . . . . . 53

4.4 Class-Oriented Abstract Non-Interference . . . . . . . . . . . . . . . . . . . 574.4.1 Class hierarchies as abstract domains . . . . . . . . . . . . . . . . 584.4.2 Class intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.4.3 Domains and program code . . . . . . . . . . . . . . . . . . . . . . . . . 594.4.4 Class extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.4.5 Advantages and limitations . . . . . . . . . . . . . . . . . . . . . . . . . 61

Contents VII

5 Program analysis for ANI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.1 Preliminary definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.1.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.1.2 Type environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.1.3 Program identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2 Abstract dependency calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.2.1 Preliminary notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.2.2 Abstract Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.2.3 Computing an expression on its inputs . . . . . . . . . . . . . . . 725.2.4 Increasing efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.2.5 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.3 Boolean functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.3.1 Models and Abstract Non-Interference . . . . . . . . . . . . . . . 84

5.4 Source code analysis via boolean functions . . . . . . . . . . . . . . . . . . 855.4.1 Analysis of statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.4.2 Whole-program analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.4.3 Widening. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.5 Correctness and further discussion . . . . . . . . . . . . . . . . . . . . . . . . . 905.5.1 The ordering on ψ-states . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.5.2 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.6 The link to Program Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.6.1 Computing abstract slices . . . . . . . . . . . . . . . . . . . . . . . . . . 955.6.2 Obtaining consistent slices . . . . . . . . . . . . . . . . . . . . . . . . . . 965.6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6 Code certification for ANI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 996.1 Analyzing bytecode programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6.1.1 Bytecode boolean functions . . . . . . . . . . . . . . . . . . . . . . . . . 1006.1.2 A simple source code compiler . . . . . . . . . . . . . . . . . . . . . . 1006.1.3 The Control Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.1.4 Analysis of blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.1.5 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076.1.6 Program analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.2 The relation between source code and bytecode . . . . . . . . . . . . . 1086.2.1 Variable assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1086.2.2 Field assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096.2.3 Conditional statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096.2.4 Non-compiled bytecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.3 Towards a Proof-Carrying code architecture . . . . . . . . . . . . . . . . 1106.3.1 What the code user receives . . . . . . . . . . . . . . . . . . . . . . . . 1106.3.2 What the code user should verify . . . . . . . . . . . . . . . . . . . . 110

7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

VIII Contents

A The abstract dependency calculus: Haskell Implementation 115

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Sommario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Preface

This work is the result of my investigation on abstract security propertieswith respect to code validation. Among its goals, the main one was to providethe first steps towards automatizing the design and checking of safe programs.Outside of this preface, I will refer to myself as we, even if, up to now, nothinghas been published as a joint work.

Part of this work, in particular Section 4.2, has already been published[73, 74]. Despite the advice of one of the referees, I chose to leave this sectionin its original place, although it may seem a bit incoherent with respect tothe following discussion.

Future plans include publishing some other parts of the last chapters, inparticular the class-oriented notion of Abstract Non-Interference, the calculusof abstract dependencies, and the algorithms for code certification via booleanfunctions.

For brevity, index entries are only included once (their first occurrence)for every section; however, this rule could have been violated somewhere.

1

Introduction

The mathematics is not there till we put it there.

Sir Arthur Eddington (1882 - 1944), The Philosophy of Physical Science

I have suffered a great deal from writers who have quoted this or thatsentence of mine either out of its context or in juxtaposition to someincongruous matter which quite distorted my meaning , or destroyedit altogether.

Alfred North Whitehead (1861 - 1947)

This chapter describes the context of this work. We discuss the importanceof confidentiality in present-day software and outline ow relaxing securityrequirements might lead to mere practical frameworks. Moreover, the goal ofthis work is introduced, together with its relation withe program certification.

1.1 Software security

The importance of software systems is becoming larger and larger in everydaylife. An enormous quantity of data is manipulated by information systems,whose complexity requires sophisticated tools for design and checking. Dueto the importance of manipulated data, programs must fulfill several securityrequirements in order to be safely executed. The execution of software mustnot negatively alter the surrounding environment (for example, by provokingsystem crashes or modifying/deleting relevant information).

Today, programs are much too large to be checked manually, in a reason-able amount of time and resources, by programmers. Consequently, code de-signers need automatic, flexible and efficient tools for checking security prop-erties of programs. Checking the safety of software is one of the main goals of

2 1 Introduction

Static Analysis, a class of techniques for analyzing the behavior of programswithout executing them. Static Analysis directly inspects the program codefor inferring approximated properties.

A security policy is a formalization of security requirements. Static Anal-ysis is a useful tool for checking the adherence of programs to some securitypolicy; it can also drive the code design process, by automatically assistingthe process of software production under the light provided by the policy.

1.2 Confidentiality

An important class of security properties is related to the protection of confi-dential data. It is often the case that a part of the data which are manipulatedby an information system must be protected by malicious users interactingwith the system. An attacker can be interested in private data. If they arepassive, attackers cannot modify the attacked system; yet, they can observethe execution of programs in order to acquire information about data theymanipulate.

Making the access to data secure

Observing the execution of programs amounts to watching their behavior ona given input. Attackers can only see a portion of program data, the publicpart. Their observations are limited to the public part of input and outputdata; it can possibly include auxiliary information, such as the execution timeor the amount of allocated memory.

Access Control techniques regulate the access to protected data. Their goalis to prevent users from reading forbidden data. A piece of data is forbidden fora user if accessing to it requires more privileges than those which are possessedby the user. It is quite a common situation, in today software systems, to havesome users (administrators and/or developers) which are allowed to executeprograms or manipulate data, forbidden for normal users. For example, someweb sites have several privilege levels: visitors, free members, paying membersand administrators. The security policy should make sure that a user cannotaccess data which are reserved to users with more privileges than his/hers.

Regulating the propagation of secret data

Access control prevents the direct access to private information. On the otherhand, users can freely access non-confidential data. Due to some (usuallyundesired) aspects of the program behavior, it can be the case that untrusted(possibly malicious) users are able to acquire private information withoutdirectly accessing to protected data. This can happen if

1.3 Relaxing security requirements 3

• attackers have some knowledge of the operations performed by the pro-gram, that is, they are able to correctly interpret their observation of theprogram semantics; and

• the final value of non-protected data depends on protected information,so that it is possible to guess the value of private data by only readingnon-confidential information.

The first requirement may seem, at a first glance, quite unrealistic, sincethe semantics of many commercial programs is protected by copyright and notfreely observable. However, decompiling techniques allow to guess informationabout the program semantics by re-building the program source code. Moreimportantly, a free knowledge of program behavior is made perfectly realisticby the growing importance of open source software: many systems rely onsoftware whose behavior is public, that is, can be freely observed by anyone.

A dependency between program data generates a information flow. Infor-mation is said to flow from c1 to c2 if c2 depends on c1, that is, if the initialvalue of c1 is relevant to the final value of c2. In this case, it is possible toguess something about c1 by merely observing c2. Information Flow analysischecks whether information flows dangerously from private to public data.This amounts to say that, in order to preserve confidentiality, a programshould not generate forbidden flows, that is, the possibility to acquire pro-tected information by watching public data. This security property is calledNon-Interference; the private and the public part of program data are saidnot to interfere if there is no transfer of information across the private-publicboundary.

1.3 Relaxing security requirements

If a program satisfies Non-Interference, then its publicly observable behavioris completely independent with respect to protected data. From the seman-tic point of view, private information could also be completely different, oreven disappear, without resulting in any change in the observable behavior.Actually, this complete separation between private and public data may seemtoo strong a requirement. In facts, we could ask whether such a program ispractically equivalent (regarding untrusted users) to another one which doesnot use private data at all.

Allowing selected flows

This question led computer scientists to several attempts towards the weak-ening of the standard Non-Interference notion. Weakening Non-Interferenceamounts to allow some selected flows from private to public data. Flows arepermitted in order to model realistic frameworks where interference must beregulated but not totally forbidden.

4 1 Introduction

Aggregate data (that is, those which are obtained by some computationon arrays of data, such as average, sum or standard deviation) on private in-formation are a typical example of interference which is often considered asharmless; yet, the public result clearly depends on protected data. Some ap-proaches allow everyone to read aggregate data (for example, the average ofmonthly salaries in an enterprise), provided that no knowledge about partic-ular data can be inferred (for example, the salary of the employee Xx Yy).

Limiting the power of attackers

Abstract Non-Interference approaches this problem with a different perspec-tive: attackers are assumed to have a reduced observational power, so that theycan only observe program execution up to some degree of approximation (for-malized by means of the theory of Abstract Interpretation). Dangerous flowsare permitted as long as they cannot be observed and exploited by attackers.

1.4 Code Certification

Developing secure programs is one of the main goals of program analysistechniques, such as Static Analysis. A programmer is usually interested inproducing code which fulfills some given security requirements. Yet, it canhappen that the code designer writes (by error or malice) programs which,when executed in an external system, disclose confidential data which arecontained in the system. In this case, a user, which wants to execute softwarereceived from an untrusted source, can possibly (and involuntarily) releasesome of his or her secrets by running this malicious code.

The growing importance of Web-based communications makes the problemof executing safe code crucial: most users do not pay enough attention towhat they download from the World Wide Web and run on their systems.Consequently, mechanisms for automatically deciding when a program can besafely executed represent an utterly important tool for improving the securityof systems. Proof-Carrying code follows this direction: the guarantees for thesafety of a program are contained in the program itself, so that the user cancheck the security properties (that is, certificate the program) before executingthe code. In principle, Proof-Carrying code architecture can deal with a widevariety of security policies; among them, confidentiality.

1.5 The goal of this work

When dealing with abstract information flow properties such as Abstract Non-Interference, we must face the problem of representing abstract properties ofdata and their related operations.

1.5 The goal of this work 5

Our purpose is to discuss the problem and provide a definition of AbstractNon-Interference which is more practical and easier to be used in a realisticframework. We consider an Object-Oriented programming language, modeledon Java, and describe data properties as classes. In this approach, the analysisof properties can be reduced, up to a certain extent, to class analysis.

We try to suggest an algorithm for the computation of abstract data de-pendencies. The calculus of abstract dependencies is a semantic task, thenit is, in the general case, undecidable. The provided algorithm computes anover-approximation of dependencies, relying on standard static analysis type-based techniques. Abstract dependencies are crucial in defining approximatedinformation flow properties; they are also relevant to the abstract version ofprogram slicing.

Some ideas for a certification architecture are included; in particular, thealgorithm for checking the security properties of source code is transposedto the bytecode low-level language. However, in order to obtain a realisticProof-Carrying code framework, some further work is needed.

2

Preliminaries

There is nothing more difficult to take in hand, more perilous to con-duct or more uncertain in its success than to take the lead in theintroduction of a new order of things.

Niccolo Machiavelli (1469 - 1527), The Prince

Quotation, n: The act of repeating erroneously the words of another.

Ambrose Bierce (1842 - 1914), The Devil’s Dictionary

This chapter introduces notions which are useful in the rest of the work.The IMP, FUN and JL programming languages are illustrated. IMP and FUNare used in simple examples; JL is the target language of Information Flowanalysis. Also, a description of the Java bytecode subset obtained by compilingJL programs is provided. Finally, we introduce the theoretical backgroundwhich is necessary to understand Abstract Interpretation and related topics.

2.1 Example languages

In this work, some examples will be provided, which use simple programminglanguages. An imperative and a functional language will be used. In bothcases, P (x) is a shorthand for JP K (x) and stands for the result (output) ofexecuting or evaluating the program P on the input x.

2.1.1 Imperative simple language

This language can be often found in the literature as the IMP language.Integers are the only data type; input and output data are stored in variablesx, y ∈ V, whose life covers the entire program execution. The set of variables is

8 2 Preliminaries

unlimited; if not differently specified, it is intended that the input and outputof a program are, respectively, the initial and final values of all variables.Basic statements are skip and variable assignment. Statements s ∈ S can besequentially concatenated, chosen by means of conditional and iterated in aloop; no procedures are provided. Expressions E are simple arithmetical termsobtained by combining numbers Z and operators.

E ::= Z | V | E + E | E ∗ E | (E) | . . .

B ::= true | false | E == E | E <= E | B and B | . . .

S ::= skip | V := E | S ; S | if B then S else S | while B do S

A program state is the memory configuration, that is, the values of variablesat a given time. The semantics J K is straightforward and will be omitted.

2.1.2 Functional simple language

The functional language FUN is a kind of typed lambda-calculus with int asthe only basic type and the arrow type constructor →.

Syntax

Functional expressions can be obtained by means of lambda-abstraction, ap-plication, conditional and recursion. Arithmetical operations are shorthandsfor their corresponding lambda terms.

C ::= 0, 1, 2, 3 . . . constantsV ::= x, y, F . . . variablesE ::= C | V | E + E | λV.E | E(E) | expressions

µV. E | if E then E else E

Semantics

Semantic domains have a bottom element⊥; for functions, the bottom elementis λv.⊥. The denotational semantics J K : E 7→ S follows the call-by-valueevaluation rule.

n ∈ Z⊥∆= Z ∪ {⊥} integers

f ∈ U∆= Z⊥ ∪ [U 7→ U ] values

W∆= {ω} wrong value

ε ∈ E∆= V 7→ U environments

φ ∈ S∆= E 7→ U semantic domains

2.2 The Object-Oriented Java-like language 9

JnKε∆= n

JxKε∆= ε(x) Je1 + e2Kε

∆= Je1Kε + Je2Kε

Jλx.eKε∆= λv. JeKε[x←v] Je(e′)Kε

∆= JeKε (Je′Kε)

JµF.eKε∆= lfp

(

λv. JeKε[F←v])

Je′ = e′′Kε∆= (Je′Kε = Je′′Kε)

J if e then e′ else e′′K ∆= (JeKε = 0) ? (Je′Kε) ; (Je′′Kε)Type checking is implicit: if a type error arises, expressions evaluate to ω.The type system is the monomorphic type system of simply-typed lambda-calculus.

2.2 The Object-Oriented Java-like language

The Object-Oriented language JL is a subset of the Java [41] programminglanguage, without advanced features such as concurrency and exceptions.

Remark 2.1. With abuse of notation, here and in the following the terms classand type will be used interchangeably when dealing with Java or JL.

2.2.1 Syntax and semantics

The language syntax is modeled on Java. Expressions and statements aredefined as:

v ∈ V variablesf ∈ F field namesx ∈ I = V ∪ {v .f | v ∈ V, f ∈ F} identifiersm ∈ M method names

C,D ∈ T class namesE ::= I identifiers| new T(E, . . . ,E) new objects| V .M(E, . . . ,E) method invocation

S ::= skip no operations| I=E assignment| V .M(E, . . . ,E) procedural method execution| S ; S sequential composition| if (B) S else S conditional| while (B) S loop| return E return value

Identifiers cannot take the form x .f .g. Actually, sequences of field references(dots) are not necessary; for example, the assignment

x .f .g=e

10 2 Preliminaries

can be replaced by adding a new local variable t:

t=x .f ; t .g=e

This is correct since, after the first statement, t happens to be the object inthe f field of x. On the other hand, the sequence

t1=x .f ; t2=t1 .g ; t2=e

seems to be an good alternative, with the benefit of ruling out dotted iden-tifiers on the left side of assignments. However, we can be easily see that itdoes not work since the second assignment to t2 invalidates the effects of thefirst one. Similarly, method invocation can be limited to v .m( . . . ) for somevariable v. Boolean expressions b ∈ B are simply objects with Boolean type.The language semantics is, basically, the Java semantics for the correspondinglanguage subset.

2.2.2 Primitive types

In JL, Java primitive types int and boolean are replaced by primitive classesInteger and Boolean. Actually, replacing primitive types with primitive classesdoes not preserve the language semantics. In practice, operations as the cre-ation of new objects involve using primitive values, as in new Integer(5). Wechoose to postulate the existence of an infinite set of Integer and Booleanconstants, such as Boolean true, Boolean false, Integer one, Integer two etc.Logical and arithmetical operations are also provided in form of methods.This hypothesis on language design allows to manipulate all program data asobjects.

2.2.3 Classes as singletons

The Information Flow analysis described in Chapters 4 and 5 is type-based:the type of program data is the only relevant thing when computing abstractproperties. In facts, it is its abstract property. The analysis is parametric onthe power of attackers, that is, on the precision of properties.

Standard Non-Interference (that is, when attackers can see the actual valueof data, see Section 3.2) is a special case of this general framework. Since, inthe non-abstract case, every value can be distinguished, we need to transposethis condition to types. This amounts to making every value distinguishableby its type, that is, defining classes containing one single object (singletonclasses).Example. The most precise (concrete) properties on numerical values havethe form to be 5, to be 6 etc. Similarly, the properties to be true and to befalse are the most precise properties on booleans. We want to declare classesrepresenting those properties, that is, containing one single object. They aredeclared as subclasses of primitive classes.

2.3 Java bytecode 11

public class I n t e g e r {public I n t e g e r add ( In t e g e r n ) {}

}public class Zero extends I n t e g e r {

public One add (One n ) {}public Two add (Two n) {}. . .

}public class One extends I n t e g e r {

public Two add (One n ) {}public Three add (Two n) {}. . .

}. . .

public class Boolean {public Boolean and ( Boolean b ) {}

}public class True extends Boolean {

public True and (True b ) {}public False and ( False b ) {}

}public class False extends Boolean {public False and ( Boolean b ) {}

}

As we see, methods are also specialized in new classes. In practice, singletonclasses represent constants. There is no need to have internal members suchas the numerical value, since this information is already contained in types.Clearly, an infinite set of classes should be declared, together with suitablemethod definitions. 2

Such a reformulation of program data, in which classes are ubiquitous, isirrealistic, due to the huge number of class and method declarations. However,it has a theoretical meaning: it puts standard Information Flow analysis intothe properties as classes framework. Indeed, singleton classes allow to expressthe operations on concrete values in a type-based way.

2.3 Java bytecode

Compiling a Java program results in a bytecode program. This is the formin which programs are executed on different platforms by means of the JavaVirtual Machine [52].

12 2 Preliminaries

The Java Virtual Machine is the cornerstone of the Java platform. It isthe component of the technology which is responsible for its hardware- andoperating system- independence, the small size of its compiled code, and itsability to protect users from malicious programs. The JVM has an instructionset and manipulates the memory at runtime. It knows nothing of the Javaprogramming language, only of a particular binary format, the .class fileformat. Java does not even need to be the source language; in facts, imple-mentors of other high-level languages are turning to the Java Virtual Machineas a delivery vehicle for their languages. A .class file contains Java VirtualMachine instructions (or bytecodes) and a symbol table, as well as other an-cillary information.

2.3.1 Syntax and semantics

A .class file contains information about a class and its members. The codeof a method lies in the Code attribute of some method info descriptor in the.class file. It consists of a sequence of bytecode instructions, that is, bytestrings whose first byte is the opcode and the rest identifies parameters.

States and frames

Instructions are executed on states which contain local information, globalconstants and an operand stack (which is empty at the beginning of themethod) to perform operations. A method is associated to a frame:

X ∈ Xloc : Iv 7→ V local variablesO ∈ O : stack of V operand stacksR ∈ R : Ir 7→ V runtime constant pools

F = 〈X,O,R〉 ∈ F : Xloc ×O ×R frames

Frames are put in a framestack FS ∈ SF ; in the following, when no push-ing/popping is done on the framestack, only the frame on top of the stack isshown.

Instructions

The semantics for most bytecode instructions is given. In 〈〈X,O,R〉〉, theframe 〈X,O,R〉 is on top of the framestack and the other frames are notvisualized. v : O is the operand stack obtained by pushing v onto O. X [j] isthe local variable with index j.

2.3 Java bytecode 13

Jaconst nullK〈〈X,O,R〉〉 = 〈〈X,null : O,R〉〉

JdupK〈〈X,v:O,R〉〉 = 〈〈X, v : v : O,R〉〉

JnopK〈〈X,O,R〉〉 = 〈〈X,O,R〉〉

JpopK〈〈X,v:O,R〉〉 = 〈〈X,O,R〉〉

JswapK〈〈X,v1:v2:O,R〉〉= 〈〈X, v2 : v1 : O,R〉〉

Jaload jK〈〈X,O,R〉〉 = 〈〈X,X [j] : O,R〉〉

Jastore jK〈〈X,v:O,R〉〉 = 〈〈X [j ← v] , O,R〉〉

J getfield jK〈〈X,o:O,R〉〉 = 〈〈X,u : O,R〉〉

J getstatic jK〈〈X,O,R〉〉 = 〈〈X,u : O,R〉〉

Jputfield jK〈〈X,v:o:O,R〉〉 = 〈〈X,O,R〉〉

Jputstatic jK〈〈X,v:O,R〉〉 = 〈〈X,O,R〉〉

In the rules for putfield and putstatic, modifications to instance or static dataon the heap are not shown. In the rule for getfield j (respectively, getstatic j),after the symbolic field reference R[j] has been resolved, u is the value of theinstance (resp. static) field R[j] of the object referenced by o (resp. in theresolved class).

J invokestatic jK〈〈X,v1:..:vk:O,R〉〉 =

⟨

〈X ′ [k ← vk] , ∅, R′〉

〈X,O,R〉

⟩

J invokevirtual jK〈〈X,o:v1:..:vk:O,R〉〉 =

⟨

〈X ′ [k ← vk] , ∅, R′〉

〈X,O,R〉

⟩

The pushed frame is (i) for invokestatic, the one corresponding to the classmethod; (ii) for invokevirtual, the instance method of o. After the invokedmethod has terminated its execution, the final state is 〈X,u : O,R〉 if thevalue u is returned, 〈X,O,R〉 otherwise.

JareturnKfi

〈X,v:O,R〉

〈X′,O′,R′〉

fl = 〈〈X ′, v : O′, R′〉〉

JreturnKfi

〈X,O,R〉

〈X′,O′,R′〉

fl = 〈〈X ′, O′, R′〉〉

Jldc jK〈〈X,O,R〉〉 = 〈〈X,R[j] : O,R〉〉

Jnew jK〈〈X,O,R〉〉 = 〈〈X, o : O,R〉〉

14 2 Preliminaries

2.4 Lattice and fixpoint theory

This section introduces basic notions on relations, functions, orderings andfixpoints. Its main use is in defining the Abstract Interpretation theory inSection 3.1.

2.4.1 Relations

Let S, T and U be sets. The powerset ℘(S) is the set of all subsets of S:℘(S) = {X | X ⊆ S}. The cartesian product S × T is the set of pairs (s, t)such that s ∈ S and t ∈ T . A binary relation on S × T is a subset of thecartesian product: R ∈ ℘(S × T ). Similarly, a k-ary relation is a subset ofS1 × · · · × Sk. A binary relation R on S × S can be:

• reflexive if ∀s ∈ S. (s, s) ∈ R• transitive if ∀s, t, u ∈ S. ((s, t) ∈ R ∧ (t, u) ∈ R)⇒ (s, u) ∈ R• symmetric if ∀s, t ∈ S. (s, t) ∈ R⇔ (t, s) ∈ R• antisymmetric if ∀s, t ∈ S. ((s, t) ∈ R ∧ (t, s) ∈ R)⇒ s = t.

2.4.2 Orderings and lattices

A partial order vS (or simply v when no confusion arises) is a binary relationon S × S which is reflexive, transitive and antisymmetric. Usually, s vS t isa shorthand for (s, t) ∈ vS. We write s @ t if s v t and s and t are distinct.Partial orders are also represented by the symbols ≤ (<) or � (≺).

Let S(v) be a poset (shorthand for partially ordered set), that is, a setS with a partial order v on its elements. The upper bound of a subset Xof S is an element u ∈ S which is greater or equal than all elements of X(∀t ∈ X. t v u). u is the least upper bound (lub) tX if, for every upper boundu′ of X , u v u′ holds. Dually, the lower bound and the greatest lower bound(glb) are defined by considering w instead of v.

An ascending chain (or, equivalently, an increasing chain) is a subset ofS such that v is a linear order on X , that is, any two elements of X arecomparable: ∀x′, x′′ ∈ X. x′ v x′′ ∨ x′′ v x′. A complete partial order (cpo)is a poset such that every ascending chain has a least upper bound. Givena poset S, its Moore closure M (S) is the set {uX | X ⊆ S}. S is a Moorefamily if S =M (S), that is, if it contains the glb of all its subsets.

A complete lattice L(v,>,⊥,u,t) is a poset such that uX and tX existfor every X ⊆ L (then, it is also a Moore family for both glb and lub). > and⊥ are the top (maximum) and bottom (minimum) elements; they are definedas, respectively, tL and uL.

2.4 Lattice and fixpoint theory 15

2.4.3 Functions

A function ϕ : S 7→ T is a particular binary relation on S × T , having theproperty that, for every s ∈ S, there exists at most one t ∈ T such that(s, t) ∈ ϕ. Such an element t can be written, for example, as ϕs, ϕ(s) or ϕ JsK.If ϕ : S 7→ T , it is said that ϕ maps S to T or, equivalently, that ϕ is a mapof S to T . Functions can have more than one parameter, that is, be a subsetof non-binary relations on S1 × · · · × Sk × T . If ϕ : S1, . . . , Sk 7→ T , then k issaid to be its arity, that is, the number of its parameters. For ϕ : S 7→ T andX ⊆ S, the expression ϕ(X) often stands for {ϕ(x) | x ∈ X}. This set is saidto be the image of X by ϕ.

A function ϕ : S 7→ T is total if its value is defined for every value s of theparameter (or, for every s1, . . . , sk if ϕ has arity k): ∀s ∈ S. ∃t ∈ T. (s, t) ∈ ϕ;otherwise, ϕ is said to be partial. It is monotone if s1 vS s2 implies ϕ(s1) vTϕ(s2); S

m7→ T is the set of all monotone functions from S to T .

A function is upper- (lower-) continuous if it preserves existing least upper(greatest lower) bounds of ascending (descending) chains: ϕ(tSX) = tTϕ(X)(resp. ϕ(uSX) = uTϕ(X)), where ϕ : S 7→ T and X ⊆ S is an ascending(descending) chain. Continuity implies monotonicity.

A function ϕ : S 7→ S is extensive if its result is greater than its parameter:∀s ∈ S. s v ϕ(s). Its dual notion is reductivity. Also, ϕ is idempotent if∀s ∈ S. ϕ(ϕ(s)) = ϕ(s).

If s′ 6= s′′ implies ϕ(s′) 6= ϕ(s′′), then ϕ : S 7→ T is said to be injective,or an injection of S in T . Moreover, if ∀t ∈ T. ∃s. ϕ(s) = t holds, then ϕ issurjective, or a surjection of S on T . A bijection is both injective and surjective.Given two posets S and T , the map ϕ : S 7→ T is an order isomorphism(or simply an isomorphism, when no confusion arises with isomorphisms onbinary operations) if it is a bijection and preserves the order of S in T , thatis, ϕ(s1) ≤T ϕ(s2) if and only if s1 ≤S s2. S and T are said to be isomorphic.

Given two functions ϕ′ and ϕ′′ mapping S to T , when no confusion arises,ϕ′ v ϕ′′ stands for the pointwise ordering ∀s ∈ S. ϕ′(S) vT ϕ′′(s).

Given ϕ′ : T 7→ U and ϕ′′ : S 7→ T , the composition ϕ′ ◦ ϕ′′ : S 7→ U isdefined as ∀x. (ϕ′ ◦ ϕ′′)(x) = ϕ′(ϕ′′(x)). Composition preserves monotonicity

(ϕ′ ∈ Tm7→ U and ϕ′′ ∈ S

m7→ T implies ϕ′ ◦ϕ′′ ∈ S

m7→ U), as well as continuity

(the composition of two continuous functions is continuous) and isomorphism(composing two isomorphisms yields an isomorphism).

2.4.4 Fixpoints

A fixpoint s ∈ S of ϕ : S 7→ S is such that ϕ(s) = s. Fixpoints are elementsof the set ϕ= = {s ∈ S | ϕ(s) = s}. The least fixpoint lfp (ϕ) is the uniques such that ∀t ∈ ϕ=. t w s. The notion of greatest fixpoint gfp (ϕ) is dual.By Tarski’s fixpoint theorem, if ϕ is upper continuous on a complete lattice,the least fixpoint can be obtained (in no more than ω steps, where ω is the

16 2 Preliminaries

cardinality of natural numbers) as the limit of an ascending chain startingfrom the minimum:

lfp (ϕ) = tn≥0 ϕn(⊥) where ϕ0(s) = s and ϕn+1(s) = ϕ(ϕn(s))

We write s = lfp⊥ (ϕ) to make clear that s is obtained as the limit of{ϕn(⊥)}n≥0.

3

Theoretical foundations and state of the art

It is a very sad thing that nowadays there is so little useless informa-tion.

Oscar Wilde (1854 - 1900)

People will accept your ideas much more readily if you tell them Ben-jamin Franklin said it first.

David H. Comins

This chapter introduces the main theories which are used throughout thiswork. Abstract Interpretation (Section 3.1 formalizes approximations on pro-gram data and provides a framework for program analysis. The security re-quirements we are going to investigate are described by the notion of Infor-mation Flow analysis (Section 3.2); the security property of Non-Interferenceis presented in its abstract version (Section 3.3). Next, an introduction toProgram Slicing (Section 3.4) is included, because of its tight relation withInformation Flow. Our work is directed to program certification, in the styleof Proof-Carrying code, whose basic notions are introduced in Section 3.5.Finally, the state of the art is widely discussed in Section 3.6; the focus ismainly on current research in Information Flow analysis of Object-Orientedprogramming languages.

3.1 Abstract Interpretation

Abstract Interpretation was introduced by Patrick and Radhia Cousot in1977 [19]. It is a theory of semantic approximation for systematically derivingsemantic-based program analysis algorithms. Program analyzers rely on the

18 3 Theoretical foundations and state of the art

notion of abstract property, that is, a static approximation of the dynamicprogram properties.

Abstract Interpretation was refined, extended or generalized in several pa-pers: (i) systematic derivation of program analysis frameworks [20, 22, 23, 25];(ii) higher-order functional languages [24, 46]; (iii) logic languages [21]; (iv)types and type systems [18]. Further research steps on abstract domains wereproposed by File, Giacobazzi, Mastroeni, Quintarelli, Ranzato and Scozzariin a series of papers [35, 36, 38, 37, 39, 34, 30, 33] (see [27] for a brief survey).

3.1.1 Program analysis frameworks

Abstract Interpretation is intended as a theoretical framework which allowsto express a variety of program analyses into a unified formalism. Its mainutilization is to design automatic program analyzers for statically determiningdynamic program properties. This is the main goal of Static Analysis, that is,a class of techniques which aim to study programs without executing them.Abstract Interpretation defines static analyses, which can be compared or sys-tematically modified/refined by means of the mathematical tools underlyingthe theory.

Static analysis should be decidable and efficient. The well-known, earlyfoundational works in computer science by Turing, Kleene, Church, Post,Godel and others, showed that most relevant program properties cannot beexactly decided by another program. The most famous example is the haltingproblem: no algorithm can tell, for every program P , whether the execution ofP will terminate. Undecidable properties often represent the most interestinginformation we want to acquire about a program. Static analyzers are forcedto compute approximated properties with a soundness constraint; that is, thequestion does the program P have the property π? cannot be affirmativelyanswered unless π is indeed satisfied at the semantic, unapproximated level.Due to approximation, the inverse direction can be false, that is, it is possiblethat P satisfies π even if the analyzer cannot tell. Abstract interpretationformalizes the approximation of properties, and specifies methods to derivestatic analysis algorithms.

3.1.2 Principles of Abstract Interpretation

The classical framework for Abstract Interpretation starts from a semanticsdescribing relevant aspects of program behavior. Several semantics [33] (op-erational, denotational, traces. . . ) can be chosen, depending on the programproperties we are interested in.

Semantics of Turing-equivalent programming languages are undecidable;usually, fixpoints (Section 2.4) describe the behavior of potentially non-terminating programs. Abstract Interpretation provides means for approxi-mating a concrete fixpoint semantics into an abstract semantics. This tech-nique can be seen as the simplification of the equation involved in the concretesemantics into an approximated abstract equation.

3.1 Abstract Interpretation 19

Approximation of concrete program properties

We assume concrete program properties to be described by elements of a givendomain P [, equipped with an ordering relation

4[ = {(π1, π2) | π1 is more precise than π2}

The notion more precise can be translated as is satisfied by a smaller setof values (which can be programs, program data etc.). Analogous definitionshold for abstract properties 〈P ],4]〉.Example. [Signs] Let P [ be the set

{[⊥] , [neg] , [0] , [mos] , [nonPos] , [non0] , [nonNeg] , [>]}

which is intended as the value of some numerical variable x in a program.Therefore, the meaning of properties is [⊥] = ∅1, [0] = {0}, . . . , [>] = Z;that is, a property is the set of values satisfying it, so that [⊥] 4[ [0]since ∅ ⊆ {0}. A possible approximation of P [ could be the domain P ] =

{[

⊥]]

,[

nonPos]]

,[

0]]

,[

nonNeg]]

,[

>]]

} with[

nonPos]] ∆

= [nonPos]

and[

nonNeg]] ∆

= [nonNeg]. The abstract set P ] does not consider strictinequalities, then some loss of information occurs. 2

In many cases, when semantic values belong to a domain D, the set ofconcrete properties is P [ = ℘(D), that is, all possible properties of values areconsidered. The semantics of abstract properties is given by the concretizationfunction γ : P ] 7→ P [, which yields the concrete property corresponding toa given abstract property (for example, γ(

[

nonPos]]

) = [nonPos]). The

best abstract approximation of some property π[ is given by the abstractionfunction α(π[), where α : P [ 7→ P ] formalizes the idea of approximation.Example. [Signs, continued] The abstraction and concretization functions forthe previous example are shown in Figure 3.1. 2

The soundness of approximations, that is, the fact that π] is a valid ap-proximation of π[, is expressed as α(π[) 4] π] or, equivalently, π[ 4[ γ(π]).When these two conditions are equivalent, α and γ are said to form a Galoisconnection.

Definition 3.1 (Galois connection). Given two posets 〈P [,4[〉 and 〈P ],4]

〉, a Galois connection is a pair of maps α : P [ 7→ P ] and γ : P ] 7→ P [

satisfying

∀π[ ∈ P [, π] ∈ P ]. α(π[) 4] π] ⇐⇒ π[ 4[ γ(π])

1 In the following, the assertion V = X, for an abstract value V and a set X, is tobe intended as γ(V ) = X


α γ

concrete domain abstract domain

[neg]

[⊥]

[pos]

[0]

[non0]

[

nonPos]]

[

0]]

[

nonNeg]]

[

>]]

[

⊥]]

[>]

[nonNeg]

[nonPos]

Fig. 3.1. P [, P ] and their α and γ functions

In this case, we write

〈P [,4[〉 −−−→←−−−αγ〈P ],4]〉

The functional compositions γ ◦ α and α ◦ γ are, respectively, extensive andreductive (Section 2.4):

∀π[ ∈ P [. π[ 4[ γ(α(π[)) ∀π] ∈ P ]. α(γ(π])) 4] π]

It follows that α and γ are monotone. Moreover, in a Galois connection, oneof the two functions induces the other (the dual formula also holds):

〈P [,4[〉 −−−→←−−−α1

γ1〈P ],4]〉 ∧ 〈P [,4[〉 −−−→←−−−

α2

γ2〈P ],4]〉 =⇒

(α1 = α2)⇒ (γ1 = γ2)


Galois connections define closure operators. An upper closure operator (uco)is a monotone, extensive and idempotent map; a lower closure operator (lco)is reductive instead of extensive.

P [ −−−→←−−−αγ

P ] ⇒ α ◦ γ is an lco, γ ◦ α is a uco

Upper closure operators ρ ∈ uco(P [), for a given concrete domain P [, willbe used in the rest of this work. The set ρ(P [) = {ρ(π[) | π[ ∈ P [} is the setof fixpoints of ρ (by idempotence). If P [ is a complete lattice, then ρ(P [) is aMoore family; ρ(P [) is isomorphic to P ].

The uco-based approach to approximation is equivalent to the Galois con-nection approach. The former does not need the existence of some domain ofabstract properties P ] to be made explicit. Instead, all abstract computationscan be defined on a subset of the domain of concrete properties. The domainof concrete properties is called the concrete domain; depending on the con-text, either P ] or ρ(P [) will be denoted as the abstract domain. Moreover, anabstract domain will be often identified with its upper closure operator. Theelements of an abstract domain are its abstract values.Example. The set of abstract properties P ] of Example 3.1.2 can be describedby the upper closure operator ρ (when describing the same property V ] iswritten as V ):

ρ : ℘(Z) 7→ ℘(Z)ρ(∅) = ∅ ≈ [⊥]

ρ({0}) = {0} ≈ [0]ρ(X) = {n | n ≥ 0} ≈ [nonNeg] if ∀x ∈ X. x ≥ 0ρ(X) = {n | n ≤ 0} ≈ [nonPos] if ∀x ∈ X. x ≤ 0ρ(X) = Z ≈ [>] otherwise

2

3.1.3 Abstract domains

Given a concrete domain P [ which is a complete lattice, the set of upperclosure operators (or, equivalently, of Galois connections) is also a completelattice ordered with respect to precision: the smaller a uco, the smaller theloss of information induced by the abstraction process.

ρ1 ≤ ρ2∆= ∀p ∈ P [. ρ1(p) 4

[ ρ2(p)

Some operators are defined, which systematically refine, simplify or combineabstractions: among them, reduced product, disjunctive completion, comple-mentation. We only introduce the reduced product.


Definition 3.2 (Reduced product). Let P ]i , with i ∈ I, be abstract do-mains designed over the concrete domain P [ by using Galois connectionsor upper closure operators. The reduced product ui(P

]i ) is the more abstract

among the domains which are more concrete than every P ]i . It is defined as

the Moore closure (Section 2.4) of the set⋃

i P]i or, equivalently, by the upper

closure operator

ρ(π) =∧

i

ρi(π)

The reduced product of a set of domains is the simplest domain which incor-porates all the relevant information of the P ]i .

Example. Let P ]1 be the abstract domain of signs (P ] in Example 3.1.2) and

P ]2 be the parity domain {[>] , [even] , [odd] , [>]}, with the obvious meaning

for abstract values. The reduced product P ]1 u P]2 captures both the sign and

the parity of numbers:

P ]1 u P]2 = { [⊥] , [0] , [nonPosEven] , [nonPosOdd] ,

[nonNegEven] , [nonNegOdd] , [>] }

2

3.1.4 Property transformers and completeness

Galois connections can be lifted from sets of properties to sets of monotone

property transformers: if P [A −−−−→←−−−−αA

γA

P ]A and P [B −−−−→←−−−−αB

γB

P ]B are Galois connec-

tions, then

P [Am7→ P [B −−−−−−−−−−→←−−−−−−−−−−

λφ.αB◦φ◦γA

λψ.γB◦ψ◦αA

P ]Am7→ P ]B

where the orderings 4A and 4B on functions are pointwise. The functionγB ◦ψ ◦αA is said to be the best correct approximation of ψ, that is, the mostprecise function approximating ψ with respect to αA and γB .

This lifting can be used to approximate functions, functionals, etc. Ingeneral, the approximation must be sound, that is, for some f [ ∈ P [A

m7→ P [B

and f ] ∈ P ]Am7→ P ]B , the following condition holds:

αB ◦ f[ 4B f ] ◦ αA

Soundness implies that the result of applying the abstract function is a cor-rect approximation of the concrete result. If equality holds, then we havecompleteness. Completeness is an important property of abstractions, mean-ing that, relatively to the underlying abstract domains, the abstract functionis as precise as possible.


Example. The sign abstraction (Example 3.1.2) of properties ℘(Z) is notcomplete with respect to the sum: the Galois connection is defined as

℘(Z)× ℘(Z)m7→ ℘(Z) −−−−−−−−→←−−−−−−−−

λφ.α◦φ◦γ

λψ.γ◦ψ◦αsigns× signs

m7→ signs

where α and γ induce the Galois connection ℘(Z) −−−→←−−−αγ

signs. The sum

function f and its approximation f ] = α◦f ◦γ do not satisfy the completenesscondition:

α(f(3,−2)) = [nonNeg] 6= [>] = f ](α(3), α(−2))

On the other hand, it turns out that this abstraction is complete with respectto product. 2

3.1.5 Fixpoint approximation

Let f ∈ P [ 7→ P [ be a continuous function, and assume we are interested inapproximating its fixpoint lfp⊥[ (f) (Section 2.4), obtained as the limit ofthe iteration sequence

f0 = ⊥[ fn+1 = f(fn) lfp⊥[ (f) =⊔

n≥0

fn

It is natural to obtain the approximation by computing the abstract imageof the sequence α(f0), α(f1), α(f2), . . . If the approximated function f ] iscomplete with respect to f (that is, α◦f = f ]◦α), then the fixpoint abstractionα(lfp⊥[ (f)) can be obtained by the sequence

f ]0 = α(⊥[) f ]n+1 = f ](f ]n) lfpα(⊥[)

(

f ])

=⊔

n≥0

f ]n

If f ] is not complete, then the limit of the abstract sequence can be greaterthan α(lfp⊥[ (f)).

Widening

Computing the limit of the abstract iteration sequence is not guaranteed toterminate. Widening operators [23] are used in computing abstract iterationsequences; they enforce the condition that every ascending sequence reachesa fixpoint in a finite number of steps.

Definition 3.3. Given a domain L in which there exist infinite ascendingchains (Section 2.4), the operator O : L× L 7→ L is a widening if it satisfies


∀x, y ∈ L. x v xOy∀x, y ∈ L. y v xOy

for all ascending chains x0 v x1 v . . . , the ascending chaindefined by y0 = x0, yn+1 = ynOxn+1 is not strictly ascending.

Applying O to an infinite strictly ascending chain (that is, such that ∀n. xn @nn+1) builds another chain whose fixpoint is finitely reachable (that is, thereexists an n such that yn = yn+1).

In the following, we will consider a bounded iteration, in which a maximumnumber of steps n is fixed. Unless the fixpoint has been reached before n steps,the n-th iteration is ulteriorly approximated in order to guarantee soundness.

3.1.6 Program properties

Semantics is the more concrete among the properties of a program. A pro-gram semantics identifies a set of programs which are semantically equivalent.Depending on how much the semantics is detailed [17] (for example, trace se-mantics is more detailed than input-output semantics since it can distinguishprograms by their partial computations, not only by their final result), theset of semantically equivalent programs can be larger or smaller.

Given a semantics, JK, the collecting semantics JKC is a set extension of JK:

JPKC = {JPK}The collecting semantic can be seen as a very precise property of programs. Inprogram analysis, we can be interested in less precise (but computable) prop-erties; for example, requiring that the variable x at program point p has positivevalue identifies a set of program semantics which can be non-equivalent withrespect to other properties.

Let P [ = ℘(

{JPKC}P)

be the concrete domain of program semantics. An

abstract semantics is an approximation of the collecting semantics. Given a setof abstract properties P ] and a program P , JPK] is the most precise abstract

property which is satisfied by P . Soundness implies that JPK] must be greater

than or equal to α(JPKC). In general, given an algorithm for computing theabstract semantics, it can be that the computed abstract property is lessprecise than JPK]; this is generally due to incompleteness, that is, some lossof precision induced by abstract computations.Example. Let P ] be the set {

[

⊥]]

, [pos] , [nonPos] ,[

>]]

}. Every program Pwith positive final value for x will be classified in the same class with respectto P ]. 2

Type systems are an interesting example of abstract semantics [18]: thevalues of data are abstracted into types, so that two concrete values are in-distinguishable as long as they have the same type.

3.2 Information Flow Analysis 25

3.1.7 Typical (numerical) domains

We define here some abstract domains which will be useful in the following. Ifnot differently specified, the concrete domain is intended to be C = ℘(Z). Wewill mainly refer to upper closure operators rather than Galois connections.Abstract domains are usually denoted by A; when no confusion arises, ρ willalso stand for both the uco and the set of its fixpoints (equivalently, abstractproperties or abstract values), that is, the corresponding abstract domain.For an abstract value V , we write V = X or V ≡ X , where X ∈ C, if Xis the set of concrete values which are described by V (in terms of Galoisconnections, X = γ(V )). Moreover, V = V ′ ∪ V ′′ is the value such thatγ(V ) = γ(V ′) ∪ γ(V ′′).

Numerical domains of interest are:

• Sign, zero excluded:

As = ρs = {[>] , [pos] , [neg] , [⊥]}

where [>] = Z, [⊥] = ∅, [pos] = {n | n ≥ 0} and [neg] = {n | n < 0}.• Sign, zero included:

Asz = ρsz = {[>] , [pos] , [neg] , [0] , [⊥]}

where [pos] = {n | n > 0} and [0] = {0} (the other values being equalto As). This domain might also include [nonPos] = [neg] ∪ [0] and[nonNeg] = [pos] ∪ [0].

• Parity:Ap = ρp = {[>] , [even] , [odd] , [⊥]}

where [even] = {n | n is even} and [odd] = {n | n is odd}.• Parity and Sign:

Aps = ρps = {[>] , [even] , [odd] , [pos] , [neg] ,[poseven] = [pos] ∩ [even] ,[posodd] , [negeven] , [negodd] , [⊥]}

which is the reduced product of As and Ap.

The partial ordering on abstract values is illustrated in Figure 3.2.

3.2 Information Flow Analysis

The problem of protecting the confidentiality of information manipulated bycomputing systems is becoming more and more important [67]. The currentpractice of software production gives little assurance that programs satisfyconstraints on how data are released (that is, how information flows). One ofthe most promising approaches to Information Flow security is based on the


[>]

[⊥]

[odd]

[>]

[⊥]

[pos][neg]

[even]

As

Ap

[>]

[nonNeg]

[pos]

[⊥]

[0]

Asz

[neg]

[nonPos]

[odd]

[⊥]

[neg][pos]

[negeven] [posodd]

Aps

[even]

[negodd][poseven]

[>]

Fig. 3.2. The partial ordering on the example abstract domains

use of programming-language techniques for specifying and enforcing securitypolicies (Language-based security). Language-based mechanisms are especiallyinteresting because of the inadequacy of standard security techniques in pro-tecting confidential information in large, networked information systems suchas the growing realm of web-based services.

Conventional security mechanisms, such as Access Control [48], requirethat agents have some privileges in order to access protected data. However,in a large computing system, assuming that all programs are trustworthyis simply not realistic: we are not guaranteed that checked programs (thatis, which can indeed exhibit any certificate) will not leak information whenmanipulating confidential data. Attention must also be paid to how data prop-agate, in order to avoid that secret data become visible (by error or malice) to

3.2 Information Flow Analysis 27

untrusted users. The language-based approach to these security requirementsrelies on program analysis. Analysis should guarantee that information con-trolled by a confidentiality policy will not flow to locations which violate thepolicy (that is, which can be visible to unauthorized users). The main securityproperty with respect to Information Flow is Non-Interference, introduced byGoguen and Meseguer [40] as:

One group of users [...] is noninterfering with another group ofusers if what the first group does [...] has no effect on what the secondgroup of users can see

Non-Interference relies on the separation of program data into private (orhigh-security, H) and public (or low-security, L). Public data are those whichare directly visible by external agents. On the other hand, the value of privatedata should not be acquired, directly or indirectly, by untrusted, possiblymalicious users. The usual framework underlying many formal definitions ofNon-Interference identifies data with program variables: a program preservesconfidentiality if, by observing the value of public variables after the programexecution, an attacker is not able to acquire information about the initialvalue of private variables.

A formal definition of Non-Interference in IMP is:

∀ h′, h′′ ∈ VH, ∀ l ∈ VL. (JPK (h′, l))L

= (JPK (h′′, l))L

where h and l are values for, respectively, the private variable vector hand the public variable vector l; JPK (vh, vl) is the semantics of a programP for some private and public input vh and vl; ()L is its restriction to publicvariables (that is, to what attackers can see). This means that executing twiceP with different values for the private input results in two final states whichcannot be distinguished by external, untrusted users. Then, attackers are notable to infer the value of h by watching the final value of l.Example. Let P be the IMP program below.

l 1 = 1 ;while ( h1 > 0) do {

l 1 = l 1 + 1 ;h1 = h1 − 1;

}l 2 = l 1 + h2 ;

P has two private variables h1 and h2, and two public variables l1 and l2.Both low-security variables have, after the execution, values which clearlydepend on private input values. There is a direct information flow from h2to l2, by means of the variable assignment in the last statement. Moreover,the final value of l1 clearly depends on h1 even if no forbidden assignmentto the public variable occurs. This is an indirect form of information flow, in


which confidential information is leaked to public variables whose modificationdepends on private data. In this example, whether the increment of l1 isexecuted depends on h1. Analysis should also pay attention to transitive flows:l2 depends on h1 because it depends on l1. 2

Studying the information flow of programs is essentially equivalent to com-puting dependencies between data [14, 2]: non-interference implies that publicdata do not depend on private data. One of the most practiced techniques forinformation flow analysis is the use of type systems [71, 70, 1]: a programsatisfies the security requirements if it is typable with respect to some secu-rity type definition. Program analysis for Non-Interference has been applied toprogramming languages such as, among others, Java [54, 56, 6], Java bytecode[11], lambda-calculus [43] and ML [63].

3.3 Abstract Non-Interference

This section describes security properties obtained by weakening the notionof Non-Interference in order to model realistic confidentiality policies.

3.3.1 Weakening Non-Interference

The standard notion of Non-Interference turns out to be too strong a require-ment for software production. Keeping public and private data completelyseparate can be unnatural and lead to limitations in the design of programs.In practice, some selected information leaking should often be allowed in realprograms.Example. If the access to some functionalities of the programP is regulated bypassword checking, the value of the password is clearly confidential. However,by trying to access the service, an external user acquires information aboutthe password, namely, that it is different from his or her attempts. Such aprogram would not satisfy the standard property of Non-Interference; yet, itis perfectly reasonable. 2

Weakened forms of Non-Interference aim to specify which information flowsfrom high-security to low-security data should be allowed. Many attemptshave been proposed in this direction; among them, Declassification [75] andRelaxed Non-Interference [51].

3.3.2 Attackers as abstract interpretations

Non-Interference can be weakened by assuming that some dangerous flowsare permitted since they cannot be exploited by attackers. That is, external,

3.3 Abstract Non-Interference 29

unauthorized observers can only see public data up to a certain degree of pre-cision. This is the assumption underlying Abstract Non-Interference (ANI)[31], a confidentiality policy introduced by Giacobazzi and Mastroeni in 2004.Abstract Non-Interference is based on Abstract Interpretation; it is param-eterized on the precision (that is, the observational power) of attackers inobserving the program execution. It is natural, in the Abstract Interpretationframework, to describe the observational power by means of the approxima-tion induced by Galois connections. Attackers have access to an approximatedversion of the program semantics, that is, they can be described as abstractinterpretation of programs.Example. Let l be a public integer variable of the program P . If an attackerA can observe the value of l before and after the execution of P , then he orshe has the maximum observational power on the datum. On the other hand,if A can only observe some property π of l, then two executions in which lhas the same value with respect to π (although, possibly, different concretevalues) cannot be distinguished. Since a property on data can be representedby an upper closure operator ρ, we say that the precision of A with respectto l is ρ: the attacker cannot distinguish two values v1 and v2 if ρ(v1) = ρ(v2)(that is, if they are represented by the same abstract value or, equivalently,have the same property π). 2

In general, the observational power of attackers can be different for everypublic variable, and different between input and output. The following formaldefinition is the condition for a program to satisfy Abstract Non-Interferencewith respect to observational power η on the input and ρ on the output (withthe same precision for all public variables).

Definition 3.4 (Abstract Non-Interference). A program is secure forANI with respect to η and ρ (written [η]P (ρ)) if

∀ h′, h′′ ∈ VH, ∀ l′, l′′ ∈ VL. η (l′) = η (l′′)

=⇒ ρ(

(JPK (h′, l′))L)

= ρ(

(JPK (h′′, l′′))L)

If attackers cannot distinguish the public input, it is impossible for them toacquire information about private data by observing the output.

Example. Let the observational power of an attacker A be described by ρsz .Then, the program P of Example 3.2 satisfies ANI for the variable l1. In facts,the sign of l1 is always positive and, therefore, does not depend on privatedata. On the other hand, an illicit flow is still possible with respect to l2,since no hypotheses can be made on the initial value of h2. Consequently, Pis to be rejected even for the weakened security property; yet, some flows areno longer detected as dangerous. 2


3.3.3 Narrow and Abstract Non-Interference

Definition 3.4 allows some information flows to be detected as dangerous evenif they are motivated by a change in public instead of private data. That is, itis possible that a flow from public data is considered as coming from privatevariables; these flows are named deceptive flows.Example. Let l be the only public variable in P . Suppose an attacker canobserve the parity ρp of l on the input and its sign ρs on the output. Supposealso l is not modified inside P ; then, it is clear that no forbidden informationflows should be detected. Yet, the ANI condition, as defined in Definition 3.4,does not hold since ρp(−4) = ρp(6) but ρs(−4) 6= ρs(6). 2

To deal with this problem, a new definition of Non-Interference must beprovided: namely, the Abstract Non-Interference, opposed to the Narrow Non-Interference [31] of Definition 3.4.

Definition 3.5. A program is secure (written (η)P (φ 8ρ)) if, for everyh′, h′′ ∈ VH, l′, l′′ ∈ VL:

η (l′) = η (l′′) =⇒ ρ(

(JPK (φ(h′), η(l′)))L)

= ρ(

(JPK (φ(h′′), η(l′′)))L)

The semantics of P is computed on abstract values with φ and η.

The idea is to model only information flows generated by the variationof private values. This is obtained by computing the program semantics onabstract values. This refined definition of ANI has no deceptive flows: if anattacker can observe the property η of public input and ρ of public output,then no information flow concerning the property φ of private input interferesin the observable property ρ, under the assumption that the public inputproperty η does not change. The following holds:

[id]P (id) =⇒ [η]P (ρ) =⇒ (η)P (id 8ρ) =⇒ (η)P (φ 8ρ)

where id = λπ. π is the identity domain. The standard Non-Interferenceproperty can be expressed as [id]P (id), or (id)P (id 8id).

3.4 Program Slicing

Program slicing [72, 45] is a software engineering technique which has beensuccessfully used in many areas as debugging, testing, reverse engineering,reuse and complexity analysis. Its main goal is to get a deeper understandingof a program by finding, for a given behavior (the slicing criterion), the subsetof the program (the slice) which is relevant to that behavior. Program slicingrelies on the notion of dependency between program data: a depends on b ifits value can change due to some changes in b (for dependencies, see Sections

3.4 Program Slicing 31

3.6.2 and 5.2). A statement is not relevant for a criterion if what the statementdoes has not influence on the behavior which is induced by the criterion.

Slicing comes in different versions: static slicing does not make assumptionsabout the input of a program. On the other hand, dynamic slices [47] arecomputed by relying on a specific test case, and are usually smaller. Anothervariant is conditioned slicing [13], in which an input condition represents aset of initial states; a theorem prover extracts the statements which are neverexecuted if the condition induced by the slicing criterion is satisfied.

3.4.1 An algorithm for program slicing

The literature on program slicing is well-known (see [69] for a survey). Here,we illustrate a basic, non-interprocedural algorithm for static slicing [72] ofIMP programs, based on the iterative solution of dataflow equations.

A slice is a correct program which is obtained from the original programby deleting zero or more statements, namely, those which are not relevant toa particular criterion. A slicing criterion is a pair 〈s, v〉 where s is a programpoint (or the program statement or control predicate at that point) and v isan identifier. A slice for a criterion 〈s, v〉 must compute, for v, the same valueas the original program when the program is at s. In the following, sets S ofslicing criteria will be considered.

The algorithm relies on dependencies between statements, which can bedata dependencies (occurring when a statements modifies or defines an iden-tifier which is referred to by another statement) or control dependencies (astatement is control dependent on a control predicate if the predicate hasinfluence on whether the statement is executed.).

A control flow graph CFG is computed for the program. Nodes are programstatements (identified by numbers), edges i→cfg j represent the dependencyof statement j on i. The functions def () and ref () are defined as

def (v := e) = {v}

ref (v := e) = ref ( if e then · else ·) = ref (while e do ·) = vars (e)

Let S be a set of slicing criteria, that is, S = {〈k, v〉 | v ∈ V } for a statementk and some set of identifiers V . This means that we are interested in the valueof identifiers in V at statements k. Then, the following dataflow equations aredefined: for each i→cfg j,

R0S (i) = R0

S (i) ∪ {v | v ∈ R0S (j) ∧ v /∈ def (i)}

∪ {v | v ∈ ref (i) ∧ def (i) ∩ R0S (j) 6= ∅}

S0S = {i | i→cfg j ∧ def (i) ∩ R0

S (j) 6= ∅}

The first formula is the set of directly relevant variables; the second is de-rived from the first and is the set of relevant statements. Iteration starts withR0S (n) = V and R0

S (m) = ∅ for m 6= n, where n is the end node of CFG.


Then, control dependencies are taken into account, that is, those origi-nating from control statements. Let infl (b) the set of statements which areunder the range of influence of a predicate b. Dependencies are propagated asfollows:

BkS ={

b | SkS ∩ infl (b) 6= ∅}

Rk+1S (i) = RkS (i) ∪

⋃

b∈BkS

R0{〈b,v〉|v∈ref(b)} (i)

Sk+1S = BkS ∪

{

i | i→cfg j ∧ def (i) ∩ Rk+1S (j) 6= ∅

}

Rk+1S (i) is obtained by adding to RkS (i) variables which are relevant because

they have a transitive data dependency on statements in BkS (that is, predi-

cates which have influence on statements in SkS).The set SkS is non-decreasing on k; the fixpoint of the iterative computation

is the desired program slice.

3.4.2 Slicing and information flow

An Object-Oriented approach to slicing can be found in [49] and [50]; thesecond paper highlights the link between slicing and information flow. Therelation of slicing with information flow was first outlined by Bergeretti andCarre [12].

Let P ′ be a program which is obtained by another program P by adding,for every private variable h, a statement ih : h := h at the beginning of P (ihis the corresponding program point). These statements play the role of inputstatements. Similarly, for every public variable l, the statement il : l := l isadded at the end of P (representing the output of l). We have that P satisfiesthe Non-Interference condition if the slice of P ′ with respect to the slicingcriterion {〈il, l〉 | l is public} does not contain any of the ih statements, thatis, if the initial value of the private variables does not affect the final value ofpublic variables.Example. The IMP program P , with private variables h1, h2 and publicvariables l1, l2, and its transformation P ′ are shown in Figure 3.3.

If the slice is computed with respect to the public values at the last twoassignments, then it does not contain h2 := h2. However, since it does containh1 := h1, Non-Interference is not satisfied. In facts, the public variable l1indeed depends on the initial value of h1. 2

The relation between Information Flow and Program Slicing will be alsodiscussed (with respect to the abstract notion) in Section 5.6.

3.5 Proof-Carrying Code 33

Fig. 3.3. The program P and its transformation P ′

l 1 = 1 ;while ( h1 > 0) do {

l 1 = l 1 + 1 ;h1 = h1 − 1 ;

}

h1 = h1 ;h2 = h2 ;l 1 = 1 ;while ( h1 > 0) do {

l 1 = l 1 + 1 ;h1 = h1 − 1 ;

}l 1 = l 1 ;l 2 = l 2 ;

3.5 Proof-Carrying Code

The Proof-Carrying code (PCC) framework was proposed by George Neculain 1997 [58, 57, 59, 60, 61]. Through the program certification mechanismprovided by a Proof-Carrying code architecture, a host system is able todetermine whether it is safe to execute a program which is supplied by anuntrusted source.

With the exponential growth of Web-based communication, it happensmore and more frequently that a user be asked to execute untrusted code.Certification should be provided in order to guarantee security propertiesof programs. The mechanism of certificates (that is, a program is executedonly if the source can be trusted) seems not to be sufficient, mainly due tothe impossibility to maintain a correct knowledge of trusted sources. Proof-Carrying code follows a different path: programs are executed if it is possible,for the user, to check safety with respect to some agreed security policy.

3.5.1 The PCC architecture

In Proof-Carrying code, the code producer is required to provide a safetyproof for the program with respect to some security policy. The safety proofcomes along with the program (if the producer and the user agree on thesecurity policy), or is provided on-demand to prove that the program satisfiesthe required security properties. In any case, before executing the code, theconsumer runs a proof validator to determine the validity of the providedproof; if the checks is successful, the program can be safely executed.


Basically, there are no theoretical constraints on which methods should beused in order to obtain, represent and validate the safety proof. The key pointis that validating a proof (which is on the side of the consumer) is, in general,a much cheaper task than obtaining it (which is on side of the producer).An overview of the PCC architecture is shown in Figure 3.4. The part of thediagram which is beyond the dotted line is the trusted component of the PCCarchitecture; that is, the only module the consumer is forced to consider assafe without having a proof. The key advantage of Proof-Carrying code relieson the simplicity and the efficiency of a proof validator, compared to thecomplexity of a theorem prover: the hard work is left to the code producer,which can use various optimizing techniques in order to boost the programanalysis. The verification condition (VC) is a predicate, expressed in somelogic, whose validity is a sufficient condition for ensuring compliance withthe safety policy. The code producer computes VC for proving it, while theconsumer computes it to ensure that the code contains a valid proof.

Fig. 3.4. The Proof-Carrying code architecture

source code

compilation

certification

compiled code

safety proof

security policy

ConditionVerification

executionproof

validation

3.5.2 A novel approach to PCC architecture design

Recently, a new design for a PCC architecture has been proposed by Chang,Chlipala and Necula [15]. Instead of a safety proof, a program is equippedwith a static analyzer for the security property and an analysis certifier, whosesoundness must be checked by some trusted algorithm. The code user runs theanalyzer on the program and checks the correctness of the analysis. According

3.5 Proof-Carrying Code 35

to the authors, this should lead to several benefits in modularity and flexibility;moreover, the analysis certifier is expected to be easier to prove sound thanthe analysis itself.

3.5.3 Application examples

Extension of the ML runtime system

In the original paper [58] an example is provided, in which Proof-Carrying codeallows arbitrary users to use untrusted, foreign functions in the runtime systemof a safe programming language. In practice, when programming in high-levellanguages such as ML or Haskell, the runtime system must be extended withfunctionalities which are written in an unsafe programming language, such asC or even assembly language. Here, the problem is to guarantee that foreignoperations preserve some conditions on data representation. The following MLcode defines a union type T and a sum operation, which computes the sum ofall numbers contained in the T list. Suppose that the runtime system uses, forsum, a hand-optimized assembly version. Then the problem of defining a safetypolicy for ML runtime system extensions arises; the foreign sum function mustbe proven correct with respect to the safety policy.

datatype T = Int of i n t | Pair of i n t ∗ i n tfun sum ( l : T l i s t ) =

l e tfun f o l d r f n i l a = a| f o l d r f ( h : : t ) a = f o l d r f t ( f ( a , h ) )

inf o l d r ( fn ( acc , Int i ) => acc + i

| ( acc , Pair ( i , j )) => acc + i + j )l 0

end

The safety requirement for runtime system extensions is that they preservedata-representation conventions. In ML, data representation is type-directedand can be expressed for this example as follows:

τ ::= int | τ1 ∗ τ2 | τ1 + τ2 | τ list

T represents the type int + (int ∗ int). Conditions on data-representationare expressed in terms of type safety. Integer values are k-bit machine words,depending on the chosen architecture; pairs are represented as pointers to apair of location containing the constructor value (0 for Int, 1 for Pair) andthe value contained in the constructor.

A verification condition is computed in Floyd style by a VC generator; itis based on the precondition and the postcondition for program safety:


pre =` r0 : T list post =` r0 : int

where r0 is the machine register which contains the input list at the beginning,and the final result at the end of the execution. The VC generator uses pre andpost to generate a formula which is a sufficient condition for the preservationof data representation invariants. The assembly code of the sum function isannotated with invariants, that is, conditions to be satisfied at a given programpoint ; for example,

INV r0 : T list

is the precondition before the first code instruction, while

INV r0 : T list ∧ R1 : int

must hold at every loop iteration (r1 contains the value of acc). The soundnessof the VC generator must be proven; that is, given a pre and a postconditionand a set of invariants, the validity of VC implies that the program reads onlyfrom valid memory locations as they are defined by the typing rules.

The safety proof is a derivation of the Verification Condition. EdinburghLogical Framework [42] was advocated as a formalism to represent predicatesand proofs; however, any other reasonable framework can be used. Furtherdetails are omitted.

3.6 The state of the art: Object-Oriented Information

Flow analysis

This section introduces some recent results in the field of Information Flowanalysis, with a particular interest in Object-Oriented programming lan-guages.

Section 3.6.1 largely discusses an approach to Non-Interference analysis inpresence of pointers. This work deals with aliasing, in a more refined way thanexisting type-based approaches. Current research on abstract dependenciesis illustrated in Section 3.6.2. The analysis of low-level languages, togetherwith code certification and the preservation of properties through compilation,are discussed in Sections 3.6.3 and 3.6.4. A short survey of mathematicalformalisms for describing information flow is given in Section 3.6.5. Finally,Section 3.6.6 introduces the only existent approach to ANI verification andoutlines some of its shortcomings.

3.6.1 Analysis with pointer aliasing

A recent paper [4] by Amtoft, Bandhakavi and Banerjee addresses the problemof pointer aliasing [16, 26] in Information Flow analysis of Object-Orientedprograms. Pointer aliasing occurs whenever two identifiers can refer to the

3.6 The state of the art: Object-Oriented Information Flow analysis 37

same memory location(s), as in languages with explicit pointers (C or C++)or Object-Oriented languages (C++ or Java).

In Java, an object of class C identifies a portion of memory which can bereferred by any identifier whose type is compatible with C. Consequently, anymodification to the object via one of the identifiers referring to it propagatesto the other identifiers. Program analysis must account for this problem, byspecifying when it is safe to consider two identifiers x and y as independent,that is, when x and y are guaranteed not to share any memory locations.

Assertions

Independence properties are specified by means of assertions in a Hoare-likelogic. The logic basically relies on two predicates: region assertions and inde-pendence assertions. The first has the form x L, where x is an identifier andL is an abstract location, that is, an abstract representation of a set of heaplocations. Similarly, the assertion L1 .f L2 means that, for every concretelocation l1 which is abstracted by L1, any location l2 contained in the fieldf of l1 is represented by L2. If L1 and L2 are disjoint, then the assertionsx L1 and y L2 imply that x and y cannot alias, since they do shareno heap locations. This is an independence assertion, and can be expressedas xn y (to be read as x is independent of y). Assertions and statements arecombined in triples

{φi} S {φo}

where the assertions φi and φo are valid, respectively, before and after thestatement S. In addition to assertions which are computed axiomatically, pro-grammers are allowed to insert programmer assertions are also, in order tohelp the prover to deal with independence results which are clear but noteasily provable.

Semantics of assertions

States (s, h) contain a store s and a heap h. The extraction relation η relateslocations to abstract locations. It is, actually, an upper closure operator. Theone-state semantics gives a precise meaning to assertions φ by the predicate(s, h) �η φ (l .f is the field f of the location l):

(s, h) �η x L ⇐⇒ (s(x), L) ∈ η(s, h) �η L .f L′ ⇐⇒ ∀l. (l, L) ∈ η ⇒ (l .f, L′) ∈ η

A two-state semantics is also defined, which relates two states with an asser-tion. When there is no ambiguity on y, the assertion xn stands for x n y.The meaning of xn is that the two states which are referred by the two-statesemantics agree on the value of x. As an example of semantic rule,

(s, h) & (s′, h′) �β,η,η′ xn ⇐⇒ (s(x), s′(x)) ∈ β

means that x is constant in (s, h) and (s′, h′) if its location in the two statesis related by a bijection β on locations.


Static independence checking

Dependencies are checked by means of a Hoare-like logic. A judgement of thelogic takes the form

Π ` {φi} S {φo} [X ]

where X is an over-approximation of the abstract locations modified by S,Π is an environment of method summaries {ψi} {ψo} [X ′] which describethe behavior of methods, and {φi} S {φo} is a Hoare triple. An axiomaticsemantics is provided, which computes assertions by means of a set of rules (forexample, {y L} x := y {x L} [{x}] is the rule for variable assignment).Judgements of the logic only refer to what is actually affected by a statement;they are combined by means of a sound frame rule, in the style of SeparationLogic [64].

A sound algorithm sp (S, φi) computes, for a statement S and a precon-dition φi a pair (φo, X) where φo is a postcondition and X approximates theabstract addresses modified by S. Under certain conditions, φo is proven tobe the strongest postcondition.

Discussion

This work uses Hoare logic to define an interprocedural, flow sensitive In-formation Flow analysis for Object-Oriented programs, extending a previouswork [5] on a simple imperative language. The use of region and independenceassertions allows a better treatment of aliasing than current type-based anal-yses [54, 7]. Besides describing Non-Interference issues, assertions also modelaliasing.

Methods are represented by summaries, in form of triples. A method isassigned a set of summaries, since different preconditions and postconditionsmay hold at different call sites. This polyvariance does not, however, dealwith subclassing and dynamical dispatch: these two issues are not taken intoaccount in this work. Dealing with multiple method definitions would probablyimply the necessity of disjunctions in assertions (actually, disjunctions areconfined in programmer assertions).

A collection of rules for computing postconditions is shown. The precon-dition of

{x L, y L′, L .f L′;xn, yn, L .fn} x .f=y {L .f L′;L .fn} [{L .f}]

specially in its L .f L′ component, seems to be a bit strong (assignmentcannot change the abstract location), and satisfied only if abstract locationscontain all type-compatible concrete locations. This could possibly result inlimiting the benefits of this approach in the treatment of aliasing. Also, it isnot clear how the assumption on the existence of a specific location L0 fornew objects restricts the set of programs which can be accounted for in theanalysis.


3.6.2 Abstract dependencies

Abstract dependencies are accounted for in another recent work [65], wherethey are applied to the detection of false alarms in program analysis, and inthe context of semantic slicing [66].

Classical, concrete dependencies (see Definition 5.5, and [45] for a calculuson dependence graphs) keep track of which data can have their value modified,in response of some changes in data they depend on. Given a program P , itstrace semantics collects all the traces 〈s0, . . . , sn〉 ∈ Σ, that is, the executionof P on every possible input state s0, ending in sn. Standard dependenciescan be generalized in two ways, as shown in the following paragraphs.

Observable dependencies

In program analysis, we are often not interested in all data dependencies. Onthe contrary, we want to identify a subset of the program traces and consideronly dependencies which are observable in this subset. Given a semantic sliceE ⊆ Σ and a function E] mapping program points to sets of memory config-urations, we have that E] is an abstraction of E if the memory configurationµ ∈ M belongs to E](l) for all traces 〈. . . , (l, µ), . . . 〉 ∈ E . Intuitively, a depen-dency is observable if it a dependency on the whole set of traces, and it stillexists in the restriction induced in the semantic slice. We are not interested inthis restriction, then all traces will be considered in the following paragraph.

Abstract dependencies

The other restrictive2 definition of dependency is modeled on an abstractproperty: the only dependencies which are taken into account are those mod-ifying the abstract property of dependent data.

Definition 3.6 (Abstract dependencies). Let φ be a denotation, that is,a function mapping M (memory configurations) to ℘(M), and α0 and α1 bethe abstraction functions (with the dual γ0 and γ1) of some abstract domainsD0 and D1 on data. Then, φ induces an abstract dependency of (x1, α1) on(x0, α0) if and only if there exist a memory configuration µ and two abstractvalues da, db ∈ D0 such that

α1(φ(µ [x0 ← γ0(da)])(x1)) 6= α1(φ(µ [x0 ← γ0(db)])(x1))

This definition (similar to Definition 5.6) means that there is an abstractdependency from x0 to x1 if changing the abstract property for x0 in the inputconfiguration of φ results in a different abstract value for x1 in φ output (thatis, after applying the denotation to the initial state).

2 Restrictive in the sense that it identifies a smaller set of dependencies than thestandard notion.


Comparing and computing dependencies

The abstraction functions parameterizing the above definition underline ahierarchy of abstract dependencies: the more precise α0 and α1 are, the moredependencies will be discovered. Having α0 = α1 = λv.v leads to standarddependencies.

Furthermore, definitions are provided, which show how it is possible tocombine dependencies:

• Dependencies on the denotations φ′ (on α0 and α1) and φ′′ (on α1 andα2) can be composed into an upper approximation of the dependencies ofφ2 ◦ φ1 on α0 and α2.

• Given a set of paths from l0 to ln, dependencies can be approximated byfixpoint iteration of local dependencies on all the traces from l0 to ln.

Importantly, aliasing issues are not taken into account.

Discussion

This approach is interesting since it shows how to deal with the computationof abstract dependencies. It is claimed to be tightly connected with Non-Interference and its abstract version, ANI. Moreover, strong connections withprogram slicing are outlined.

Abstract dependency is an approximated property on program data. It isa weak version of the standard notion of dependency since the informationa is said to be relevant to b only if a change in a can modify some abstractproperty (not only the concrete value) of b. In the work of Rival, a numberof techniques is provided for analyzing and composing dependencies, basedon the trace semantics of programs. For example, the dependency analysis ofthe compound statement [s1 ; s2] involves combining dependencies in s1 ands2 via some compositional rule. However, when combining dependencies, anassumption is made on the analysis of simple statements: only a mathemati-cal, set-theoretic definition is provided for the dependency calculus. Then, weneed to fill the gap between a purely semantic, mathematical notion and analgorithmic definition. To the best of our knowledge, there is no other workin the literature addressing this topic, which is among those discussed in Sec-tion 5.2. Clearly, in order to be decidable and as efficient as possible, manyapproximations are unavoidable, thus resulting in some loss of precision.

3.6.3 JVM-like programming languages

Barthe and Rezk [10, 11] propose an information flow type system for a non-trivial fragment of the Java Virtual Machine, including classes, methods andexceptions. The type system is developed in four layers:


1. operations for manipulating operand stacks and jumps [8]; every localvariable has a security level; a notion of indistinguishability is defined.Instructions inside conditional branches can have their security level liftedby the boolean guard.

2. Classes, static fields and static methods (respectively playing the same roleas global variables and procedures); methods are equipped with a security

signature k1k→ k2, where k1 and k2 are, respectively, the expected security

level of the parameters and the result, and k is the effect of the methodon the global memory.

3. Dynamically created objects, method overriding, type casts; here, themain complication is the use of a dynamically allocated heap.

4. Exceptions, which can leak information through abnormal termination.

Soundness proofs are provided for each level, based on the chosen notion ofstate indistinguishability.

Discussion

The proposed information flow analysis accounts for a large subset of Javabytecode (but the treatment of overriding is quite unclear). However, thetype system seems to be overly conservative, particularly when using securitysignatures. Moreover, a common feature of type-based approach is the lack offlow-sensitivity: the existence of an unsafe subprogram is a sufficient conditionfor rejecting a program.

Remark 3.7. Basically, security levels for variables play the same role asboolean variables in Chapter 5. In order to account for more complex in-formation flow properties, such as Abstract Non-Interference, the algorithmwith boolean variables is equipped with mechanisms for computing relevantvariables in expressions. Probably, this can also be applied to this (standard)information flow type-system, in order to analyze ANI.

3.6.4 Code certification

To the best of our knowledge, there does not exist a complete work on codecertification for Information Flow security properties. Since certification dealswith properties of executable programs, rather than source code programs3,a step in this direction can be found in a recent work by Barthe, Basu andRezk [8].

Here, the focus is set on the preservation of security program propertiesthrough . A security type system for a low-level programming language withjumps and calls is provided, modeled on its counterpart on the high-level

3 In facts, verifying source code can help the code producer to build good programs,but is not a guarantee for the end user, since the program comes as executable.


language. This work introduces a compiler, together with a proof that com-pilation preserves security properties. The proof can be seen as a procedureto compute, from a well-typing certificate for the source program, anotherwell-typing certificate for the compiled program.

3.6.5 Mathematical formalisms

Besides security type systems [71], several other techniques have been used inorder to describe the information flow properties of programs. Among them,Hoare logic [5], self-composition with Hoare logic [9] (see below), booleanfunctions [28, 29], dependency calculi [14].

To the best of our knowledge, there does not exist a formal and exhaustivecomparison between all or some of these techniques (for example, with respectto what they can prove, their efficiency, their flexibility etc.).

Self-composition

This technique relies on the following assumption: a program P is secure ifthe sequential composition P ; P ′ satisfies the Hoare triple

{∀x ∈ Xl.x = x′} P ; P ′ {∀x ∈ Xl.x = x′}

where

• P ′ is obtained by P through variable renaming;• Xl is the set of public variables;• for every x, x′ is the renaming of x in P ′

It is easy to see that this definition is precisely Non-Interference in its originalformulation [40].

Which Object-Oriented notion of Non-Interference

Existing literature on Information Flow analysis for realistic Object-Orientedprogramming languages often provides tiny examples of illicit flows (for ex-ample, a dangerous assignment of an instance field) in order to show howalgorithms behave. However, a formal characterization of Non-Interferencefor a complete Object-Oriented program (in the case of Java, a set of coop-erating class declarations with an entry point) cannot be found in the worksreferred in this section, with the exception of the work by Myers, Banerjeeand Naumann [54, 55, 6]. The discussion of Section 4.3 is in this direction.


3.6.6 Proving Abstract Non-Interference

Giacobazzi and Mastroeni [32] propose an axiomatic proof system for AbstractNon-Interference, based on Hoare logic. Assertions (for narrow ANI) takethe form [η] s (ρ), where s is a statement. Derivation rules allow to infer anassertion given one or more premises. Examples of the derivation rules are

[η] s (>)

[η] s1 (ρ) [η] s2 (ρ)

[η] s1 ; s2 (ρ)

meaning, respectively, that (i) every statement satisfies ANI if no property canbe distinguished at the output; (ii) the ANI property on η and ρ is preservedby sequential composition. There are rules for computing loop invariants, aswell as non-structural rules for extending the obtained results (here, v meansmore precise):

[η′] s (ρ′) η v η′ ρ′ v ρ

[η] s (ρ)

∀i ∈ I. [η] s (ρi)

[η] s(d

i∈I ρi)

In this system, it is possible to prove some results about invariants andthe security of programs with respect to given abstract properties. However,in the rule for public variable assignment (Π(η) refers to partitions, and willbe taken into account here)

[η] e (ρ) [Π(η) v Π(ρ)] x is public

[η]x := e (ρ)

the assertion [η] e (ρ) is defined as

∀l1, l2 public. η(l1) = η(l2) =⇒ ∀h1, h2 private. ρ(e(h1, l1)) = ρ(e(h2, l2))

It can be easily seen that this is the set-theoretical definition 3.4 of narrowAbstract Non-Interference. Consequently, the main point in proving ANI hasa mathematical, non-algorithmic formulation. Actually, this is the main diffi-culty in proving abstract information flow properties.

4

Representing Abstract Non-Interference

All the limitative Theorems of metamathematics and the theory ofcomputation suggest that once the ability to represent your own struc-ture has reached a certain critical point, that is the kiss of death: itguarantees that you can never represent yourself totally. Godel’s In-completeness Theorem, Church’s Undecidability Theorem, Turing’sHalting Problem, Turski’s Truth Theorem– all have the flavour ofsome ancient fairy tale which warns you that ”To seek self- knowledgeis to embark on a journey which . . . will always be incomplete, cannotbe charted on a map, will never halt, cannot be described.”

Douglas R. Hofstadter

What a good thing Adam had. When he said a good thing he knewnobody had said it before.

Mark Twain (1835 - 1910), Notebooks

This chapter investigates how to represent the information flow behaviorof programs with respect to abstract properties. Section 4.2 describes a type-based approach and its applications to simple languages. Next, we discussthe formalization of Abstract Non-Interference in a realistic Object-Orientedframework. Typical numerical domains are defined in Section 3.1.7.

4.1 Introduction

Checking the security property of Abstract Non-Interference relies on findinga suitable approach to the representation of abstract properties. This is tightlyrelated to the representation of abstract domains and operations on abstractvalues. ANI is defined as a denotational property of input and output data

46 4 Representing Abstract Non-Interference

(Definitions 3.4 and 3.5); its set-theoretic flavour (made visible by the use ofuniversal quantifiers) is a major obstacle towards the automatizing of analysistechniques.

In principle, if D is an infinite set, there is no way to automatically expressvalues and operations belonging to an arbitrary abstract domain on D. Inparticular, two operations are crucial: (i) computing the abstract propertyρ(v) of a concrete value v; (ii) combining two abstract values by means ofsome abstract operation V ′ ⊗ V ′′.

Values and operators

Task (i) can be accomplished by some computable function ρ(), taking v asits input and yielding a representation of the abstract values describing v.Yet, a Turing-equivalent programming language is needed in order to repre-sent arbitrary domains1. The use of a universal language involves problemswith termination, which become specially important when the process of codeanalysis is embedded into a certification architecture (Section 6.3): a user isprobably not willing to validate programs by means of some algorithm which isnot guaranteed to terminate. An extant approach to represent abstract prop-erties via a computable ρ() can be found in [51]2; in this case, many domainscannot be described since the adopted language (a typed lambda-calculuswithout recursion) is not universal. Moreover, some computable domains can-not be easily represented in analytical form. Instead, tabular representationwould be more appropriate, with the important limitation of being unable todeal with infinite abstract domains.

Task(ii) suffers from the similar problems, namely, our inability to dealwith infinite sets. In order to avoid set theoretic operations such as V ′⊗V ′′ =α({v′ ⊗ v′′ | v′ ∈ γ(V ′), v′′ ∈ γ(V ′′)}), rules must be provided which specifythe abstract behavior of ⊗. Importantly, partial knowledge about abstractoperations preserves soundness, that is, when no rule V ′ ⊗ V ′′ = U is pro-vided, the result can be safely approximated by [>] (thus inducing less precisecomputations).

4.2 Security types

This section illustrates an approach [73, 74] to the representation of Ab-stract Non-Interference, which uses security types. Section 4.2.1 defines se-curity types; Section 4.2.2 develops an algorithm for detecting ANI in the

1 Strictly speaking, we should say that, even with Turing-equivalent languages,most domains cannot be represented since they are inherently uncomputable.(Un)Fortunately, this is beyond the realm of computer scientists.

2 This paper does not rely on Abstract Interpretation; however, as pointed out in[53], Relaxed Non-Interference can be seen as a special case of ANI

4.2 Security types 47

FUN language (Section 2.1.2). Section 4.2.3 extends the discussion to a smallsubset of Java bytecode.

4.2.1 Representing abstract Information Flow with types

Dealing with Information Flow security properties involves comparing pro-gram executions. In facts, Non-Interference does not require that some pro-gram data be included in some range of values (which is, basically, the scope ofstandard type analysis). On the contrary, constraints are placed with respectto different program runs. For example, the definition of Non-Interference(Section 3.2) relates two executions which share the same input public values.Output public values are required to be the same; yet, there is no restrictionon which values (in an absolute sense) they can take.

Types as sets of pairs

If we want to describe such a requirement by means of types, the most naturalchoice is to assign a set of pairs (instead of single values) to program data.We call such a set a security type. A pair identifies two values which can bepossessed by a piece of data in any two program executions. Usually, we areinterested in computing output values.Example. If a variable x has security type τ = {(n, n) | n ∈ Z}, then it isguaranteed that, after any two executions, x has the same value:

∀v1, v2. (JPK (v1), JPK (v2)) ∈ τ

which is the same as JPK (v1) = JPK (v2) for any input values v1 and v2. Thisdoes not make assumptions on the (absolute) value of x. 2

Let c be some program component containing relevant data (a variable,a lambda-expression, etc.). If c is public, Information Flow analysis shouldprove that, at the end of the program, the value of c cannot be distinguishedif it could not be distinguished at the beginning. This amounts to say that,if the input security type for τ(c)in is a subset of {(v, v)}V (where V is someset of values), then, also, the output type τ(c)out only contains symmetricpairs. A security type which only contains pairs of the form (v, v) is said tobe public; a program satisfies the Non-Interference property if

(∀ c public. τ(c)in is public) =⇒ (∀ c public. τ(c)out is public)

The aim of this security type analysis is to infer security types for publicand private program data and verify the above condition. Obviously, this is atheoretical model since such a set representation cannot deal with infinite setsof values. Its main purpose is to make clearer the representation of abstractproperties via security types.


Since the definition of Non-Interference does not make assumptions on thevalue of private data, then any private component d is given a non-publicinput security type

τ(d)in = V × V = {(v1, v2) | v1, v2 ∈ V }

meaning that nothing is known of the compared value of c in two arbitrary exe-cutions. No constraints are set on τ(d)out: it can by anything. Non-Interferenceholds if no pair (v1, v2), with v1 6= v2, flows to public components during theprogram execution.

Abstract security types

Let the abstract domain A be generated by the upper closure operator ρ. Ifρ represents the observational power of an attacker A, then A is not able todistinguish two values v1 and v2 if ρ({v1}) = ρ({v2}). In terms of securitytypes, the value of a public c cannot be distinguished in any two executionsif its type is public, that is,

τ(c) ⊆ {(s, s) | s ∈ {ρ({v}) | v ∈ V }}

Example. Let ρp denote the abstract domain of parity. Then, ρp(c) is public ifit does not contain the pairs ([even] , [odd]) or ([odd] , [even]), so that an at-tacker cannot distinguish two executions unless it can see something more thanparity. In such a framework, a constant is assigned the type {([even] , [even])}or {([odd] , [odd])}, depending on its parity. 2

If the image of ρ is a finite set of abstract values, then security types arefinitely representable as sets of pairs of abstract values. Private componentsare given an input type ρ(V )× ρ(V ), where ρ(V ) = {ρ({v}) | v ∈ V }.

4.2.2 Higher-Order security types and type inference

The most natural choice in defining Non-Interference for the FUN language[73] is to consider, as the relevant input program components, the environmentvariables, divided into public and private. The only output value is the resultof evaluating a program (lambda-expression) P in some environment. Non-Interference is satisfied by P if attackers cannot distinguish between the valueof an expression, computed in two different environments.

Indistinguishability

Indistinguishability of the (public) input is formalized by the predicateenvE (ε1, ε2), where εi are two environments mapping variables to (concretevalues), and E is an abstract environment, mapping variables to security types

4.2 Security types 49

(depending on the functional type of variables and on whether a variable ispublic or private). Two values v1 and v2 are indistinguishable with respect tothe type σ (written stσ (v1, v2)) if both v1 and v2 are represented by σ:

stσ (v1, v2) ≡ ∃(s1, s2) ∈ σ. (v1 ∈ s1 ∧ v2 ∈ s2)

stσ→σ′ (f1, f2) ≡ ∀(s1, s2) ∈ σ → σ′. ∀v1, v2.

stσ (v1, v2)⇒ st(s1,s2)(σ) (f1(v1), f2(v2))

where

• abstract values s belong to the set ρ(V ) for some abstract domain ρ whichmodels the precision of attackers;

• v ∈ s if the concrete value v is represented by the abstract value s (formally,v ∈ γ(s), see Section 3.1.2);

• (s1, s2)(σ) is obtained as the set {(s1(t1), s2(t2)) | (t1, t2) ∈ σ} (the expres-sion s(t) is the abstract functional application of the abstract higher-ordervalue s to the abstract value t).

The indistinguishability relation on environments extends the st predicate:envE (ε1, ε2) ≡ ∀x. stE(x) (ε1(x), ε2(x)). The Abstract Non-Interference con-dition on FUN is defined as follows.

Definition 4.1 (Higher-Order Abstract Non-Interference). Let P be afunctional expression, which is to be evaluated in an abstract environment E.Let ρ be the oservational power of attackers on the output. Then, P satisfiesAbstract Non-Interference for E and ρ if

ANIE,ρ (P) ≡ ∀ε1, ε2. envE (ε1, ε2)⇒ ρ(JPKε1) = ρ(JPKε2)

If an attacker cannot distinguish two input environments (envE (ε1, ε2)), nei-ther he or she can distinguish the output computed value.

The type system

A (monomorphic) type system is provided, which infers a security type for P .The judgement E ` P : [(τ)]σ specifies that, in E, the expression P is givena (standard, monomorphic) type τ , together with a security type σ. Sometyping rules are shown below.

E ` n : [(int)](ρ(n),ρ(n))[const]

E ` x : E(x)[var]

E ` b : [(bool)]σbE ` e′ : [(τ)]σ′ E ` e′′ : [(τ)]σ′′

E ` if b then e′ else e′′ : [(τ)]σ′tσ′′[if]

E [F ← σ] ` e : σ σ minimal

E ` µF.e : σ[rec]


The rule for constants assign a constant, public type to n. Inferring thetype for a conditional involves taking both branches into account. The recur-sion rule finds a minimal type among the fixpoints.

Theorem 4.2. Let security types and abstract domains be related by

ρ σ ≡ ∀v1, v2. stσ (v1, v2)⇒ ρ(v1) = ρ(v2)

so that ρ cannot distinguish values which belong to the same type σ. Then, thefollowing holds:

ρ σ ∧ E ` P : [(τ)]σ =⇒ ANIE,ρ (P)

The security type σ is an upper approximation of the possible values of P,then the condition ρ σ implies that the output cannot be distinguished byan attacker with precision ρ. Note that, if, as it usually is, σ is built overρ(V ) = {ρ({v}) | v ∈ V }, then ρ σ only if σ is public.

An example

The outlined type system is able to correctly deal with recursive, higher-orderexpressions like

P ≡ µF. λx. if x = 0 then 2 else 2 ∗ y ∗ F (x− 1)

Here, y is a private variable; the observational power of attackers is ρp.The abstract domain is lifted the the functional order by means of domain

operators. For example, the set of functions from int to int is divided intofive abstract classes: functions (i) mapping all numbers to even numbers; (ii)leaving the parity unchanged; (iii) inverting the parity value; (iv) mappingall numbers to odds; (v) the remaining functions. On the concrete side, thevalue of P clearly depends on y, then Non-Interference is not satisfied. On theother hand, let abstract properties be taken into account. For every value ofy, the recursive function F has an even result value, regardless of x; then, itbelongs to the abstract class (i). Consequently, ANI for the domain of paritiesis satisfied by P .

Discussion

This type system can be practically improved in several ways. However, it is,in general,undecidable when the semantics domain is infinite. Difficulties areincreased by the construction of higher-order domains, which are, generally,much more complex than the primitive domains. Some typing rules, such asthe lambda-abstraction rule, use universal quantifiers on abstract values, thuspossibly resulting in non-termination.

4.3 A realistic approach to Information Flow 51

4.2.3 Stack-based security types and type inference

This approach can be also applied [74] to a small subset of Java bytecode,namely, the non procedural, sequential part which is based on the use of anoperand stack for evaluating expressions. A Control Flow Graph representsthe control flow of programs (see Chapter 6 for a similar technique). A blockis a sequence of non-jump instructions which are guaranteed to be executedsequentially (no exceptions). The security results obtained for a block arecombined in compound subprograms.

Security types are defined as in higher-order ANI (clearly, without higher-order features). In addition, some auxiliar information (prevalues) helps toimprove the type analysis by remembering the security conditions at the be-ginning of a block.

An abstract operand stack is used, which contains security types instead ofconcrete values. The abstract stack is manipulated according to the concreteone: when a binary operation is performed, two values are popped and theresult (of combining their security types) is then pushed on the stack.

4.2.4 Brief discussion

The notion of security type which was presented in this section is interest-ing since it provides a first step towards an automatic representation of ab-stract security information. Next sections in this chapter will follow a differ-ent approach and deal with more complex languages. The focus will be ondefining security properties whose analysis takes advantage of the languagemechanisms. This could help to build an integrated analysis framework whichchecks Information Flow security properties together with standard correct-ness conditions (such as type safety) which are enforced by the usual designand execution frameworks.

4.3 A realistic approach to Information Flow

Most Information Flow properties were originally defined on simple languages,similar to IMP and FUN (introduced, respectively, in Section 2.1.1 and 2.1.2).In this case, programs work on amounts of data which are fixed throughoutthe execution: it is not possible to create or delete data by new variable dec-laration, explicit memory allocation/deallocation, object creation or garbagecollection. Due to these simplifications, it is easy to split data into a public anda private part; the distinction between input and output values is also clear.Consequently, the foundational definitions of Non-Interference work well.

None of today real-life programming languages do match this framework.The ability to produce large and flexible programs is utterly increased by pro-viding mechanisms which manipulate, at runtime, the amount of data whichcan be accessed and modified. These techniques result in a more complex


behavior of programs with respect to properties about the manipulation ofprogram data. Information Flow is one of such properties; the analysis ofrealistic programming languages needs more comprehensive notions of Non-Interference to be defined. For example, the simple addition of dynamic vari-able declaration to IMP is a sufficient condition to make the original notion ofNon-Interference somehow inadequate: the variable set is no longer constantduring program execution.

In imperative languages, many features concur to increase the difficulties:

• the presence of procedures needs a consistent treatment of subprograms;• variables can be declared everywhere in the code;• program types are not limited to integer numbers;• programs can receive and release information during their execution via

I/O operations.

In addition, some problems are peculiar to the Object-Oriented framework:

• Classes identify new data types whose structure can be quite complex.• Objects can be dynamically created, so that the amount of allocated mem-

ory is not fixed.• Objects can be nested into other objects as field values.• A class can inherit data or behavior from its superclasses.• Polymorphism: when procedures take the form of instance methods, it is

not statically decidable which method will be invoked at runtime.

The specification of the Java programming language [41] includes all thesefeatures; unused memory is periodically freed by a Garbage Collection oper-ation. Java also includes interface declaration, exceptions, concurrency andmany other mechanisms which will not be addressed here.

4.3.1 Which definition of Non-Interference

The definition of Non-Interference by Goguen and Meseguer (Section 3.2) isthe guideline for a correct model of Information Flow analysis. However, dueto its genericity and informality, the problem arises of tailoring the propertyto a specific programming language and execution environment. In particular,it must be made clear what observers can see of an execution framework; thatis, the semantics which is used by observers in order to acquire knowledgeabout programs.Example. Let P be a program written in a variant of IMP with dynamicvariable declaration. h and h1 are private variables.

i f ( h == 0) {h1 = 1 ;

} else {int i = 8 ;h1 = 2 ; }


This code fragment has the property of Non-Interference if the observer canonly see public variables on input and output (input-output semantics), sincei is no longer visible outside the else branch and there is not any harmful as-signment depending on the value of h. However, two executions with differentvalues for h can be distinguished by the observer if, for example, he or shecan observe one of the following program properties:

• the value of variables at every program point (trace semantics), not onlyon input and output; in this case, the temporary value 8 for i would bedetected;

• the amount of public allocated memory; in this case, the execution of theelse branch results in one more defined memory location.

2

This discussion may seem useless if we limit ourselves to simple program-ming languages; however, it turns out to be crucial in an Object-Orientedframework, where dynamic memory allocation is a key feature.

Remark 4.3. The above example discusses Information Flow program prop-erties with respect to which parts of the execution environment an observeris able to see. Orthogonally, the ANI weakened version of Non-Interferenceis parameterized on how an observer can see data, that is, the level of preci-sion in observing public information. A working definition of Abstract Non-Interference in a real-life programming language should answer to both ques-tions.

4.3.2 What is an NI-secure program

A Java program is a collection of inter-operating classes; each class has itsown operations and contains its own data, either belonging to the class itselfor to some of its instances (objects). The notion of Non-Interference we adopthere is quite close to the original one:

what an attacker can see at the end of the execution of a programdoes not allow him or her to acquire (abstract) information about se-cret input data

In order to adapt this generic notion to Java, we must make clear thefollowing:

• what an attacker is and can do during the execution of observed programs;• what the input and output of a program precisely are;• which data an attacker can watch, in the sense of Example 4.3.1.

Actually, we will deal with the language introduced in Section 2.2, insteadof the real Java programming language. Consequently, advanced Java featuresas exceptions, interfaces, inner classes and concurrency need not to be takeninto account.


Programs and attackers

Usually, when modeling Information Flow analysis, the role of attackers isnot explicitly specified: an attacker is simply an agent which can watch theexecution of programs with the magnifying glass provided by some semantics.For example, in analyzing IMP programs, the semantics is often the input-output semantics on the values of non-protected variables. Then, attackersdo nothing but watching the value of public data at some selected programpoints (the beginning and the end).

In our Object-Oriented framework, we choose to see attackers A as pro-grams running in the same environment as the program P whose secrets areto be broken. A is supposed to include classes and methods which can interactwith P classes in order to disclose their secrets. An attacker can be seen intwo ways:

• A can invoke static or instance methods of any class of P , and freely accessto their non-protected static or instance fields. In this framework, P canbe seen as a class library whose functionalities can possibly release secrets.There is no need for P to be a complete program with a main() entryprocedure.

• A is interested to the whole execution of programs (which are completeprograms with main()); then, input and output program data are thosewhich can be accessed respectively before and after executing main().

In both cases, Non-Interference holds if A is not able, by accessing to thepublic output data of P , to acquire information about the private input data.

The input and output of programs

The picture outlined above relies on the existence of objects which encapsulaterelevant program data before methods are actually executed. Existence ofobjects can be supposed when methods are called as in class libraries (see firstformalization of attackers in the previous paragraph). On the other hand, inthe case of a standalone Java executable file, the main() method has, as itsonly input, the string sequence args of its actual parameters.

Our choice is to suppose that main() begins by the fetchData() staticmethod, which fetches all the useful input information via the invocation ofsome class methods or instance methods belonging to newly created objects.These methods can, for example, read files or perform I/O operations; thedata which are fetched can be public or private. The part of the programwhich is relevant to Information Flow analysis begins after fetchData(); theinput of a program is the amount of data which have been acquired by thismethod.

Symmetrically, the output of the program is written to persistent mem-ory or external devices by means of the procedure releaseData(). The basicstructure of main() is shown below.


static void main ( St r ing [ ] args ) {

// d e c l a r a t i o n s o f t he v a r i a b l e s which w i l l// conta in the f e t c h ed data

f e tchData ( ) ;

// the p r i n c i p a l par t o f t he program beg ins here

. . .

// the p r i n c i p a l par t o f t he program ends here

r e l ea s eData ( ) ;}

Example. Let P be a program which begins its execution by reading thefile in.txt, performs some computations and eventually writes the resultsinto in.txt. The initial procedure fetchData() loads in.txt into an objectin of some suitable class. Similarly, final data are released from an object outto out.txt by releaseData(). P can generate dangerous information flows ifout.txt can be read by an attacker, while in.txt cannot. In this case thesecurity policy must consider in as private and out as public. 2

The rest of our program analysis will suppose the existence of relevantdata before and after executing methods.

Program I/O

Direct Input/Output is one of the most usual features in realistic programminglanguages. It involves the possibility that data can be acquired or released atintermediate executions steps (not only in fetchData() or releaseData()).

Input/Output procedures are methods. We can suppose that:

• For input: data are initially acquired by fetchData() and made availableto P computations; this does not correctly describe program interactivity,since, in a real framework, program inputs may also depend on previouscomputations.

• For output: data which are output at runtime are stored into special ob-jects which cannot be further modified, so that an illicit output can bedetected by observing public data after execution.

Example. Let P be the program

public C m( ) {


C l = getPub l i c ( ) ;C h = getPr iva te ( ) ;r e l e a s e (m1(h , l ) ) ;return l ;

}

where release () is an Output procedure. The returned value of m() is clearlyunaffected by the private value h (provided that m1() has no side effects);however, since the value of h is output before the termination, an attackerwould be able to dangerously acquire protected information. We suppose thatrelease () writes the result of m1(h,l) in a public object o; the InformationFlow analysis will detect this forbidden flow in the final value of o. 2

Public and private data

The Java programming language comes with its own modifiers public, privateand protected. These modifiers model security requirements: for example, aprivate field cannot be directly accessed from outside its class. The securityproperty which is enforced by modifiers is a kind of access control [48]: directaccess to data is limited to a certain set of agents (objects). However, infor-mation flow properties should not be limited to direct access to data: secretinformation cannot be propagated to the visible part of data. The followingpiece of code outlines the difference:

public class D {public C l ;private C h ;public C m( ) { return h ; }

}

This class declaration is perfectly legal; yet, the private field h is propagatedvia the return value of the (public) method m().

For the purpose of information flow analysis, we add two further modifiersH and L, meaning, respectively, high-security and low-security level, which areorthogonal (although related) to private, public and protected. For example,the field declaration private H C f is legal.

Due to this choice, the definition of public and private data is related toclass fields: if H C f is declared in D, then all the C objects stored in thefield f of D objects are considered as private. The program memory is splitinto private and public locations. A location l is private if (i) its content isan object referred by some private field; or (ii) it is referred by a private orpublic field of some private location l′.

In the following, the notion of private and public data will refer to H andL instead of private and public Java modifiers.

4.4 Class-Oriented Abstract Non-Interference 57

The attacker semantics on observed programs

The previous paragraphs define the semantics of attackers in observing pro-grams. Basically, it is an input-output semantics restricted to variables whichare externally visible, are defined in fetchData() and released in releaseData()(then, also including explicit I/O procedures). Attackers cannot see:

• private data throughout the whole execution;• temporary data which do not reach the end of program execution; they can

be observed only if their content flows to public data which are eventuallyreleased in releaseData();

• the memory state; for example, the amount of allocated data at a givenprogram point;

• the behavior of programs with respect to computational complexity.

The last two items (which are exemplified in the following method defini-tions) represent some form of information leaking; we choose not to considerattackers which can exploit them.

This fragment describes how an attacker which can see the amount ofallocated memory is able to acquire information about the private variable h:he or she can guess the value of b(h) by observing whether a memory regionhas been allocated by someMemoryAllocation().

// example o f I n t e r f e r en c e v ia memory a l l o c a t i o nvoid m( ) {

i f ( b (h ) ) {someMemoryAllocation ( ) ;

}}

If we can observe the execution time, we are able to guess whether theboolean guard was verified at this program point.

// example o f I n t e r f e r en c e v ia the o b s e r v a t i on// o f computat iona l comp lex i t yvoid m( ) {

i f ( b (h ) ) {someExponentialComputation ( ) ;

}}

4.4 Class-Oriented Abstract Non-Interference

Section 4.3 dealt with the problem of specifying what attackers can see of theexecution environment of programs. The outlined choices apply to any analysis


framework for standard Non-Interference, in the sense of Section 3.2. AbstractNon-Interference is interested in how data can be observed by untrusted users;that is, the degree of precision attackers have in observing public information.

In some universe of entities, abstract properties identify sets of elementssharing some common behavior. In an Object-Oriented language, classes havea similar purpose: they identify collections of objects with a similar internalstructure. Then, we model properties by means of classes. This allows toreduce the checking of properties to a type-directed program analysis. Classesare ordered by the subclass relation, that is, a class represents a superset of theobjects represented by any of its subclasses. The subclass relation models sub-properties, that is, properties which are more precise since they are satisfiedby a smaller set of elements.

Depending on the context, in this section the identifiers C,D, . . . will de-note sets of semantic objects, class names or the abstract properties theymodel. The subclass relation of C ′ with respect to C is denoted by C ′ ≺ C.Given a class C, the predicate o : C holds if the object o is in class C (or,equivalently, if it satisfies the property C). o :: C means that o : C and theredoes not exist any D ≺ C such that o : D.

Remark 4.4. We use the (ρ)P (ρ 8ρ) version of Abstract Non-Interference,not excluding a priori that the observational power is not the same for allprogram .

4.4.1 Class hierarchies as abstract domains

In our Abstract Interpretation framework, given a set of objects (i.e. elementsof some universe) C, the powerset C = ℘(C) is the concrete domain. A classhierarchy with C as its root identifies a subset of abstract properties D ∈ C,that is, an abstract domain A.Example. Let C be the set of integers. In Java, it is modeled by the classC = Integer. Let the semantic abstract properties of sign be modeled by ρs;it can be represented by the following class hierarchy:

{[Integer] , [Pos] ≺ [Integer] , [Neg] ≺ [Integer]}

2

In terms of program classes, the upper closure operator ρA(C), for anabstract domain A and an object set C, is defined as the smallest class C0 ∈ Asuch that every c ∈ C satisfies c : C0. The lub t of a set of classes is the leastsuperclass of all classes in the set. No class equivalent exists for u (see Section4.4.2). The set of program classes can be always seen as a domain whose topelement is the class Object and the bottom element is an ad hoc empty classbot. We can suppose bot to exist, even if it cannot be declared in Java asa subclass of all classes (see below), since it plays no part in computations;therefore, it is not strictly required to exist.


4.4.2 Class intersection

Since abstract domains are closed under intersection (Section 3.1), for everyclasses C1 and C2 there must be a subclass representing C ′ = C1∩C2. Due tothe property of class hierarchies, C ′ should be a subclass of both C1 and C2.Unless C1 be a subclass of C2 or vice versa, C ′ cannot be represented sincethe Java language has no mechanisms for multiple inheritance.

Consequently, representable abstract domains are those in which twoclasses (sets of concrete values) are either related by ≺ or disjoint. If theyare disjoint, C1 ∩ C2 = ∅ can be represented by the bot class.

In the following, an abstract domain A is tree-like if it satisfies ∀a, b ∈A. a u b ∈ {a, b,⊥}, that is, if it is representable by a single inheritance classhierarchy. We choose this name since these domains are graphically similar totrees, provided ⊥ is excluded from the representation. In domains which arenot tree-like, some abstract values cannot be represented as classes.Example. Let the class Integer have subclasses Even and Odd, representingparities, and Pos and Neg, representing signs. A domain ρps representingproperties of numbers with respect to both sign and parity is not representableby extending the same Java class hierarchy with new classes. In facts, the classNegEven of negative even numbers should be a subclass of both Even and Neg.2

In a language with multiple inheritance as C++, the domain of the aboveexample would be representable (obviously, if attention is paid to the compli-cated language mechanisms).

4.4.3 Domains and program code

Representing properties as classes involves performing the program analysiswith the help of a type-checking algorithm. Usually, a Java program comeswith its own classes for organizing and storing structured data. In general,program analysis can be demanded for a property which is not representedby any program class.Example. A program P contains the classes E, E1, E2 and E3 with E1 ≺ E,E2 ≺ E and E3 ≺ E; the abstract domain represented by this class hierarchyis the program domain AP . Let AA be a domain which cannot distinguishbetween E1 and E2; on the other hand, let AA contain the properties E4 andE5, which are both more precise than E3. If the program designer wants toanalyze the program with respect to the analysis domain AA, he or she mustinclude the information of AA into the program classes. Then, he or she shouldbuild a new domain representing both AP and AA. The graphic representationof domains is shown in Figure 4.1.

2

The program should be modified according to the new hierarchy:


Fig. 4.1. The modified hierarchy

AP

E

E3

E2E1

⊥

AA

E

E4

E6E5

⊥

The modified hierarchy

E

E4 E3

E2E6

E1E5

⊥

• an object creation new E3(...) should be replaced by new E5(...) ornew E6(...) whenever one of the two sub-properties holds for the newobject. It should be clear that the analysis designer must know, for somex, whether PE5

(x) and PE6(x) hold or not (i.e. whether x satisfies the

abstract properties): it is part of his or her knowledge of the problem.• The analyzer does not need to distinguish E1 from E2 sinceAA can only see

E4; then, any occurrence of E1 or E2 can be replaced by E4 (for example,when inferring the type of expressions).

• overriding methods of inserted classes should incorporate the behavior ofoperators with respect to the properties to be checked. This point will bebetter explained in the example below.

Remark 4.5. In the above example, both AP and AA are downflat (see Section4.4.2) and their union is also a domain; AP ∪AA is also downflat, since thereis no p ∈ AP and a ∈ AA such that p u a /∈ {p, a,⊥}. Consequently, it isthe reduced product of AP and AA. In general, this could be false; in facts,it is reasonable (for simplicity) to require AA to be downflat (ANI uses, inpractice, flat domains which are also downflat), but this condition does notrule out the possibility to have two incomparable, non-disjoint abstract valuesin AP ∪AA.

4.4.4 Class extension

In the common practice of Object-Oriented code design, subclasses are notonly devoted to identify subsets of objects. In general, the subclass objectshave a different internal structure with respect to objects belonging to the


superclass. Modifying the structure of objects is done by adding or redefiningclass members, that is, fields and methods.

The meaning of redefined (overridden) members depends on the role ofclasses in the program.

• If a class E′ ≺ E is part of the program classes AP , then its redefinedmembers simply specify the behavior of E ′ objects when it is differentwith respect to E. In this case, program analysis must take into accountevery method which can possibly be executed at runtime.

• If E′ belongs to AA (that is, it is not part of the original program), thenits members should incorporate the knowledge of the designer about thesecurity policy to be enforced.

The second possibility is made clear by the example below: Example. Con-sider the class hierarchy of Example 4.4.1; by redefining methods, it is possibleto specify the behavior of integer operations on subclasses. In particular, theObject-Oriented addition on integers can be specialized to non-negative num-bers:

class I n t e g e r {. . .I n t e g e r add( In t e g e r n). . .

}class Pos extends I n t e g e r {

. . .I n t e g e r add( In t e g e r n ) ; // i n h e r i t e dPos add (Pos n ) ;I n t e g e r add(Neg n). . .

}

The redefined add() method encodes some knowledge of the behavior of addi-tion on positive and negative numbers. For example, adding two non-negativesyields a non-negative; on the other hand, nothing can be said if a non-negativeis added to a negative. 2

4.4.5 Advantages and limitations

Modeling abstract properties via classes allows to reduce the process of com-puting and checking properties to well-known typing algorithms.

Theoretical definitions of Abstract Non-Interference and similar proper-ties, which rely on performing operations on abstract domains, often use aset-theoretic representation of abstract values. For example, in the abstractdomain of parity, there are abstract values identifying the infinite sets of even


and odd numbers. In principle, representing infinite sets is impossible. In-tensional, algorithmic definitions can deal with some simple cases (e.g. evennumbers are those which yield no remainder when divided by 2), but in generalthe problem of verifying if an element satisfies a property is not decidable.

The modeling of abstract properties by means of classes relies on the set-theoretical flavour of Object-Oriented mechanisms. However, many Object-Oriented features do not match with this spirit. For example, the executionof some object method can result in a change in the behavior of the objectitself with respect to some property.Example. Let the class Mod3 model the properties contains an integer numberwhich is divisible by 3.

class Mod3 ( ) {int n ; // t h i s must be d i v i s i b l e by 3void m( ) { n++; }

}

It is clear that m() can break the property described by Mod3 because of thenew value of n. 2

Moreover, it is not always possible to consider classes as subsets of theirsuperclasses.Example. The following classes (which use primitive types, not included inour language) show how it is possible to declare the subclass OddButLegal,whose instances are not a subset of Pos instances.

class Pos {// p o s i t i v e i n t e g e r sprotected int x ;Pos ( ) { x = 0 ; }void m( ) { x++; }

}class OddButLegal extends Pos {

// i n t e g e r s such t ha t x <= −11OddButLegal ( ) { x = −11; }void m( ) { x−−; }

}class SuperClass extends Pos {

// i n t e g e r s such t ha t x >= −5SuperClass ( ) { x = −5; }

}

2


In facts, our properties are likely to be of the form contains a field f of classC. Then, standard type checking would guarantee the property preservation.Example. Let C1 and C2 be subclasses of C. Then, the properties D, D1 andD2 are defined as contains f of class x, where x is, respectively, C, C1 andC2.

class D { C f ; C g ; }class D1 { C1 f ; C g ; }class D2 { C2 f ; C g ; }

Subclasses D1 and D2 cannot contain methods which break their correspon-dent abstract property, since type integrity forbids an assignment to f unlessit is compatible with C1 or C2. 2

If we limit ourselves to properties of this form, the preservation of a prop-erty on a given object is automatically enforced by type checking mechanisms.Otherwise, if our purpose is to model data which can change their behaviorwith respect to properties, this can be done by created new objects of differentclass which acquire the invariant part of relevant data.Example. The following method perform a pseudo-modification of an objectby replacing it with some new object with the desired property.

D2 changeProperty (D1 d , C2 c ) {D2 aux = new D2 ( ) ;aux . f = c ;aux . g = d . g ;return aux ;

}

Clearly, changeProperty() cannot be a method of the class D1, since an objectis not able to modify its own class. 2

Under the condition of limiting abstract properties to constraints on theproperties of object members, the subset relation is correctly formalized bysubclasses: for example, the class D1 in Example 4.4.5 identifies a subset ofD, provided that the same condition is inductively satisfied by C1 and C.

5

Program analysis for Abstract

Non-Interference

Better be despised for too anxious apprehensions, than ruined by tooconfident security.

Edmund Burke (1729 - 1797)

Misquotation is, in fact, the pride and privilege of the learned. Awidely- read man never quotes accurately, for the rather obvious rea-son that he has read too widely.

Hesketh Pearson, Common Misquotations

In this chapter, the main lines of an algorithm for checking Abstract Non-Interference in JL programs are presented. This static analyzer computes aboolean function representing the behavior of programs with respect to thedesired information flow properties. Boolean functions encapsulate informa-tion about the propagation of secrets among program data. As in the previouschapter, we intend typical numerical domains as they are defined in Section3.1.7.

5.1 Preliminary definitions

5.1.1 Methods

Methods are declared as class members. The same method name can be as-signed to several subprograms. In the following, no distinction will be madebetween (i) different methods with the same name but different, incompat-ible signatures; (ii) different instances of a method (obtained, for example,by means of overriding). We simply consider the set Md as containing allthe subprograms which can be executed when invoking methods. An element

66 5 Program analysis for ANI

d ∈ Md will be referred to as a method definition, a method instance (not tobe confused with an instance method) or, simply, a method.Example. Consider the following classes:

class C {D m( . . . ) { . . . } // d1

E m( . . . ) { . . . } // d2

F n ( . . . ) { . . . }}class C1 extends C {

D m( . . . ) { . . . } // d′1G p ( . . . ) { . . . }

}

After these class declarations, Md contains three method instances for themethod name m. The method instance d′1, which redefines the correspondingC instance d1, and d2, which is another method tout court, are equally con-sidered as distinct instances with respect to d1. 2

In the following, our interest will be in computing which method instancescan be executed at runtime when a method invocation occurs.

5.1.2 Type environments

Type environments ε ∈ E are functions from identifiers to types C ∈ C. Forevery program point p in a method body, a type environment is computedin order to approximate the possible runtime types that identifiers can haveat p. The abstract semantics JsK]ε computes the type environment after thestatement s, starting from ε.

Let the method instance d be declared in the class D as C .m(P1, . . . , Pk),where P1 . . . Pk are the expected types of parameters. When analyzing thebody of d, the initial type environment ε0 is as follows:

ε0(pi) = Pi parametersε0(v) = ⊥ local variables

ε0(this) = Dε0(u .f) = declared type for f in ε0(u)

The abstract semantics computes an over-approximation of runtime types, inthe form of type environments:

5.1 Preliminary definitions 67

Jv=eK]ε = ε[

v ← JeK]ε]

Jv .f=eK]ε = ε[

v .f ← JeK]ε]

Js1 ; s2K]ε = Js2K]Js1K]ε

J if (b) s1 else s2K]ε = Js1K]ε g Js2K]εJwhile (b) sK]ε = lfpε

(

λε′. εg JsK]ε′)

Jv .m(a1 . . . ak)K]ε = JmK]ε′ ε′ = ε[

this ← JvK]ε] [

pi ← JaiK]ε]

JvK]ε = ε(v)

Jv .fK]ε = ε(v .f)

Jnew C( . . . )K]ε = C

In the following, Tε (e) will often stand for JeK]ε. The following definition usestype environments in order to identify the set of method instances which canbe executed at a given program point.

Definition 5.1 (method instances). Let the method invocation

v .m(e1, . . . , ek)

occur at the program point p, in the type environment ε. For every class C �JvK]ε, the set instε (C,m, e1 . . . ek) contains the method instances for whichthe type of formal parameters is compatible with the type of e1, . . . , ek in ε. Ifm is a static method, so that the invocation takes the form C .m(e1, . . . , ek),then C is the only class to be considered. Intuitivelly, we say that the typesPj of formal parameters are compatible with ε if they are comparable (that is,� or �) to the corresponding ε(ej), so that their runtime types C ′ � Pj andC ′′ � ε(ej) can be possibly the same.

5.1.3 Program identifiers

When executing a method, the program code of the method body can referto a set of program identifiers. This set contains the formal parameters, thelocal variables and the special variable this. Moreover, for an identifier v anda field name f , the field v .f may be referred.

In Object-Oriented languages, due to the well-known aliasing problem,a modification in a variable can affect other identifiers which refer to thesame object. We are interested in computing which program identifiers canbe affected by an assignment. The program memory can be seen as a set Lof locations l, each belonging to a class Cl (l). In a given type environmentε, a variable v is assigned to a set of locations locsε (v), namely, an upperapproximation of its possible runtime locations. We define a Galois connectionbetween sets of locations and types:


Definition 5.2.

℘(L) −−−→←−−−αγC

α(L) = g ({Cl (l) | l ∈ L})γ(C) = {l ∈ L | Cl (l) � C}

As a soundness requirement, we have Tε (v) � α(locsε (v)), that is, a typingmust be a correct description of runtime values.

Next, we compute the set of locations corresponding to identifiers andexpressions in ε:

Definition 5.3. The set of locations corresponding to variables and expres-sions is computed as follows:

locsε (l .f) = {the location referred by the f field of l}

locsε (v) = γ(Tε (v))

locsε (e) = γ(Tε (e))

locsε (v .f) =⋃

{locsε (l .f) | l ∈ locsε (v)}

When assigning an identifier by means of the statement x=e, data prop-agation to x may affect all the identifiers potentially sharing their locationwith x. These identifiers are said to be the sharing identifiers for x and arecomputed by sh (x).

Definition 5.4 (Sharing identifiers). The set of sharing identifiers for anidentifier x is defined as

shε (v) = {v}

shε (v .f) = {u.f | locsε (u) ∩ locsε (v) 6= ∅}

where ε is the type environment at the current program point. When a localvariable v is assigned, the set of its locations is completely replaced by a newone, then no other identifiers need to be considered. On the other hand, whenmodifying the field f of v, it is possible that the same field of some identifieru is also modified, because u and v can refer to the same object.

The definition of sh aims at dealing with aliasing. As it is presented inDefinition 5.3, locs is clearly a very poor choice for this purpose. In facts, x issupposed to share locations with y unless their types are completely unrelated.Example. Let x and y be defined by the assignments x=new C( . . . ) andy=new C( . . . ). Clearly, x and y do not alias even if they have the same type.2

The separation of types is too strong a requirement for dealing with alias-ing; some more refined mechanism is needed. For example, the Hoare-like logicillustrated in Section 3.6.1 can be better in telling whether program identifiersare supposed to alias.

5.2 Abstract dependency calculus 69

5.2 Abstract dependency calculus

The calculus of dependencies [14, 2] is an important task in program analysis.Data dependencies describe how the information propagates in the code ofa program during its execution. A well-known application is the analysis ofunused variables in compilers, which aims at simplifying the translated pro-gram by detecting and eliminating useless parts of the input program [3]. Ifprogram data are contained in variables, we say that the variable x dependson y at the program point p if the value of x at p is affected by y.

Since data are mainly manipulated by variable assignments, we are par-ticularly interested in deciding, in the statement x := e, which variables arerelevant to the abstract value of the expression e. This is the base level incomputing dependencies; subsequently, they need to be propagated in theprogram code by means of some control flow algorithm (which we suppose toexist and build a dependency graph, see below).

5.2.1 Preliminary notions

This section introduces some notions and notation conventions which will beuseful throughout this section. We also describe analysis tools for improvingthe results obtained with the dependency calculus of Section 5.2.5.

Dependency graphs

A dependency graph can be built out of a program by means of control flowprogram analysis; it describes how data propagate during the execution. Fol-lowing an approach similar to that of Program Slicing [72], we might be in-terested in computing dependencies on program statements: the statements2 depends on s1 if some of the variables which are used inside s2 (and notredefined before their use) are defined inside s1, and this definition reachess2 in at least one possible execution path. Graph vertexes consist of programstatements; an edge from s2 to s1 is added if there exists a dependency of s1on s2.Example. Consider the following code:

(0) : v := 1;

(1) : x := e1;

(2) : y := e2;

(3) : x := 0;

(4) : z := x+ y;

The statement (4) depends on (2) since the definition of y is relevant to theexpression x+y; similarly, (4) depends on (3). On the other hand, it does notdepend on (0) because v is not used in (4). Finally, there are no dependencies


of (4) on (1) since the definition of x does not reach the last statement (thatis, there is only one path from (1) to (4), and another assignment of x occurson this path). The dependency graph for this fragment would look as

(2) (3)

(4)

(1)(0)

2

There is a number of tools for building and analyzing dependency graphs;they allow to compute how variable definitions propagate to program state-ments. In the rest of this work, it is supposed that some standard techniqueis used for this purpose. The key point which characterizes our approach isthe definition of what using a variable means. Therefore, we will focus on thein the small level: how to compute the set of used variables in the statementx := e. As will be clear in the rest of this section, this set cannot be easilycomputed as vars (e) when abstract properties are considered.

The evaluation of expressions

We refer to the IMP language of Section 2.1.1. For a program point p and

a variable x, JxK[p (σ) is the set of values x can possibly take at p, startingfrom the initial state σ. Given an expression e ∈ E, we write e [x1, . . . , xk] tomake explicit the occurring variables (that is, vars (e) = {x1, . . . , xk}). Fore [x1, . . . , xk], the functional application e[ (v1, . . . , vk) is the result of evalu-ating e when, for every i, xi is given the value vi (in this case, variables areconsidered as an ordered sequence).

On the abstract side, when an abstract domain A (with ρ denoting theupper closure operator on the concrete domain C) is given and no confusionarises about it, e] (V1, . . . , Vk) is the abstract evaluation of e on the abstractvalues Vi belonging to A. Note that, in the following, the evaluation (v)] (·) ofa constant v will be often replaced by (α(v))] (·) (for example, in the domainof signs, evaluating (6)] is equivalent to evaluating ([pos])]).

An abstract expression e] with k variables can be seen as a function map-ping Ak to A, defined as the best correct approximation (see Section 3.1.4)of e[:

e] (V1, . . . , Vk) = α({

e[ (v1, . . . , vk) | ∀i. v1 ∈ γ(Vi)})

JxK]p (σ) is the abstract counterpart of JxK[p (σ), that is,

JxK]p (σ) = α(

JxK[p (σ))


The abstract semantics

We define a sound abstract semantics to statically infer abstract properties ofvariables. Given a variable x and a program point p, the abstract semantics,parameterized on the abstract domain A, computes an over-approximationV ′ of the minimal abstract value V ∈ A which safely represents all possibleconcrete values of x at p, so that the following must hold:

V ′ ≥A∨

{

JxK]p (σ) | σ is an input state}

An abstract environment ε ∈ E is a function mapping variables to abstractvalues (see Section 5.1.2 for a similar notion of environments with types). It

is computed compositionally: S JsK]ε0 builds the abstract environment at p,where s is a statement, ε0 is the environment before s and p is the programpoint after s. e[ (σ) and e] (ε) are, respectively, shorthands for e[ (v1, . . . , vk)and e] (V1, . . . , Vk), where, for every i, xi has value vi in σ and Vi in ε. Ab-stract environments are computed in the standard way (ε′ ∨ ε′′ is computedpointwise):

S JskipK]ε = ε

S Jx := eK]ε = ε[

x← e] (ε)]

S Js1 ; s2K]ε = S Js2K]SJs1K]ε

S J if b then s1 else s2K]ε = S Js1K]ε ∨ S Js2K]εS Jwhile b do sK]ε = lfpε

(

λε′. ε ∨ S JsK]ε′)

5.2.2 Abstract Dependencies

As explained before, we are interested in finding which variables affect thefinal value of x in the assignment x := e. We write x ∈ ref (e) if x is relevantto e, consistently with the notation of Program Slicing. Most standard depen-dency calculi compute relevant variables for e as vars (e). This is clearly anapproximation: for example, if e takes the form x + y − y, its value does notdepend on y, although the variable belongs to vars (e). However, describingdependency by means of variable occurrence (that is, by only considering thesyntactic level) is, in the average case, quite a good approximation of the real(semantic) notion of dependency, outlined in Definition 5.5.

Definition 5.5 (concrete dependencies). An expression e is said to de-pend on y (written y e) if its value can change depending on the differentvalues y takes, that is, if there exist two values for y which imply differentvalues of e.


y e [y, x1 . . . xk ]

⇐⇒

∃v′, v′′, {ui}1...k. e[ (v′, u1 . . . uk) 6= e[ (v′′, u1 . . . uk)

The picture is quite different when abstract dependencies are considered.The abstract calculus of dependencies is a weaker version of its standard(concrete) counterpart: it is very often the case that e depends, with respectto the considered abstract property, on a strict subset of its variables. Thefollowing is a semantic definition of abstract dependencies:

Definition 5.6 (abstract dependencies). e abstractly depends on y, with

respect to an abstract property ρ (written yρ e), if the property can change

depending on the abstract value of y.

yρ e [y, x1 . . . xk ]

⇐⇒

∃v′, v′′, u′1, u′′1 . . . u

′k, u′′k .

α(u′i) = α(u′′i ) = Vi ∧ e] (α(v′), V1 . . . Vk) 6= e] (α(v′′), V1 . . . Vk)

If we compute abstract dependencies by only observing syntax, then aconsiderable loss of precision occurs. In facts, any purely syntactic approachlooks at occurrence as the only criterion to decide relevance (formally, x e iffx ∈ vars (e)). As a consequence, the syntactic-directed result for any abstractproperty ρ would be the same as for concrete dependencies. Instead, we wantabstraction to make a difference with respect to dependencies.Example. Consider the expression e [x, y] ≡ (x ∗ x+ 1) ∗ y. The final value ofe clearly depends on both x and y, so that the syntax-directed dependencycalculus yields a correct result (indeed, all occurring variables are relevant).On the other hand, abstract dependencies with respect to the sign domain ρsshow how a syntactical approach may be insufficient. In facts, x has no effectson the sign of e (that is, on the abstract value of e]); yet, we need semanticsin order to detect such a false dependency. 2

The following is an easy property of abstract dependency:

Proposition 5.7. Let ρ′ and ρ′′ be two abstract domains such that ρ′ is moreprecise than ρ′′. Then, for every x ∈ vars (e),

xρ′′

e =⇒ xρ′

e

5.2.3 Computing an expression on its inputs

Given an expression, the aim of the abstract dependency calculus is to com-pute the set of variables which are indeed relevant to its abstract value. As


shown above, this is, in principle, a semantic task and cannot be done au-tomatically for arbitrary expressions. In other words, our only possibility toexactly translate Definition 5.6 into an algorithm is to acquire informationabout the abstract evaluation of e on every possible input.Example. To compute the set of ρs-dependencies on an expression e [x, y, z],we must compute e] on every possible sign of x, y and z. Thus, the value of e]

must be computed 33 = 27 times. An occurring variable (y, in this example)is not relevant to e] if the result

e] (Vx, [neg] , Vz) = e] (Vx, [0] , Vz) = e] (Vx, [pos] , Vz)

holds for every Vx and Vz, that is, changing the abstract value of y does notaffect e]. 2

We assume to have an algorithm which computes e] on an abstract input.Intuitivelly, given an abstract domain ρ, e] models the best correct approxi-mation of e[, that is1,

e] (ε) = ρ({

e[ (σ) | σ ∈ ρ(ε)})

For example, e] can have a set of abstract operators like ⊕ or ⊗, describingthe behavior of + or × in ρs:

[pos]⊕ [pos] = [pos] [pos]⊗ [pos] = [pos][neg]⊕ [neg] = [neg] [neg]⊗ [neg] = [pos]

Let |e|] be the computational complexity of e] on an expression e. Inthe example above, the abstract value of e must be computed once for ev-ery abstract input. Consequently, this technique has exponential complexity

O(

|e|] ·#(A)k)

, where k = #(vars (e)) is the number of variables occurring

in e. Although, in common expressions, both #(A) and k are usually quitesmall, an improvement is definitely needed. In facts, this calculus should alsodeal with procedures and functions, which possibly involve a larger set ofvariables.

An abstract value V ∈ A is called an atom if there is no other V ′ 6= ⊥such that V ′ ≤ V . The predicate Atom (V ) indicates atomicity.

Assumption 5.1 In Definition 5.6, the focus is on the abstraction of single-tons (see also Example 5.2.3, where > is not considered among the possibleinputs). We assume that atomic values are those which can be obtained by theabstraction of singletons:

Atom (V ) ⇐⇒ ∃v. V = α({v})

1 We leave to Remark 5.8 the discussion on how ≥ instead of = might hold in thisformula.


The assumption holds if and only if atoms form a partition of the set of con-crete values. This is not true for arbitrary domains. However, every domainA can be transformed into another domain A′ which satisfies the property. A′

is obtained by adding to A, for every V ′ ∈ A and V ′′ ≤ V ′, the abstract valuerepresenting V ′ \ V ′′, that is, by adding complements. The domain A′ is saidto be partitioning.

Example. Let the domain A on natural numbers consist of the abstract values

[2] ≡ {n | n mod 2 = 0}

[3] ≡ {n | n mod 3 = 0}

[6] ≡ [2] ∩ [3]

The only atom, [6], does not represent all numbers. In order to satisfy theproperty, we must add

[3 \ 2] ≡ [3] \ [2]

[2 \ 3] ≡ [2] \ [3]

[¬(2,3)] ≡ Z \ ([2] ∪ [3])

In the new domain, Z is indeed partitioned by the atomic abstract values [6],[3 \ 2], [2 \ 3] and [¬(2,3)]. 2

If two concrete values are both described by the same atom, then theycannot be distinguished by any property of A. In computing e], we are inter-ested in atomic inputs, that is, inputs ε which only consist of atomic values(∀x. Atom (ε(x))). These inputs are those which can be obtained by abstract-ing a single concrete input (∀x. ε(x) = ρ(σ(x)) for some σ).

5.2.4 Increasing efficiency

Instead of computing e] on every input, we aim to be more efficient. Severalimprovements can be introduced:

Limiting the set of inputs

Let the expression e [x, y, z] in Example 5.2.3 be computed at the programpoint p. Suppose some static analysis technique (for example, an abstractsemantics which over-approximates the runtime value of variables) can tellthat the value of y at p is always positive. Then, the inputs for e] at p arelimited to {(Vx, [pos] , Vz) | Vx, Vz ∈ A}, which contains 32 = 9 elementsinstead of 27. This additional knowledge can often considerably reduce theamount of computation to be performed.


Computing e on non-atomic inputs

Let ρsp capture both the parity and the sign of numbers. The set of atomicvalues consists of [poseven], [negeven], [posodd], [negodd]; atoms form apartition of numbers into four subsets. Given E = {[poseven] , [negeven]}and O = {[posodd] , [negodd]}, the assertion

∀ V ′ ∈ E. ∀ V ′′ ∈ O. e] (Vx, V′, V ′′) = U

is implied by the more generic assertion

e] (Vx, [even] , [odd]) = U

since [poseven] and [negeven] (resp. [posodd] and [negodd]) form a par-tition of [even] (resp. [odd]). Consequently, if we can prove a result for someinput ε, we do not need to prove more specific formulæ on more restrictedinputs. Then, our purpose will be to infer properties on general inputs andconsider sub-inputs only if necessary.

Remark 5.8. Actually, due to the incompleteness (in the sense of AbstractInterpretation) of most analyzers, it can happen that the above implicationdoes not hold. For example, let the input ε be partitioned by {ε1, . . . , εj}.Then, it is possible that the computed value for every εi is e] (εi) = U , bute] (ε) = U ′ with U < U ′. However, in the following we will search for genericassertions in which U ′ is atomic; then, in this case, U < U ′ cannot be true.

Example. Let e [x] = x ∗ x+ 1 be evaluated in ρsz . Let εp satisfy εp(x) = >.εp is partitioned by its sub-inputs ε1, ε2 and ε3, such that ε1(x) = [neg],ε2(x) = [0] and ε3(x) = [pos]. Suppose e] uses the usual rules about productand sum:

[pos] ∗ [pos] = [pos] [neg] ∗ [neg] = [pos] [0] ∗ [0] = [0][pos] + [pos] = [pos] [0] + [pos] = [pos] [>] ∗ [>] = [>]

This is enough in order to compute

e] (ε1) = e] (ε2) = e] (ε3) = [pos]

since, for example,

e] (ε1) = [neg] ∗ [neg] + [pos] = [pos] + [pos] = [pos]

Yet, the abstract evaluator would need some more knowledge in order to inferthe general result e] (ε) = [pos], since [>] ∗ [>] + [pos] = [>]. 2


Analyzing e compositionally

The syntactic structure of expressions can help in computing relevant vari-ables. In particular, computations performed on subexpressions can turn tobe useful in analyzing the main expression. For example, let e [x1, x2, x3, x4]be the sum of e′ [x1, x2, x3] and e′′ [x3, x4]. We have that x1 ∈ ref (e′) is anecessary condition for having x1 ∈ ref (e) . Similarly, x3 ∈ ref (e) only ifx3 ∈ ref (e′) or x3 ∈ ref (e′′) .

If it is possible, by analyzing e′, to assert x1 /∈ ref (e′), then x1 does notneed to be taken into account in the analysis of e.

Computational bounds

After formally introducing the algorithm, it will become clearer how theamount of computations can be reduced by placing bounds on the analysis ofsubexpressions. Take e as defined in the previous paragraph: if the analysisof e′′ turns out to be too expensive (that is, some fixed limit of analysis stepshas been reached), then it can be stopped and vars (e′′) can be returned.Obviously, this may result in a loss of precision, which, however, is sometimesworthy.

5.2.5 The algorithm

Notation

We write ε′ ≤ ε′′, for any two inputs of e], if ε′(x) ≤ ε′′(x) holds for everyx ∈ vars (e) (where ≤ is the partial order on the abstract domain). α(σ) isthe pointwise extension of α to the variables of σ, and σ ≤ ε is a shorthand forα(σ) ≤ ε or, equivalently, ∀x.α(σ(x)) ≤ ε(x). When e is clear by the context,∀x. is a shorthand for x ∈ vars (e). Finally, when the name of variablesinduces some ordering (as in x1, x2 etc.), ε = (V1, V2, . . . ) means that ε(xi) =Vi for every i (similarly for concrete inputs).

Computing the input set

Given a program point p, the abstract semantics of Section 5.2.1 computes anabstract environment εp. An environment can be seen as an input state foran expression, so that inputs and environments will be used interchangeably.When an expression e is evaluated at p, the possible abstract inputs are safelyapproximated by εp: for every possible concrete input σ at p,

∀x. α(σ(x)) ≤ εp(x)

Then, the set of abstract inputs is inputsp (e) = {ε | ε ≤ εp}. Atoms represent(abstract) constancy; that is, if we can infer an atomic value V for e, then


every concrete value of e is indistinguishable with respect to the abstractproperties.Example. Let x and y be two variables. If the static analysis computes anabstract value V for both x and y, then they are not guaranteed to havethe same abstract property unless V is an atom. For example, consider theproperties of sign and parity, introduced in Section 5.2.4. Knowing that x andy are both even does not exclude that they can be distinguished by someonewho can also observe signs. 2

Due to our representation of properties, this monotonicity result holds:

ε′ ≤ ε′′ =⇒ e] (ε′) ≤ e] (ε′′)

In facts, smaller abstract inputs represent smaller sets of concrete inputs, sothat {e[ (σ) | σ ≤ ε′} is a subset of {e[ (σ) | σ ≤ ε′′}. In the following, whenwe write ∀ε, it is implicit that ε ≤ εp for εp computed at the current programpoint p.

For an abstract environment ε, [ε]x is the set of environments ε′ whichare more restricted than ε at x; that is, ε′(x) < ε(x) and ε′(y) = ε(y) fory 6= x. Let X ∈ ℘(V); then, [ε]X is the straightforward set extension of therestriction operator.

The calculus at work

The proposed algorithm relies on the assumption introduced in Assumption5.1: given an abstract value V , the set of abstract values which are less thanor equal to V is a covering (not necessarily a partition) of V , that is, allconcrete values in γ(V ) are represented by at least one of its sub-values:γ(V ) =

⋃

{γ (V ′) | V ′ ≤ V }. Consequently, the following holds for every ε:

Atom

∨

ε′≤ε

∨

σ′≤ε′

{

α(

e[ (σ′))}

⇒ Atom

∨

σ≤ε

{

α(

e[ (σ))}

Note that, given an algorithm for e], the analogous result on abstract compu-tations does not hold, due to incompleteness (see Remark 5.8):

Atom

(

∨

{

e] (ε′) | ε′ ≤ ε}

)

; Atom(

e] (ε))

Anyway, since, by soundness,∨

ε′≤ε e] (ε′) ≤ e] (ε), we need to check the

atomicity of e] (ε′) only if Atom(

e] (ε))

cannot be directly proven. Conversely,since γ(V ) =

⋃

{γ(V ′)|V ′ ≤ V }, the assertion∨

ε′≤ε e] (ε′) < e] (ε) is a suf-

ficient condition for taking∨

ε′≤ε e] (ε′) as the computed value on ε (thus

avoiding incompleteness). This is properly the task of the function [·]] ([·] ↓).In Figure 5.1, we give a set of definitions for an approximated calculus of

abstract dependencies.


Fig. 5.1. The calculus

[·]] ([·] ↓) : (E × E) 7→ A

e] (ε ↓) =V

x∈vars(e)

“

W

ε′∈[ε]xe] (ε′ ↓)

”

Atom : (E × E) 7→ (A× {true, false})Atom (e, ε) =

`

e] (ε ↓) ,Atom`

e] (ε ↓)´´

ref3 : (E × E × ℘(V) × ℘(V)) 7→ (E × ℘(V))ref3 (e, ε,X, Y ) = match Atom (e, ε) with

(V,true) → (V, X)

(V, false) →T

y∈Y

“

S

ε′∈[ε]yref3 (e, ε′, {y} ∪X,Y )

”

ref2 : (E × E × ℘(V)) 7→ (E × ℘(V))ref2 (V, ε, Y ) = ref3 (V, ε, ∅, ∅)ref2 (z, ε, Y ) = ref3 (z, ε, ∅, {z})

ref2 (e1 ⊗ e2, ε, Y ) = let (e′i, Zi) = ref2 (ei, ε, Y ) in

ref3 (e′1 ⊗ e′2, ε, ∅, Z1 ∪ Z2)

ref : E 7→ (E × ℘(V))ref (e) = ref2 (e, εp,vars (e))

Code analysis

Definitions are translated (excluding optimizations) from a working imple-mentation (Appendix A), written in the Haskell [62] functional language.

The function Atom (e, ε) checks if it is possible to compute an atomicabstract value for e on ε, and computes it. As said above, the role of e] (ε ↓)is to produce, when possible, an assertion on the larger input, starting fromthose on its sub-inputs. If Atom (e, ε) is (V,true), then V is atomic; thismeans that ε is precise enough to infer an atomic value for e. Consequently, eonly depends on the variables which are specialized in ε, with respect to εp.Example. Let εp be the most general input, and ε ∈ [εp]{x,y} be specialized

on x and y, that is, ε(x) = Vx, ε(y) = Vy and ε = εp elsewhere. If e] (ε) = Veis atomic, then there is no possibility that any z /∈ {x, y} is relevant to e,since, by the monotonicity of e],

∀Vz ≤ εp(z). e] (ε [z ← Vz ]) = Ve

This means that different values of z cannot make a difference in the observedvalue of e. 2


Remark 5.9. Expressions are evaluated starting from εp, then some sub-inputsof λx.> are not considered (they should be if e would be considered as astandalone entity). Actually, this choice is correct since, in program analysis,we are interested in expressions as parts of programs. Consequently, it makessense to ignore inputs which can never occur in concrete executions.

The function ref3 takes an expression, an environment and two sets ofvariables X and Y . These sets represent, respectively, a temporary lower andupper approximation of the relevant variables of e. For example, the call ofref3 in the pattern ref2 (z, ε, Y ) takes Y = {z} since z is the only vari-able which can be relevant to the expression z. Moreover, in the patternref2 (e1 ⊗ e2, ε, Y ), Z1 ∪ Z2 is taken by ref3 since the relevant variablesof e1 ⊗ e2 are less than or equal to those of e1 or e2. The use of Y onlyimproves efficiency; vars (e) could be taken instead. On the other hand, inref3 (e, ε,X, Y ), the set X consists of the variables which are more special-ized in ε than in εp. Initially, X = ∅. If X is enough to infer a ground valuefor e] (ε ↓), then it is returned. Otherwise, a variable y is added to X andrecursion is invoked on the restrictions of ε on y.

In ref2, recursion is performed on the structure of expressions; conversely,ref3 recurs on the sub-inputs which are obtained by specializing the param-eter on each y ∈ Y . The motivation for taking the intersection on y of thesesets is made clear by the Corollary 5.12 below.

Definition 5.10. Abstract dependency can be extended to sets X ⊆ vars (e).

Xρ e [X, y1, . . . , yk] ⇐⇒ ∃{v′x, v

′′x}x∈X , U1 . . . Uk ground .

e] ({α(v′x)}x∈X , U1 . . . Uk) 6= e] ({α(v′′)}x∈X , U1 . . . Uk)

where {·}x∈X is to be seen as a tuple. Clearly, {x}ρ e iff x

ρ e (Def. 5.6).

Given a tuple X of variables and a tuple V of abstract values, ε [X ← V ] isthe set extension of the environment updating: each x ∈ X is updated with thecorresponding V ∈ V. Two inputs ε′ and ε′′ agree on Z (written ε′ =Z ε

′′) if,for every z ∈ Z, ε′(z) = ε′′(z) holds. =\Z is a shorthand for =vars(e)\Z .

Theorem 5.11.

¬(

Xρ e

)

∧ ¬(

Yρ e

)

=⇒ ¬(

X ∪ Yρ e

)

(Otherwise stated: X ∪ Yρ e =⇒ X

ρ e ∨ Y

ρ e)

Proof. For each ε, let EεX be the set {ε [X ← V ] | V ∈ A|X|}. We have

¬(

Zρ e

)

⇐⇒ ∀ε. ∀ε′, ε′′ ∈ EεZ . e] (ε′) = e] (ε′′)

Then, the right hand sides of the formulæ hold by hypothesis for both X andY . The elements of EεZ are exactly those which agree with ε on vars (e) \Z.


ε′ =\Z ε′′ ⇐⇒ ε′, ε′′ ∈ Eε

′

Z ⇐⇒ ε′, ε′′ ∈ Eε′′

Z

We must prove that ∀ε1, ε2 ∈ EεX∪Y . e] (ε′1) = e] (ε2). We already know that

ε1 =\(X∪Y ) ε2. We can take some ε′ such that ε′ =\(X∪Y ) ε1 and ε′ =\(X∪Y )

ε2. Moreover, ε′ can be chosen to verify ε′ =X\Y ε1 and ε′ =Y \X ε2. Hence,both assertions ε′ =\Y ε1 and ε′ =\X ε2 hold. By hypothesis, {ε′, ε1} ⊆ EεY , so

that e] (ε′) = e] (ε1). Similarly, e] (ε′) = e] (ε2) and, by transitivity, e] (ε1) =e] (ε2). 2

Corollary 5.12. Given e, ε, X and Y , there is a minimal element (withrespect to set intersection on variable sets) for the set of sets

⋃

ε′∈[ε]y

ref3 (e, ε′, X, Y )

y∈Y

Therefore, taking the intersection on y in the definition ref3 is meaningful.

Substitution of subexpressions

This algorithm has an important feature: it computes, together with the setof relevant variables, a modified expression which is equivalent to the givenone, with respect to the abstract domain, but possibly simpler. This is doneby replacing constant (that is, without relevant variables) subexpressions withthe corresponding atomic abstract values (see the first pattern in the definitionof ref3). The following property holds:

Proposition 5.13. Let e′ and X be the expression and the set of variablesobtained by computing ref (e). Then, y /∈ vars (e′) implies y /∈ X.

Proof. It follows easily by observing that, if y does not occur in e′, then itonly occurs in constant (without relevant variables) subexpressions of e. 2

Note that the other direction of the implication is not true. As a counterex-ample, let e = fst(x1, x2), with fst = λ(x, y).x; in this case, x2 is clearly notrelevant to e, but e′ = e since the second subexpression of e is not constantby itself.

An example

Let e [x1, x2, x3] = (6∗x1)+(4∗x2 ∗x2)+x3 be analyzed in ρps. Suppose someinformation can be inferred on the initial input εp, namely, that x3 is alwaysnegative at p. Therefore, εp = ([>] , [>] , [neg]). Clearly, e is not constant inp with respect to the abstract properties: e] (εp) = [>] (no matter how muchthe algorithm for computing e] is refined), so that both parity and sign canchange depending on the concrete value of variables. However, we can infersome properties for any of the addenda:


• 6 ∗ x1 is always even: (6 ∗ x1)] (εp) = [even].

• 4 ∗ x2 ∗ x2 is non-negative and even.• x3 is negative, due to the restriction on εp.

Every variable only occurs in one addendum, then it is relevant only if theaddendum is. Here, we have that the information about 6 ∗ x1 and x3 isnot enough to mark the variables as irrelevant ([even] and [neg] are notatomic). On the other hand, x2 cannot affect the abstract value of the secondaddendum, then it is not relevant. The algorithm is able to produce this result,since e] ((V1, [>] , V2) ↓) is atomic for every atomic values V1 and V2. In facts,the result of ref (e) is ((6 ∗x1) + [poseven] +x3, {x1, x3}). Note that, in the(less precise) abstract domain of parity, x3 is the only relevant variable, since6 ∗ x1 is constant.

Soundness

The soundness result we want to obtain is that, if xρ e, then x belongs to

the set of variables which is computed by ref (e) on the abstract domain ρ.Since the main part of the computation is implemented by ref3, the followinglemma has to be proven.

Lemma 5.14 (soundness of ref3). Let e be an expression and Y be the setof variables which has been computed on its subexpressions (as Z1 ∪Z2 in thedefinition of ref2). Then

xρ e =⇒ x ∈ X where (e′, X) = ref3 (e, εp, ∅, Y )

Proof. By induction on the structure of expressions. If e is a constant, thenits abstract value is always ground and ref3 correctly returns ∅. If e = y,then y is relevant if and only if εp(y) is not ground; again, ref3 is sound.

Let e be the composition of a set {ei} of subexpressions. By inductivehypothesis, we can take Y as a sound over approximation of the relevantvariables of e (since ref3 has been invoked on ei). Suppose, by contradiction,that x does not belong to the computed set. We define two predicates:

π(Z) ≡ ref3 (e, εZ , Z, Y ) computes Z

Where εZ ∈ [εp]Z . π(Z) is equivalent to say that e] (εZ ↓) is atomic.

µ(Z) ≡ (π(Z) ∨ ∃z 6= x. µ({z} ∪ Z)) ∧ x /∈ Z

The predicate µ(∅) means that some set Z, which does not contain x, is re-turned by the ref3 algorithm (note that the existential quantifier correspondsto set intersection in the definition of ref3). Consequently (snd = λ(a, b). b),x /∈ snd (ref3 (e, εp, ∅, Y )) implies µ(∅). The base of µ recursive definitionis π(Z) ∧ x /∈ Z for some Z (the computed set). It is easy to see that π(Z)implies that any variable outside Z is not relevant to e (see Definition 5.10),then x is not relevant. 2


Theorem 5.15 (soundness). Let ρ be the abstract domain used in comput-ing e], and x ∈ vars (e). The result of ref is a sound approximation of therelevant variables in e, under the assumption that the inputs to e are restrictedto those which are represented by εp.

xρ e =⇒ x ∈ X where (e′, X) = ref (e)

Proof. It follows quite easily from the definition of ref and ref2, and Lemma5.14. 2

Complexity issues

In the worst case, this algorithm is exponential (see Section 5.2.3. However,its complexity can be much better if some information can be inferred on εpand there are subexpressions which are equivalent to constants.

This algorithm can easily be equipped with a computational bound: if,after consuming N time-resources, ref3 (e, ε,X, Y ) has not terminated itsexecution, its evaluation can be interrupted and the upper approximation Ycan be returned (and made available on the recursion stack for the rest of thecomputation).

5.2.6 Discussion

This section defines an algorithm for computing abstract dependencies in ex-pressions. In order to improve its efficiency, it relies on the structure of theabstract domain in which properties are described. The algorithm computesan upper approximation of the set X of relevant variables, together with asimplified expression e′, which is obtained by replacing constant subexpres-sions. Even if expressions usually contain a few variables, this method canbe also applied to functional subroutines (which are, usually, larger than ex-pressions). Moreover, side effects can be taken into account by keeping trackof the variables which are used inside procedures. In view of its applicationto several Program Analysis techniques, the algorithm should be modified inorder to have vars (e′) = X where (e′, X) is the result of ref (e); furtherresearch has to be carried out in this direction.

In the following sections, we denote by refεp (e) (or ref (e), when theinput is implicit) the result of the ref (e) algorithm on the abstract input εp:

In view of the properties as classes correspondance, we can consider theabstract domain as a Java class hierarchy; this does not invalidate the resultsobtained in this section.

5.3 Boolean functions 83

5.3 Boolean functions

In this section, a formalism to describe the information flow behavior of pro-grams is presented. Boolean functions [28] keep track of the propagation ofsecret information in program data. A boolean function is a first-order logicformula ranging on a set of basic predicates: the boolean variables.

Definition 5.16 (Boolean variables). Boolean variables are identifiers witha subscript:

B = {xj | x is an identifier, j is a subscript}

The set of subscripts contains i and o, standing for input and output, togetherwith fresh subscripts which are created when required by the rules for computingboolean functions. Subscripts correspond to program points: for example, whenanalyzing a statement s, the subscript i refers to the program point immediatelybefore s (its input state).

Boolean variables can take, as their values, true or false; xj is true ifdata identified by x can have a secret informational content at the programpoint corresponding to j. A boolean variable corresponding to the programinput (xi for some x) is true if and only if the identifier x belongs to theprivate part of program data (see Section 4.3.2).Example. In analyzing the program P , suppose xi ∧ ¬yi holds, so that xis private and y is public. Let t and u be, respectively, the subscripts corre-sponding to immediately before and immediately after a variable assignments ≡ (y := x). Then, provided xt holds (that is, x has not lost its secret contentin any program point before s), yu is true because the secrets of x flowed intoy. If no further modifications occur to y, then yo is true. Consequently, thereis a forbidden information flow from x to y, because a public identifier is notguaranteed to be secret-free in the output (this is described by the formula¬yi ∧ yo). 2

An identifier x can contain secrets if some of the data represented by xbelong to the private part of program data, or if they have been assignedsome value depending on private data. We say can contain secrets, instead ofcontains secrets, because, actually, a dangerous assignment is not a sufficientcondition for a forbidden flow to exist. Yet, the analysis is interested in thenegation of this condition, that is, that data cannot contain secret information.

A typical form for boolean functions is xo ↔ xi, where ↔ is logical equiv-alence. This means that the statement does not modify the information flowbehavior of x, that is, x can contain secrets after s if and only if it can containsecrets before.

Definition 5.17 (boolean function substitutions). Given a formula ψ,the formula [ψ]xt

yuis obtained by variable substitution, that is, by replacing in


ψ all occurrences of xt with yu. The subscript substitution [ψ]tu is obtainedfrom ψ by replacing all the occurrences of the subscript t with u, that is, allvariables vt with their corresponding vu.

[ψ]xt

yu= ψ[xt → yu]

[ψ]tu = ψ[vt → vu] for each v

The special variable reto refers to the result of evaluating an expression.

5.3.1 Models and Abstract Non-Interference

A model for a boolean function ψ is an assignment M of boolean variablesmaking ψ true. ψ is satisfiable iff it has models.Example. Consider the IMP assignment x := f(y, z). The correspondingboolean function is

ψ = (xo ↔ yi ∨ zi) ∧ (yo ↔ yi) ∧ (zo ↔ zi)

meaning that the output value of x can contain secrets if either y or z can inthe input. The models for ψ are (vs ∈ M if and only if vs is true in M)

JψK =

{

∅, {yi, xo, yo}, {zi, xo, zo}, {yi, zi, xo, yo, zo},{xi}, {xi, yi, xo, yo}, {xi, zi, xo, zo}, {xi, yi, zi, xo, yo, zo}

}

As an example, there are no modelsM such that {yi, zi}∩M 6= ∅ but xo /∈ M:it is not possible that either y or z can contain secrets in the input if x cannotin the output. 2

Only a subset of the models has to be considered in Information Flowanalysis: those which are consistent with the division of identifiers into publicand private (described by the truth values of vi).

Definition 5.18. Let Ψ (P) be the boolean function associated with P. Non-Interference holds for P if

Ψ0 (P) ≡

(

∧

x∈L

¬xi

)

∧ Ψ (P) ∧

(

∨

x∈L

xo

)

is not satisfiable. In other words, P has the Non-Interference property if everymodel of its boolean function which is consistent with L and H (

∧

x∈L¬xi) does

not allow any dangerous flows to public program identifiers (∨

x∈Lxo).

In the above example, ψ is unsatisfiable iff x ∈ H or y, z ∈ L, that is, ifthe assignment is not dangerous since x is a private identifier or the assignedexpression does not contain secrets.

5.4 Source code analysis via boolean functions 85

5.4 Source code analysis via boolean functions

Example 5.3.1 refers to an imperative language where variables always denotethe same location through the whole execution. Things get more complicatedwhen dealing with an Object-Oriented programming language: identifiers v oru .f can share the same location, so that the effects of an assignment can berelevant to more than one identifier. Moreover, the boolean function for theIMP assignment was computed assuming, as in standard Non-Interference,that both y and z are relevant to x. As discussed in Section 5.2, we need totake Abstract Dependencies into account in order to model ANI. The functionΨ (P) computes the boolean function which represents the information flowproperties of P with respect to some assumption on the observational powerof attackers.

5.4.1 Analysis of statements

This section presents a compositional abstract semantics for computing theinformation flow behavior of statements. In the following, ε is the type environ-ment computed immediately before the statement to be analyzed. Definitions5.1, 5.20 and 5.21 provide the necessary background to the rules for computingΨ. They should be read whenever required by Definition 5.22.

Definition 5.19 (Method instances). Let Ψ (d) be the boolean functioncomputed for a method instance d. The function Ψ′′ε (C .m, e1 . . . ek) computesψ for a method invocation of m with actual parameters e1, . . . , ek.

Ψ′′ε (C .m, e1 . . . ek) =∨

d∈instε(C,m,e1...ek)

Ψ (d)

Since it is not statically known which d will be executed, the resulting booleanfunction is the disjunction on all the possible instances.

Note that ψ′ ∨ ψ′′ is unsatisfiable if both ψ′ and ψ′′ are; therefore, dis-junction over d is a conservative, sound choice (see Definition 5.18: Non-Interference must hold for every execution).

Definition 5.20 (Implicit flows). When the control flow of a program can-not be statically decided, as in conditionals or loops, the identifiers G in theboolean guard can determine which path will be taken at runtime. Then, ifsome x ∈ G contains secrets, there is an implicit information flow from G toidentifiers which are defined in some of the branches, since information aboutthe values in G can be acquired by watching which path is executed. Let x be arelevant (with respect to the abstract properties) identifier in a boolean guardg, and y be an identifier which is defined inside a statement s, dependingon g. Then, the implicit flow from x to y must be added to Ψ (s), yieldingΨ (s) [x y]:


ψ [x y] = [ψ]yo

y′o∧ (yo ↔ (y′o ∨ xi))

The corresponding definitions for sets of identifiers are obtained by disjunctionand iteration over X and Y .

ψ [X y] = [ψ]yo

y′o∧ yo ↔ (y′o ∨ (∨x∈Xxi))

ψ [X ∅] = ψ

ψ [X ({y} ∪ Y )] = (ψ [X y]) [X Y ]

Definition 5.21 (Expressions). The information flow behavior of an ex-pression with no side effects is computed by

Ψe (e) = reto ↔(

∨

{xi | x ∈ refε (e)})

where ret is a special boolean variable which holds if the final value of theexpression can contain secrets.

The function Ψ computes the boolean function representing the behav-ior of a statement with respect to abstract information flows. ref computesrelevant identifiers in expressions, parameterized on the desired ANI securitypolicy, that is, the abstract properties an observer should not be able to dis-tinguish. Ψ analyzes the body of a method instance compositionally, startingfrom the type environment ε (which is left implicit).

Definition 5.22 (Statements). The information flow behavior of state-ments is computed by the function Ψ. For every identifier y which is notaffected by the statement, the formula yo ↔ yi is implicitly added.

Ψ (x=e) = [Ψe (e)]ot ∧

(

rett ↔∧

{yo | y ∈ shε (x)})

Ψ (s1 ; s2) = [Ψ (s1)]ot ∧ [Ψ (s2)]

it

Ψ ( if (e) s1 else s2) = (Ψ (s1) ∨Ψ (s2)) [vars (e) def (s1) ∪ def (s2)]

Ψ (while (e) s) = lfp∧xxo↔xi

(

λψ. D ∨(

[ψ]ot ∧ [D]

it

))

where D = Ψ ( if (e) s else skip)

Ψ(

v .m(e1 . . . ek))

=∨

C�Tε(v)

Ψ′(

C .m(e1 . . . ek))

Ψ′(

C .m(e1 . . . ek))

=

∧

j=1..k

[

Ψe

(

ej)]reto

pji

o

t

∧ [Ψ′′ε (C .m, e1 . . . ek)]it

• When an identifier x is assigned, the secret content of e may propagate tothe sharing identifiers for x. Therefore, all sharing y can contain secretsif some relevant variable of e has secrets (see Definitions 5.4 and 5.21).


• The rule for sequential composition helps to understand subscript substi-tutions. The use of a fresh subscript t (which corresponds to the programpoint between s1 and s2) creates a correspondence between the output ofs1 and the input of s2. Therefore, xo holds after s2 if (i) x acquires secretsin s2; or (ii) xt holds after s1 and x is not modified in s2.

• In conditional statements, both paths must be taken into account. More-over, implicit flows keep track of secrets in the boolean guard. Note that,in computing the relevant guard identifiers, vars (e) is used instead ofrefε (e). In facts, constancy with respect to the abstract property is notenough to decide which path is to be taken at runtime (for example, inIMP, knowing that x is even gives not enough information about the test

x?= 0). In terms of dependency calculus, this amounts to say that control

dependencies are not abstracted.• Computing dependencies for a loop involves a fixpoint computation. A

loop is equivalent to every possible sequential composition of the statementif (e) s else skip. Standard fixpoint approximation techniques, such aswidening (Section 3.1.5), can be used in order to ensure that the fixpointcomputation will terminate.

• When a method is invoked, the analyzer must consider every method in-stance which is compatible with the actual parameters. This is done by thedisjunction on Tε (v) and by computing Ψ′′ε (C .m, . . . ). Actual parametersare related to formal parameters by replacing reto with the correspondinginput instance of the formal parameter pj in Ψe

(

ej)

.

Static members

Definition 5.22 can account for static fields by simply extending the definitionof sh. Static fields are guaranteed to be separated with regard to their set oflocations, so that every class has its own place in the heap.

shε (C .f) = {C .f}

Boolean functions for static methods are computed by Ψ′, which can also beseen as a one-class restriction of the method invocation Ψ rule.

5.4.2 Whole-program analysis

It is easy to see that the computation of Ψ for a method instance can lead toproblems, due to mutual recursion [68]. In facts, the information flow behaviorof d depends on the behavior of every d′ which is invoked inside d. Since d′ canin turn invoke d, the termination of the Ψ computation is not guaranteed. Toavoid this problem, we perform a fixpoint computation on ψ-states Ψ ∈ FλB ,that is, functions from method instances to boolean functions.

Ψ ∈ FλB = [Md 7→ FB]


A special ψ-state is

Ψ⊥ = λd. ψ⊥

ψ⊥ =∧

x

xo ↔ xi

in which every method instance is neutral (that is, equivalent to skip) withrespect to information flow, leaving unchanged all identifiers.

We show a technique which is quite a standard fixpoint computation onψ-states. Let Ψ⊥ be the initial state. When a method is invoked, its instancesare pushed on the method stack, which initially contains the (unique) instanceof main(). At each step, the method instance d on top of the stack is analyzed.To avoid problems related to mutual recursion, when m() is invoked inside d,every d′ ∈ inst (·,m, ·) is pushed on the stack and analyzed if and only if it isnot already somewhere on the stack. Otherwise, the current boolean functionsΨ(d′) is taken. When the analysis of d is complete, Ψ(d) is updated with thenew computed boolean function.

In this new fomulation, the definition of Ψ′′ε (C .m, e1 . . . ek) must be re-stated:

Ψ′′ε (C .m, e1 . . . ek) =∨

{dΨ | d ∈ instε (C,m, e1 . . . ek)}

dΨ = (d is on the stack) ? (Ψ(d)) ; (Ψ (d))

where Ψ is the current ψ-state. In the following algorithm, main is the methodinstance of main().

ANI {foreach d . Ψ(d) <− ψ⊥ ;do {Ψ <− ana lyze (main , Ψ )

} until ( no changes occur to Ψor n i t e r a t i o n s have been done )

return Ψ}

ana lyze (d , Ψ ) {push (d ) ;foreach method c a l l {

foreach compatible method in s tance d′ {i f ( d′ i s on the stack ) {d′Ψ <− Ψ(d′) ;

} else {d′Ψ <− ana lyze (d′ ,Ψ ) ;

}}


// d′Ψ i s used to compute the boo lean func t i on// f o r the method invoca t i on. . .

}// ana l y s i s o f o the r s ta tements. . .// the ana l y s i s o f d y i e l d s a boo lean func t i on ψΨ(d) <− ψ ;pop (d ) ;

}

The algorithm checks method instances until the stack is empty, that is, untilthe analysis of main is complete.

5.4.3 Widening

The ANI procedure terminates when a fixpoint is reached or, in alternative,the number of performed iterations reaches a fixed threshold n. In this case,the problem arises of how to deal with final Ψ -states which are not fixpoints.

Simply taking the n-th Ψ -state as the final result of ANI could result inan incorrect behavior of the analyzer. In facts, the algorithm starts under theoptimistic assumption that no dangerous flows exist, being the behavior ofmethods neutral with respect to Non-Interference (which is the meaning ofψ⊥). If the computation is artificially stopped before reaching the fixpoint,then we are not guaranteed that data propagation has been completely takeninto account. Consequently, programs could be possibly accepted even if somedangerous flow may exist. The Abstract Interpretation theory uses wideningoperators (Section 3.1.5) in order to enforce soundness in bounded fixpointcomputations.

In our framework, the widening of a Ψ -state Ψ yields boolean functionsΨ(d) which are correct with respect to the real information flow descriptions ofmethods, that is, which are more pessimistic with respect to the satisfiabilityof the security requirements. The most naıve choice is to set Ψ(d) to ψ> forevery d. The formula ψ> is one which is maximal with respect to the orderingon boolean functions (introduced in Section 5.5.1); for example, true can besuch a formula. Obviously, this choice would lead to the rejecting of everypossible program. A smarter solution is to define the widening operator asfollows:

Definition 5.23 (Finished method instances). When executing analyze()on d and Ψ , we must keep record of when the boolean function for d hasreached a fixpoint. For example, if d contains no method invocations, thenits final information flow behavior can be found at the first iteration of theANI() main loop. We need to keep a boolean value for every instance; this


value can be updated (from false to true) at the end of analyze(), based onthe following rule:

finished (d) = ∀d′ invoked inside d. finished (d′)

(clearly, the predicate is trivially true if no invocations occur in d). This meansthat d can be labeled as finished if every method invocation d′ inside d has beenpreviously labeled as finished.

Definition 5.24 (Widening). The widening operator widening : FλB 7→ FλB

is an extensive operator on ψ-states. A further iteration of the main loop isperformed, in which recursive calls are replaced by (i) if its analysis is finished:the current value for d′; (ii) otherwise: a safe, pessimistic boolean function ψ>.

widening (Ψ) = ∀d. analyze d with:

∀d′ invoked in d. (finished (d′)) ? (Ψ(d′)) ; (ψ>)

Remark 5.25. Actually, widening is not properly a widening operator, asdefined in Definition 3.3. To fit the definition, we define On : FλB ×F

λB 7→ F

λB

as operating on the ascending chain x0, . . . , xm, . . . and on the bound n.

y0 = x0

ym+1 = xm+1 if m < nym+1 = ym On x

m+1 = widening(

xm+1)

otherwise

It is easy to verify that this definition of the ym chain satisfies the wideningconditions.

Example. Let a program P be characterized by the below graph, representingthe following relation: d→ d′ if and only if the method instance d′ in invokedinside d.

d

d′′1

d′

d′′2

After the first iteration of ANI(), only d′ is labeled as finished. The othermethod instances cannot be assigned the finished label because of the mutualrecursion between d′′1 and d′′2 . 2

5.5 Correctness and further discussion

This section outlines a correctness proof for the ANI() procedure, relying onthe ordering on ψ-states and the monotonicity of a single iteration of the mainloop.

5.5 Correctness and further discussion 91

5.5.1 The ordering on ψ-states

In order to guarantee that a fixpoint on ψ-states will eventually be reached,we must prove that a single iteration of the ANI() procedure is monotone onψ-states.

Definition 5.26 (Ordering on ψ-states). The ordering on ψ-states is de-fined as follows:

ψi =

(

∧

x∈L

¬xi

)

∧

∧

y∈H

yi

Ψ ′ ≤ Ψ ′′ ≡ ∀d. Ψ ′(d) ≤ Ψ ′′(d)

ψ′ ≤ ψ′′ ≡ Jψi ∧ ψ′K ⊆ Jψi ∧ ψ′′K

Among the reasonable boolean functions (that is, those which are a descrip-tion of some legal program), the formula ψ⊥ is minimal with respect to thisordering, because the only model of ψi ∧ ψ⊥ is {yi|y ∈ H} ∪ {yo|y ∈ H}.

Since the number of identifiers is finite, the number of models is also finite(the cardinality of the powerset of boolean variables). Consequently, theredoes not exist any infinite ascending chain Ψo, . . . , Ψk, . . . ; if monotonicity canbe proven, the fixpoint computation is guaranteed to terminate in a finitenumber of steps. The widening operator of Section 5.4.3 results to affect theefficiency, rather than the termination of the algorithm.

5.5.2 Monotonicity

Proving monotonicity for an iteration of ANI() involves checking the mono-tonicity of the analyze() procedure. This can be done by demonstrating thatall rules of Ψ are monotone on their parameters. Some easy lemmata on theboolean function ordering are provided.

Lemma 5.27. Conjunction preserves monotonicity:

ψ′1 ≤ ψ′′1 ∧ ψ′2 ≤ ψ

′′2 =⇒ (ψ′1 ∧ ψ

′2) ≤ (ψ′′1 ∧ ψ

′′2 )

Proof. Trivial. 2

Lemma 5.28. Disjunction preserves monotonicity:

ψ′1 ≤ ψ′′1 ∧ ψ′2 ≤ ψ

′′2 =⇒ (ψ′1 ∨ ψ

′2) ≤ (ψ′′1 ∨ ψ

′′2 )

Proof. Trivial. 2


Lemma 5.29. Subscript substitution preserves monotonicity:

ψ′ ≤ ψ′′ =⇒ [ψ′]tu ≤ [ψ′′]

tu

Proof. Trivial (subscript substitution does not alter the nature of booleanfunctions). 2

Lemma 5.30. Let M be a variable assignment; [M]xs

ytis defined to be such

that xs ∈ [M]xs

yt⇔ yt ∈ M and, for every zu 6= xs, zu ∈ [M]xs

yt⇔ zu ∈M.

Then, for every ψ, we have

M∈r[ψ]

xs

yt

z⇐⇒ [M]

xs

yt∈ JψK

Proof. yt occurs in [ψ]xs

ytwhere (i) it occurs in ψ; and (ii) xs occurs in the

same place in ψ. If M is a model of [ψ]xs

yt, then there exists a model M′ of

ψ (which is equal to M for every variable which is not xs or yt) such thatxs ∈ M

′ ⇔ yt ∈ M′, since xs and yt are considered as a single variable in

[ψ]xs

yt. Moreover, by stating yt ∈ M′ ⇔ yt ∈ M, we have M′ = [M]

xs

yt. The

other direction of the implication is similar. 2

Lemma 5.31. Variable substitution preserves monotonicity, under the hy-pothesis that the replaced variable is not the input instance of any x:

∀ψ′, ψ′′, xs, yt. s 6= i ∧ ψ′ ≤ ψ′′ =⇒ [ψ′]xs

yt≤ [ψ′′]

xs

yt

Proof. The condition on xs implies [ψi]xs

yt= ψi. Then

M∈rψi ∧ [ψ′]

xs

yt

z⇐⇒ [Lemma 5.30]

[M]xs

yt∈ Jψi ∧ ψ′K =⇒ [hypothesis]

[M]xs

yt∈ Jψi ∧ ψ′′K ⇐⇒ [Lemma 5.30]

M∈rψi ∧ [ψ′′]

xs

yt

z

The hypothesis s 6= i is not restrictive because no substitution [ψ]yi

ztoccurs in

the definition of Ψ. 2

Lemma 5.32. Adding implicit flows to boolean functions preserves mono-tonicity:

∀X,Y. ψ′ ≤ ψ′′ =⇒ ψ′ [X Y ] ≤ ψ′′ [X Y ]

5.5 Correctness and further discussion 93

Proof. First, we prove that, for a set X and an identifier y,

ψ′ ≤ ψ′′ =⇒ ψ′ [X y] ≤ ψ′′ [X y]

This is true since

ψ′ ≤ ψ′′ =⇒ [Lemma 5.31]

[ψ′]yo

y′o≤ [ψ′′]

yo

y′o=⇒ [Lemma 5.27]

(

[ψ′]yo

y′o∧ (yo ↔ y′o ∨ (∨x∈Xxi))

)

≤

≤(

[ψ′′]yo

y′o∧ (yo ↔ y′o ∨ (∨x∈Xxi))

)

=⇒ [Def. 5.20]

ψ′ [X y] ≤ ψ′′ [X y]

The general case ψ′ [X Y ] ≤ ψ′′ [X Y ] can be obtained by iterating theabove derivation for every y ∈ Y . 2

Definition 5.33. The function Ψ [s] is a specialized version of Ψ, parame-terized on a statement s. The parameters for Ψ [s] replace the recursive callsto Ψ or Ψe inside the corresponding definition of Ψ.

Ψ [x=e] = λψ. [ψ]ot ∧

(

rett ↔∧

{yo | y ∈ shε (x)})

(5.1)

Ψ [s1 ; s2] = λψ1. λψ2. [ψ1]ot ∧ [ψ2]

it (5.2)

Ψ [ if (e) s1 else s2] = λψ1.λψ2. (ψ1 ∨ ψ2) [X Y ] (5.3)

X = refε (e)

Y = def (s1) ∪ def (s2)

Ψ [while (e) s] = λψ. lfp∧xxo↔xi

(

λφ. ψ ∨(

[φ]ot ∧ [ψ]it

))

(5.4)

Ψ[

v .m(e1 . . . ek)]

= λψ, ψ1 . . . ψk. [ψ]it ∧

∧

j=1..k

[

ψj]reto

pji

o

t

(5.5)

Ψ [return e] = λψ. [ψ]ot ∧ rett ↔ reto (5.6)

Actually, rather than a statement, s is for a statement schema: for example,if (e) s1 else s2 stands for an arbitrary conditional statement.

Theorem 5.34. For every statement schema s, Ψ [s] is monotone on its pa-rameters. That is, for Ψ [s] with k-arity,

∀j ∈ {1, . . . , k}. ψ′j ≤ ψ′′j =⇒ Ψ [s] (ψ′1, . . . , ψ′k) ≤ Ψ [s] (ψ′′1 , . . . , ψ

′′k )

Proof. By cases:


• Case (5.1): follows by Lemmata 5.31 and 5.28.• Case (5.2): follows by Lemmata 5.27 and 5.29.• Case (5.3): X and Y can be computed once for given e, s1 and s2; then,

they can be regarded as constant sets. The result follows by Lemmata 5.32and 5.28.

• Case (5.4): the function λφ. ψ∨(

[φ]ot ∧ [ψ]it

)

is monotone on φ by Lemmata

5.31, 5.27, and 5.28; then, finding its fixpoint is meaningful. It is alsomonotone on ψ, by the same lemmata. Consequently, the lfp is monotoneon ψ.

• Case (5.5): the parameter ψ is built out of all possible instances of m thatcan be called by v; the result follows by Lemmata 5.31 and 5.27.

• Case (5.6): follows by Lemmata 5.31 and 5.27.

2

Theorem 5.35. For any method instance d, the procedure analyze(d, Ψ ) ismonotone on Ψ .

Proof. Monotonicity for non-invoking statements follows directly by Theorem5.34. For each method invocation, the set of compatible method instances doesnot depend on Ψ . The boolean function for d′ is either directly taken fromΨ or computed by analyze() on d′ and Ψ . In both cases, monotonicity on Ψholds. d′Ψ is computed for every d′; the monotonicity of the disjunction followsby Lemma 5.28. 2

Theorems 5.34 and 5.35 prove that the ANI procedure indeed reaches afixpoint which describes the information flow of a program. Since no infiniteascending chains exist, termination is guaranteed. However, a widening oper-ator was defined (Section 5.4.3) in order to improve efficiency.

5.6 The link to Program Slicing

In section 3.4.2, the strong connection between Program Slicing and Informa-tion Flow was outlined. A quite natural extension of standard slicing consistsof considering abstract properties instead of concrete properties: the state-ments which are relevat to a criterion 〈j, v〉 (and, therefore, which should beincluded in the abstract slice) are those which can affect the abstract value ofv at the program point j. Rival [65] includes an abstract version of programslicing, which still lacks a comprehensive definition2, among the applicatonsof the abstract dependency calculus.

2 Despite its title, the recent work by Hong, Lee and Sokolsky [44] does not discussabstract slicing in the meaning we are investigating.

5.6 The link to Program Slicing 95

5.6.1 Computing abstract slices

Consider the simple slicing algorithm in Section 3.4.1. It iteratively computesthe functions RkS (i) (for a statement at program point i), SkS and BkS for aslicing criterion S, starting with the index k = 0. These functions provide thesolution of dataflow equations for computing the set of statements which arerelevant to S. They rely on ref () and def (), that is, respectively, the set ofreferred and defined variables in a statement. The main difference between theconcrete and the abstract case can be encapsulated into the ref () function,which captures data dependencies3 (Section 5.2.2). Given an abstract domainρ which describes the abstract properties we want to study, the followingdefinition of ref () can be given:

ref (v := e) = ref (e) (abstract, parameterized on ρ)

ref ( if e then · else ·) = vars (e)

ref (while e do ·) = vars (e)

By simply replacing the abstract version of ref, parameterized on an abstractproperty ρ, in the dataflow equations computing B , R () and S , the computedslice turns out to be smaller than the concrete one.Example. Consider the program P ′ (the same as Example 3.4.2), in Figure5.2. If the sign property is considered, then the final value of l1 does not

Fig. 5.2. The program P ′

1h1 = h1 ;2h2 = h2 ;3l 1 = 1 ;4while ( h1 > 0) do {5l 1 = l 1 + 1 ;6h1 = h1 − 1 ;7}8l 1 = l 1 ;9l 2 = l 2 ;

depend on the initial value of h1 since:

• before entering the loop, l1 is always positive, due to the assignment l1 := 1;• adding 1 to a positive number does not change its sign.

3 As explained in Definition 5.22, control dependencies are not affected by theproperty abstraction.


Consequently, the statement h1 := h1 should not be included in the abstractslice. The behavior of the slicing algorithm should be consistent with theserequirements. The statement h1 := h1 is not included in the abstract slicebecause, when l1 := l1 is executed, the value of l1 is constant with respectto the abstract property. This result is obtained by the type analysis of theprogram, which yields ε8(l1) = [pos], so that refε8 (l1) = ∅. Therefore, theloop does not change the sign of l1 and its guard is not among the relevantvariables. 2

Proposition 5.36. Let ρ′ and ρ′′ be two abstract domains, with ρ′ more pre-cise than ρ′′. Given a program P and a slicing criterion S, the abstract criteriaS ′ and S ′′ are its reformulation on, respectively, ρ′ and ρ′′. Then, the abstractslice with respect to S ′′ is smaller than the slice on S ′.

Proof. By Proposition 5.7, we have, for every expression e, refρ′′ (e) ⊆refρ′′ (e). Both R0

S (i) and S0S are monotone on ref, then, for every program

point i,R0S′′ (i) ⊆ R0

S′ (i) S0S′′ ⊆ S0

S′

This result can be rewritten as R[0] ∧ S[0], given

R[k] ≡ RkS′′ (i) ⊆ RkS′ (i)

S[k] ≡ SkS′′ ⊆ SkS′

B[k] ≡ BkS′′ ⊆ BkS′

Moreover, the following easy inductive steps hold for every k ≥ 0:

S[k] =⇒ B[k]

R[k] ∧ B[k] =⇒ R[k + 1]

B[k] ∧ R[k + 1] =⇒ S[k + 1]

Consequently, S[k] holds for every k, and, in particular, for the fixpoint of theiteration sequence. 2

5.6.2 Obtaining consistent slices

An important requirement for a slice is to be a correct program. In particular,we may ask for all variables to be initialized before being used.Example. Consider the assignment s : x := e [y, z]: data are consistent if yand z are initialized in every path from the first statement to s. We assumethis condition holds for P . In Weiser algorithm, a slice for 〈s, x〉 contains allstatements s′ such that y or z belong to def (s′), and there is at least onepath reaching s where the effects of s′ are preserved. Consequently, the slice

5.6 The link to Program Slicing 97

P〈s,x〉 is guaranteed to contain all useful initializing statements for y and z,then it is correct with respect to variable initialization. 2

In abstract slicing, this kind of consistency is no longer preserved: state-ments s′ of the previous example, defining y (resp. z) can be deleted fromthe slice even if they are not shadowed, provided that there is no abstractdependency of e on y (resp. z)4. Then, e [y, z] may have some of its variablesuninitialized. To deal with this problem, we observe that our algorithm ref (e)in Section 5.2 computes, together with the set X of relevant variables, a sim-plified expression e′ which rules out variables occurring in constant-equivalentsubexpressions. Then, assignments x := e could be replaced, in the slice, byx := e′. Unfortunately, as pointed out after Proposition 5.13, it is not guar-anteed that y ∈ vars (e′)⇔ y ∈ X ; consequently, it could happen that somevariables in e′ lack initialization. In order to compute a consistent slice, wecould take vars (e′) (instead of X) as the set of relevant variables (thus ob-taining a less precise slice). However, future research will be directed to modifyref in order to obtain y ∈ vars (e′)⇔ y ∈ X .

5.6.3 Discussion

Abstract program slicing is an interesting application of the abstract depen-dency calculus to standard program slicing. A formal and complete definitionof what abstract slicing should do is still to be provided. Future researchshould study the necessary and sufficient conditions for having a semanticallycorrect abstract slice of a program. Correcteness conditions should help in thedesigning of an algorithm which is able to keep unrelevant statements if theyare necessary for the slice correctness, and to modify statements in order toeliminate undesired syntactic (concrete) dependecies.

Moreover, the same techniques could be applied to more advanced versionsof slicing [69], such as dynamic slicing [47] or conditioned slicing [13].

4 The same problem would arise if concrete dependencies were computed semanti-cally. For example, in the program x := y ; v := x − x, the variable v does notdepend, semantically, on x, then the first statement should be deleted from theslice.

6

Code certification for Abstract

Non-Interference

No one can build his security upon the nobleness of another person.

Willa Cather (1873 - 1947)

I love quotations because it is a joy to find thoughts one might have,beautifully expressed with much authority by someone recognizedwiser than oneself.

Marlene Dietrich (1901 - 1992)

This chapter discusses how to apply the Information Flow analysis out-lined in Chapter 5 to code certification. An architecture in the style of Proof-Carrying code (Section 3.5) is introduced. In order to be executed by mis-trustful users (that is, users which are not a priori confident in the safety ofprograms), programs must be supplied together with a correctness proof forthe desired security policy: Abstract Non-Interference.

Security checking of source code programs is a valid tool for programmersto design safe software. Yet, in general, programs are sold or provided intheir executable form. Therefore, an external user which wants to safely runa program will reasonably require more than a correctness proof of the sourcecode, since he or she is not guaranteed that the security properties of thesource program are preserved in the executable code.

Since we are dealing with Java-like JL programs, software is not suppliedto users as native, machine-dependent code. Instead, Java source code is com-piled to bytecode (Section 2.3), which is executed by the Java Virtual Machine(JVM [52]) interpreter. The problem is to verify that bytecode programs (com-ing as sets of .class files) indeed satisfy the security requirements. The firststep is to define an algorithm for checking ANI on bytecode.

100 6 Code certification for ANI

6.1 Analyzing bytecode programs

The syntax and semantics of bytecode are described in Section 2.3. It shouldbe clear that bytecode, although designed as the target language of Javacompilers, is not forced to be produced from Java source code. The JavaVirtual Machine knows nothing of the Java programming language, only of aparticular binary format: the .class file format. Bytecode files can be alsoobtained from other programming languages or techniques; the JVM does notneed to know the origin of .class files. The possibility to use bytecode in non-Java programming frameworks considerably increases the range of applicationfor this security analysis.

6.1.1 Bytecode boolean functions

Boolean functions will be used to describe the information flow behavior ofbytecode programs, similarly to their role in source code analysis. Before de-scribing how boolean functions are computed, we introduce some new booleanvariables for describing semantic objects1taking part to program execution.

Boolean variables can represent field references (that is, a subset of theelements R[j] for a Runtime Constant PoolR ∈ R); operand stack values(that is, the k-th element O[k] in some operand stack O ∈ O), local variables(that is, X [l] for X ∈ Xloc). Given an operand stack O and an instruction ι, τiand τo are, respectively, the index of the top element of the stack before andafter ι. The variables O[j]i and O[j]o denote the input and output instancesof the value at position j on the stack. Typically, O[τi]i and O[τo]o represent,respectively, the boolean variables corresponding to the top of the stack oninput and output. The elements under the top of the stack are denoted byO[τt−k]t for some k > 0. Substitutions [ψ]

xt

yuand [ψ]

tu are defined as in source

code analysis (Definition 5.17).

Definition 6.1. When two formulæ ψ and ψ′ are sequentially combined, someof the output variables of ψ need to have their subscript replaced by a fresh t.The operator · encodes the concatenation of boolean formulæ.

ψ · ψ′ = [ψ]ot ∧ ψ′ ∧

∧

vt /∈vars(ψ′)

vt ↔ vo

The equivalences vt ↔ vo are added for variables which are not affected by ψ′.The substitution [ψ′]

it is not necessary since it is explicit in the rules below.

6.1.2 A simple source code compiler

A naıf, non-optimized compiler is presented. The process of compilation isschematized as C

(

sS)

= sB , where sS is a source code statement and sB is

1 Here, the word object does not refer, specifically, to a class instance.

6.1 Analyzing bytecode programs 101

the result of compiling sS . In the rules, jv stands for the index of v in the setof local variables X ; hy stands for the index of a reference to y (which can be,for example, a class or a method) in the Runtime Constant Pool R.

C (v) = aload jv

C(

new C(e1 . . . ek))

= new rC ; dup; C(

e1)

; . . . ; C(

ek)

;

invokespecial rC . initC (v .f) = C (v) ; getfield rf

C(

v .m(e1 . . . ek))

= C (v) ; C(

e1)

; . . . ; C(

ek)

; invokevirtual rm

C (v=e) = C (e) ; astore jv

C (v .f=e) = C (v) ; C (e) ; putfield rf

C (s1 ; s2) = C (s1) ; C (s2)

C ( if (e′ == e′′) s1 else s2) = C (e′) ; C (e′′) ; if acmpneq p; C (s1) ;

goto q; p : C (s2) ; q : . . .

C (while (e′ == e′′) s) = p : C (e′) ; C (e′′) ; if acmpneq q;

C (s) ; goto p; q : . . .

6.1.3 The Control Flow Graph

In a bytecode program, statements are not organized in a block structure (asin while (e) {...} ). Instead, the control flow is regulated by labels and jumps.Consequently, program analysis cannot be purely compositional.

The instruction sequence of a method body can be schematized in a Con-trol Flow Graph (CFG), in which blocks of sequential instructions are linkedby jump edges. Bytecode jump instructions are goto (unconditional jump),if acmpeq (comparison for equality on a reference value), ifnull (comparisonfor equality with the null value), if icmpeq (comparison for equality on inte-gers), and ifeq (comparison for equality with zero). We ignore the last twoinstructions because we are not interested in primitive types.

CFG jump edges

Every bytecode instruction ι has at most two successors, that is, instructionswhich can be possibly executed after ι. For non-jump instructions, the suc-cessor of ι is the instruction directly following it in the code. Control flowmust be represented as a graph having, as vertexes, sequence of non-jumpinstructions (blocks), and, as edges, the elements of the successor relation in-duced by jumps. A block b can end in a non-jump instruction ι (for example,if the successor of ι is the target of some jump) or in a jump. If ι is non-jumpor goto, then b has one successor block; otherwise, b has two successors (thebranches of the conditional jump).


Example. Let the block structure of a program P be represented by theControl Flow Graph in Figure 6.1. In this graph, the most common blockstructures are exemplified. 2

if...

if...

goto

goto

non jump

non jump

Fig. 6.1. An example of CFG

Implicit flows

An edge ω from a block b to b′ (written bω→ b′) is labeled with the set δω

of identifiers which can affect the execution of b′. Intuitively, sets δω play therole of refε (e) in the definition of Ψ (Definition 5.22) for the conditionalstatement. For example, if, at the end of b, execution branches on x, then xbelongs to the δ labels of both edges exiting from b, possibly together withidentifiers inherited from higher-level (with respect to the program structure)conditional jumps.

More formally: let b be a block ending in ι. b inherits from its predecessorblocks a set δb of identifiers, defined as: ∪({δω | b′

ω→ b}); it is the set of identi-

fiers whose value can possibly determine whether b will be actually executed.Sets of branching identifiers for b are computed as follows, depending on ι.

• ι is not a jump: b has an only outgoing edge ω with δω = δb.• ι = goto: b has an only outgoing edge ω with δω = δb (no additional

branching can be done).• ι = if acmpeq: there are two outgoing edges ω′ and ω′′, with δω′ = δω′′ =

δb∪{O[τi], O[τi−1]}, where O is the operand stack when executing ι. Thismeans that branching on comparison depends on both compared values.


• ι = ifnull : both outgoing edges are labeled with δ = δb∪{O[τi]} (branchingonly depends on one identifier).

The initial block of a method instance d (that is, the first one to be ex-ecuted) is also equipped with a set δ of branching identifiers: in facts, theexecution of d could be conditioned by control flow in the method d′ whichinvokes d.Example. Edges in the block structure of Example 6.1.3 are labeled withbranching variables as shown in Figure 6.2. The sets δ′ and δ′′ are the set ofrelevant variables for the boolean guards. 2

δ′

δ′′

δ′ δ′

δ′ ∪ δ′′ δ′ ∪ δ′′

Fig. 6.2. Branching variables

Control Flow Graph analysis

Let Ψb (◦) the boolean function computed for a block b, as defined in Sec-tion 6.1.4. Ψb (ψ0(b)) is obtained by applying the abstract semantics for theinstruction sequence of b, starting by

ψ0(b) =∨

{b′|b′ω→b}

Ψb′ (◦)

where (◦) is the initial boolean function for b′. The meaning of this formula isthat b can be executed after one of its predecessors, then it cannot be decidedstatically which boolean function has to be taken. Computing Ψ obviouslyinvolves a fixpoint computation; then, a suitable technique is needed in orderto resolve cycles in the successor relation.


6.1.4 Analysis of blocks

Since exceptions are not considered, instructions inside a block are alwaysexecuted sequentially.

The abstract semantics

Instructions inside a block b are considered sequentially via the abstract se-mantics J·K〈·,[·]·〉, which operates on states SF × FB × ℘(B). The analysis of

the block starts in a state 〈F, [ψ]δ〉 where

• F is the (concrete) framestack.• ψ is the initial boolean function, obtained by disjunction of functions com-

puted in the predecessor blocks p[b]: ψ =∨

b′∈p[b] Ψb′ . If no predecessor

blocks exist, then ψ =∧

v vo ↔ vi.• δ is the set of inherited branching identifiers, as defined above. It is not

modified inside the block.

Concrete semantic components are only shown for clarity.

Definition 6.2. The formula xt ↔δ yu stands for

xt ↔

(

yu ∨

(

∨

v∈δ

vi′

))

meaning that a modified identifier x depends on the assigned value (containedin y) and on the branching identifiers of the current block. The subscripti′ is used in order to preserve the input value of branching identifiers fromassignments possibly occurring inside the block. At the end of the block, theboolean function ψ, computed after the last instruction, is modified into ψ′ asfollows:

Ψb = ψ′ = ψ ∧

(

∧

v∈δ

vi ↔ vi′

)

The following paragraphs outline the information flow analysis of bytecodenon-jump instructions, divided in categories.

Instructions working on the operand stack

These instructions only modify the current operand stack by using informationwhich is contained in the stack itself.


Jaconst nullK〈〈X,O,R〉,[ψ]δ〉= 〈〈X,null : O,R〉 , [ψ · ¬O[τo]o]δ〉

JdupK〈〈X,v:O,R〉,[ψ]δ〉= 〈〈X, v : v : O,R〉 , [ψ · O[τo]o ↔ O[τi]i]δ〉

JnopK〈〈X,O,R〉,[ψ]δ〉= 〈〈X,O,R〉 , [ψ]δ〉

JpopK〈〈X,v:O,R〉,[ψ]δ〉= 〈〈X,O,R〉 , [ψ]δ〉

JswapK〈〈X,v1:v2:O,R〉,[ψ]δ〉= 〈〈X, v2 : v1 : O,R〉 , [ψ · ψ′]δ〉

ψ′ = O[τo]o ↔ O[τi − 1]i ∧ O[τo − 1]o ↔ O[τi]i

The computed boolean function keeps track of the modifications on the stack:for example, in swap, the information flow properties of stack values areswapped. When pushing the null constant, the resulting stack value cannotcontain secrets. δ is not used for this category of instructions, since stackvalues cannot cause interference unless they are written in local variables orfields. Note that, in JdupK〈〈X,v:O,R〉,[ψ]δ〉

, the stack indexes τo and τi are not

the same: τi = τo − 1 since an additional value has been pushed.

Instructions loading from/storing to local variables

Load and store instructions take, as a parameter, the index of the local vari-able on which they work. Information flow analysis simply keeps track of themoving of data to/from local variables from/to the operand stack.

Jaload jK〈〈X,O,R〉,[ψ]δ〉= 〈〈X,X [j] : O,R〉 , [ψ · O[τo]o ↔ X [j]t]δ〉

Jastore jK〈〈X,v:O,R〉,[ψ]δ〉= 〈〈X [j ← v] , O,R〉 , [ψ · X [j]o ↔δ O[τt]t]δ〉

The set of branching variables δ is used in the definition of astore becauselocal variables can generate flows inside the method body.

Instructions reading or writing fields

Getting from/putting in object fields is done by resolving the informationspecified in the constant pool in order to have a reference to the field. Staticfields do not have on the stack a reference to the object containing desireddata; instead, the constant pool contains a reference to the class.

J getfield jK〈〈X,o:O,R〉,[ψ]δ〉= 〈〈X,u : O,R〉 , [ψ · O[τo]o ↔ R[j]i]δ〉

J getstatic jK〈〈X,O,R〉,[ψ]δ〉= 〈〈X,u : O,R〉 , [ψ · O[τo]o ↔ R[j]i]δ〉

Jputfield jK〈〈X,v:o:O,R〉,[ψ]δ〉= 〈〈X,O,R〉 , [ψ · (O[τi]i ↔δ A)]δ〉

A =∧

{bo|b ∈ sh (j, o)}

Jputstatic jK〈〈X,v:O,R〉,[ψ]δ〉= 〈〈X,O,R〉 , [ψ · (O[τi]i ↔δ B)]δ〉

B =∧

{bo|b ∈ sh (j)}


In the put instructions, sh is the set of identifiers which can be affected by amodification in j.

Method invocation

In the static analysis for Abstract Non-Interference, instructions for methodinvocation do not really execute methods: the operand stack is modified as ifthe method had really been executed and the computed boolean function isthe one associated with the method itself. Rules for this kind of instructionsare slightly different, depending on whether the invoked method returns avalue.

• Methods returning a value:

J invokestatic jK〈〈X,v1:..:vk:O,R〉,[ψ]δ〉=

⟨

〈X,u : O,R〉 ,

[

ψ ·[

[

Ψ δ(R[j])]τo

O[τo]o

]i

t

]

δ

⟩

J invokevirtual jK〈〈X,o:v1:..:vk:O,R〉,[ψ]δ〉=

⟨

〈X,u : O,R〉 ,

[

ψ ·

(

∨

C

[

[

Ψ δ(R[j]C)]τo

O[τo]o

]i

t

)]

δ

⟩

C ranges over the possible classes o can have at runtime. In static methodinvocation, the class is statically decided.

• Methods with void return type:

J invokestatic jK〈〈X,v1:..:vk:O,R〉,[ψ]δ〉=

⟨

〈X,O,R〉 ,[

ψ ·[

Ψ δ(R[j])]i

t

]

δ

⟩

J invokevirtual jK〈〈X,o:v1:..:vk:O,R〉,[ψ]δ〉=

⟨

〈X,O,R〉 ,[

ψ ·(

∨

C

[

Ψ δ(R[j])]i

t

)]

δ

⟩

u is the return value of the method, when it exists. τ is a special variablewhich is also used in analyzing return instructions: it indicates the top ofthe operand stack which is in frame directly below the current frame in theframestack. A boolean function Ψ(d) is referred for every method instanced which is compatible with the current method invocation. We can considerΨ(d) to be parametric on the set of branching identifiers δ (i.e. Ψ(d) = λδ. ψwhere ψ ∈ FB). When the method d is called, the current value of δ replacesthe formal parameter in Ψ(d), yielding Ψ δ(d).

Remark 6.3. Substitution of δ inside Ψ(d) is performed only once. That is,further method invocations inside d are not taken into account. This avoidstermination problems due to recursion.

Example. Let Ψ(d) be ψ′∧ψ′′(Ψ(d′)), where ψ′′(Ψ(d′)) means that ψ′′ dependson the boolean function of another method d′. Then, Ψ δ(d) is obtained by


replacing placeholders for branching variables with δ. This substitution isonly syntactic and is not performed recursively on Ψ(d′). 2

Instructions returning from method execution

Rules for return instructions simply keep the security value on top of theoperand stack and put it on top of the stack of the underlying frame.

JareturnKfi

〈X,v:O,R〉

〈X′,O′ ,R′〉,[ψ]δ

fl = 〈〈X ′, v : O′, R′〉 , [ψ · τ o ↔ O[τi]i]δ〉

JreturnKfi

〈X,O,R〉

〈X′,O′ ,R′〉,[ψ]δ

fl = 〈〈X ′, O′, R′〉 , [ψ]δ〉

O[τi]i is the returned value. The boolean variable τ o represents the top ofthe stack under the current frame; it is dual to O[τi]i in the invokestatic andinvokevirtual instructions for non-void return type methods.

Other instructions pushing values on the operand stack

Both ldc (pushing a value from the runtime constant pool) and new (creatinga new object reference) instructions push a value on the operand stack. Inthe first case, the pushed value does not contain secrets, since it is constantwith respect to types. In the second case, the value is secret-free if its class isground (that is, has no subclasses).

Jldc jK〈〈X,O,R〉,[ψ]δ〉= 〈〈X,R[j] : O,R〉 , [ψ · ¬O[τo]o]δ〉

Jnew jK〈〈X,O,R〉,[ψ]δ〉= 〈〈X, o : O,R〉 , [ψ · A]δ〉

A ≡ ¬O[τo]o the new object has a ground class

A ≡ O[τo]o otherwise

6.1.5 Expressions

Our information flow analysis of source code relies on the function ref (e),which computes the set of relevant identifiers with respect to the abstractvalue of e (Section 5.2). We assume there is a similar function for bytecode,which uses local variables and field references instead of source code identifier.Compiling an assignment v=e results in the bytecode instruction sequence

C (e) ; astore jv

where jv is the index of v in the set of local variables (compiling a fieldassignment v .f=e is similar). Immediately before the store instruction, thecomputed boolean function has the form


ψ = [ψ0]ot ∧ [Ψe (e)]it

Ψe (e) = O[τo]o ↔∨

{xi | x ∈ refε0 (e)}

In the above formulæ, ψ0 and ε0 are, respectively, the boolean function andthe type environment computed before the assignment. The boolean variableO[τo]o corresponds to the result value of e. It has the same role as reto insource code analysis. The definition of Ψe (e) is clearly equivalent to its sourcecode counterpart.

6.1.6 Program analysis

A bytecode program comes as a collection of .class files, each containing a setof method instances. Program analysis for Abstract Non-Interference worksin the same way as for Java source code (Section 5.4.2). At the beginning, allmethod instances d are assigned to the boolean function

Ψ(d) = ψ⊥ =∧

v∈B

vo ↔δ vi

so that methods are supposed to be equivalent, with respect to informationflow properties, to nop. Note that every function Ψ(d) is parametric on a setδ of branching variables (see the rules for method invocation instructions, inSection 6.1.4).

6.2 The relation between source code and bytecode

Under the assumption that ref is computed in the same way, the analysis ofbytecode is equivalent to its source code counterpart.

Theorem 6.4. A source code program P is accepted, with respect to someANI security policy, if and only if its compiled version C (P) is accepted bythe bytecode analysis on the same policy.

To informally prove this, we examine three kinds of statements. We writeb′ ≈S,B b′′ when the (source code) boolean variable b′ plays (informally) thesame role as the (bytecode) boolean variable b′′. For example, after evaluatingan expression, both reto and O[τo]o represent the computed value.

6.2.1 Variable assignment

The statement s = v=e is compiled to C (e) ; astore jv . The Ψ function forsource code computes the boolean function

ψS = [Ψe (e)]ot ∧ rett ↔ vo

6.2 The relation between source code and bytecode 109

where shv = {v}. On the other hand, bytecode analysis yields

ψB = ψ0 · O[τt]t ↔δ X [jv]o

which is equivalent to ψS since:

• rett ≈S,B O[τt]t and vo ≈S,B X [jv]o;• [Ψe (e)]

ot is left implicit in ψB (it is incorporated into the initial function

ψ0);• conversely, the initial boolean function is implicit in ψS ;• implicit flows δ are not accounted for in ψS because of the compositional

structure of high-level programs (they will computed for the conditionalstatement surrounding s).

6.2.2 Field assignment

A field assignment s = v .f=e is compiled as

C (v .f=e) = C (v) ; C (e) ; putfield rf

The computed boolean function for source code and bytecode is

ψS = [Ψe (e)]ot ∧ rett ↔

(

∧

{yo|y ∈ shε (v .f)})

ψB = ψ0 · O[τi]i ↔δ

(

∧

{yo|y ∈ shε (rf , v)})

Similarly to local variable assignment, equivalence follows by ≈S,B; clearly,sh (rf , v) must be consistent with sh (v .f).

6.2.3 Conditional statement

Let s be the statementif (e) s1 else s2

It is compiled as

C (s) = Ep; C (s1) ; goto q; p : C (s2) ; q : . . .

where Ep is the result of compiling e, together with the test instruction. If e isfalse, execution continues at the statement labeled by p. The boolean functionfor s is

(Ψ (s1) ∨Ψ (s2)) [vars (e) def (s1) ∪ def (s2)]

On the other hand, the bytecode block b beginning at q has, as predecessors,the blocks b1 = C (s1) and b2 = C (s2). The initial function ψ0 of b is com-puted as the disjunction of the final boolean functions of b1 and b2, which isequivalent to Ψ (s1)∨Ψ (s2). Implicit flows are equivalent since b1 and b2 takeδ = ref (e) as their branching variables.


6.2.4 Non-compiled bytecode

In general, a user which wants to execute safe programs (see next section) isnot supposed to believe that the bytecode form of the program is the resultof a correct compiling process. As mentioned before, the JVM does not makeassumptions on how .class files are produced.

Therefore, the Control Flow Graph of programs is not forced to be astraightforward translation of high-level block structure. This is not a problemsince bytecode ANI analysis does not rely on such an assumption. It candeal with arbitrary, meaningful bytecode programs, which are a superset ofprograms obtained by compiling.

6.3 Towards a Proof-Carrying code architecture

In a Proof-Carrying code architecture (Section 3.5) for a Java-like program-ming language, the code user (CU) receives the program to be executedfrom the code producer (CP), in form of .class files. He or she wants tobe guaranteed that the program execution is safe with respect to AbstractNon-Interference for a given abstract property of data.

6.3.1 What the code user receives

Our PCC-like architecture follows the approach outlined in [15] (Section3.5.2). The following components are provided to the code user:

• The program P , as a set of .class files.• The partition {H,L} of data.• The abstract property ρ. It describes the abstraction level of attackers on

properties of program classes. Basically, in order to describe ρ, it is suf-ficient to provide an implementation of the abstract dependency calculus(see ids() and ref in Section 5.2.5). The dependency calculus is the onlycomponent of the code certification architecture which is affected by ρ.

• An implementation of the ANI() algorithm for bytecode, together with asoundness proof.

• A ψ-state ΨCP (P), which is claimed to describe the information flow be-havior of P .

6.3.2 What the code user should verify

The code producer provides, together with the program code, the assertionmy program satisfies ANI for ρ, L and H (which were made available withthe code). This assertion has the form of the ψ-state ΨCP (P). The code userneeds to check the assertion in order to safely execute the code.

6.3 Towards a Proof-Carrying code architecture 111

The security policy

The ANI security policy is encoded in ρ and can be fully specified by analgorithm for computing abstract dependencies (essentially, the function ref).We assume that ρ models a security requirement from the user (that is, aproperty the user considers as necessary in order to accept the program). CUsends ref to CP, which computes abstract dependencies accordingly. ref canbe considered as a trusted component: CU does not need to check it. Thesame algorithm is used by CU to verify ΨCP (P).

The boolean function

The code user receives the ψ-state ΨCP (P) as a representation of the securityproperties of P . Then, CU should verify:

• that Ψ0 (P), obtained from ΨCP (P ,main) with the provided H and L, isindeed unsatisfiable (Definition 5.18), that is,

(

∧

x∈L

¬xi ∧ ΨCP (P ,main) ∧∨

x∈H

xo

)

has no models

• that ΨCP (P) correctly describes the information flow behavior of P .

The first task could be accomplished by means of some algorithm for de-ciding unsatisfiability of formulæ. However, due to the complexity of such analgorithm, the burden of proving unsatisfiability is left to the code producer,following the Proof-Carrying code style. An UNSAT proof is then attached toΨCP (P); the code user only needs to verify it.

The validity of ΨCP (P) with respect to P is checked by running, on theuser side, the ANI analysis of P . The algorithm ANI() for bytecode programsis used; it can be either provided by CP (with a soundness proof) or bepossessed by CU (and be a trusted component of the PCC architecture).

Then, the code user needs to execute ANI() on the code. However, he orshe does not need to perform the whole (fixpoint) computation: it is sufficientto verify that the provided ψ-state ΨCP (P) is indeed the result of executingANI(P). To do so, one single iteration of ANI() main loop is needed, that is,an execution of analyze(main,ΨCP (P)). If ΨCP (P) happens to be a fixpoint(that is, analyze() does not change the ψ-state), then it is accepted as a correctdescription of P .

7

Conclusions

Arguments are to be avoided; they are always vulgar and often con-vincing.

Oscar Wilde

After all, all he did was string together a lot of old, well-known quo-tations.

H. L. Mencken (1880 - 1956), on Shakespeare

A static analyzer

The main purpose of this work was to provide a characterization of AbstractNon-Interference which is more likely to be automatizable than the originaldefinition. As pointed out several times throughout previous chapters, themain difficulty is in representing and computing abstract properties.

The ANI analyzer describes the information flow behavior of programsby means of boolean functions, extending to a more realistic programminglanguage previous work which uses the same technique [28]. What makes adifference between this algorithm and an analogous version dealing with stan-dard information flow mainly affects the ref function and the calculus ofdependencies. If standard Non-Interference is considered, then ref (e) comesto be simply the set of identifiers which occur in e. Conversely, abstract de-pendencies are selective in the choice of relevant identifiers: some of themplay no role in determining the abstract value of e. Our calculus of abstractdependencies clearly needs to be fully specified and made practical by meansof optimization steps and a good choice for the abstract evaluation of expres-sions. However, no other technique to achieve this result was found in theliterature (see the discussion in Section 3.6.2).

114 7 Conclusions

In addition, it is shown how abstract dependencies are the first step indefinining the promising techinque of abstract slicing. Abstract slicing stillneeds a comprehensive formulation. However, we believe many of the existingframeworks can be reused in order to develop a variant of program slicing forapproximated properties of data.

Chapter 4 proposes a notion of Abstract Non-Interference which can be ap-plied to realistic Object-Oriented programs. Aliasing is accounted for throughthe computation of sharing identifiers. A clever technique is needed not to beoverly conservative with respect to aliasing (see Sections 3.6.1 and 5.1.3), thatis, to assert the mutual independence of identifiers which cannot share anymemory location.

Code certification

Some work is still to be done in the direction of a Proof-Carrying code architec-ture for the certification of Abstract Non-Interference in Java-like programs.

A translation of boolean function analysis has been outlined in Section 6.2.Yet, a complete specification of the PCC architecture must be provided (seeSection 6.3.1), in particular, of its components. For example, (i) the represen-tation of the ψ-state which comes together with the bytecode should be madepractical and cheap; (ii) the implementation of ref should be fully specifiedand optimized; (iii) moreover, the implementation of ANI() should be madepart of the architecture, and a suitable representation should be provided forits soundness proof.

A

The abstract dependency calculus: Haskell

Implementation

Below, we give the Haskell [62] code for the dependency calculus introducedin Section 5.2. For brevity, we do not include the definitions of the abstractdomain (that is, the type ABSVAL ans its operators), so that the followingcode cannot be properly considered as a working implementation.

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− a v a r i a b l e i s denoted by a numeral indextype VAR = Int

−− a tomic i t y : an a b s t r a c t va lue i s atomic i f f i t has no−− subva lue s −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− the type ABSVAL i s im p l i c i t −−−−−−−−−−−−−−−−−−−−−−−−−atomic : : ABSVAL −> Boolatomic v = null ( subValues v )

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Environments−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−− environments map v a r i a b l e s to a b s t r a c t va lue s −−−−−−−type ENV = VAR −> ABSVAL

−− updat ing o f an environment at the v a r i a b l e x wi th v −update : : ENV −> VAR −> ABSVAL −> ENVupdate env x v = \ y −> i f ( y==x ) then v else ( env y)

−− computes the s e t o f environments env0 which are equa l−− to env on y <> x , and such t ha t env0 ( x ) < env ( x) −−−−subs : : ENV −> VAR −> [ENV]subs env x =

map ( \ v −> update env x v ) ( subValues ( env x ) )

116 A The abstract dependency calculus: Haskell Implementation

−− checks i f an environment i s atomic on each v a r i a b l e −−− o f an expre s s i on −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−atomicEnv : : ENV −> EXPR −> BoolatomicEnv env e = a l l ( \ x −> atomic ( env x ) ) ( vars e )

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Abstrac t e v a l u a t i on−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−− an expre s s i on can be an a b s t r a c t constant , a var , −−−−− a sum , a product or a p r o j e c t i on −−−−−−−−−−−−−−−−−−−−data EXPR = CONST ABSVAL

| VARIABLE Int| SUM EXPR EXPR| PROD EXPR EXPR| FST EXPR EXPR| SND EXPR EXPR

−− computes the s e t o f v a r i a b l e s o f an expre s s i on −−−−−−vars : : EXPR −> [VAR]vars (CONST v ) = [ ]vars (VARIABLE x ) = [ x ]vars (SUM e1 e2 ) = union ( vars e1 ) ( vars e2 )vars (PROD e1 e2 ) = union ( vars e1 ) ( vars e2 )vars (FST e1 e2 ) = union ( vars e1 ) ( vars e2 )vars (SND e1 e2 ) = union ( vars e1 ) ( vars e2 )

−− e v a l u a t e s an expre s s i on in an a b s t r a c t environment −−−− ( r e l i e s on a b s t r a c t opera t i ons on va lue s)−−−−−−−−−−−eSharp : : EXPR −> ENV −> ABSVALeSharp (CONST v ) env = veSharp (VARIABLE x ) env = env xeSharp (SUM e1 e2 ) env =

aSum ( eSharp e1 env ) ( eSharp e2 env )eSharp (PROD e1 e2 ) env =

aProd ( eSharp e1 env ) ( eSharp e2 env )eSharp (FST e1 e2 ) env = eSharp e1 enveSharp (SND e1 e2 ) env = eSharp e2 env

−− computes the l e a s t upper bound o f the e v a l u a t i on o f e−− ( by means o f eSharp2 ) on a l l the subs o f env on x −−−−− i f no sub inpu t s e x i s t , top i s re turned ( note that , −−−− usua l l y , bottom i s taken as the lub o f an empty s e t ;−− t h i s cho i c e depends on having de f ined subs by means −−− o f < i n s t ead <=, f o r terminat ion prob lems)−−−−−−−−−−

A The abstract dependency calculus: Haskell Implementation 117

evalSubs : : EXPR −> ENV −> VAR −> ABSVALevalSubs e env x = l e t ch = subs env x in

i f ( null ch ) then TOPelse setLub (map ( \ env0 −> ( eSharp2 e env0 ) )

( subs env x ) )

−− a b s t r a c t e v a l u a t i on o f e x p r e s s i on s−−−−−−−−−−−−−−−−−−−− un l i k e eSharp , here the e v a l u a t i on can take advantage−− o f the e v a l u a t i on o f e on the sub inpu t s o f env −−−−−−eSharp2 : : EXPR −> ENV −> ABSVALeSharp2 e env = i f ( atomicEnv env e ) then ( eSharp e env )

else g lb ( eSharp e env )( setGlb (map ( evalSubs e env ) ( vars e ) ) )

−− checks i f the eva lua t ed expre s s i on i s atomic −−−−−−−−−− and re turns the computed a b s t r a c t va lue −−−−−−−−−−−−−at : : EXPR −> ENV −> (ABSVAL, Bool)at e env = l e t ee = ( eSharp2 e env ) in ( ee , atomic ee )

−− computes the union o f the s e t s o f r e l e v an t v a r i a b l e s−− which are ob ta ined by the sub inpu t s o f env−−−−−−−−−−−− i f no sub inpu t s e x i s t , r e tu rns the whole s e t o f −−−−−−− v a r i a b l e s ( which i s n eu t r a l in the i n t e r s e c t i o n −−−−−−− performed in s i d e r e f 3)−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−re fSubs : : EXPR −> ENV −> VAR −> [VAR] −> [VAR] −> [VAR]re fSubs e env z xs ys =

l e t ch = ( subs env z ) ini f ( null ch ) then ys

else setUnion (map ( \ env0 −> snd( r e f 3 e env0 ( union [ z ] xs ) ys ) ) ch )

−− computes the r e l e v an t v a r i a b l e s o f an expre s s i on −−−−−− i f the expre s s i on i s cons tant and atomic ( no r e l e v an t−− v a r i a b l e s ) , i t i s rep laced , in the re turned value , −−−− by a constant −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− ys i s the s e t o f vars which are p o s s i b l y r e l e v an t −−−−− ( some v a r i a b l e s may have been ru l ed out by prev ious −−− computat ions)−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−r e f 3 : : EXPR −> ENV −> [VAR] −> [VAR] −> (EXPR, [VAR] )r e f 3 e env xs ys =

l e t ( vv , bb ) = ( at e env ) in i f bb then (CONST vv , xs )else ( e , foldr ( \ y −>

\ zs −> re fSubs e env y xs zs ) ys ys )

−− subexp e x t r a c t i on ( on ly f o r s imp l i f y i n g the code) −−−

118 A The abstract dependency calculus: Haskell Implementation

sons : : EXPR −> (EXPR,EXPR)sons (SUM e1 e2 ) = ( e1 , e2 )sons (PROD e1 e2 ) = ( e1 , e2 )sons (FST e1 e2 ) = ( e1 , e2 )sons (SND e1 e2 ) = ( e1 , e2 )

−− b u i l d s a compound expre s s i on wi th e1 and e2 , a f t e r −−−− the s t r u c t u r e o f i t s f i r s t parameter ( on ly f o r −−−−−−−− s imp l i f y i n g the code)−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− proper t y : recombine e ( sons e ) == e−−−−−−−−−−−−−−−−−recombine : : EXPR −> (EXPR,EXPR) −> EXPRrecombine (SUM ea eb ) ( e1 , e2 ) = (SUM e1 e2 )recombine (PROD ea eb ) ( e1 , e2 ) = (PROD e1 e2 )recombine (FST ea eb ) ( e1 , e2 ) = (FST e1 e2 )recombine (SND ea eb ) ( e1 , e2 ) = (SND e1 e2 )

−− main func t i on −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− does recur s i on on the s t r u c t u r e o f the expre s s i on −−−−− t a ke s advantage o f the s u b s t i t u t i o n s which are −−−−−−−− p o s s i b l y operated by r e f 3 −−−−−−−−−−−−−−−−−−−−−−−−−−−r e f 2 : : EXPR −> ENV −> [VAR] −> (EXPR, [VAR] )r e f 2 (CONST v ) env ys = r e f 3 (CONST v ) env [ ] [ ]r e f 2 (VARIABLE z ) env ys = r e f 3 (VARIABLE z ) env [ ] [ z ]r e f 2 e env ys =

l e t ( e1 , e2 ) = sons e inlet ( ( ee1 , r1 ) , ( ee2 , r2 ) ) = ( r e f 2 e1 env ys ,

r e f 2 e2 env ys ) inlet ee = recombine e ( ee1 , ee2 ) in

r e f 3 ee env [ ] ( union r1 r2 )

−− the most genera l a b s t r a c t environment : no in format ionbaseEnv : : ENVbaseEnv = \ i −> TOP

−− c a l l s r e f 2 on the genera l environment−−−−−−−−−−−−−−−r e f : : EXPR −> (EXPR, [VAR] )r e f e = r e f 2 e baseEnv [ ]

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− The expre s s i on in the example −−−−−−−−−−−−−−−−−−−−−−−exp = (SUM (PROD (CONST POSEVEN) (VARIABLE 1 ) )

(SUM (PROD (CONST POSEVEN)(PROD (VARIABLE 2 ) (VARIABLE 2 ) ) )

(VARIABLE 3 ) ) )

References

1. Martın Abadi. Secrecy by typing in security protocols. Journal of the Associa-tion for Computing Machinery, 46(5):749–786, September 1999.

2. Martın Abadi, Anindya Banerjee, Nevin Heintze, and Jon G. Riecke. A corecalculus of dependency. In Conference Record of the Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages147–160, San Antonio, Texas, USA, January 1999. ACM Press, New York, NY.

3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers - Principles,Techniques and Tools. Addison-Wesley, 1985.

4. Torben Amtoft, Sruthi Bandhakavi, and Anindya Banerjee. A logic for infor-mation flow in object-oriented programs. In Simon P. Jones, editor, ConferenceRecord of the Annual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages (POPL), Charleston, South Carolina, USA, January2006. ACM Press, New York, NY.

5. Torben Amtoft and Anindya Banerjee. Information flow analysis in logicalform. In Roberto Giacobazzi, editor, Proceedings of the Symposium on StaticAnalysis (SAS), volume 3148 of Lecture Notes in Computer Science, pages 100–115, Verona, Italy, August 2004. Springer-Verlag, Berlin.

6. Anindya Banerjee and David A. Naumann. Secure information flow and pointerconfinement in a Java-like language. In Proceedings of the IEEE ComputerSecurity Foundations Workshop (CSFW), pages 253–267, Cape Breton, NovaScotia, Canada, June 2002.

7. Anindya Banerjee and David A. Naumann. Stack-based access control andsecure information flow. Journal of Functional Programming, 2(15):131–177,March 2005.

8. Gilles Barthe, Amitabh Basu, and Tamara Rezk. Security Types PreservingCompilation. In Bernhard Steffen and Giorgio Levi, editors, Proceedings of theInternational Conference on Verification, Model Checking and Abstract Inter-pretation (VMCAI), volume 2934 of Lecture Notes in Computer Science, pages2–15, Venice, Italy, January 2004. Springer-Verlag, Berlin.

9. Gilles Barthe, Pedro R. D’Argenio, and Tamara Rezk. Secure information flowby self-composition. In Proceedings of the IEEE Computer Security FoundationsWorkshop (CSFW), pages 100–114, Pacific Grove, California, USA, June 2004.IEEE Computer Society Press.

120 References

10. Gilles Barthe and Tamara Rezk. Secure information flow for a sequential JavaVirtual Machine. Unpublished, 2004.

11. Gilles Barthe and Tamara Rezk. Non-interference for a JVM-like language. InProceedings of the ACM SIGPLAN International Workshop on Types in Lan-guages Design and Implementation (TLDI), pages 103–112, Long Beach, Cali-fornia, USA, 2005. ACM Press, New York, NY.

12. J. F. Bergeretti and B. A. Carre. Information-flow and data-flow analysis ofwhile-programs. ACM Transactions on Programming Languages and Systems,7(1):37–61, 1985.

13. Gerardo Canfora, Aniello Cimitile, and Andrea De Lucia. Conditioned ProgramSlicing. Information and Software Technology, 40:596–607, 1998.

14. Ian R. Cartwright and Matthias Felleisen. The semantics of program depen-dence. In Proceedings of the SIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI), pages 13–27, Portland, Oregon, USA, 1989.ACM Press, New York, NY.

15. B. Evan Chang, Adam Chlipala, and George Necula. A Framework for CertifiedProgram Analysis and Its Applications to Mobile-Code Safety. In Proceedingsof the International Conference on Verification, Model Checking and AbstractInterpretation (VMCAI), Lecture Notes in Computer Science, Charleston, SouthCarolina, USA, 2006. Springer-Verlag, Berlin.

16. Jong-Deok Choi, Michael Burke, and Paul Carini. Efficient flow-sensitive inter-procedural computation of pointer-induced aliases and side effects. In Proceed-ings of the SIGPLAN Conference on Programming Language Design and Im-plementation (PLDI), pages 232–245, Charleston, South Carolina, USA, 1993.

17. Patrick Cousot. Constructive design of a hierarchy of semantics of a transi-tion system by abstract interpretation (invited paper). In Steve Brookes andMichael W. Mislove, editors, Proceedings of the International Symposium onMathematical Foundations of Programming Semantics (MFPS), volume 6 ofElectronic Notes in Theoretical Computer Science. Elsevier, Amsterdam, 1997.URL: http://www.elsevier.nl/locate/entcs/volume6.html.

18. Patrick Cousot. Types as abstract interpretations, invited paper. In ConferenceRecord of the Annual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages (POPL), pages 316–331, Paris, France, January 1997.ACM Press, New York, NY.

19. Patrick Cousot and Radhia Cousot. Abstract interpretation: a unified latticemodel for static analysis of programs by construction or approximation of fix-points. In Conference Record of the Annual ACM SIGPLAN-SIGACT Sym-posium on Principles of Programming Languages (POPL), pages 238–252, LosAngeles, California, USA, 1977. ACM Press, New York, NY.

20. Patrick Cousot and Radhia Cousot. Systematic design of program analysisframeworks. In Conference Record of the Annual ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages (POPL), pages 269–282,San Antonio, Texas, USA, 1979. ACM Press, New York, NY.

21. Patrick Cousot and Radhia Cousot. Abstract interpretation and applicationto logic programs. Journal of Logic Programming, 13(2–3):103–179, 1992. (Theeditor of Journal of Logic Programming has mistakenly published the unreadablegalley proof. For a correct version of this paper, see http://www.di.ens.fr/

~cousot.).22. Patrick Cousot and Radhia Cousot. Abstract interpretation frameworks. Jour-

nal of Logic and Computation, 2(4):511–547, August 1992.

References 121

23. Patrick Cousot and Radhia Cousot. Comparing the Galois connection andwidening/narrowing approaches to abstract interpretation, invited paper. InMaurice Bruynooghe and Martin Wirsing, editors, Proceedings of the Inter-national Workshop on Programming Language Implementation and Logic Pro-gramming (PLILP), Lecture Notes in Computer Science, pages 269–295, Leuven,Belgium, 1992. Springer-Verlag, Berlin.

24. Patrick Cousot and Radhia Cousot. Higher-order abstract interpretation (andapplication to comportment analysis generalizing strictness, termination, pro-jection and PER analysis of functional languages), invited paper. In Henri Bal,editor, Proceedings of the International Conference on Computer Languages(ICCL), pages 95–112, Toulouse, France, May 1994. IEEE Computer SocietyPress.

25. Patrick Cousot and Radhia Cousot. Systematic design of program transforma-tion frameworks by abstract interpretation. In Conference Record of the AnnualACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages(POPL), pages 178–190, Portland, Oregon, USA, January 2002. ACM Press,New York, NY.

26. Alain Deutsch. Interprocedural may-alias analysis for pointers: beyond k-limiting. In Proceedings of the SIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI), Orlando, Florida, USA, 1994.

27. Gilberto File, Roberto Giacobazzi, and Francesco Ranzato. A unifying view onabstract domain design. ACM Computing Surveys, 28(2):333–336, 1996.

28. Samir Genaim, Roberto Giacobazzi, and Isabella Mastroeni. Modeling infor-mation flow dependencies with boolean functions. In IFIP WG 1.7, ACM SIG-PLAN and GI FoMSESS Workshop on Issues in the Theory of Security, pages55–66, 2004.

29. Samir Genaim and Fausto Spoto. Information Flow Analysis for Java Byte-code. In Radhia Cousot, editor, Proceedings of the International Conferenceon Verification, Model Checking and Abstract Interpretation (VMCAI), volume3385 of Lecture Notes in Computer Science, pages 346–362, Paris, France, 2005.Springer-Verlag, Berlin.

30. Roberto Giacobazzi and Isabella Mastroeni. Domain compression for com-plete abstractions. In Lenore Zuck, Paul Attie, Agostino Cortesi, and SupratikMukhopadhyay, editors, Proceedings of the International Conference on Verifi-cation, Model Checking and Abstract Interpretation (VMCAI), volume 2575 ofLecture Notes in Computer Science, pages 146–160, New York, NY, USA, 2003.Springer-Verlag, Berlin.

31. Roberto Giacobazzi and Isabella Mastroeni. Abstract non-interference: Pa-rameterizing non-interference by abstract interpretation. In Neil D. Jonesand Xavier Leroy, editors, Conference Record of the Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages186–197, Venice, Italy, January 2004. ACM Press, New York, NY.

32. Roberto Giacobazzi and Isabella Mastroeni. Proving abstract non-interference.In Proceedings of the Annual Conference of the European Association for Com-puter Science Logic (EACSL), volume 3210 of Lecture Notes in Computer Sci-ence, pages 280–294, Karpacz, Poland, 2004. Springer-Verlag, Berlin.

33. Roberto Giacobazzi and Isabella Mastroeni. Transforming semantics by abstractinterpretation. Theoretical Computer Science, 337(1-3):1–50, 2005.

34. Roberto Giacobazzi and Elena Quintarelli. Incompleteness, counterexamplesand refinements in abstract model-checking. In Patrick Cousot, editor, Proceed-

122 References

ings of the Symposium on Static Analysis (SAS), volume 2126 of Lecture Notes inComputer Science, pages 356–373, Paris, France, 2001. Springer-Verlag, Berlin.

35. Roberto Giacobazzi and Francesco Ranzato. Completeness in abstract interpre-tation: A domain perspective. In Michael Johnson, editor, Proceedings of theInternational Conference on Algebraic Methodology and Software Technology(AMAST), volume 1349 of Lecture Notes in Computer Science, pages 231–245,Sydney, Australia, 1997. Springer-Verlag, Berlin.

36. Roberto Giacobazzi and Francesco Ranzato. On the least complete extensionof complete subsemilattices. Algebra Universalis, 38(3):235–237, 1997.

37. Roberto Giacobazzi, Francesco Ranzato, and Francesca Scozzari. Building com-plete abstract interpretations in a linear logic-based setting. In Giorgio Levi,editor, Proceedings of the Symposium on Static Analysis (SAS), volume 1503 ofLecture Notes in Computer Science, pages 215–229, Pisa, Italy, 1998. Springer-Verlag, Berlin.

38. Roberto Giacobazzi, Francesco Ranzato, and Francesca Scozzari. Complete ab-stract interpretations made constructive. In Lubos Brim, Jozef Gruska, and JiriZlatuska, editors, Proceedings of the International Symposium on Mathemati-cal Foundations of Computer Science (MFCS), volume 1450 of Lecture Notes inComputer Science, pages 366–377, Brno, Czech Republic, 1998. Springer-Verlag,Berlin.

39. Roberto Giacobazzi, Francesco Ranzato, and Francesca Scozzari. Making ab-stract interpretations complete. Journal of the Association for Computing Ma-chinery, 47(2):361–416, 2000.

40. Joseph A. Goguen and Jose Meseguer. Security policies and security models.In Proceedings of the IEEE Symposium on Security and Privacy (SSP), pages11–20. IEEE Computer Society Press, 1982.

41. James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. The Java LanguageSpecification Second Edition. Addison-Wesley, 2000.

42. Robert Harper, Furio Honsell, and Gordon Plotkin. A framework for defininglogics. Journal of the Association for Computing Machinery, 40(1):143–184,January 1993.

43. Nevin Heintze and Jon G. Riecke. The SLam calculus: Programming withsecrecy and integrity. In Conference Record of the Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), SanDiego, California, USA, January 1998. ACM Press, New York, NY.

44. Hyoung Seok Hong, Insup Lee, and Oleg Sokolsky. Abstract Slicing: A new ap-proach to program slicing based on abstract interpretation and model checking.In Jens Krinke and Giulio Antoniol, editors, Proceedings of the InternationalWorkshop on Source Code Analysis and Manipulation (SCAM), pages 25–34,Budapest, Hungary, September 2005. IEEE Computer Society Press.

45. Susan Horwitz, Thomas W. Reps, and David Binkley. Interprocedural slicingusing dependence graphs. ACM Transactions on Programming Languages andSystems, 12(1):35–46, 1990.

46. Sebastian Hunt. Abstract Interpretation of Functional Languages: From The-ory to Practice. PhD thesis, Dept. of Computing, Imperial College of ScienceTechnology and Medicine, London, UK, 1991.

47. Bogdan Korel and Janusz Laski. Dynamic Program Slicing. Information Pro-cessing Letters, 29(3):155–163, 1988.

48. Butler W. Lampson. Protection. In Proceedings of the Princeton Symposium onInformation Sciences and Systems (ISS), pages 437–443, Princeton University,

References 123

March 1971. Reprinted in Operating Systems Review, vol. 8, no. 1, pp. 18–24,Jan. 1974.

49. Loren Larsen and Mary Jean Harrold. Slicing object-oriented software. InProceedings of the International Conference on Software Engineering (ICSE),pages 495–505, Berlin, Germany, 1996. IEEE Computer Society Press.

50. Bixin Li. Analyzing information-flow in java program based on slicing technique.SIGSOFT Software Engineering Notes, 27(5):98–103, 2002.

51. Peng Li and Steve Zdancewic. Downgrading policies and relaxed noninterfer-ence. In Jens Palsberg and Martın Abadi, editors, Conference Record of theAnnual ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages (POPL), Long Beach, California, USA, 2005. ACM Press, New York,NY.

52. Tim Lindholm and Frank Yellin. The Java Virtual Machine Specification.http://java.sun.com/docs/books/vmspec/index.html.

53. Isabella Mastroeni. On the role of Abstract Non-Interference in language-basedsecurity. In Radhia Cousot, editor, Proceedings of the Asian Symposium on Pro-gramming Languages and Systems (APLAS), volume 3780 of Lecture Notes inComputer Science, pages 418–433, Tsukuba, Japan, November 2005. Springer-Verlag, Berlin.

54. Andrew C. Myers. JFlow: practical mostly-static information flow control.In Conference Record of the Annual ACM SIGPLAN-SIGACT Symposium onPrinciples of Programming Languages (POPL), pages 228–241, San Antonio,Texas, USA, January 1999. ACM Press, New York, NY.

55. Andrew C. Myers. Mostly-Static Decentralized Information Flow Control. PhDthesis, Massachusetts Institute of Technology, Cambridge, Massachusets, USA,1999.

56. Andrew C. Myers, Nathaniel Nystrom, Lantian Zheng, and Steve Zdancewic.Jif: Java information flow. Software release. http://www.cs.cornell.edu/jif, July2001.

57. George Necula. Compiling with Proofs. PhD thesis, Carnegie Mellon University,Pittsburgh, Pennsylvania, USA, May 1997.

58. George Necula. Proof-Carrying Code. In Conference Record of the AnnualACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages(POPL), Paris, France, January 1997. ACM Press, New York, NY.

59. George Necula and Peter Lee. Research on Proof-Carrying Code for mobile-code security. In Proceedings of the Workshop on Foundations of Mobile CodeSecurity (FMCS), 1997.

60. George Necula and Peter Lee. Research on Proof-Carrying Code for untrusted-code security. In Proceedings of the IEEE Symposium on Security and Privacy(SSP), Oakland, New Zealand, 1997.

61. George Necula and Peter Lee. Efficient Representation and Validation of Proofs.In Proceedings of the IEEE Symposium on Logic in Computer Science (LICS),Indianapolis, Indiana, USA, 1998.

62. John Peterson and Olaf Chitil. Haskell: a purely functional language. URL:http://www.haskell.org.

63. Francois Pottier and Vincent Simonet. Information flow inference for ML. ACMTransactions on Programming Languages and Systems, 25(1):117–158, January2003.

124 References

64. John C. Reynolds. Separation logic: a logic for shared mutable data structures.In Proceedings of the IEEE Symposium on Logic in Computer Science (LICS),pages 55–74, 2002.

65. Xavier Rival. Abstract dependences for alarm diagnosis. In Radhia Cousot,editor, Proceedings of the Asian Symposium on Programming Languages andSystems (APLAS), volume 3780 of Lecture Notes in Computer Science, pages347–363, Tsukuba, Japan, November 2005. Springer-Verlag, Berlin.

66. Xavier Rival. Understanding the origin of alarms in ASTREE. In Chris Hankin,editor, Proceedings of the Symposium on Static Analysis (SAS), volume 3672 ofLecture Notes in Computer Science, London, UK, August 2005. Springer-Verlag,Berlin.

67. Andrei Sabelfeld and Andrew C. Myers. Language-based information-flow secu-rity. IEEE Journal on Selected Areas in Communications, 21(1):5–19, January2003.

68. Q. Sun, David A. Naumann, and Anindya Banerjee. Modular and constraint-based information flow inference for an object-oriented language. In RobertoGiacobazzi, editor, Proceedings of the Symposium on Static Analysis (SAS),volume 3148 of Lecture Notes in Computer Science, pages 84–99, Verona, Italy,August 2004. Springer-Verlag, Berlin.

69. Frank Tip. A survey of program slicing techniques. Technical report, CWI(Centre for Mathematics and Computer Science), Amsterdam, The Netherlands,1994.

70. Dennis Volpano and Geoffrey Smith. A type-based approach to program secu-rity. In Michel Bidoit and Max Dauchet, editors, Proceedings of the InternationalJoint Conference on the Theory and Practice of Software Development, volume1214 of Lecture Notes in Computer Science, pages 607–621, Lille, France, April1997. Springer-Verlag, Berlin.

71. Dennis Volpano, Geoffrey Smith, and Cynthia Irvine. A sound type system forsecure flow analysis. Journal of Computer Security, 4(3):167–187, 1996.

72. Mark Weiser. Program slicing. In Proceedings of the International Conferenceon Software Engineering (ICSE), pages 439–449, San Diego, California, USA,1981. IEEE Computer Society Press.

73. Damiano Zanardini. Higher-Order Abstract Non-Interference. In Pawel Urzy-czyn, editor, Proceedings of the International Conference on Typed Lambda Cal-culi and Applications (TLCA), volume 3461 of Lecture Notes in Computer Sci-ence, Nara, Japan, April 2005. Springer-Verlag, Berlin.

74. Damiano Zanardini. Abstract Non-Interference in a fragment of Java bytecode.In Proceedings of the ACM Symposium on Applied Computing (SAC), Dijon,France, April 2006. To appear.

75. Steve Zdancewic and Andrew C. Myers. Robust declassification. In Proceedingsof the IEEE Computer Security Foundations Workshop (CSFW), pages 15–23,Cape Breton, Nova Scotia, Canada, June 2001.

Index

Ψ function, 101] function, 75Md set, 63Ψe function, 91Ψe function (bytecode), 105Ψ function, 83, 91, 100, 106Ψ [] function, 91T () function, 65Σ (set of traces), 37JK] function, 64ψ-state, 85, 88, 108ψi function, 88ψ⊥ formula, 89

abstract address, 36abstract dependency, 4, 36, 82, 108, 111abstract domain, 15, 19, 37, 43, 45, 56,

68, 113abstract environment, 69Abstract Interpretation, i, 3, 11, 15, 26,

44, 73, 87abstract location, 35Abstract Non-Interference, i, 3, 4, 26,

38, 39, 43, 44, 55, 59, 63, 82, 97,106, 108, 111

(non-narrow) ANI, 28narrow ANI, 28, 40

abstract operation, 4, 43abstract property, 4, 8, 15, 37, 43, 45,

48, 55, 83, 84, 92, 108, 111abstract semantics, 64, 83, 101abstract slicing, 111abstract value, 19, 27, 43, 56, 105, 111abstraction function, 17, 37

accept (a program), 87, 106, 108Access Control, 2, 23, 54actual parameter, 52, 83, 85address, 36administrator, 2aggregate data, 3alarm, 36aliasing, 34, 65, 111analysis certifier, 32analysis designer, 57analysis domain, 57analytical representation, 44analyze() procedure, 89, 109analyzer, 73ANI() procedure, 86, 88, 108, 112antisymmetry, 11ascending chain, 12, 21, 88, 89assembly language, 33assertion, 35, 108

independence assertion, 35programmer assertion, 35region assertion, 35

assignment, 5, 7, 25, 31, 36, 65, 67, 81,82, 84, 105, 106

(of boolean variable), 82dangerous assignment, 40, 81

atom, 71Atom () predicate, 71attacker, 2, 3, 8, 25, 26, 51, 55, 82, 108average, 3axiomatic semantics, 36

BkS function, 29

best correct approximation, 20, 68

126 Index

bijection, 13, 35binary format, 9, 97binary relation, 11block, 48, 99, 107(boolean variable) assignment, 82boolean function, i, 40, 63, 80, 82, 89,

98, 106, 111boolean guard, 38, 83boolean variable, 39, 80, 89, 98, 106bottom element, ⊥, 6, 12bound

computational bound, 80greatest lower bound, 12, 56least upper bound, 12, 56lower bound, 12upper bound, 12

bounded computation, 87branch, 38, 47, 83, 99branching variable, 107bytecode, see Java bytecode

C language, 33, 34C++ language, 34, 57call-by-value, 6can contain secrets, 81cardinality, 89cartesian product, 11certificate, 23, 31, 39certification, i, 3, 4, 31, 44, 97, 108, 112class, 4, 7, 9, 38, 49, 55, 63, 85, 98, 103,

108class hierarchy, 56, 80class library, 52classes as sets, 55empty class, 56primitive class, 8properties as classes, see properties

as classessingleton class, 8subclass, 8, 55superclass, 50, 56types as classes (equivalence), 7

.class file, 9, 97, 107, 108closure operator, 18

lower closure operator, see lowerclosure operator

upper closure operator, see upperclosure operator

code, 1, 4

code consumer, 31code producer, 31, 39, 108code user, 4, 9, 31, 39, 44, 97, 107, 108compiler, 66compiling, 9, 39, 98, 106complement, 71complementation, 19complete lattice, 12, 19completeness, 20complexity, 28, 71, 109component (of a program), 45component (of an architecture), 108composition, 38compositionality, 84, 99, 106computable, 43computational bound, 74, 80concatenation, 98concrete domain, 19, 56concrete property, 16concrete value, 43concretization function, 17concurrency, 7, 50conditional, 5, 7, 83, 91, 100, 106conditioned slicing, 28, 95confidentiality, 1, 4, 23, 26conservative, 83consistent, 82constancy, 35, 84, 105constant, 10, 46, 47, 68contains secrets, 81content (informational), 81control flow, 48, 67, 83, 99Control Flow Graph, 48, 99, 107control flow graph, 29copyright, 2correctness, 88, 97covering, 75crash, 1criterion, 92cycle, 101

dangerous assignment, 40, 81dangerous flow, 3, 82, 87data, 1, 3–5, 8, 17, 23, 26, 28, 37, 44,

56, 63, 66, 81, 108aggregate data, 3forbidden data, 2particular data, 3

dataflow equation, 29

Index 127

debugging, 28deceptive flow, 27decidability, 16declaration, 49, 64Declassification, 26decompiling, 2def () function (slicing), 29denotation, 37dependency, 2, 25, 28, 108, 111, 113

abstract dependency, see abstractdependency

concrete dependency, 69control dependency, 29, 84, 92data dependency, 4, 29, 66, 92dependency graph, 67false dependency, 70local dependency, 38

descriptor, 9disjunctive completion, 19dispatch, 36domain of parity, 20, 46, 57domain of sign, 17, 20, 56, 93downflat domain, 57download, 4dynamic, 15dynamic slicing, 28, 95

edge, 67, 99Edinburgh Logical Framework, 34efficiency, 16, 89, 92environment, 6, 36, 46equivalence, 98exception, 7, 38, 48, 50, 101executable form, 97execution time, 2exponential, 71expression, 5, 7, 65, 67, 83, 93, 106, 111external user, 97extraction relation, 35

false alarm, 36field, 7, 35, 50, 58, 65, 98, 105, 107

field name, 65instance field, 40static field, 39, 85

finished, 87finished predicate, 87first-order logic, 80

fixpoint, i, 13, 16, 21, 30, 38, 47, 84, 88,94, 101, 109

fixpoint semantics, 16greatest fixpoint, 13least fixpoint, 13

fixpoint approximation, 84flat domain, 58flow, 3flow sensitive, 36Floyd, 33forbidden data, 2forbidden flow, 81foreign functions, 32formal parameter, 65, 85formalism, 80foundational computer science, 16frame, 10frame rule, 36framestack, 10, 102free member, 2fresh, 98fst function, 78FUN language, 6, 44, 49function, 6, 12, 37, 71

arity, 12, 91bijection, 35bijective, 13composition, 13, 17continuous, 12, 21extensive, 12, 17, 87idempotent, 12image, 12, 21injective, 13monotone, 12, 18, 74, 88partial, 12pointwise ordering, 13reductive, 12, 17surjective, 13total, 12

functional language, 5, 15

Galois connection, 17, 26, 65garbage collection, 49graph, 67, 88, 99ground, 105guard, 38, 83

H modifier, 54halting problem, 16

128 Index

Haskell, 33

Haskell language, i, 75, 113heap, 10, 35, 39, 85high-level (language), 9, 33, 39, 106

high-level (program structure), 100higher-order, 15, 46

Hoare logic, 35, 40, 66Hoare triple, 35, 40

I/O, 49

identifier, 7, 29, 34, 55, 64, 81, 82, 89,90, 105, 111

dotted identifier, 7sharing identifier, see sharing

identifier

identity domain, 28ids() procedure, 108IMP language, 5, 25, 29, 49, 68, 82, 84

imperative language, 5, 36, 49, 82implicit flow, 83, 90

incompleteness, 73increasing chain, 12independent identifiers, 34

indistinguishability, 38indistinguishable, 46Information Flow, i, 3, 4, 8, 23, 30, 34,

38, 44, 49, 63, 80, 82, 92, 97, 98,105, 108, 111

direct flow, 25

implicit flow, 83, 90indirect flow, 25

information system, 1, 2informational content, 81inheritance, 50, 102

multiple inheritance, see multipleinheritance

injection, 13

input, 2, 5, 25, 27, 28, 37, 44, 49, 81, 84,90, 98

instance, 10

instruction, 48, 98instruction set, 9

interface, 50interpreter, 97interprocedural, 36

invariant, 33, 41isomorphism, 13, 19iteration, 83, 87, 88

Java bytecode, i, 4, 9, 26, 39, 44, 97,106, 108, 112

instructionsaconst null, 102aload, 99, 103areturn, 105astore, 99, 103, 105, 106dup, 99, 102getfield , 99, 103getstatic , 103goto, 99, 100, 107if acmpeq, 99, 100if acmpneq, 99ifeq , 99if icmpeq, 99ifnull , 99, 100invokespecial , 99invokevirtual, 99ldc, 105new, 99, 105nop, 102pop, 102putfield , 99, 103, 107putstatic, 103return, 105swap, 102non-jump instruction, 99

Java language, i, 4, 7, 9, 26, 34, 50, 56,80, 97, 108

Java compiler, 97Java Virtual Machine, 9, 38, 97, 107JL language, 7, 63, 97judgement, 35, 47jump, 99

L modifier, 54label, 99lambda-abstraction, 6lambda-calculus, 6, 26, 44lambda-expression, 45lattice, i, 12lifting, 20, 48list, 33local dependency, 38local variable, 98location, 34, 50, 65, 82, 111

abstract location, see abstractlocation

locs function, 65

Index 129

logic language, 15logical equivalence, 81, 98loop, 5, 7, 34, 83, 93low-level (language), 4, 39lower closure operator, 18

machine register, 33machine-dependent, 97main() procedure, 52, 85, 109main loop, 88, 109map, 12, 17member, 9, 63

static member, 85memory, 2, 6, 9, 34, 39, 65, 111

allocation/deallocation, 49memory configuration, 37

method, 7, 9, 36, 38, 50, 58, 63, 85, 98instance method, 50, 63method body, 64, 99method definition, 63method instance, 63, 83, 91, 100method invocation, 7, 64, 83, 87, 92method name, 63method overriding, 39, 57, 63method signature, 63method summary, 36static method, 39, 52, 65, 85

method stack, 85mistrustful, 97ML language, 26, 33model, 82, 89, 109modifier, 54

private modifier, see private modifierprotected modifier, see protected

modifierpublic modifier, see public modifier

monomorphic, 6, 47Moore closure, 12, 19Moore family, 12, 19multiple inheritance, 56mutual recursion, 85

native code, ii, 97Non-Interference, 3, 8, 24, 30, 38, 44,

49, 55, 82(standard) NI, 26, 111

non-jump, 48

object, 36, 39, 50, 55, 66, 98

object creation, 7, 36, 39, 49, 57, 105

Object Oriented, 111Object-Oriented, i, 4, 7, 34, 49, 55, 65,

82observable behavior, 3

observable dependency, 37observational power, 3, 26, 45, 56, 82observer, 2, 50, 84

occurrence, 68one-state semantics, 35opcode, 10open source, 2

operand stack, 10, 38, 48, 98top element, 98

operator, 98optimization, 32

order isomorphism, 13ordering, 16, 74, 88output, 2, 5, 27, 44, 49, 81, 84, 98overriding, 39, 57, 63

pair, 44

paramateractual parameter, see actual

parameterformal parameter, see formal

parameterparameter, 10, 39, 64, 89, 103parity, 46, 57

partial order, 12partially ordered set, 12particular data, 3partition, 71, 108

partition (of domain), 41partitioning domain, 71passive, 2password checking, 26

path, 38, 67, 83paying member, 2pointer aliasing, see aliasing

pointwise, 74pointwise ordering, 13polymorphism, 50polyvariance, 36

pop, 10poset, 12, 17postcondition, 33, 36powerset, 11, 56, 89

130 Index

precision, 8, 17, 27, 41, 44, 46, 55, 70,93

precondition, 33, 36predecessor, 102, 107predicate, 29, 32, 35, 56, 80, 87prevalue, 48primitive domain, 48private, 2, 3, 25, 26, 30, 45, 49, 81private modifier, 54privilege, 2privileges, 23procedural, 48procedure, 5, 7, 49, 71program data, see dataprogram domain, 57program point, 29, 37, 64, 66, 81, 84, 92program slicing, see slicingprogram state, 6proof system, 40proof validator, 31Proof-Carrying code, i, 4, 31, 97, 108,

112PCC architecture, 108

propagation, 24, 63, 66, 80, 84properties as classes, 4, 9, 55, 80property, 27, 35, 63, 108

precision, see precisionprogram property, 15properties as classes, see properties

as classesproperties as sets of values, 17property transformer, 20security property, see security

propertyundecidable property, 16

protected, 3, 23protected modifier, 54prover, 28, 35public, 2, 3, 25, 26, 30, 44, 49, 55, 81public modifier, 54push, 10, 85

RkS (i) function, 29

reached (fixpoint), 88recursion, 6, 44, 77

mutual recursion, see mutualrecursion

recursive call, 87, 91reduced product, 19

ref function, 80, 84, 93, 100, 105, 106,108, 111, 112

ref () function (slicing), 29, 92ref2 function, 76ref3 function, 76reference, 11, 98, 103reflexivity, 11reject (a program), 39, 87relation, 11, 88Relaxed Non-Interference, 26, 44relevant, 28, 39, 69, 82, 84, 92, 105, 111renaming, 40requirement, 3restriction, 75result value, 105ret variable, 81, 83, 105, 106return value, 7reverse engineering, 28runtime, 49, 58, 64, 72, 83, 84Runtime Constant Pool, 98runtime system, 32

SkS function, 29

safe, 31, 34safe program, 107safe software, 97safety, 97, 108safety proof, 31salary, 3same role, 106satisfiability, 82, 87, 109secret, 63, 80, 103security, 23

high-security, 25, 26Language-based security, 23low-security, 25, 26

security checking, 97security policy, 1, 4, 23, 31, 58, 84, 97,

106, 108security property, i, 1, 31, 44security requirement, 1, 3, 4, 44, 87, 97,

108security signature, 39security type, 44semantic approximation, 15semantic slicing, 36semantics, 2–4, 6, 25, 26, 50, 55

abstract semantics, 16, 22, 64, 83, 101axiomatic semantics, 36

Index 131

collecting semantics, 22denotational semantics, 6, 43fixpoint semantics, 16input-output semantics, 22, 50one-state semantics, 35trace semantics, 22, 37, 50two-state semantics, 35

semantics:abstract semantics, 68separation, 3, 66Separation Logic, 36sequential, 48sequential combination (of boolean

functions), 98sequential composition, 84set (representation), 45sh function, 66, 85, 103, 106sharing, 34, 66, 82, 84sharing identifier, 66, 84, 111side effect, 80, 83sign, 57, 68, 93signature, 63skip, 85slice, 92slicing, 28, 38, 67, 92, 111

abstract slicing, see abstract slicingconditioned slicing, 28, 95dynamic slicing, 28, 95slice, 28slicing criterion, 28static slicing, 28

soundness, 16, 32, 36, 39, 44, 65, 75, 83,87, 108, 112

source code, i, 2, 4, 9, 82, 97, 106stack, 10, 48, 80, 85, 98

framestack, 102standard deviation, 3state, 10, 25, 35, 37, 102statement, 5, 7, 25, 28, 35, 64, 67, 81,

83, 91, 92, 98, 106statement schema, 91static, 10, 15, 101, 103Static Analysis, 1, 4, 16static analyzer, 32, 63static field, 85static member, 85static method, 85static slicing, 28store, 35strongest postcondition, 36

subclass, 36, 105subprogram, 49, 63subroutine, 80[] function, 75subscript, 81, 84, 98

fresh subscript, 81, 84subscript substitution, 81, 84, 89substitution, 98successor, 99sum (aggregate data), 3superset, 107surjection, 13symmetric pair, 45symmetry, 11syntactic, 70system crash, 1

tabular representation, 44termination, 16, 44, 48, 84, 89test, 84test instruction, 107testing, 28theorem prover, 31this variable, 65threshold, 87time-resource, 80top element (of a stack), 98top element, >, 12, 44trace semantics, 37transitive flow, 25transitivity, 11triple, 35trusted, 108trusted component, 31truth value, 82Turing-equivalence, 16, 43two-state semantics, 35type, 5, 15, 34, 44, 49, 64, 65, 93

primitive type, 8, 99security type, 26, 44type analysis, 44type cast, 39type checking, 6, 55, 57type constructor, 6type environment, 64, 83, 105type inference, 46type safety, 49type system, 6, 15, 22, 25, 38, 47types as classes (equivalence), 7

132 Index

typing rule, 47typing rules, 34union type, 33

undecidability, 4, 48universal language, 43universal quantifier, 43, 48UNSAT (proof), 109untrusted, 2, 4, 31untrusted user, 3, 55upper closure operator, 18, 27, 35, 45,

56, 68

validation, 44value, 6variable, 5–7, 17, 25, 27, 36, 45, 49, 64,

66, 81, 82, 98global variable, 39local variable, 7, 10, 38, 98

private variable, 25public variable, 25variable declaration, 49, 64

variable renaming, 40variable substitution, 81, 90vars () function, 29, 84, 107VC predicate, 32VC generator, 33verification condition, 32vertex, 67, 99visitor, 2

Web, 2, 4web, 23, 31widening, 21, 84widening operator, 89World Wide Web, 4

wrong value, 6

Sommario

Questo lavoro e organizzato in sette capitoli. L’introduzione, Capitolo 1, il-lustra il contesto in cui la ricerca si colloca, ed evidenzia l’importanza delleproprieta di sicurezza nella pratica attuale dello sviluppo ed analisi del soft-ware.

Il Capitolo 2 introduce i diversi linguaggi di programmazione utilizzati neicapitoli successivi (tranne Haskell, usato per implementare il calcolo delledipendenze incluso nell’Appendice A). Questo capitolo contiene, inoltre, iriferimenti necessari all teoria dei reticoli e dei punti fissi.

La definizione originale di Non-Interferenza Astratta e le teorie su cui essasi basa (in particolare, l’Interpretazione Astratta e i fondamenti dell’analisidei flussi di informazione) sono presentati nel Capitolo 3. Poiche il nostrointeresse e rivolto alla certificazione del codice, introduciamo la promettentearchitettura Proof-Carrying code (codice dotato di una prova di correttezza).

Il Proof-Carrying code e l’obiettivo dell’architettura delineata nel Capi-tolo 6. Anche il Capitolo 3 presenta, nella sua ultima sezione, ricerche recentiche sono importanti per questo scopo. Un’attenzione particolare e rivoltaall’analisi di sicurezza per i linguaggi orientati agli oggetti. Ci occuperemoinoltre dell’attuale ricerca sulla modellazione di proprieta astratte.

La parte principale di questo lavoro inizia al Capitolo 4, dove si dis-cute esaustivamente il problema ri rappresentare proprieta di sicurezza rel-ative al flusso di informazioni. Nella prima sezione viene presentato un ap-proccio basato su tipi di sicurezza, che costituisce un passo iniziale versol’automatizzazione della Non-Interferenza Astratta. Successivamente, si dis-cute un approccio realistico ad ANI. Il capitolo contiene una definizione com-pleta di ANI per un linguaggio simile a Java; questo approccio sfrutta i mec-canismi propri del linguaggio per controllare le proprieta di sicurezza.

Questa definizione e applicata nel Capitolo 5, che sviluppa un’analisidi ANI su codice sorgente, ottenuta attraverso l’uso di funzioni booleane.L’analisi del flusso di informazione si basa su un calcolo delle dipendenze as-tratte sui dati; esse sono calcolate da un algoritmo descritto in questo capitolo.Le dipendenze sono inoltre considerate in relazione al Program Slicing.

134 Sommario

L’analisi del codice sorgente viene tradotta, nel Capitolo 6, al linguag-gio Java bytecode, in vista della certificazione di programmi all’interno diun’architettura PCC.

Il Capitolo 7 contiene le conclusioni e direzioni future di ricerca.

costa.fdi.ucm.esdamiano/pubs/phDTh.pdf · Summary This work is organized in seven chapters. The...

Documents

Transcript of costa.fdi.ucm.esdamiano/pubs/phDTh.pdf · Summary This work is organized in seven chapters. The...