TIP Language and Type Analysis - USTC

54
TIP Language and Type Analysis Yu Zhang Course web site: http ://staff.ustc.edu.cn/~yuzhang/pldpa Type Analysis and Unification 1

Transcript of TIP Language and Type Analysis - USTC

Page 1: TIP Language and Type Analysis - USTC

TIP Language and Type Analysis

Yu Zhang

Course web site: http://staff.ustc.edu.cn/~yuzhang/pldpa

Type Analysis and Unification 1

Page 2: TIP Language and Type Analysis - USTC

Resources

• Static Program Analysis

- http://cs.au.dk/~amoeller/

- TIPC:implemented in C++17tipg4:implemented using ANTLR4

Type Analysis and Unification 2

Anders Møller

Page 3: TIP Language and Type Analysis - USTC

Questions about Programs

• Does the program terminate on all inputs?

• How large can the heap/stack frame become during

execution?

• Can sensitive information leak to non-trusted users?

• Can non-trusted users affect sensitive information?

• Data races?

• SQL injections?

• …

Type Analysis and Unification 3

SQL 注入:通过把SQL

命令插入到Web表单提

交等,来欺骗服务器执

行恶意的SQL命令

Page 4: TIP Language and Type Analysis - USTC

Program Points

Type Analysis and Unification 4

Any point in the program

= any value of the PC

Invariants (不变式):

A property holds at a program point if it holds in any such

state for any execution with any input

Page 5: TIP Language and Type Analysis - USTC

Questions about Program Points

• Will the value of x be read in the future?

• Is the variable x initialized before it is read?

• What is a lower and upper bound on the value of

the integer variable x?

• Can the pointer p be null?

• Which variables can p point to?

• Do p and q point to disjoint structures in the heap?

• …

Type Analysis and Unification 5

Page 6: TIP Language and Type Analysis - USTC

Why are the Answers Interesting?

• Increase efficiency

- Resource usage

- Optimization

• Ensure correctness

- Verify behavior

- Catch bugs early

• Support program understanding

• Enable refactoringsType Analysis and Unification 6

Page 7: TIP Language and Type Analysis - USTC

Programs that reason about programs

• Soundness(可靠性): don’t miss any errors

• Completeness(完备性): don’t raise false alarms

• Termination(终止性): always give an answer

Type Analysis and Unification 7

Page 8: TIP Language and Type Analysis - USTC

Rice’s theorem, 1953

• H.G. Rice: Classes of recursively enumerable

sets and their decision problem

• Rice定理:Any nontrivial property of the behavior of

programs in a Turing-complete language is undecidable!

递归可枚举语言的所有非平凡(nontrival)性质都是不可判

定的

平凡性质:要么对全体程序都为真,要么对全体程序都为假

非平凡性质:所有不平凡的性质

Type Analysis and Unification 8

Page 9: TIP Language and Type Analysis - USTC

Approximation

• Approximate answers may be decidable!

- Output yes/no => output yes/no/unknown

• The approximation must be conservative

• More subtle approximations if not only yes/no

- E.g. memory usage, pointer targets

Type Analysis and Unification 9

Page 10: TIP Language and Type Analysis - USTC

False positives and false negatives

Type Analysis and Unification 10

误报

prevent by completeness

漏报

prevent by soundness

Page 11: TIP Language and Type Analysis - USTC

The Engineering Challenge

• A correct but trivial approximation algorithm may

just give the useless answer every time

• The engineering challenge is to give the useful

answer often enough to fuel the client application

• … and to do so within reasonable time and space

• Hard (but fun) part of static analysis

Type Analysis and Unification 11

Page 12: TIP Language and Type Analysis - USTC

A Constraint-based Approach

• Conceptually separates the analysis specification

from algorithmic aspects and implementation

details

Type Analysis and Unification 12

Page 13: TIP Language and Type Analysis - USTC

Challengeing Features in Modern PLs

• Higher-order functions

• Mutable records or objects, arrays

• Integer or floating-point computations

• Dynamic dispatching

• Inheritance

• Exceptions

• Reflection

• …

Type Analysis and Unification 13

Page 14: TIP Language and Type Analysis - USTC

TIP Language

TIP: Tiny Imperative Programming language

Type Analysis and Unification 14

Page 15: TIP Language and Type Analysis - USTC

TIP and its Implementation

• TIP language

- Minimal C-style syntax

- Enough features to make static analysis challenging

and fun

• Implementation

- Scala: https://github.com/cs-au-dk/TIP/

- C++ 17: https://github.com/matthewbdwyer/tipc

Type Analysis and Unification 15

Page 16: TIP Language and Type Analysis - USTC

Expresions in TIP

Type Analysis and Unification 16

Page 17: TIP Language and Type Analysis - USTC

Statements in TIP

• In conditions, 0 is false, all other values are true

• The output statement writes an integer value to

the output stream

Type Analysis and Unification 17

Page 18: TIP Language and Type Analysis - USTC

Functions in TIP

• The optional var block declares a collection of

uninitialized variables

• Function calls are an extra kind of expressions:

Type Analysis and Unification 18

Page 19: TIP Language and Type Analysis - USTC

Pointers

• No pointer arithmetic

Type Analysis and Unification 19

Page 20: TIP Language and Type Analysis - USTC

Records

• Records are passed by value (like structs in C)

• For simplicity, values of record fields cannot be

recordsType Analysis and Unification 20

Page 21: TIP Language and Type Analysis - USTC

Functions as Values

• Functions are first-class values

• The name of a function is like a variable that

refers to that function

• Generalized function calls

• Function values suffice to illustrate the main

challenges with methods (in OO languages) and

higher-order functions (in functional languages)Type Analysis and Unification 21

Page 22: TIP Language and Type Analysis - USTC

Programs

• A program is a collection of functions

• The function named main initiates execution

- Its arguments are taken from the input stream

- Its result is placed on the output stream

• We assume that all declared identifiers are unique

Type Analysis and Unification 22

Page 23: TIP Language and Type Analysis - USTC

TIP Examples

• Recursive factorial function • Iterative factorial function

Type Analysis and Unification 23

Page 24: TIP Language and Type Analysis - USTC

Control flow graphs

• Iterative factorial function

Type Analysis and Unification 24

Page 25: TIP Language and Type Analysis - USTC

Normalization

• Normalization:flatten nested expressions, using

fresh variables

Type Analysis and Unification 25

Page 26: TIP Language and Type Analysis - USTC

Type analysis and unification

Type Analysis and Unification 26

Page 27: TIP Language and Type Analysis - USTC

Type Errors

• Reasonable restrictions on operations:

- Arithmetic operators apply only to to integers

- Comparisons apply only to like values

- Only integers can be input and output

- Conditions must be integers

- Only functions can be called

- The * operator only applies to pointers

- Field lookup can only be performed on records

- The fields being accessed are guaranteed to be present

• Violations result in runtime errors

• No type annotations in TIP

Type Analysis and Unification 27

Page 28: TIP Language and Type Analysis - USTC

Type Checking

• Can type errors occur during runtime?

- undecidable

• Use conservative approximation

- A program is typable is it satisfies some type constraints

- These are systematically derived from the syntax tree

- If typable, then no runtime errors occur

- But some programs will be unfairly rejected (slack)

Type Analysis and Unification 28

typable

slack

No type

errors

Page 29: TIP Language and Type Analysis - USTC

Challenges

• Fighting slack

- Make the type checker a

bit more clever

- An eternal struggle

- And a great source of

publications

• The type checker may be

unsound

• Ex. covariant arrays in Java

- 协变数组若B是A的子类, 则如下代码在Java中是允许的: A[ ] a=new B[ ];

- 从类延伸到数组的变换,原有的继承关系不变

Type Analysis and Unification 29

Page 30: TIP Language and Type Analysis - USTC

Types

• Types describe the possible values

• These describe integers, pointers, functions, and

records

• Types are terms generated by this grammar

Type Analysis and Unification 30

Page 31: TIP Language and Type Analysis - USTC

Type constraints

Type Analysis and Unification 31

Page 32: TIP Language and Type Analysis - USTC

Generating constraints

Type Analysis and Unification 32

Page 33: TIP Language and Type Analysis - USTC

Generating constraints

Type Analysis and Unification 33

多态类型

Page 34: TIP Language and Type Analysis - USTC

Exercise

• Generate and solve the constraints

• Then try with y = alloc 8 replaced by y = 42

Type Analysis and Unification 34

Page 35: TIP Language and Type Analysis - USTC

Generating constraints

• This is the idea, but not directly expressible in TIP

types

Type Analysis and Unification 35

Page 36: TIP Language and Type Analysis - USTC

Generating constraints

• Exercise: Field write statements?

Type Analysis and Unification 36

Page 37: TIP Language and Type Analysis - USTC

General Terms

Type Analysis and Unification 37

Page 38: TIP Language and Type Analysis - USTC

Unification合一

• An equality between two terms with variables

- k(X,b,Y) = k(f(Y,Z), Z, d(Z))

• A solution (a unifier) is an assignment from

variables to terms that makes both sides equal

- X = f(d(b),b)

- Y = d(b)

- Z = b

Type Analysis and Unification 38

Page 39: TIP Language and Type Analysis - USTC

Unification errors

• Constructor error

- d(X) = e(X)

• Arity error

- a = a(X)

Type Analysis and Unification 39

Page 40: TIP Language and Type Analysis - USTC

Linear unification algorithm

• 1978, by Paterson and Wegman

• In time O(n)

- Finds a most general unifier

- Or decides that none exists

• Can be used as a back-end for type checking

• … but only for finite terms

Type Analysis and Unification 40

Page 41: TIP Language and Type Analysis - USTC

Recursive data structures

Type Analysis and Unification 41

[[p]] = [[alloc null]]

= ↑[[null]]

= ↑ ↑ t = ↑[[p]] = ↑ ↑ [[p]]

[[p]] = t t = ↑ t

Page 42: TIP Language and Type Analysis - USTC

Regular terms正则式

• Infinite but (eventually) repeating

- e(e(e(e(e(e(…))))))

- d(a, d(a, d(a,…)))

- f(f(f(f(…), f(…)), f(f(…), f(…))), f(f(f(…), f(…)), f(f(…),

f(…))))

• Only finitely many different subtrees

• A non-regular term

- f(a,f(d(a), f(d(d(a)), f(d(d(d(a))),…)))

Type Analysis and Unification 42

http://users-cs.au.dk/amoeller/spa/

3.3 Solving Constraints with Unification

Page 43: TIP Language and Type Analysis - USTC

Regular unification

• 1976, Huet

• Use a union-find (并查) algorithm to solve the

unification problem for regular terms in O(n*A(n))

• A(n) is the inverse Ackermann function

- Smallest k such that n<Ack(k,k)

- This is never bigger than 5 for any real value of n

• See TIP implementation tipcType Analysis and Unification 43

Page 44: TIP Language and Type Analysis - USTC

Union-Find

Type Analysis and Unification 44

Add a new node x that

initially is its own parent

Find the canonical representative of x by traversing the path to the root, performing path compression on the way

Find the canonical representatives of x and y, and makes one parent of the other unless they are already equivalent

https://github.com/matthewbdwyer/tipc/blob/main/src/semantic/types/solver/UnionFind.cpp

Page 45: TIP Language and Type Analysis - USTC

Union-Find (simplified)

Type Analysis and Unification 45

Page 46: TIP Language and Type Analysis - USTC

Implementation Strategy

• Representation of the different kinds of types

(including type variables)

• Map from AST nodes to type variables

• Union-Find

• Traverse AST, generate constraints, unify

- Reply type error if unification fails

- When unifying a type variable with e.g. a function type, it is

useful to pick the function type as representation

- For outputting solution, assign names to type variables (that

are roots), and be careful about recursive typesType Analysis and Unification 46

Page 47: TIP Language and Type Analysis - USTC

The Complicated Function

Type Analysis and Unification 47

Page 48: TIP Language and Type Analysis - USTC

Solutions

Type Analysis and Unification 48

递归类型

Page 49: TIP Language and Type Analysis - USTC

Infinitely many solutions

• Polymorphic function

(which is not expressible in TIP type language)

Type Analysis and Unification 49

Page 50: TIP Language and Type Analysis - USTC

Recursive and polymorphic types

Type Analysis and Unification 50

Page 51: TIP Language and Type Analysis - USTC

Slack – let-polymorphism

Type Analysis and Unification 51

Page 52: TIP Language and Type Analysis - USTC

Slack – let-polymorphism

Type Analysis and Unification 52

Page 53: TIP Language and Type Analysis - USTC

Slack – flow-insensitivity

Type Analysis and Unification 53

Page 54: TIP Language and Type Analysis - USTC

Other programming errors

Type Analysis and Unification 54