Type Inference: CIS Seminar, 11/3/2009 Type inference: Inside the Type Checker. A presentation by:...

Type Inference: CIS Seminar, 11/3/2009

Type inference: Inside the Type Checker.

A presentation by: Daniel Tuck


Types in Computer Languages

• Dynamic Languages– (p*, Ruby, Lisp) – Provide no compile time type

checking: type errors occur at runtime.

• Static Languages– Eliminates most runtime errors – type conflicts

recognized at compile time– Explicit (Java, C++) – Types must be declared for

every name– Implicit (Haskell, ML) – Types don’t have to be

declared as they are inferred. However types may be declared to improve readability, or remove ambiguity.


Type Inference

• Wikipedia - Type inference refers to the ability to automatically either partially or fully deduce the type of the value derived from the eventual evaluation of an expression.

• A type expresses a common property of all values an expression might assume.


HistoryType checking has traditionally been done

"bottom up" – if you know the types of all arguments to a function you know the type of the result.

1958: Haskell Curry and Robert Feys develop a type inference algorithm for the simply typed lambda calculus.

1969: Roger Hindley extends this work and proves his algorithm infers the most general type.

1978: Robin Milner, independently of Hindley's work, develops equivalent algorithm

2004: Java 5 Adopts the H-M algorithm and type inference becomes respectable


World's geekiest T-shirt. This is the entire H-M algorithm expressed as logic.


Haskell vs Java

Everything we'll do works in both Haskell and Java. We'll use Haskell because the notations are much simpler.


Haskell Expressions

• Haskell uses spaces instead of ( , , ) for arguments to a function.

add 1 4 instead of add(1,4)• Lists in Haskell are compromised of two

brackets with commas as delimiters: [1,4]• Add an element to a list with ":": x:xs• Function definition uses pattern matching

append [] x = x

append (y:ys) x = y : append ys x


Haskell Expressions




append [] x = x


Two definitions of append – Haskell tries

these in order


Haskell Expressions




append [] x = x


Match first argument against

the empty list


Haskell Expressions




append [] x = x


This takes a list apart into the head (y) and

tail (ys)


Haskell Expressions




append [] x = x


Since this isn't in a pattern it adds y to the

front of the list


Haskell Types• Primitive types: Integer, String, Bool• Type declaration: f :: Integer• Structured types:

– Function: g :: Integer -> String– Multiple argument functions: h :: String -> Integer -> Bool– List: x :: [Integer]– Functional arguments:

f1 :: [Integer] -> (Integer -> Integer) -> Integer


PolymorphismThe Haskell map function takes 2 arguments. The first being a function, and the second being a list of inputs the function should be applied to. The output is a list of result values.

Input: map abs [-1,-3,4,-12] Output: [1,3,4,12]

Input: map reverse ["abc","cda","1234"] Output: ["cba","adc","4321"]

Input: map (3*) [1,2,3,4] Output: [3,6,9,12]


Type Variables

Haskell determines if a type name is an actual type (like Integer) or parameter type (type variable) by the case of the first character.

append :: [a] -> [a] -> [a]

Type variable – means that any type

can be in the list

Reuse of "a" means that both parameters

to append must share the same type

Java:

public <A> List<A> append(List<A> x, List<A> y)


Type inference

We can infer the type of the following definition using the H-M algorithm:

map f [] = []

map f (x:xs) = f x : map f xs


Type inference


map f [] = []


map :: a -> b -> c #1: Map has two

arguments (observed from the pattern

matching)


Type inference


map f [] = []


map :: a -> [b] -> [c]

#2: The second argument and the

returned value must be a list (observing the

first clause and the [ ]s)


Type inference


map f [] = []


map :: (d -> e) -> [b] -> [c]

#3: The first argument to map is used as a

function


Type inference


map f [] = []


map :: (b -> e) -> [b] -> [c]

#4: The input to the function is taken from the second argument

list


Type inference


map f [] = []


map :: (b -> c) -> [b] -> [c]

#5: The output of the function is added to the

result list


Type inference


map f [] = []


map :: (b -> c) -> [b] -> [c]

#6: This type is consistent with all other parts of map so it is the

inferred type of map


Your turn.EasyWhat is the type of the following function

inc n = n + 1

Java - Public int inc(int n){return n+1}This is a function that takes a Integer and results in

an Integer (the type of "+" determines this)inc : Integer -> Integer


Your turn.

Medium

What is the type of the following function

f n = if n==0 then 1 else n*f(n-1)

This is again a function that takes an Integer and results in an Integer.

inc : Integer -> Integer


Type Constraints

f n = if n==0 then 1

else n*f(n-1)

N must be an Integer because of comparison with 0.

The same applies here.

EVERY one of these constraints must be satisfiedor the inference process will yield a type error.


Your turn.HardWhat is the following function’s type?

length [] = 0length (x:xs) = 1 + length xs

The answer is that the input must be a list but since we never use the elements of the list it can contain anything. The result is a integer.

length :: [a] -> Integer


PolymorphismHere's the BIG IDEA behind H-M:

When you infer a type and parts of the type are "unconstrained", you can use ANY type for these unconstrained values. Type variables in signatures represent these unconstrained type components.

When the result of type inference contains a type variable, we can reuse the function for any type of argument.

a = length [1,2,3]

b = length [True, False, True]


Haskell/Java Comparison

Haskell:length [] = 0

length (x:xs) = 1 + length xs

Java:static <T> int length(List<T> x){

if (null x) return 0;

return 1 + length(x.next);}


Implementing H-M Typechecking

How do we actually implement H-M typechecking? Logic programming to the rescue!

We represent unknown types with logic variables.

Constraints are represented by unificationsUnlike Prolog, if unification fails you

immediately quit and complain about the code you're checking.


A H-M TypecheckerTo typecheck a function: a) Assign a new type variable (logic variable) to

each name being defined. b) Every occurrence of known function / variable

requires a fresh copy (new logic variables) of the type of the type of that function.

c) Every syntactic construct adds type constraints (unification)

d) When done, if a logic variable is unbound this will turn into a type variable.


Syntax Type ConstraintsFunction call: f x The type of f must be a -> b The type of x must be a The resulting type is bIf – then – else: The type of the test must be Bool The then clause and else clause must have the same

type.List construction: x:y The type of x is "a", the type of y must be [a], and the

result is type [a].Operators like + or – constrain all types to Integer


H-M In Action

length [ ] = 0

length (first:rest) = 1 + length rest

α0

α1

α0

α2

α2Integer

Integer -> Integer -> Integer

Assign types to all names in the code


H-M In Action

length [ ] = 0


α0

α1

α0

α2

α2Integer


Function definition of length: a0 a3 -> a4

Unification: All occurrences of a0

become a3 -> a4

a0 a3 -> a4


H-M In Action

length [ ] = 0


α0

α1

α0

α2

α2Integer


[ ] argument to length: a3 [a5]

The empty list is polymorphic

Manufacture new logic variables!

a0 a3 -> a4a3 [a5]


H-M In Action

length [ ] = 0


α0

α1

α0

α2

α2Integer


Right hand side of definition: a4 Integer

a0 a3 -> a4a3 [a5]

a4 Integer


H-M In Action

length [ ] = 0


α0

α1

α0

α2

α2Integer


Type of first:rest a2 [a6]

a1 a6a3 [a6]

Because a3 is the first

argument to length

a0 a3 -> a4a3 [a5]

a4 Integer a2 [a6]

a1 a6a3 [a6]


H-M In Action

length [ ] = 0


α0

α1

α0

α2

α2Integer


The "+" adds the following constraints:Integer Integer (1st argument)

Integer a4 (result of length, 2nd argument)Integer a4 (result of length)

a0 a3 -> a4a3 [a5]

a4 Integer a2 [a6]

a1 a6a3 [a6]

Integer a4


H-M In Action

length [ ] = 0


α0

α1

α0

α2

α2Integer


In the end, a0 contains the final type:length :: [a2] -> IntegerThis is generalized aslength :: [a] -> Integer(or a: a -> Integer)

A

a0 a3 -> a4a3 [a5]

a4 Integer a2 [a6]

a1 a6a3 [a6]

Integer a4length :: [a2] ->

Integer


Inside H-M

The order in which constraints are added to the type environment doesn't matter – all the algorithm has to do is ensure that every constraint is accounted for.

Note that inside "length" every occurrence of length is represented by the same type variable. After generalization, each call to length generates a fresh type variable. This is the key insight of the H-M algorithm.


Why?• Polymorphic functions are extremely useful but writing

out their types is tedious. H-M can check type consistency (Java) or infer types from scratch (Haskell).

• Q: Why don’t all languages have generics?– A: As a matter of fact many languages are adding Generics to

their suite of tools . Even Visual Basic now has this!• Q: Why aren’t there implicit types in Java?

– A: H-M doesn't "play nice" with the object-oriented part of the Java type system.

• Q: What good is all of this for a Java programmer?– A: Generics avoid casting to Object – something that can be

unsafe in Java.


ConclusionsAs more languages adopt generic typing, the H-M algorithm

has become important to understanding how languages work.

H-M is simple (sort of!)Many extensions to H-M have been developed to allow

more precise typing (catch more errors at compile time).Type checking can address security, safety (null pointers,

array out of bounds errors), and many other important program properties.

Types give strong guarantees of program correctness.Everyone should learn more math (logic!)

Type Inference: CIS Seminar, 11/3/2009 Type inference: Inside the Type Checker. A presentation by:...

Documents

Transcript of Type Inference: CIS Seminar, 11/3/2009 Type inference: Inside the Type Checker. A presentation by:...