Applications of Type Constraints in Software Engineering Tools

IBM Research

© 2004 IBM Corporation

Applications of Type Constraints in Software Engineering Tools

Frank TipIBM T.J. Watson Research Center

IBM Research

© 2004 IBM Corporation2

This Presentation is Based on Joint Work With

Ittai Balaban (New York University)

Dirk Bäumer (IBM Zurich Research Center)

Bjorn De Sutter (Ghent University)

Julian Dolby (IBM T.J. Watson Research Center)

Robert Fuhrer (IBM T.J. Watson Research Center)

Adam Kieżun (MIT)

IBM Research


IBM Research

about 3000 people world-wide– 1600 at IBM T.J. Watson Research Center

– other sites: Almaden, Austin, Zurich, Haifa, China, India

Software Technology Department– about 70 people, director Daniel Yellin

– projects on: compiler optimization (JikesRVM), aspects, performance analysis, web services, refactoring, verification, XML, ...

– www.research.ibm.com/compsci/plansoft/index.html

ARTIST project (Advanced Refactoring Tools for Improving Software archiTecture)– Robert Fuhrer, Mandana Vaziri, Tim Klinger, Adam Kiezun (intern),

Frank Tip (project leader)

– collaboration with Eclipse JDT team at IBM Zurich

– collaboration with IBM Rational

– academic collaborations with Bjorn De Sutter (Ghent University), Ittai Balaban (NYU)

http://www.research.ibm.com/compsci/plansoft/index.html

IBM Research


Other Research Activities

change impact analysis– given an old and a new version of a program, and a test that fails in

the new version, find the subset of the source code changes responsible for the failure

– with Barbara Ryder and Xiaoxia Ren (Rutgers) and Julian Dolby (IBM), Max Stoerzer (University of Passau)

– papers: PASTE’01, OOPSLA’04

Jax: an application extractor for Java– apply static analysis techniques to eliminate redundant functionality

from Java applications, and apply size-reducing transformations

– with Peter Sweeney, Chris Laffra, Aldo Eisma, David Streeter

– transferred to IBM product (WebSphere Studio Device Developer)

– papers: CACM’03, TOPLAS’02, OOPSLA’00, FSE’00, OOPSLA’99

IBM Research


Outline

background

type constraints for Java programs– notation and terminology

– constraint generation rules

applications– generalization-related refactorings (OOPSLA’03)

– customization of library classes (ECOOP’04)

– refactorings for introducing generics (work in progress)

related work

conclusions and future work

IBM Research


Outline

background






related work


IBM Research


Scope of our Research

start with a type-correct Java program P

for a given transformation that transforms P into P’– we would like to check/guarantee that P’ is type-correct

– we would like to check/guarantee that P’ has the same behavior as P

– (in some cases) compute “maximal” P’ for which the above properties hold

we use type constraints to establish these properties– formalism for expressing relationships between program expressions that

must hold in order for a program to be type-correct

– traditionally used for type checking and type inference

transformations under consideration– refactorings: well-known maintenance operations, usually aimed at making

code more flexible/general; proposed by the programmer

– driven by static/dynamic analysis in link-time optimizer

IBM Research


Refactoring

refactoring: the application of behavior-preserving transformations to a program in order to improve a program’s design

– eliminating undesirable program characteristics

– e.g., duplicated code, classes/methods that are too large,...

– making existing classes/methods usable in new contexts

– preparing for extensions

– breaking up monolithic systems into components

– introduction of design patterns

refactoring (noun): a specific program transformation. Usually identified by:

– name (e.g., “Extract Method”, “Pull Up Members”, ...)

– preconditions

– a specific set of transformations to be performed by a programmer or by an automated tool

IBM Research


Refactoring

pioneered by Griswold [1991], Opdyke [1992] & Johnson, leading to Smalltalk Refactoring Browser [Roberts 1992]

recently popularized by continuous-refinement methodologies such as “Extreme Programming” [Beck 2000]

catalogues of common refactorings:[Fowler 1999], [Kerievsky 2003]

Fowler describes refactorings as a series of steps to be performed by the programmer

– manual refactoring is very error-prone

– renewed interest in automated refactoring support in IDEs

– refactoring support featured in Eclipse, IntelliJ IDEA, OmniCore, ...

IBM Research


Categories of Refactorings (see Fowler’s book)

making method calls simpler– Rename Method, Add/Remove Parameter, ...

composing methods– Extract Method, Inline Method, Inline Local, ...

moving features between objects– Move Method, Move Field, Extract Class, ...

organizing data– Self-Encapsulate Field, Replace Data Value with Object, ...

simplifying/eliminating conditionals– Replace Conditional with Polymorphism, ...

dealing with generalization– Extract Interface, Pull Up Members, ...

IBM Research


Eclipse (www.eclipse.org) open-source (CPL) development environment

– implemented in Java, XML– basis for commercial offerings by IBM (WSAD, WSDD) and others

plugin-architecture– plugins contribute views/perspectives– plugins provide extension points

state-of-the-art development environment for Java– quick-fixes, refactoring, type hierarchy view, call hierarchy, search facilities– support for other languages (C, Smalltalk, AspectJ)

various IBM programs focused on Eclipse– Eclipse Innovation Grants for academics (2002, 2003)– Eclipse Technology Exchange meetings (ICSE, OOPSLA, ECOOP)

solid basis for research/education projects– Penumbra, Gild, Hipikat, ECESIS, ...– Continuous Testing, Java Traits, Ownership Types, ...

IBM Research


Demo: Eclipse Refactorings

IBM Research


Outline

background






related work


IBM Research


Type Constraints

formalism developed in 1990s

– captures relationships between types of program constructs

original purpose: type checking/inference

– prove that certain kinds of errors cannot occur at run-time

– e.g., no “message not understood” errors

we use a variation on the formalism from a book by Palsberg & Schwartzbach

– adapted/extended to capture the semantics of Java

IBM Research


Type Constraints Notation

[E] the type of expression E

[M] the declared return type of method M

[F] the declared type of field F

Decl(M) the type that contains method M

Param(M,i) the i-th parameter of method M

, subtype relation

IBM Research


Syntax of Type Constraints

[E] = [E’] the type of expression E must be the same as the

type of expression E’

[E] [E’] the type of expression E is a proper

subtype of the type of expression E’

[E] [E’] either [E] = [E’] or [E] [E’]

[E] T the type of expression E is defined to be T

[E] [E1] or ... or [E] [Ek]

disjunction: at least one of subconstraints

[E] [E1], ..., [E] [Ek] must hold

IBM Research


Generating Type Constraints

declaration C v [v] C

assignment E1 = E2 [E2] [E1]

access E.f to field F [E.f] [F]

[E] Decl(F)

return E in method M [E] [M]

method M in class C Decl(M) C

this in method M [this] Decl(M)

direct call E.m(E1,...,En) to method M [E.m(E1,...,En)] [M]

[Ei] [Param(M,i)]

[E] Decl(M)

IBM Research


for a call E.m(E1,...,En) to a virtual method M

RootDefs(M) = { M’ | M overrides M’, and there exists no M’’ (M’’ M’) such that M’ overrides M’’ }

Virtual Method Calls

[E.m(E1,...,En) ] [M]

[Ei] [Param(M,i)]

[E] Decl(M1) or... or [E] Decl(Mk) where RootDefs(M) = { M1,...,Mk }

IBM Research


Constraints for Virtual Method Calls

public void foo(String s1, String s2) {

Hashtable h = new Hashtable();

h.put(s1, s2);

}

[h] Decl(Map.put(...)) or [h] Decl(Dictionary.put(...))

Map

Hashtable

Dictionary

Map

[h] Map or [h] Dictionary

put()

put()put()

IBM Research


Constraints for Overriding & Hiding

if method M’ overrides method M, M’ M

if field F’ hides field F

[Param(M’,i)] = [Param(M,i)]

[M’] = [M]

Decl(M’) < Decl(M)

Decl(F’) < Decl(F)

IBM Research


Casts

for a cast (C)E[(C)E] C

[E] [(C)E] or [(C)E] [E]

if C is a class and [E] is a class

the latter constraint need not be generated if C or |E| is an interface

these constraints only capture the requirements for type-correctness (not necessarily program behavior)

it is possible to avoid generating disjunctions by preserving the “directionality” of the cast

IBM Research


Outline

background






related work


IBM Research


Refactoring for Generalization

several refactorings are concerned with generalization

– moving methods/fields to superclasses and subclasses

– splitting & merging of classes

– manipulating the types of declarations

Chapter 11 of Fowler’s book mentions:

– Extract Interface

– Pull Up Member(s)

– Push Down Member(s)

– Extract Subclass

– Generalize Type

IBM Research


Extract Interface – Recipe

select class C

select subset M of C’s methods

create interface I containing declarations of the methods in M

add inheritance “C implements I”

“Adjust client type declarations to use the interface” [Fowler, p.342]

IBM Research


Extract Interface: An Example

List class with methods as follows:

– add(Comparable) add an element

– addAll(List) add contents of another List

– iterator() iteration support

– sort() sorts the list

ListIterator class

– implements java.util.Iterator; methods hasNext(), next()

Client class

– create List; add some elements

– add contents of another List; sort the List

– print contents of the List

extract an interface Bag from List

– declares add(Comparable), addAll(List), iterator()

interface Bag { public Iterator iterator(); public List add(Comparable e); public List addAll(List v0);}class List implements Bag { int size = 0; Comparable[] elems = new Comparable[10]; public Iterator iterator(){ return new ListIterator(this); } public List add(Comparable e) { if (this.size + 1 == this.elems.length) { Comparable[] newElems = new Comparable[2 * this.size]; System.arraycopy(this.elems, 0, newElems, 0, this.size); this.elems = newElems; } this.elems[this.size++] = e; return this; } public List addAll(List v1) { java.util.Iterator i = v1.iterator(); for (; i.hasNext(); this.add((Comparable)i.next())); return this; } public void sort() { /* insertion sort */ }}

List/Bag Example (1)

class ListIterator implements java.util.Iterator { private int count = 0; private List v2; ListIterator(List v3){ v2 = v3; } public boolean hasNext(){ return this.count < this.v2.size; } public Object next(){ return this.v2.elems[this.count++]; }}public class Client { public static void main(String[] args) { List v4 = createList(); populate(v4); update(v4); sortList(v4); print(v4); } static List createList(){ return new List(); } static void populate(List v5){ v5.add("foo").add("bar"); } static void update(List v6) { List v7 = new List().add("zap").add("baz"); v6.addAll(v7); } static void sortList(List v8){ v8.sort(); } static void print(List v9) { for (Iterator iter = v9.iterator(); iter.hasNext();) System.out.println("Object: " + iter.next()); }}

List/Bag Example (2)

ListList

List

List

List

IBM Research


Problem Statement

identify all declarations that can be updated to make use of the newly extracted interface

want to be able to reason about:

– correctness of the solution

– maximality of the solution

IBM Research


Using Type Constraints

declared types of variables, fields, parameters constrained by:

– field access, method calls

– assignments, parameter-passing

several other invariants must be maintained to preserve type-correctness & program behavior

Observation: all these constraints can be stated succinctly and uniformly using type constraints

IBM Research


List.add(),Bag.add() [Bag.add()] = [List.add()]

List.addAll(),Bag.addAll() [v0] = [v1]

[Bag.addAll()] = [List.addAll()]

List.iterator() List [v3]

List.add() List [List.add()]

List.addAll() [v1] Bag, List [List.addAll()]

ListIterator.iterator() [v3] [v2]

ListIterator.hasNext() [v2] List

ListIterator.next() [v2] List

Client.main() [Client.createList()] [v4], [v4] [v5], [v4] [v6], [v4] [l8], [v4] [l9]

Client.createList() List [Client.createList()]

Client.populate() [v5] Bag, [List.add()] Bag

Client.update() [List.add()] [v7], [List.add()] Bag, [v6] Bag, [v7] [v1]

Client.sortList() [v8] List

Client.print() [v9] Bag

IBM Research


Observation

the constraints for the original program contain all the information we need

some declarations cannot be updated

List [v3] [v2] List

[v4] [v8] List

other variables are less constrained

[v1] Bag

IBM Research


Algorithm for Determining “Updatable” Declarations

iterative algorithm for determining non-updatable declarations

– first determine declarations that cannot be updated because of member access (e.g., [v2] List, [v8] List)

– if x is non-updatable, and there is a type constraint

[y] [x], [y] = [x], or [y] < [x]

then y is non-updatable

iterate until fixed-point is reached

IBM Research


Non-Updatable Declarations for the Example Program

{ v2, v3, v4, v8, Client.createList() }

(consistent with earlier result)

IBM Research


Justification (Details in Paper)

type-correctness

– updating the “updatable” declaration elements results in a program that satisfies all type constraints

preservation of behavior

– argument based on the fact that method dispatch, cast/instanceof behavior do not depend on declared types

maximality

– updating any non-updatable declarations will result in the violation of type constraints

IBM Research


Another Refactoring: Pull Up Members

class A { ...}

class B extends A { public B foo(){ return this;}}

[this] Decl(B.foo())Decl(B.foo()) B[B.foo()] B[this] [B.foo()]?

IBM Research


Pull Up Members (2)

class A { public B foo(){ return this;}}

class B extends A { ...}

[this] Decl(A.foo())Decl(A.foo()) A[A.foo()] B

[this] ≤ [A.foo()]

IBM Research


Other Refactorings

Generalize Type

– update the type of a declaration E

– use type constraints to determine allowable supertypes/subtypes

– may enable Pull Up Members in certain cases

Extract Subclass

– splitting of a class

– can be treated similarly as Extract Interface

Push Down Members

– the “inverse” of Pull Up Members

– similar issues

IBM Research


Perspective

infer from original program a system of ordering constraints between types of declaration elements

– original program is just one possible solution

Extract Interface

– declarations: variables

– locations of members: constants

Pull Up Members

– declarations: constants

– locations of members: variables

Generalize Type

– selected declaration: variable

– all other declarations & locations of members: constants

IBM Research


Demo: Extract Interface & Generalize Type

IBM Research


Outline

background






related work


IBM Research


Class Libraries

class libraries improve programmer productivity– programmers don’t have to waste time developing & debugging

standard infrastructure

but... class libraries are often implemented with some typical/ average usage pattern in mind

for example: container class implementations assume that:– elements are accessed often & frequently– a large number of elements is stored

performance loss if the actual usage of a library class differs from this typical usage pattern “MyHashTable”, “SmartHashtable”,... in various benchmarks

IBM Research


Our Approach

derive custom versions from library classes

rewrite application to use these custom versions

ship custom library classes with application

technical foundations:

– use type constraints to determine where custom classes can be used

– use profile information to determine where introducing custom classes is profitable

– use static analysis and profile information to decide how to customize

IBM Research


Example Program

class Example { void foo(Map m){ Hashtable r1 = new Hashtable(); JTree tree = new JTree(r1); Hashtable r2 = new Hashtable(); Hashtable r3 = new Hashtable(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(Object o){ Hashtable r4 = (Hashtable) o; if (r4.contains(“FOO”)) {…} }}

class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); H r2 = new H(); H r3 = new H(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ H r4 = (H) o; if (r4.contains(“FOO”)) {…} }}

Map

Hashtable

Object

DictionaryString M

H

O

DS M

H

O

DS

IBM Research



class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); H r2 = new H()H1(); H r3 = new H()H2(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ H r4 = (H) o; if (r4.contains(“FOO”)) {…} }}

How to customize? M

H

O

DS

H2H1

IBM Research


H2H1

How to customize? M

H

O

DS

class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); H r2 = new H()H1(); H r3 = new H()H2(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ H r4 = (H) o; if (r4.contains(“FOO”)) {…} }}

H2H1

class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); H H1 r2 = new H()H1(); H H2 r3 = new H()H2(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ H r4 = (H) o; if (r4.contains(“FOO”)) {…} }}

IBM Research


How to customize?


H2H1

M

H

O

DS

IBM Research


H2H1

AH

How to customize?


class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); H AH r2 = new H()H1(); H AH r3 = new H()H2(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ H r4 = (H) o; if (r4.contains(“FOO”)) {…} }}

• update allocations of library types

• update declarations

H2H1

M

H

O

DS

IBM Research


Restrictions?


• type correctness

H2H1

AH

M

H

O

DS

• interface compatibility

• preserve behavior of cast and instanceof operations

call to:javax.swing.JTree(Hashtable)

IBM Research


Outline of Approach

generate type constraints for program

– additional constraints generated to ensure that behavior of cast/instanceof operations is preserved

constraint simplification

– rewrite/replace all constraints to use “≤” only

solve the resulting constraint system

rewrite the program’s declarations and allocation sites to use the inferred types

IBM Research


Preserving the Behavior of Cast & instanceof we want to change declarations and allocation sites

– need to ensure that cast/instanceof operations succeed and fail in exactly the same cases as before

– use points-to analysis to approximate the set of objects to which the cast/instanceof is applied

– easily expressed using constraint (to be replaced with a ≤ constraint)

public class Example { void zip(){ zap(new Hashtable()); // A1 zap(new String()); // A2 } void zap(Object o){ Hashtable h = (Hashtable)o; // C }}

A1 ≤ C

A2 C

IBM Research


a1 ≤ d1 d1 ≤ H a2 ≤ d2 a3 ≤ d3 d2 ≤ D v d2 ≤ M d3 ≤ d4 d3 ≤ d2 d2 ≤ M S ≤ d4 c1 ≤ d4 v d4 ≤ c1 c1 ≤ d5 d5 ≤ H v d5 ≤ AH a3 ≤ c1 S c1

Type constraints


class Example { void foo(M m){ d1 r1 = new a1(); JTree tree = new JTree(r1); d2 r2 = new a2(); d3 r3 = new a3(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(d4 o){ d5 r4 = (c1) o; if (r4.contains(“FOO”)) {…} }}

H

H2H1

AH

M

H

O

DS

d5 ≤ H

c1 ≤ H

IBM Research


Type constraints H

H2H1

AH

M

H

O

DS

a1

d1d2

a2a3

d3

M HH

d5

c1 S

d4

a1 ≤ d1 d1 ≤ H a2 ≤ d2 a3 ≤ d3 d2 ≤ D v d2 ≤ M d3 ≤ d4 d3 ≤ d2 d2 ≤ M S ≤ d4 c1 ≤ d4 v d4 ≤ c1 c1 ≤ d5 d5 ≤ H v d5 ≤ AH a3 ≤ c1 S c1

d5 ≤ H

c1 ≤ H

IBM Research


Constraint Solving H

H2H1

AH

M

H

O

DS

{O,S,H,H1,H2}{O,S,H,H1,H2} {O,S,H,H1,H2}

a1

d1d2

a2a3

d3

M HH

d5

c1 S

d4{O,S,H,H1,H2,D,M,AH}

{O,S,H,H1,H2,D,M,AH}





d1 ≤ H


{O,S,H,H1,H2}

a1 ≤ d1

d5 ≤ HT



{O,S,H,H1,H2}





{O,S,H,H1,H2}{O,S,H,H1,H2}



{O,S,H,H1,H2}



{O,S,H,H1,H2}









IBM Research


class Example { void foo(M m){ d1 r1 = new a1(); JTree tree = new JTree(r1); d2 r2 = new a2(); d3 r3 = new a3(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(d4 o){ d5 r4 = (c1) o; if (r4.contains(“FOO”)) {…} }}

Rewriting the Example Program

a1

d1d2

a2

M HH

d5

c1 S

d4

{H1}{H2}

{AH}

{H2}

{O} {AH}

{AH}

{H}

{H}

a3

d3

class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); d2 r2 = new a2(); d3 r3 = new a3(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(d4 o){ d5 r4 = (c1) o; if (r4.contains(“FOO”)) {…} }}

class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); d2 r2 = new H1(); d3 r3 = new H2(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(d4 o){ d5 r4 = (c1) o; if (r4.contains(“FOO”)) {…} }}

class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); AH r2 = new H1(); AH r3 = new H2(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(d4 o){ d5 r4 = (c1) o; if (r4.contains(“FOO”)) {…} }}

class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); AH r2 = new H1(); AH r3 = new H2(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ d5 r4 = (c1) o; if (r4.contains(“FOO”)) {…} }}

class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); AH r2 = new H1(); AH r3 = new H2(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ d5 r4 = (H2) o; if (r4.contains(“FOO”)) {…} }}

class Example { void foo(M m){ H r1 = new H(); JTree tree = new JTree(r1); AH r2 = new H1(); AH r3 = new H2(); r2.put(“FOO”,“BAR”); bar(r3); r2 = r3; r2.putAll(m); bar(“HELLO”); } void bar(O o){ AH r4 = (H2) o; if (r4.contains(“FOO”)) {…} }}

IBM Research


Creating Custom Classes

1. create custom “profiling” Hashtable

– determine how often allocation sites are executed

– simulate caching schemes

– number of succeeding/failing get/put operations

2. static analysis (using “gnosis” framework developed at IBM)

– construct call graph (0-CFA, distinct allocation sites for classes of interest)

– compute type estimates

– escape analysis

3. generate custom implementations: H1, H2, …

– generated from template (using C preprocessor)

4. rewrite bytecode for the program

H2H1

AH

M

H

O

DS

IBM Research


Generating Custom Classes

1. lazy vs. eager allocation

2. synchronized vs. unsynchronized

3. optimizing edge cases

4. caching of frequently accessed objects

5. removal of unused fail-safe iteration code

6. …

H2H1

AH

M

H

O

DS

IBM Research


Applied Customizations _202_jess– specialization of Hashtable keys (String/Integer)– synchronization removal on frequently used Vectors

_209_db– use caching to optimize consecutive Vector-retrievals– synchronization removal on frequently used Vectors

_218_jack– 99% of all search operations are on empty Hashtables– lazy allocation, removal of bookkeeping for fail-safe iterators– synchronization removal on Hashtables

Jax– most containers remain small, decrease initial container size

HyperJ– optimization of empty Hashtables, removal of bookkeeping for fail-safe iterators– synchronization removal

Chess*– frequent iteration over Hashtables of fixed, small size– use smaller initial size

Pmd *– the vast majority of a huge number of allocated HashSets remains empty– lazy allocation, removal of bookkeeping for fail-safe iterators

*no synchronization removal because of GUI-related multi-threading in these benchmarks

IBM Research


Speedups

customization of:– java.util.* containers– StringBuffers (desynchronization only)

measurements taken on HyperThreaded Pentium 4 @ 2.8Ghz running Linux 2.4.21

IBM Research


Heap Consumption

significant reduction in heap consumption on _218_jack because of lazy allocation of many Hashtable-objects that remain empty

IBM Research


Impact on Application Size

note: original size of _209_db is only 6KB.– 15 KB of custom container classes are added

on large benchmarks (>100Kb), the size increase is <= 12%

IBM Research


Outline

background






related work


IBM Research


generics (parametric polymorphism) to be introduced in Java 1.5– classes can have type parameters that have optional bounds– reduces need for downcasts

class Hashtable<Key,Value> { ... }

class Tree<Elem extends Comparable<Elem>> { ... }

Hashtable<Integer,String> table = new Hashtable<Integer,String>();

...String s = table.get(someInteger);

Java Generics

IBM Research


Generic Collections

in most Java applications, the use of Collection classes is the main source of down-casts

the standard libraries for Java 1.5 contain generic versions of existing Collection classes

– Vector<T> instead of Vector

– HashMap<K,V> instead of HashMap

goal: refactor applications that use non-generic collections

– make them use generic collections instead

– use type inference to infer element types

– remove downcasts

IBM Research


class A {

public void foo(){

Vector v1 = new Vector();

String s1= "aaa";

this.insert(v1, s1);

String s2= (String)v1.get(0);

}

public void insert(List v2, Object o){

v2.add(o);

}

}

Example 1

IBM Research


class A {

public void foo(){

Vector<String> v1 = new Vector<String>();

String s1= "aaa";

this.insert(v1, s1);

String s2= (String)v1.get(0);

}

public void insert(List<String> v2, String o){

v2.add(o);

}

}

Example 1 (refactored)

update “collection” declarations

remove casts

note update of declaration of o

IBM Research


public void bar(){

List v1= new Vector();

v1.add(new Float(3.4));

this.reverse(v1);

Float f1 = (Float) v1.iterator().next();

}

public void baz(){

List v2 = new Vector();

v2.add(new Integer(17));

this.reverse(v2);

Integer i1 = (Integer) v2.iterator().next();

}

public void reverse(List v3){

for (int t=0; t < v3.size()/2; t++){

Object temp = v3.get(v3.size()-1);

v3.add(v3.size()-1, v3.get(t));

v3.add(t, temp);

}

}

Example 2

IBM Research


public void bar(){

List<Number> v1= new Vector<Number>(); v1.add(new Float(3.4));

this.reverse(v1);


}

public void baz(){

List<Number> v2 = new Vector<Number>(); v2.add(new Integer(17));

this.reverse(v2);


}

public void reverse(List<Number> v3){ for (int t=0; t < v3.size()/2; t++){

Number temp = v3.get(v3.size()-1); v3.add(v3.size()-1, v3.get(t));

v3.add(t, temp);

}

}

Example 2(version 1)

element types “merged” in reverse()

cannot remove casts in callers

public void bar(){

List<Float> v1= new Vector<Float>(); v1.add(new Float(3.4));

this.reverse(v1);


}

public void baz(){

List<Integer> v2 = new Vector<Integer>(); v2.add(new Integer(17));

this.reverse(v2);


}

public <T> void reverse(List<T> v3){ for (int t=0; t < v3.size()/2; t++){

T temp = v3.get(v3.size()-1); v3.add(v3.size()-1, v3.get(t));

v3.add(t, temp);

}

}

Example 2(version 2)

obs: no flow of values between different invocations of reverse()

need for context-sensitive analysis

introduction of type parameters

IBM Research


Outline of Approach

context inference

– use low-cost variation on Agesen’s Cartesian Product Algorithm (CPA) [Agesen:95] for inferring relevant contexts

– simultaneously computes points-to information for expressions and a set of contexts for each method

type inference

– generate type constraints for the program that explicitly encode context information

– solving the type constraints produces element types for declarations and allocations of container class types

source rewriting

– analyze (element) types inferred for different contexts, introduce type parameter if necessary

IBM Research


public void bar(){

List v1= new Vector(); // L1


this.reverse(v1);


}

public void baz(){

List v2 = new Vector(); // L2


this.reverse(v2);


}


for (int t=0; t < v3.size()/2; t++){



v3.add(t, temp);

}

}

Context Inference[●]

[●]

[●,Lext] [●,L1] [●,L2]

[●]

[●,L1]

[●]

[●,L2]

IBM Research


public void bar(){



this.reverse(v1);


}

public void baz(){



this.reverse(v2);


}


for (int t=0; t < v3.size()/2; t++){



v3.add(t, temp);

}

}

Example Constraints[●]

[●]

[●,L1], [●,L2], [●,Lext]

|new Vector()|[●] Vector<X1>

|new Vector()|[●] ≤ |v1|[●]

|new Float(3.4)|[●] Float

|new Float(3.4)|[●] Types[●](v1)

|v1|[●] ≤ |v3|[●, L1]

|new Vector()|[●] Vector<X2>

|new Vector()|[●] ≤ |v2|[●]

|new Integer(17)|[●] Integer

|new Integer(17)|[●] Types[●](v2)

|v2|[●] ≤ |v3|[●, L2]

|v3.get()|[●,L1] Elem[●, L1](v3)

|v3.get()|[●, L1] ≤ |temp|[●, L1]

|v3.get()|[●, L1] ≤ Elem[●, L1](v3)

|temp|[●, L1] ≤ Elem[●, L1](v3)

|v3.get()|[●,L2] Elem[●, L2](v3)

|v3.get()|[●, L2] ≤ |temp|[●, L2]

|v3.get()|[●, L2] ≤ Elem[●, L2](v3)

|temp|[●, L2] ≤ Elem[●, L2](v3)

|v3.get()|[●,LExt] Elem[●,LExt]

(v3)

|v3.get()|[●,LExt] ≤ |temp|[●,LExt]

|v3.get()|[●,LExt] ≤ Elem[●,LExt]

(v3)

|temp|[●, LExt] ≤ Elem[●, LExt]

(v3)

IBM Research


Constraint Solving

standard propagation-based solver

– computes a type for each constraint variable |E|– in cases where multiple types can be chosen for an expression E, a

heuristics-based choice is made (a least specific type for container-related expressions, a most specific type for other expressions)

– different types may be computed for the same expression in different contexts (e.g., |E|1 and |E|2)

element types are unified across ≤ constraints

processing type variables

– a type variable is bound by matching it with a concrete set of types

– matching two type variables results in their unification

– type variables may be left unbound (e.g., in incomplete programs)

– use approximate solution (e.g., element type Object) when processing programs with code like v.add(v)

IBM Research


public void bar(){



this.reverse(v1);


}

public void baz(){



this.reverse(v2);


}


for (int t=0; t < v3.size()/2; t++){



v3.add(t, temp);

}

}

Constraint Solving[●]

[●]

[●,L1], [●,L2], [●,Lext]

Elem[●](v1) = Float

Elem[●](v2) = Integer

Elem[●,L1](v3) = Float

Elem[●,L2](v3) = Integer

Elem[●,Lext(v3) = Object

IBM Research


public void bar(){

List<Float> v1= new Vector<Float>();


this.reverse(v1);


}

public void baz(){

List<Integer> v2 = new Vector<Integer>();


this.reverse(v2);


}

public <T> void reverse(List<T> v3){

for (int t=0; t < v3.size()/2; t++){

T temp = v3.get(v3.size()-1);


v3.add(t, temp);

}

}

Code Generation

IBM Research


Results

benchmark LOC #container allocations

#container declarations

#casts #casts removed

%casts removed

Hanoi 4028 3 6 20 14 70

JUnit 5317 24 63 54 21 39

JLex 7841 17 45 71 53 75

JavaCup 10598 19 78 502 373 74

Mango1 2808 2 9 2 2 100

Mango2 2808 3 13 4 2 50

Mango3 2808 1 17 10 0 0

IBM Research


Demo: Prototype “Genericize” Refactoring

IBM Research


Outline

background






related work


IBM Research


Related Work on Customization

automatic data structure selection for SETL

– see [Schonberg et al. ’81]

automatic component selection

– see, e.g., [Hogstedt et al. ’01, Yellin ’03]

– purely profile-based, no static analysis

– all possible component implementations supplied up-front

automatic optimization of data structures in specific domains

– e.g., data structure selection for sparse matrix problems

optimizations applied to specific container classes

– see, e.g., [Beckmann & Wang, Friedman et al. ’01]

– e.g., prefetching, incrementalizing rehash operations

much related work on partial evaluation and program specialization

– see e.g., [Schultz, Lawall, Consel ’03]

IBM Research


Other Related Work

type inference and type-directed transformation have been used in the translation of large COBOL programs for Y2K compliance [Eidorff et al. 99, Ramalingam et al. 99]

informal characterization of type constraints [Opdyke’92, Seguin’00, Tokuda & Batory’01]

detecting overspecific variables [Halloran & Scherlis’02]

generating proposals for refactoring class hierarchies using concept analysis [Snelting & Tip’00]

inferring generic types in Java programs [Duggan’99, Donovan et al.’04, Von Dincklage & Diwan’04]

IBM Research


Future Work

in progress: support for migration between functionally equivalent classes

– e.g., from Vector to ArrayList, Hashtable to HashMap

– limitations on migration due to interaction with external code

– application: upgrading of “legacy” applications

variation on Java in which programmers only refer to interface types such as Set, Map, List instead of concrete types such as HashSet, TreeMap, ArrayList

– use customization techniques to select implementation

– similar in spirit to the SETL work at NYU by Paige, Schonberg, et al. in the 1970s and 1980s

other generics-related refactorings

– select a declaration & change its type into a type parameter

IBM Research


Conclusions type constraints are a useful tool for supporting refactorings and related

program transformations

– checking of preconditions

– determining allowable source-code modifications

– enables reasoning about program behavior

applications

– refactorings related to generalization

– customization of library classes

– refactorings for introducing generics

– more refactorings in the works

implemented in Eclipse

– Extract Interface, Generalize Type available now

– generics refactorings planned for Eclipse 3.1

– freely available from www.eclipse.org

EXTRA SLIDES

IBM Research


Typical Refactoring Scenario

user proposes a transformation by interacting with GUI/Wizards in IDE

system checks if preconditions are met

system determines necessary/allowable source code updates

systems shows before/after “diff” view

user confirms

program works as before

IBM Research


Solving the Constraints

naive approach

– explicitly enumerate all values; each expression type in { C, I }

– for each solution, determine if constraints are satisfied

cost: O(2n), where n is the number of declarations of type C

IBM Research


Object-Oriented Type Systems

“A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute”

[Benjamin C. Pierce, 2002]

Traditional applications of type systems:

– enhance readability/understandability

– prove/guarantee that certain kinds of run-time errors will not occur during program execution (e.g., “message not understood”)

– foundation for abstractions & language features (e.g., module systems)

– enable optimizations (e.g., replace dynamic dispatch with direct call)

IBM Research


Some Terminology type: set of objects that share properties (e.g., supported operations)

– in Java, there is a direct correspondence between types and classes and interfaces in the inheritance hierarchy

static typing: type information is explicit in the source code– consistency checks can be performed by a compiler (type checking)– Note: some run-time checking may still be needed

type checking: checking certain consistency properties of programs that contain explicit type declarations– to guarantee the absence of run-time errors– a program that type-checks is (statically) type-correct

type inference– in dynamically typed languages, types of expressions are inferred from their usage– also used in statically typed languages for optimization (e.g., certain run-time checks may

be proven obsolete through analysis)

type constraints– formalism for expressing relationships between program expressions that must hold in

order for a program to be type-correct– used for type checking as well as for type inference

IBM Research


Observations

cannot update variable e1 because method getName() is called on e1, which is not declared in Billable

cannot update variable e2 because method getAddress() is called on e2, which is not declared in Billable

updating the return type of findEmployee() produces type mismatch in assignment to e2

updating the cast produces type mismatch in assignment to e1

IBM Research


Observations

Observations:

– type of v2 must be List, because of field access v2.size

– type of v3 must be List, because of assignment v2 = v3

– type of v8 must be List, because of call v8.sort()

– type of v4 must be List because it is passed as an argument to Client.sortList(), implying an assignment v8 = v4

– return type of Client.createList() must be List because of assignment v4 = Client.createList()

Conclusion:

– v0, v1, v5, v6, v7, v9, and the return types of List.add(), List.addAll(), Bag.add(), Bag.addAll() can be given type Bag

IBM Research


Conclusions & Future Work

customization: a technique for library-level optimizations– use type constraints to determine where applicable– use profile information to determine where useful– use static analysis and profile information to select optimizations

strong results– speedups up to 76.7% (18.8-24.1% on average)– heap consumption reduced by up to 45.9% (11.9% on average)– modest increase in app. size (<12% on large applications)

future work:– apply additional optimizations – apply to additional library classes– self-customizing classes– incorporate into whole-program optimizers

• e.g., Jax [Tip et al. 02], IBM WSDD SmartLinker

IBM Research


Detailed Speedup Results

IBM Research


Detailed Heap/Size Results

IBM Research


Implementation

implemented in Eclipse using existing refactoring framework [Baeumer et al. 01]

– Extract Interface

– Generalize Type

– Pull Up Members

– Push Down Members

determining type constraints nontrivial for several language features

– arrays

– member types (inner classes)

– exceptions

– overloading

IBM Research


Demonstration of Eclipse Refactoring Support Basic Stuff:

– texthovers: JavaDoc– ctrl-hover: Code + HyperLink– Ctrl-T: hierarchy– code completion

Rename Class– remove ugly prefix: JX_RTA -> RTA

Extract Method– method RTA.process() too long– extract processCurrentCallSitesWrtProcessedClasses()– estIterations()– undo– estIterations() with next line --- two return values– convert local to field– estIterations with next line OK now

Inline Method– RTA.moveNewToCurrentClasses()

Inline Local Variable– inline “callSite” in processCurrentCallSitesWrtProcessedClasses()

Extract Constant– DONE_ESTIMATE at end of RTA.process()

Pull Up Members– getIndex() in JX_MethodCallSite

IBM Research


public class Employee { public String getName(){ return _name; } public String getAddress(){ return _address; } public int getRate(){ return _rate;} public boolean hasSpecialSkill(){ return _hasSpecialSkill; } private int _rate; private boolean _hasSpecialSkill; private String _name; private String _address;}public class TimeSheet { public double charge(Employee emp, int days){ int base = emp.getRate() * days; if (emp.hasSpecialSkill()) return base * 1.05; else return base; }}

Example

Example taken from Fowler’s “Refactoring”, p.342

IBM Research


Example

public interface Billable { int getRate(); boolean hasSpecialSkill();}

public class Employee implements Billable { // contents of this class same as before}public class TimeSheet { public double charge(Billable emp, int days){ int base = emp.getRate() * days; if (emp.hasSpecialSkill()) return base * 1.05; else return base; }}

Example taken from Fowler’s “Refactoring”, p.342

IBM Research


But updating any of these references to Employee leads to compilation errors...

public class Personnel { public static Employee findEmployee(String name)

throws NotFoundException { for (int t=0; t < employees.size(); t++){ Employee e1 = (Employee)employees.elementAt(t); if (e1.getName().equals(name)) return e1; } throw new NotFoundException(); } public static String findAddress(String name) throws NotFoundException { Employee e2 = findEmployee(name); return e2.getAddress(); } private static Vector employees;}

IBM Research


Context Inference assume that allocation sites in a program are labeled

– distinct labels L1, ..., Lk for container-related allocation sites

– a single “blob” label ● used for all other allocation sites

– distinct label Lext represents collections created outside the application

for each method m, infer a set of contexts Contexts(m)

– each context represents a set of callers of a method

– identified by a list of labels, one for each parameter; e.g., [L1, L2, ●, ●]

for each expression E that occurs in the body of method m for which

Contexts(m), infer a points-to set Objects(E)

– set of labels; e.g., PT(E) = {L1, L2, L9, ●}

compute context-sensitive call graph

– compute for each pair <call-site, context>, a set of <method, context> pairs

– make conservative assumptions about entry point methods

IBM Research


Context Inference

we assume a given set of entry point points

– e.g., all public methods

– to be specified by the user of the refactoring tool

conservative assumptions about objects bound to parameters of entry point methods

– depends on declared type of the parameter

conservative assumptions about calls to external methods for which source code is unavailable

use Class Hierarchy Analysis (CHA) [Grove et al. 95] to approximate behavior of dynamic dispatch

null constants, literals, primitive values modeled as objects

IBM Research


Auxiliary Definitions for Context Inference Rules

set of objects assumed to be bound to parameters of entry-point methods

construct contexts for call sites that occur in method m for which Contexts(m)

{ Lext } if T ≤ Collection

ExternalObjects(T) = { ● } if T Collection

{Lext,● } otherwise

SelectContexts(, E0,...,Ek) =

{ [p0,...,pk] | pi Objects(Ei), 0 ≤ i ≤ k }

IBM Research


Some of the Context Inference Rules

T0.m(T1,...,Tn) is an entry point, pi ExternalObjects(Ti), = [p0,...,pn], 1 ≤ i ≤ n

Contexts(T0.m(T1,...,Tn))

pi Objects(Param(T0.m(T1,...,Tn) ))

m contains assignment E1=E2, Contexts(m)

Objects(E2) Objects(E1)

m contains call E0 new TL(E1,...,En) to constructor m’, T ≤ Collection, Contexts(m)

L Objects(E0)

m contains call E0 new TL(E1,...,En) to constructor m’, T Collection, Contexts(m)

’ SelectContexts(,E0,...,En), 0 ≤ i ≤ n

’ Contexts(m’)

● Objects(E0)

Objects(Ei) Objects’(Param(m’,i))

(C1)(C2)

(C3)

(C4)

(C5)(C6)

(C7)

IBM Research


Constraint Generation

constraint generation rules similar to those used for generalization-related refactorings

– constraint variables annotated with subscript that identifies their “containing” context

– additional rules that model the behavior of operations on collections

constraint variable Elem(E) represents the element type of container objects in Objects(E)

– similar: Key(E), Value(E) type for Map-style collections

notation: NewType(T) denotes a parameterized version of type T with a fresh type variable

IBM Research


Some of the Constraint Generation Rules

m contains assignment E1=E2, Contexts(m)

|E2| ≤ |E1|

m contains direct call E T.n(E1,...,Ek) to method m’, T Collection

Contexts(m), ’ SelectContexts(, E1, ..., Ek), E’i = Param(m’,i), 1 ≤ i ≤ k

|E| |m’|’

|Ei| ≤ |E’i| ’

m contains call E0.add(E1) to method m’, Contexts(m), Decl(m’) ≤ Collection

|E1| Types(E0)

(B1)

(B4)

(B5)

(B16)

(B24)

T Types(E)

T ≤ Elem(E)

|E1| ≤ |E2|’

Elem(E1) = Elem’(E2)(B27)

IBM Research


Constraint Generation for new Expressions

m contains expression E0 new T(E1,...,Ek) to constructor m’, T Collection,

Contexts(m), ’ = SelectContexts(, E0 ,...,Ek), E’i = Param(m’, i), 0 ≤ i ≤ k

|E0| T

|Ei| ≤ |E’i|’

m contains expression E0 new T(E1,...,Ek), T ≤ Collection,

Contexts(m), T’ = NewType(T)

|E0| T’(B14)

(B2)

(B3)

IBM Research


Code Generation

source code updating for a method m is trivial if there is one context for m, or if the types inferred for the expressions in m are the same in all contexts

if for a given expression E in method m, different types are computed in different contexts for m we attempt to introduce a type parameter for E– need to determine which (if any) other expressions must have the same type as E– a bound on a type parameter T of method m is needed if expressions of type T are

constrained to be of a type X more specific than Object in some context of m• use a common upper bound of all such types X

in programs with failing casts, the type constraint system may not have a solution in a given context– approach: merge all contexts for methods with failing casts, and continue solving

(context-insensitive solution)

a down-cast (T)E is redundant if the inferred type for E is a subtype of T – in all contexts for E

Applications of Type Constraints in Software Engineering Tools

Documents

Transcript of Applications of Type Constraints in Software Engineering Tools