A Survey of Rewriting Strategies in Program Transformation Systems Part 1 Author: Eelco Visser...

31
A Survey of Rewriting Strategies in Program Transformation Systems Part 1 Author: Eelco Visser Speaker: Wei Zhao

Transcript of A Survey of Rewriting Strategies in Program Transformation Systems Part 1 Author: Eelco Visser...

A Survey of Rewriting Strategies in Program

Transformation SystemsPart 1

Author: Eelco VisserSpeaker: Wei Zhao

A Survey of Rewriting Strategies in Program

Transformation SystemsPart 1

Author: Eelco VisserSpeaker: Wei Zhao

2/29

Outline

Section 1: introduction Section 2: A Taxonomy of program

transformation Section 3: Program Representation Section 4: Implementation of Program

Transformation Section 5: Program Transformation

Paradigms Section 6: conclusions

3/29

Section 1: Introduction

Definition of program transformation (PT)

4/29

Definitions

Syntax allow us to transform a program

Semantics (intentional and extensional) to reason the validity of transformation

PT: is the act of changing one program into another

Source & target languages

5/29

Section 1: Introduction Definition of program transformation (PT) The aim of PT: programmer productivity,

maintainability, re-usability The aim of PT research:

The exist PT systems are specialized Programming languages have structural/semantic

similarities To capture generic, language independent schemas of

transformation; to develop efficient implementation that scales up to large programs

The aim of the paper: understand the similarities/differences among PT systems; concentrate on the transformation strategies

6/29

Section 2: Taxonomy of PT

Two main scenarios: Translation: source and target

languages are different Rephrasing: source and target

languages are the same Sub-scenarios based on the level of

abstraction of a program and to what extent they preserve the semantics of a program

7/29

2.1 Translation scenarios

Sub-scenarios are based on the level of abstraction of a program

Aims at preserving the extensional semantics, but usually not possible

Sub-scenarios: Synthesis, migration, reverse engineering, analysis

8/29

Synthesis

Translate from the high level to lower level abstraction

Design is traded for increased efficiency Examples:

Refinement: deriving implementation from high level specification

Compilation: Parser/pretty-printer generation from context-

free grammar

9/29

Migration

Translate a program into another language at the same level of abstraction

Examples: transformations between dialects (from Fortran 77 to Fortran 90, from Pascal to C)

10/29

Reverse Engineering

To extract the program from a lower-level to a high-level program or some higher-level aspects

The dual of synthesis Examples:

Decompilation of an object program into a high-level program

Architecture (design) extraction Documentation generation Software visualization (some aspects are depicted

in an abstract way)

11/29

Analysis

Reduce a program to one aspect such as its control- or data- flow.

A transformation to a sub-language to an aspect language

12/29

2.2 Rephrasing scenarios

Rephrasing says the same thing in different words

Improving some aspect of the program The semantics of the program are

changed Sub-scenarios: normalization,

optimization, refactoring, and renovation

13/29

Normalization

Reduces a program to a program in a sub-language decreasing its syntactic complexity

Examples Desugaring: syntactic sugar of a language is

transformed into more fundamental constructs Desugar Haskell to its kernel language EBNF to BNF

Simplification: a program is reduced to a normal (standard form)

Canonical form of intermediate representation Algebraic simplification of expressions

14/29

Optimization

Improves the run-time/space performance

Examples: Fusion Inlining Constant propagation Constant folding Common-subexpression elimination Dead code elimination

15/29

refactoring

Refactoring improves the design of a program by restructuring, preserving functionality

Obfuscation makes program harder to understand by renaming variables, inserting dead code preventing reverse engineering

16/29

renovation

Renovation repairs an error or bring the program up to date with changed requirements

The extensional semantics is changed

Example: Y2K

17/29

Section3: program representation

Some PT systems directly work on text Most systems use structured representation so parser and unparser are needed for

conversion between text and structure In some programming framework (IP),

programs are stored, edited and processed as source graphs.

What representation form should be used?

18/29

3.1 parse tree or AST Internal representation of a program to be

transformed Parser tree contain syntactic information such as layout (white

space and comments), parentheses, and extra nodes disambiguating grammar transformation

Normally AST is used since this info is irrelevant for the transformation

Layout Some applications (renovation, refactoring) need restore the layout

through the transformation. parse tree is used Where to insert the comments in a generic manner? (origin tracking)

Types Optimization and compilation might need type info to be stored in an

extension of a tree format

19/29

3.2 concrete or abstract syntax

The representation of program fragments in the specification of transformation rules

ASTs are usually represented using data structure: records, objects, algebraic data types, terms

When the ASTs (in a data structure facility) are easy to be composed and decomposed Such as in compiler, small fragments are manipulated each time AST is used directly in the specification So, the AST can be used as an intermediate language such as multiple

language can be expressed in. Otherwise (when the conceptual distance between concrete

program and the data structure access operations used in AST is too large) Transformation languages support concrete object syntax for the

programmer to define the transformations while internally use the AST.

20/29

3.3 Trees or Graphs Program structures can be represented by trees, DAGs, or full fledged

graphs with cycles Copy

Tree requires a complete copy For DAG, only a pointer to the tree gets copied

sub-tree shared by multiple context; reduced memory usage; testing of equality is cheap But the context need to be rebuilt when a tree is transformed because two occurrences of a

shared tree that are syntactically the same can have a different meaning depending on their context

Annotation DAG: annotations associated with each occurrences of the tree, Not desirable

Full fledged graph for back links (loops, link to declarations) More problematic: sub-graph can have links to the entire graph, it may be required to

reconstruct the entire graph after a transformation if it is necessary to keep the original graph.

21/29

3.4 Variable binding

Transformation system has no special knowledge for the variables in the syntax tree

Transformations need to be aware of variables by means of extra conditions to avoid the problem of free variable capture and lifting the variable occurrences out of binding

Higher-order abstract syntax (HOAS) : transparent handling of variable binding by encoding variable binding as lambda abstractions

FreshML has a weaker mechanism by transparently refreshes variable names

All approaches that rename variables are in conflict with requirements of preserving the original name (refactoring, renovation)

Associating declaration info (symbol table, annotating the usage occurrences of a symbol with the declaration)

22/29

3.5 exchange format

Program representation should be exchangeable among the transformation components

Examples XML: supports exchange of tree shaped

data Annotated Term Format: supports

exchange of DAGs maintaining maximal sharing

23/29

Section4 implementation of PT

A complex program transformation is achieved through a number of consecutive modifications of a program

A Rule: defines a basic step in the transformation of a program A Strategy: is a plan for achieving a complex transformation

using a set of rules Strategy for the reproducible and automated transformations

versus the interactive transformation Example

let x=3 in x+y let x=3 in 3+y 3+y (inline, dead variable)

Let x=3 in x+ylet x=3 in let newfun(z)=z+y in newfun(x)let newfun(z)=z+y in let x=3 in newfun(x)(extract function, hoist)

24/29

4.1. Transformation rules

Rules preserves the extensional semantics Some other aspects change (might be

intentional semantics): time/space resource usage

A rule: Syntactic recognition Some semantic condition verification The replacement contains

A term pattern A function constructing a new tree or graph A semantic action with arbitrary side-effects

25/29

4.2 transformation strategies A set of transformation rules for a programming

language induces a rewrite relation on programs If the relation is confluent and terminating, there is a

unique normal form for every program; the matter is applying the rules most efficiently to reach the normal form

In PT, it is usually not the case: A set of transformation rules can give rise to infinite

branches (by inlining recursive function) Inverses to undone a transformation(by distribution or

commutatively rules) Non-confluence in which a program can be transformed

into different programs

26/29

4.2 cont’ For a specific program it is always possible to find the shortest

path to the optimal solution for a specific transformation task A strategy is a algorithm for choosing a path in the rewrite

relation Given one set of rules, there can be many strategies, each

achieving a different goal A strategy can be provided by the engine or the user:

Fixed application order: the engine applies rules exhaustively according to a built-in strategy

Automatic dependency analysis: based on the analysis of rules Goal driven: to apply rules to achieve a user defined goal Strategy menu Completely programmable in a strategy language

The strategy needs to be expressed and implemented formally

27/29

4.2 cont’ strategies languages ingredients Sequential composition

Consecutively, conditionally, iteratively or recursively compose the rules

Non-deterministic programming Speculatively explore paths until an acceptable

solution is found Explore all paths in parallel and choose the

best (cost function to compare the solution) Goal based exploration: discard the path

inconsistent with a set of constraints

28/29

4.2 cont’ strategies languages ingredients Structural traversal:

A rewrite relation includes application of rules in any context

The strategy should determine the location where the rule is applied

find the location in the program representation structure, apply the rule, rebuild the context

It is inefficient to directly use the tree paths (context) Some mechanism is needed to traverse syntax trees and

to apply transformation rules at subtree Language specific/generic transformation mechanism Top-down / bottom-up traversal Primitive and composite traversal

29/29

4.2 cont’ strategies languages ingredients Information carrying strategies:

strategies may carry information that can be used in making decisions about paths to take an in passing context-sensitive information to rules An important aspect: the scope in which

the information is valid

30/29

4.2 cont’ strategies languages ingredients Separation of rules and strategies

In the implementation, the rule can be hardwired in the definition of the strategies

Strategies are parameterized with a set of rules Clearer specification that allow reasoning about smaller entities (rules,

strategies) separately Separation definition enables the reuse of rules, strategies, and the

generic implementation of aspects of transformation system for a class of languages

Intertwining may sometimes be required for efficiency reason, but should be done by a compiler rather than the specifier

Abstraction The strategy language should have proper abstraction over the rules

and strategy, I.e. it should be possible to name and parameterize the rules and strategies

31/29

Section 5: PT paradigms

…….