Compilation 2007 Abstract Syntax Trees Michael I. Schwartzbach BRICS, University of Aarhus.
-
date post
15-Jan-2016 -
Category
Documents
-
view
219 -
download
0
Transcript of Compilation 2007 Abstract Syntax Trees Michael I. Schwartzbach BRICS, University of Aarhus.
Compilation 2007Compilation 2007
Abstract Syntax TreesAbstract Syntax Trees
Michael I. Schwartzbach
BRICS, University of Aarhus
2Abstract Syntax Trees
Syntax Trees Carry InformationSyntax Trees Carry Information
method
printNumber
type
void
args
int
number
int
base
sequence
decl
type
String
name
ns
const
""
if
==
lvalue
numberconst
0
assign
lvalue
nsconst
"0"
while
>
lvalue
number
const
0
sequence
assign
assign
lvalue
ns
concat
%
lvalue
number
lvalue
base
lvalue
ns
lvalue
number/
lvalue
number
lvalue
base
System.out..print
concat+
lvalue
ns
const
\n
intint String
int
int int
int int
int
intint
int
boolean
boolean
String
String
String
String
String
String
String
String String
int
1 2
3
3Abstract Syntax Trees
Syntax Trees in SableCCSyntax Trees in SableCC
SableCC creates the parse tree automatically A common superclass of nodes:
public abstract class Node implements Switchable, Cloneable {
public abstract Object clone();
void parent() {...}
void parent(Node parent) {...}
abstract void removeChild(Node child);
abstract void replaceChild(Node oldChild, Node newChild);
public void replaceBy(Node node) {...}
protected String toString(Node node) {...}
protected String toString(List list) {...}
protected Node cloneNode(Node node) {...}
protected List cloneList(List list) {...}
...
}
4Abstract Syntax Trees
TokensTokens
Tokens are a special kind of nodes:
public abstract class Token extends Node {
public String getText() {...}
public void setText(String text) {...}
public int getLine() {...}
public void setLine(int line) {...}
public int getPos() {...}
public void setPos(int pos) {...}
public String toString() {...}
...
}
5Abstract Syntax Trees
Tree InvariantTree Invariant
SableCC trees are guaranteed to be tree shaped If a node is moved around, it loses its parent Use the clone() method instead of sharing
6Abstract Syntax Trees
Our Favorite Grammar in SableCCOur Favorite Grammar in SableCC
Helpers
tab = 9;
cr = 13;
lf = 10;
Tokens
eol = cr | lf | cr lf;
blank = ' ' | tab;
star = '*';
slash = '/';
plus = '+';
minus = '-';
lpar = '(';
rpar = ')';
id = 'x' | 'y' | 'z';
Ignored Tokens
blank,eol;
Productions
start = {plus} start plus term |
{minus} start minus term |
{term} term;
term = {mult} term star factor |
{div} term slash factor |
{factor} factor;
factor = {id} id |
{paren} lpar start rpar;
7Abstract Syntax Trees
Concrete Classes for TokensConcrete Classes for Tokens
public final class TId extends Token {...}
public final class TBlank extends Token {...}
public final class TEol extends Token {...}
public final class TLPar extends Token {...}
public final class TMinus extends Token {...}
public final class TPlus extends Token {...}
public final class TRPar extends Token {...}
public final class TSlash extends Token {...}
public final class TStar extends Token {...}
8Abstract Syntax Trees
Abstract Classes for NonterminalsAbstract Classes for Nonterminals
public abstract class PFactor extends Node {...}
public abstract class PStart extends Node {...}
public abstract class PTerm extends Node {...}
9Abstract Syntax Trees
Concrete Classes for ProductionsConcrete Classes for Productions
public final class APlusStart extends PStart {...}
public final class AMinusStart extends PStart {...}
public final class ATermStart extends PStart {...}
public final class AMultTerm extends PTerm {...}
public final class ADivTerm extends PTerm {...}
public final class AFactorTerm extends PTerm {...}
public final class AIdFactor extends PFactor {...}
public final class AParenFactor extends PFactor {...}
10Abstract Syntax Trees
Naming ConventionsNaming Conventions
Production: foo = {bar} baz | {qux} quux Abstract class: PFoo Concrete class: ABarFoo, AQuxFoo
Generated enum: EFoo = {BAR, QUX} All PFoo has a kindPFoo() method
11Abstract Syntax Trees
Parse Trees Use T-Nodes and A-nodesParse Trees Use T-Nodes and A-nodes
x*y+zAPlusStart
AFactorTerm
AIdFactor
TId
ATermStart
AMultTerm
AFactorTerm AIdfactor
TId
TStar
AIdfactor
TId
TPlus
12Abstract Syntax Trees
Irrelevant DetailsIrrelevant Details
LALR(1) parse trees have irrelevant details
There is no semantic distinction between:• PStart• PTerm• PFactor
The extra structure complicates traversals...
13Abstract Syntax Trees
Abstract Syntax TreesAbstract Syntax Trees
An AST records only semantically relevant information:
ABinopExp
ABinopExp AVarExp
AVarExpAVarExp
TIdTId
TIdAMulBinop
AAddBinop
14Abstract Syntax Trees
ASTs in SableCCASTs in SableCC
The AST could be built by hand, by traversing the parse tree and creating the new nodes
We would quickly get tired of this...
SableCC allows another grammar for the ASTs Productions define an inductive mapping An AST grammar is a recursive datatype
15Abstract Syntax Trees
Our Favorite Grammar with ASTs (1/2)Our Favorite Grammar with ASTs (1/2)
Helpers
tab = 9;
cr = 13;
lf = 10;
Tokens
eol = cr | lf | cr lf;
blank = ' ' | tab;
star = '*';
slash = '/';
plus = '+';
minus = '-';
l_par = '(';
r_par = ')';
id = 'x' | 'y' | 'z';
Ignored Tokens
blank,eol;
16Abstract Syntax Trees
Our Favorite Grammar with ASTs (2/2)Our Favorite Grammar with ASTs (2/2)
Productions
start {-> exp} =
{plus} start plus term
{-> New exp.binop(start.exp, New binop.add(), term.exp)} |
{minus} start minus term
{-> New exp.binop(start.exp, New binop.sub(), term.exp)} |
{term} term {-> term.exp} ;
term {-> exp} =
{mult} term star factor
{-> New exp.binop(term.exp, New binop.mul(), factor.exp)} |
{div} term slash factor
{-> New exp.binop(term.exp, New binop.div(), factor.exp)} |
{factor} factor {-> factor.exp};
factor {-> exp} =
{id} id {-> New exp.var(id)} |
{paren} l_par start r_par {-> start.exp};
Abstract Syntax Tree
exp = {binop} [l]:exp binop [r]:exp | {var} id;
binop = {add} | {sub} | {mul} | {div};
17Abstract Syntax Trees
Traversal SupportTraversal Support
Many applications need to traverse the AST SableCC adds automatic support
An extended visitor pattern: AnalysisAdapter A specialization: DepthFirstAdapter
These are generated for the given AST
18Abstract Syntax Trees
AnalysisAdapterAnalysisAdapter
Each node class XYZ has a
public void caseXYZ(XYZ node)
method that may be overridden.
Each node class has a method
public void apply(...)
that accepts an AnalysisAdapter and invokes the appropriate method
19Abstract Syntax Trees
DepthFirstAdapterDepthFirstAdapter
A subclass of AnalysisAdapter Each node type XYZ has two further methods:
public void inXYZ(XYZ node)
public void outXYZ(XYZ node)
The caseXYZ methods are implemented to perform a depth-first traversal of the AST
20Abstract Syntax Trees
Pretty Printing (1/3)Pretty Printing (1/3)
import analysis.*;
import node.*;
import java.io.*;
public class PrettyPrint extends AnalysisAdapter {
private PrintStream out;
public PrettyPrint(PrintStream out) {
this.out = out;
}
private void print(Token t) {
print(t.getText());
}
private void print(Node n) {
if (n == null) print("<<<null>>>");
else n.apply(this);
}
21Abstract Syntax Trees
Pretty Printing (2/3)Pretty Printing (2/3)
private void print(Object o) {
out.print(o.toString());
}
public @Override void caseABinopExp(ABinopExp binopexp) {
out.print("(");
print(binopexp.getL());
print(binopexp.getBinop());
print(binopexp.getR());
out.print(")");
}
public @Override void caseAVarExp(AVarExp varexp) {
print(varexp.getId());
}
22Abstract Syntax Trees
Pretty Printing (3/3)Pretty Printing (3/3)
public @Override void caseAAddBinop(AAddBinop addbinop) {
out.print("+");
}
public @Override void caseASubBinop(ASubBinop subbinop) {
out.print("-");
}
public @Override void caseAMulBinop(AMulBinop mulbinop) {
out.print("*");
}
public @Override void caseADivBinop(ADivBinop divbinop) {
out.print("/");
}
}
23Abstract Syntax Trees
Evaluating the ExpressionsEvaluating the Expressions
A typical task for the DepthFirstAdapter But it only has void methods...
We must add a value field to all AST nodes
(and other compiler phases will add further fields)
But this means changing all the generated files What happens when they are regenerated?
24Abstract Syntax Trees
AspectsAspects
An aspect is in most ways similar to a class Only one instance obtained with aspectOf() Cool new ability, add to other classes:
• new fields• new methods • new interfaces
The AspectJ compiler weaves things together The full AspectJ language is much more general
25Abstract Syntax Trees
The Evaluate Aspect (1/2)The Evaluate Aspect (1/2)
import analysis.*;
import node.*;
public aspect Evaluate extends DepthFirstAdapter {
int x,y,z;
public Evaluate setEnv(int x, int y, int z) {
this.x = x;
this.y = y;
this.z = z;
return this;
}
public int PExp.value; /* inject a value field to the PExp class */
public @Override void outABinopExp(ABinopExp binopexp) {
binopexp.value = eval(binopexp.getL().value,
binopexp.getBinop().kindPBinop(),
binopexp.getR().value);
}
26Abstract Syntax Trees
The Evaluate Aspect (2/2)The Evaluate Aspect (2/2)
public @Override void outAVarExp(AVarExp varexp) {
String id = varexp.getId().getText();
if (id.equals("x")) varexp.value = x;
else if (id.equals("y")) varexp.value = y;
else if (id.equals("z")) varexp.value = z;
}
int eval(int l, EBinop op, int r) {
switch(op) {
case ADD: return l+r;
case SUB: return l-r;
case MUL: return l*r;
case DIV: return l/r;
}
return 0;
}
}
27Abstract Syntax Trees
Using Aspects and AdaptersUsing Aspects and Adapters
class Main {
public static void main(String args[]) {
try {
Parser p =
new Parser (
new Lexer (
new PushbackReader(new InputStreamReader(System.in))));
int x,y,z;
x = Integer.parseInt(args[0]);
y = Integer.parseInt(args[1]);
z = Integer.parseInt(args[2]);
Start tree = p.parse(); /* parse the input */
PExp exp = tree.getExp();
exp.apply(new PrettyPrint(System.out)); /* pretty print */
exp.apply(Evaluate.aspectOf().setEnv(x,y,z)); /* evaluate */
System.out.println(exp.value);
}
catch(Exception e) { System.out.println(e); }
}
}
28Abstract Syntax Trees
Manipulating ASTsManipulating ASTs
Desugaring:
locally translate constructs into simpler forms Weeding:
reject unwanted ASTs Transforming:
rewrite sub-ASTs
29Abstract Syntax Trees
An HTML SubsetAn HTML Subset
HTML word* |
<a href="word"> HTML </a> |
<b> HTML </b> |
<i> HTML </i> |
<em> HTML </em>
30Abstract Syntax Trees
HTML in SableCC (1/2)HTML in SableCC (1/2)
Helpers
tab = 9;
cr = 13;
lf = 10;
char = ['a'..'z'] | ['A'..'Z'] | ['0'..'9'];
Tokens
eol = cr | lf | cr lf;
blank = ' ' | tab;
starta = '<a';
href = 'href';
eq = '=';
quote = '"';
gt = '>';
enda = '</a>';
startb = '<b>';
starti = '<i>';
startem = '<em>';
endb = '</b>';
endi = '</i>';
endem = '</em>';
word = char+;
Ignored Tokens
blank,eol;
31Abstract Syntax Trees
HTML in SableCC (2/2)HTML in SableCC (2/2)
Productions
html = {word} word* |
{a} starta href eq [quote1]:quote word [quote2]:quote gt html enda |
{b} startb html endb |
{i} starti html endi |
{em} startem html endem ;
32Abstract Syntax Trees
DesugaringDesugaring
View <em> as syntactic sugar for <i> Just perform the translation during AST building:
Productions
html {->html} =
{word} word*
{-> New html.word([word])} |
{a} starta href eq [quote1]:quote word [quote2]:quote gt html enda
{-> New html.a(word,html.html)} |
{b} startb html endb
{-> New html.b(html.html)} |
{i} starti html endi
{-> New html.i(html.html)} |
{em} startem html endem
{-> New html.i(html.html)} ;
Abstract Syntax Tree
html = {word} word* | {a} word html | {b} html | {i} html ;
33Abstract Syntax Trees
WeedingWeeding
Don't allow nested anchors One solution is to rewrite the grammar: HTML word* |
<a href="word"> HTMLNoAnchor </a> |
<b> HTML </b> |
<i> HTML </i> |
<em> HTML </em>
HTMLNoAnchor word* |
<b> HTMLNoAnchor </b> |
<i> HTMLNoAnchor </i> |
<em> HTMLNoAnchor </em>
34Abstract Syntax Trees
Combinatorial ExplosionCombinatorial Explosion
We just doubled the size of the grammar Enforcing 10 constraints like this makes the
grammar 210 = 1024 times larger And impossible to maintain...
35Abstract Syntax Trees
A Weeding PhaseA Weeding Phase
import node.*;
import analysis.*;
public class Weeding extends DepthFirstAdapter {
int aHeight = 0;
public @Override void inAAHtml(AAHtml node) {
if (aHeight>0) System.out.println("Nested anchors");
aHeight++;
}
public @Override void outAAHtml(AAHtml node) {
aHeight--;
}
}
36Abstract Syntax Trees
TransformationTransformation
Eliminate nested <b> tags Again, one solution is to rewrite the grammar:
HTML word* |
<a href="word"> HTML </a> |
<b> HTMLInsideB </b> |
<i> HTML </i> |
<em> HTML </em>
HTMLInsideB word* |
<a href="word"> HTMLInsideB </a> |
<b> HTMLInsideB </b> |
<i> HTMLInsideB </i> |
<em> HTMLInsideB </em>
ignore this in the AST
37Abstract Syntax Trees
Combinatorial ExplosionCombinatorial Explosion
This also doubles the size of the grammar Detecting 7 conditions like this makes the
grammar 27 = 128 times larger Combined with the earlier 10 constraints, the
grammar is now 131,072 times larger, with nonterminals such as:
HTMLInsideBNotInsideINoAnchor...
38Abstract Syntax Trees
A Transformation PhaseA Transformation Phase
import node.*;
import analysis.*;
public class Transform extends DepthFirstAdapter {
int bHeight = 0;
public @Override void inABHtml(ABHtml node) {
if (bHeight>0) node.replaceBy(node.getHtml());
bHeight++;
}
public @Override void outABHtml(ABHtml node) {
bHeight--;
}
}
39Abstract Syntax Trees
An Outline Phase (1/2)An Outline Phase (1/2)
import node.*;
import analysis.*;
public class Outline extends DepthFirstAdapter {
int indent = 0;
String indentString() {
String s="";
for (int i=0; i<indent; i++) s=s+" ";
return s;
}
public void inAWordHtml(AWordHtml node) {
indent++;
System.out.println(indentString()+node.toString());
}
public void outAWordHtml(AWordHtml node) {
indent--;
}
40Abstract Syntax Trees
An Outline Phase (2/2)An Outline Phase (2/2)
public void inAAHtml(AAHtml node) {
System.out.println(indentString()+"a"+" "+node.getWord().toString());
indent++;
}
public void outAAHtml(AAHtml node) { indent--; }
public void inABHtml(ABHtml node) {
System.out.println(indentString()+"b");
indent++;
}
public void outABHtml(ABHtml node) { indent--; }
public void inAIHtml(AIHtml node) {
System.out.println(indentString()+"i");
indent++;
}
public void outAIHtml(AIHtml node) { indent--; }
}
41Abstract Syntax Trees
The Main ApplicationThe Main Application
import parser.*;
import lexer.*;
import node.*;
import java.io.*;
class Main {
public static void main(String args[]) {
try {
Parser p =
new Parser (
new Lexer (
new PushbackReader(new InputStreamReader(System.in))));
Start tree = p.parse(); /* parse the input */
tree.apply(new Weeding()); /* check nested anchors */
tree.apply(new Transform()); /* eliminate nested b tags */
tree.apply(new Outline()); /* print an outline */
}
catch(Exception e) { System.out.println(e); }
}
}