Abstract Syntax Trees Compiler Baojian Hua [email protected].

26
Abstract Syntax Trees Compiler Baojian Hua [email protected]
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    239
  • download

    1

Transcript of Abstract Syntax Trees Compiler Baojian Hua [email protected].

Abstract Syntax Trees

CompilerBaojian Hua

[email protected]

Front End

source code

abstract syntax

tree

lexical analyzer

parser

tokens

IRsemantic analyzer

Recap

Lexer Program source to token sequence

Parser token sequence, and answer Y or N

Today’s topic: abstract syntax trees

Abstract Syntax Trees

Parse trees encodes the grammatical structure of the source program

However, they contain a lot of unnecessary information

What are essential here?

E

E * E

15 ( E )

E + E

3 4

Abstract Syntax Trees For the compiler to

understand an expression, it only need to know operators and operands punctuations,

parentheses, etc. are not needed

Similar for statements, functions, etc.

E

E * E

15 ( E )

E + E

3 4

Abstract Syntax Trees

E

E * E

15 ( E )

E + E

3 4

Times

Int 15 Plus

Int 3 Int 4

Parse tree Abstract syntax tree

Concrete and Abstract Syntax

Concrete Syntax is needed for parsing includes punctuation symbols,

factoring, elimination of left recursion, depends on the format of the input

Abstract Syntax is simpler, more convenient internal representation clean interface between the parser and

the later phases of the compiler

Concrete and Abstract Syntax

S

E

+E

T

F

2

T

x

3

FT *

F

E ::= E + T

| T

T ::= T * F

| F

F ::= id

| num

| ( E )

2 + 3 * x

Concrete and Abstract Syntax

2 + 3 * x

E ::= id

| num

| E + E

| E * E

| ( E )

Plus

Int 2 Times

Int 3 Id x

AST Data Structures

In the compiler, abstract syntax makes use of the implementation language to represent aspects of the grammatical structure

Highly target and implementation languages dependent arts more than science

AST in SML(* data structures *)datatype exp = Int of int | Id of string | Add of exp * exp | Times of exp * exp

E ::= id

| num

| E + E

| E * E

| ( E )

(* to encode “2+3*x” *)val prog = Add (Int 2, Times (Int 3, Id “x”))

(* Compile “2+3*x”. To be covered later… *)val x86 = compile (prog)

AST in SML(* calculate number of nodes in an ast *)

fun numNodes e =

case e

of Int _ => 1

| Id _ => 1

| Add (e1, e2) =>

1 + numNodes e1 + numNodes e2

| Times (e1, e2) =>

1 + numNodes e1 + numNodes e2

(* Note this may be too inefficient, why? *)

AST in SML(* tail-recursion *)

fun numNodes (e, n) =

case e

of Int _ => 1 + n

| Id _ => 1 + n

| Add (e1, e2) =>

let val n’ = numNodes (e1, n)

in numNodes (e2, 1+n’)

end

| Times (e1, e2) => …(*similar)

AST in SML(* yet another version using reference *)val nodes = ref 0;val op ++ = fn x => x := !x + 1fun numNodes e = case e of Int _ => ++ nodes | Id _ => ++ nodes | Add (e1, e2) => (numNodes e1 ; ++ nodes ; numNodes e2) ) | Times (e1, e2) => …(*similar)

AST in C/* data structures */typedef struct exp *exp;enum expKind {INT, ID, ADD, TIMES};struct exp { enum expKind kind; union { int i; char *id; struct {exp e1; exp e2;} add; struct {exp e1; exp e2;} times; } u;};

E ::= id

| num

| E + E

| E * E

| ( E )

AST in C/* sample program “2+3*x” */exp e1 = malloc (sizeof (*e1));e1->kind = INT;e1->u.i = 3;exp e2 = malloc (sizeof (*e2));e2->kind = ID;e2->u.id = “x”;exp e3 = malloc (sizeof (*e3));e3->kind = TIMES;e3->u.times.e1 = e1;e2->u.times.e2 = e2;…/* really boring and error-prone :-( */

E ::= id

| num

| E + E

| E * E

| ( E )

AST in C(* number of nodes again *)int numNodes (exp e) { switch (e->kind) { case INT: return 1; case ID: return 1; case ADD: case TIMES: return 1+numNodes(e->u.add.e1) +numNodes(e->u.add.e2); default: error (“impossible”); }}

Aha, C compiler is stupid!

AST in OO/* data structures */abstract class Exp {}class Int extends Exp {…}class Id extends Exp {…}class Add extends Exp {…}class Times extends Exp {…}

E ::= id

| num

| E + E

| E * E

| ( E )

/* to encode “2+3*x” */Exp prog = new Add (new Int (2), new Times (new Int (3), new Id (“x”)))

/* Not so ugly as C, but still boring */

AST in OO(* number of nodes again *)int numNodes (Exp e) { if (e instanceof Int) return 1; else if (e instanceof Id) return 1; else if (e instanceof ADD) { Add f = (Add)e; return 1+numNodes(f.e1)+numNodes(f.e2); } …}

AST Generations ML-Yacc uses an attribute-grammar scheme

each nonterminal may have a semantic value associated with it

when the parser reduces with (X ::= s1…sn) a semantic action will be executed uses semantic values from symbols in si

when parsing completes successfully parser returns semantic value associated with the sta

rt symbol usually an abstract syntax tree

Attribute Grammars

E

T

F

2

2 + 3 * 4

+ 3 * 4

+ 3 * 4

+ 3 * 4

+ 3 * 4

3 * 4

* 4

* 4

* 4

2

factor

term

exp

exp +

exp + 3

exp + factor

exp + term

+

3

F

S

E

T

4

F*T

Each nonterminal is associated with a tree.

2

2

2

2

3

3

3

4

4

*

+

Attribute Grammarsdatatype exp

= Id of string

| Num of int

| Add of exp * exp

| Times of exp * exp

%%

%%

e -> e PLUS e (Add (e1, e2))

| e TIMES e (Times (e1, e2))

| ID (Id ID)

| NUM (Num NUM)

Source Position In one-pass compiler, error messages are pr

ecise early compilers never worry about with this

But in a multi-pass compiler, source positions must be stored in AST itself

(* Example *)type pos = …datatype exp = Int of int * pos | Id of string * pos | Add of exp * exp * pos | Times of exp * exp * pos

Source Positiondatatype exp

= Id of string * pos

| Num of int * pos

| Add of exp * exp * pos

| Times of exp * exp * pos

%%

%%

e -> e PLUS e (Add (e1, e2, PLUSleft))

| e TIMES e (Times (e1, e2, TIMESleft))

| ID (Id (ID, IDleft))

| NUM (Num (NUM, NUMleft))

Labs For lab #4, your job is to produce abstract

syntax trees from source programs we’ve offered code skeleton, you should firstly

familiarize yourself with it your job is to understand the “layout” function

etc. and glue the parser by adding semantic actions

Test your compiler carefully to make sure it parses the source programs correctly

Summary

Abstract syntax trees are compiler internal representations of source programs interface between front-end and

compiler later parts Abstract syntax trees design is

language-dependent, and more art than science