2 Compiler Overview

download 2 Compiler Overview

of 5

Transcript of 2 Compiler Overview

  • 8/2/2019 2 Compiler Overview

    1/5

    2 Compiler Overview

    Contents

    y Introductiony The Scannery The Parsery The Semantic Analyzery The Intermediate Code Generatory The Optimizery The Code Generator

    Introduction

    What is a compiler?

    y A recognizer (of some source language L).y A translator (of programs written in L into programs written in some object or

    target language L').

    Here's a simple pictorial view:source program --> COMPILER --> object program

    |

    --> error messages

    A compiler is itselfa program, written in somehost language. (In cs536, students will

    implement a compiler for a simple source language using Java as thehost language.)

    A compiler operates inphases;each phase translates the source program from one

    representation to another. Different compilers may include different phases, and/or

    may order them somewhat differently. A typical organization is shown below.

    source code

    (sequence of characters)

    ||||

    \/

    ----------------------

    | lexical analyzer |

    | (scanner) |

    ----------------------

    ||

    || sequence of tokens

  • 8/2/2019 2 Compiler Overview

    2/5

    \/

    ----------------------

    | syntax analyzer |

    | (parser) |

    ----------------------

    ||

    || abstract-syntax tree

    \/

    ----------------------

    | semantic analyzer |

    ----------------------

    ||

    || augmented, annotated abstract-syntax tree

    \/

    ----------------------

    | intermediate code |

    | generator | /\

    ---------------------- ||

    || FRONT END

    || intermediate code ----------------------------------

    \/ BACK END---------------------- ||

    | optimizer | \/

    ----------------------

    ||

    || optimized intermediate code

    \/

    ----------------------

    | code |

    | generator |

    ----------------------

    ||

    ||

    \/

    object program (might be assembly code or machine code)

    Below, we look at each phase of the compiler.

    The Scanner

    The scanner is called by the parser;here's how it works:

    y The scanner reads characters from the source program.y The scanner groups the characters into lexemes (sequences of characters that

    "go together").y Each lexeme corresponds to a token; the scanner returns the next token (plus

    maybe some additional information) to the parser.

    y The scannermay also discover lexical errors (e.g.,erroneous characters).The definitions of what is a lexeme, token, or bad character all depend on the sourcelanguage.

  • 8/2/2019 2 Compiler Overview

    3/5

    Example

    Here are some Java lexemes and the corresponding tokens:

    lexeme: ; = index tmp 37

    102

    corresponding token: SEMI-COLON ASSIGN IDENT IDENT INT-LIT

    INT-LIT

    Note that multiple lexemes can correspond to the same token (e.g., there aremany

    identifiers).

    Given the source code:

    position = initial + rate * 60 ;

    a Java scanner would return the following sequence of tokens:IDENT ASSIGN IDENT PLUS IDENT TIMES INT-LIT SEMI-COLON

    Erroneous characters for Java source include # and control-a.

    The Parser

    The parser:

    y Groups tokens into "grammatical phrases", discovering the underlying structureof the source program.

    y Finds syntaxerrors. Forexample, in Java the source codeposition = * 5 ;

    corresponds to the sequence of tokens:

    IDENT ASSIGN TIMES INT-LIT SEMI-COLON

    All are legal tokens, but that sequence of tokens is erroneous.

    y Might find some "static semantic" errors,e.g., a use of an undeclared variable,or variables that aremultiply declared.

    y Might generate code, or build some intermediate representation of the programsuch as an abstract-syntax tree.

    Example

    source code: position = initial + rate * 60 ;

  • 8/2/2019 2 Compiler Overview

    4/5

    abstract-syntax tree: =

    / \

    position +

    / \

    initial *

    / \

    rate 60

    Notes:

    y The interior nodes of the tree areoperators.y A node's children are its operands.y Each subtree forms a "logical unit",e.g., the subtree with * at its root shows

    that becausemultiplication has higher precedence than addition, this operation

    must be performed as a unit (notinitial+rate).

    The Semantic Analyzer

    The semantic analyzer checks for (more) "static semantic" errors,e.g., typeerrors. It

    may also annotate and/or change the abstract syntax tree (e.g., it might annotateeach

    node that represents an expression with its type).Example:

    Abstract syntax tree before semantic analysis

    =

    / \

    / \

    position +

    / \

    / \initial *

    / \

    / \

    rate 60

    Abstract syntax tree after semantic analysis

    = (float)

    / \

    / \

    position + (float)(float) / \

    / \

    initial * (float)

    (float) / \

    / \

    rate intToFloat (float)

    |

    |

    60 (int)

  • 8/2/2019 2 Compiler Overview

    5/5

    The Intermediate Code Generator

    The intermediate code generator translates from abstract-syntax tree to intermediate

    code. One possibility is 3-address code (code in whicheach instruction involves at

    most 3 operands). Below is an example of 3-address code for the abstract-syntax treeshown above. Note that in this example, the second and third instructions eachhave

    exactly three operands (the location where the result of the operation is stored, and

    two source operands); the first and fourth instructions have just two operands

    ("temp1" and "60" for instruction 1, and "position" and "temp3" for instruction 4).

    temp1 = inttofloat(60)

    temp2 = rate * temp1

    temp3 = initial + temp2

    position = temp3

    The Optimizer

    The optimizer tries to improve code generated by the intermediate code generator.

    The goal is usually to make code run faster, but the optimizermay also try to make thecode smaller. In theexample above, an optimizermight first discover that the

    conversion of the integer 60 to a floating-point number can be done at compile time

    instead of at run time. Then it might discover that there is no need for "temp1" or

    "temp3". Here's the optimized code:

    temp2 = rate * 60.0

    position = initial + temp2

    The Code Generator

    The code generator generates object code from (optimized) intermediate code. For

    example, the following codemight be generated for our running example:

    .data

    c1:

    .float 60.0

    .textl.s $f0,rate

    mul.s $f0,c1

    l.s $f2,initial

    add.s $f0,$f0,$f2

    s.s $f0,position