Compiler II: Code GenerationThe Big Picture (Chapter 11) Keyboar d Jack Program Toke-nizer Parser...
Transcript of Compiler II: Code GenerationThe Big Picture (Chapter 11) Keyboar d Jack Program Toke-nizer Parser...
Foundations of Global Networked Computing:
Building a Modern Computer From First Principles
IWKS 3300: NAND to Tetris
Spring 2019
John K. Bennett
This course is based upon the work of Noam Nisan and Shimon Schocken.
More information can be found at (www.nand2tetris.org).
Compiler II: Code Generation
Where We Are
Assembler
Chapter 6
H.L. Language
&
Operating Sys.
abstract interface
Compiler
Chapters 10 - 11
VM Translator
Chapters 7 - 8
Computer
Architecture
Chapters 4 - 5
Gate Logic
Chapters 1 - 3 Electrical
EngineeringPhysics
Virtual
Machine
abstract interface
Software
Hierarchy
Assembly
Language
abstract interface
Hardware
Hierarchy
Machine
Code
abstract interface
Hardware
Platform
abstract interface
Chips &
Logic Gates
abstract interface
Human
Thought
Abstract design
Chapters 9, 12
We are still here
The Big Picture
(Chapter 11)
Keyboar
d
Jack
Program
Toke-
nizerParser
Code
Gene
-ration
Syntax Analyzer
Jack Compiler
VM
code
XML
code
(Chapter 10)
1. Syntax analysis: extracting the semantics from the source code
2. Code generation: expressing the semantics using the target languagethis
week
last
week
What We Did Last Week
void WriteXML(string tag, string value){
if (DoXML){
for (int i = 0; i < xmlIndent; i++) XMLFile.Write(" ");XMLFile.Write("<" + tag + "> ");// get rid of quotes around string constants// fix the XML special characters, if anyif (value == "") value = tokenizer.currentToken;if (tag == "symbol"){
value = value.Replace("&", "&");value = value.Replace("<", "<");value = value.Replace(">", ">");value = value.Replace("\"", """);
}XMLFile.Write(value);XMLFile.WriteLine(" </" + tag + ">");
}}
What We’ll Do This Week
...public VMWriter(StreamWriter VMFile){
VMOutFile = VMFile;}
public void writePush(string segment, int index){
if (validSegNames.ContainsKey(segment)){
VMOutFile.WriteLine(" push {0} {1}", segment, index);}else ReportError("Bad Segment Name", segment);
}
public void writePop(string segment, int index){
if (validSegNames.ContainsKey(segment)){
VMOutFile.WriteLine(" pop {0} {1}", segment, index);}else ReportError("Bad Segment Name", segment);
}...
Moving On to Code Generation
Class Bar {method Fraction foo(int y) {
var int temp; // a variablelet temp = (xxx+12)*-63;...
...
<varDec>
<keyword> var </keyword>
<keyword> int </keyword>
<identifier> temp </identifier>
<symbol> ; </symbol>
</varDec>
<statements>
<letStatement>
<keyword> let </keyword>
<identifier> temp </identifier>
<symbol> = </symbol>
<expression>
<term>
<symbol> ( </symbol>
<expression>
<term>
<identifier> xxx </identifier>
</term>
<symbol> + </symbol>
<term>
<int.Const.> 12 </int.Const.> </term>
</expression> ...
Syntax analyzer
The code generation challenge:
Program = a series of operations that
manipulate data
Compiler: converts each “understood”
(parsed) source operation and data item
into corresponding operations and data
items in the target language
Thus, we have to generate code for
o handling data
o handling operations
Our approach: morph the syntax
analyzer (Project 10) into a full-blown
compiler: instead of generating XML,
we’ll make it generate VM code.
The Jack Grammar
’x’: x appears verbatim
x: x is a language construct
x?: x appears 0 or 1 times
x*: x appears 0 or more times
x|y: either x or y appears
(x,y): x appears, then y.
Notation
The Jack Grammar (cont.)
’x’: x appears verbatim
x: x is a language construct
x?: x appears 0 or 1 times
x*: x appears 0 or more times
x|y: either x or y appears
(x,y): x appears, then y.
Memory Segments (review)
Where i is a non-negative integer and segment is one of the following:
static: holds values of global variables, shared by all functions in the same class
argument: holds values of the argument variables of the current function
local: holds values of the local variables of the current function
this: holds values of the private field variables of the current object
that: holds array values
constant: holds all the constants in the range 0 … 32767 (pseudo memory segment)
pointer: used to anchor this and that to various areas in the heap
temp: fixed 8-entry segment that holds temporary variables for general use;
Shared by all VM functions in the program.
VM memory Commands:
pop segment i
push segment i
“Standard” Hack Memory Map
R0R1R2R3R4
SP
LCL Base
ARG Base
THIS Base
THAT Base
R5R6R7R8R9
TEMP
R10R11R12R13R14
N/A
N/A
R15
N/AN/AN/A
N/A
Stack
Heap
Scrn. & Kbd.
0123456789
101112131415
256-2047 *
2048-16383 *
16384-24575
Stack PointerPtr. To LCL Vars (in Stack Frame)
Ptr. To Args (in Stack Frame)Pointer 0 (ptr to This segment)Pointer 1 (ptr to That segment)
Temp 0Temp 1Temp 2Temp 3Temp 4Temp 5Temp 6Temp 7
GP registerGP registerGP register
Stack (including LCL and ARG)Heap (including THIS and THAT)
Memory-Mapped I/O
N/A Static16-255 * static vars.
RAM[n] Register n Seg. Name Use
*The memory addresses in purple can be changed if there is a reason to do so.
Implementation of the VM’s Stack on the Hack RAM
working stack of
the current function
argument nArgs-1
ARG
saved state of the calling
function. Used by the VM
implementation to restore
the segments of the calling
function just after the
current function returns. saved THIS
saved ARG
saved returnAddress
saved LCL
local 0
local 1
. . .local nVars-1
argument 0
argument 1
. . .
frames of all the functions
up the calling chain
LCL
SP
saved THAT
local variables of
the current function
arguments pushed by the
caller for the current function
Global stack:
the entire RAM area dedicated
for holding the stack
Working stack (“stack frame”):
The stack that the current
function sees
At any point of time, only one
function (the current function)
is executing; other functions
may be waiting up the calling
chain
Shaded areas: irrelevant to the
current function
The current function sees only
the working stack, and has
access only to its memory
segments
The rest of the stack holds the
saved states of all the
functions up the calling
hierarchy.
Code Generation Example
method int foo() {var int x;let x = x + 1;...
<letStatement>
<keyword> let </keyword>
<identifier> x </identifier>
<symbol> = </symbol>
<expression>
<term>
<identifier> x </identifier>
</term>
<symbol> + </symbol>
<term>
<constant> 1 </constant> </term>
</expression>
</letStatement>
Syntax
analysis
(Assume x is the first local variable declared in the method)
push local 0
push constant 1
add
pop local 0
Code
generation
(we usually skip this part)
Handling Variables
When the compiler encounters a variable, say x, in the source code, it has to know:
What is x’s data type?
Primitive (int, bool or char), or user-defined (instance of a class)?
(We need to know this in order to properly allocate RAM resources for x’s
representation)
What kind of variable is x?
local, static, field, argument ?
( We need to know this in order to properly allocate x to the right memory
segment; this allocation also implies the variable’s life cycle, e.g., life of function
call or life of instance).
Handling Variables: mapping to memory segments (example)
When compiling this class, we have to create the following mappings:
The class variables nAccounts , bankCommission are mapped to static 0,1
The object fields id, owner, balance are mapped to this 0,1,2
The argument variables sum, BankAccount, when are mapped to arg 0,1,2
The local variables i, j, due are mapped to local 0,1,2
The target language uses 8 memory segments
Each memory segment, e.g. static,
is an indexed sequence of 16-bit values
that can be referred to as
static 0, static 1, static 2, etc.
Handling Variables: Symbol Tables
How the compiler uses symbol tables:
In a classic implementation, the compiler builds
several linked symbol tables, each reflecting a
single scope nested within the enclosing scope
Identifier lookup works from the current symbol
table back to the list’s head
We don’t have to work quite this hard…
Handling Variables: Symbol Table
class SymbolTable{
int staticIndex = 0;int fieldIndex = 0;int argIndex = 0;int varIndex = 0;
public static Dictionary<string, SymbolTableEntry> classSymbolTable =new Dictionary<string, SymbolTableEntry>() {};
public static Dictionary<string, SymbolTableEntry> subroutineSymbolTable =new Dictionary<string, SymbolTableEntry>() {};
public void startSubroutine(){
subroutineSymbolTable = new Dictionary<string, SymbolTableEntry>();argIndex = 0;varIndex = 0;
}
public void startClass(){
classSymbolTable = new Dictionary<string, SymbolTableEntry>();staticIndex = 0;fieldIndex = 0;
}...
Handling Variables: Symbol Table Entry
// In addition to the four possibilities of "KIND" of variable, (var, argument, static, field) the book// talks about a "Category." The possibilities for Category are "var, argument, static, field, class,// and subroutine" (page 243).// Instead of doing both, we just extend the KIND enum.public enum ST_KIND {STATIC, FIELD, ARG, VAR, CLASS_NAME, SUB_NAME, UND };
// The book also talks about the "Scope" of a variable// Scope can be "subroutine", "class", "instance", "program" or, in cases where variables of// the same name are redefined in different contexts, "subroutine + class", "subroutine + instance",// or "subroutine + program"// Scope is "None" if something goes wrong.
class SymbolTableEntry{
public string ST_TYPE; // int, bool, char or classnamepublic ST_KIND ST_KIND; // see abovepublic int ST_INDEX; // index into the segment named in KINDpublic string ST_SCOPE; // see above
public SymbolTableEntry(string type, ST_KIND kind, int index, string scope){
ST_TYPE = type;ST_KIND = kind;ST_INDEX = index;ST_SCOPE = scope;
}}
Managing Variable Life Cycle
Variable life cycle
static variables: single copy must be kept alive throughout the program duration
field variables: different copies must be kept for each object instance
local variables: created on subroutine entry, killed on exit
argument variables: similar to local variables.
The good news: the VM implementation
already handles almost all of these details!
class Complex {// Fields:field int re, im; // Real & Img. parts
.../** Constructs a new Complex number */constructor Complex new (int real, int img){
let re = real;let im = img;return this;
}...
}
class Foo {function void bla() {
var Complex a, b, c;...let a = Complex.new (5,17);let b = Complex.new (12,192);...let c = a; // Only reference is copied...
}
Jack code
Handling Objects: construction / memory allocation
How to compile: let a = Complex.new (5,17)
The compiler generates code to push
n, and then calls Memory.alloc*, where
n is the number of words necessary to
represent the object in question;
Memory.alloc is an OS function that
returns the base address of a free
memory block of size n words.
Following
compilation:
* The book’s Jack OS sometimes chokes on alloc(0), so always alloc at least one word.
Handling objects: accessing fields
class Complex {// Fields:field int re, im; // Real & Img. parts
.../** Constructs a new Complex number */constructor Complex new (int real, int img){
let re = real;let im = img;return this;
}... .../** Multiplies this Complex number
by the given scalar */method void mult (int c) {
let re = re * c;let im = im * c;return;
}...
}
Jack code
*(this + 1) = *(this + 1)times(argument 0)
How to compile:
let im = im * c
1. Look up these two variables in
the symbol table:
im:
Segment: field
Index: 1
c:
Segment: arg
Index: 0
2. Generate VM code that
implements this pseudo-code :
Assume that b and r were passed to a
function as its first two arguments, and
the code let b.radius = r; appears in
the function. How do we compile?
// Get b's base address:
push argument 0
// Point the this segment to b:
pop pointer 0
// Get r's value (r is an arg)
push argument 1
// Set b's third field to r:
pop this 2
120
80
50radius:
x:
y:
3color:120
80
50
3012
3013
3014
33015
412 3012
...
...
High level program view RAM view
0
...
b
following
compilationb
object
b
object(Actual RAM locations of program variables are
run-time dependent, and thus the addresses shown
here are arbitrary examples.)
00
1
Virtual memory segments just before
the operation b.radius=17:
3012
17
0
1
......
120
80
17
0
1
2
30120
1
3
3012
17
0
1
argument pointer this
...
3
(this 0is nowalligned withRAM[3012])
...
Virtual memory segments just after
the operation b.radius=17:
argument pointer this
Handling Objects: establishing access to the object’s fields
Background: Suppose we have an object
named b of type Ball. A Ball has x,y
coordinates, a radius, and a color.
r r
Handling objects: Method Calls
General rule: each method call
foo.bar(v1,v2,...)
is translated into:
push foopush v1push v2...
call bar
class Complex {field int re, im; // Real & Img. parts
/** Constructs a new Complex number */constructor Complex new (int real, int img) {
let re = real;let im = img;return this;
}/** Multiplies this Complex number
by the given scalar */method void mult (int c) {
let re = re * c;let im = im * c;return;
}
...
}
class Foo {
...
function void bla() {
var Complex x;
...
let x = Complex.new (1,2);
do x.mult(5);
...
}
}
Jack code
push x
push 5
call mult
How to compile:
x.mult(5);
We will view this method call as:
mult(x,5)
…and generate the following code:
class Main {...function void foo(int k) {
var int x, y;var Array bar; // declare an array...// Construct the array:let bar = Array.new (10);...let bar[k] = 19;
}...do Main.foo(2); // Call the foo method...
Jack code
How to compile:
let bar = Array.new (10);
Generate call to Array constructor, which in turn calls Memory.alloc
Handling Arrays: declaration / construction
19
4315
4316
4317
4324
(bar array)
...4318
...
...
4315
...0
bar
x
y
2 k
(local 0)
(local 1)
(local 2)
(argument 0)
275
276
277
504
RAM state
...
Following
execution:
class Main {...function void foo(int k) {
var int x, y;var Array bar; // declare an array...// Construct the array:let bar = Array.new (10);...let bar[k] = 19;
}...do Main.foo(2); // Call the foo method...
Jack code
How to compile: let bar[k] = 19;
// bar[k]=19, or *(bar+k)=19push barpush kadd// Use a pointer to access x[k]pop addr // addr points to bar[k]push 19pop *addr // Set bar[k] to 19
VM Code (pseudo)
// bar[k]=19, or *(bar+k)=19push local 2push argument 0add// Use the that segment to access x[k]pop pointer 1push constant 19pop that 0
VM Code (actual)
19
4315
4316
4317
4324
(bar array)
...4318
...
...
4315
...0
bar
x
y
2 k
(local 0)
(local 1)
(local 2)
(argument 0)
275
276
277
504
RAM state, just after executing bar[k] = 19
...
Following
execution:
Handling arrays: accessing an array entry by its index
syntax
analysis
parse tree
Handling Expressions
((5+z)/-8)*(4+2)
Jack code
push 5
push z
add
push 8
neg
call div
push 4
push 2
add
call mult
code
generation
To generate VM code from a parse tree exp, use the following logic:
The codeWrite(exp) algorithm:
if exp is a constant n then output "push n"
if exp is a variable v then output "push v"
if exp is op(exp1) then codeWrite(exp1); output "op";
if exp is (exp1 op exp2) then codeWrite(exp1); codeWrite(exp2); output "op";
if exp is func (exp1, ..., expn) then codeWrite(exp1); ... codeWrite(exp1); output "call func";
Pseudo VM code
+
CompileExpression Example
void CompileExpression(string expNextChar)// we use expNextChar to let us know when we are done with the (op term)* cycle, e.g., ‘)’{
WriteXMLTag("expression");bool doneWithExpression = false;CompileTerm(); // first term on stackwhile (GetNextToken()){
if ((tokenizer.currentToken == "+") || (tokenizer.currentToken == "-") || (tokenizer.currentToken == "*") || (tokenizer.currentToken == "/") ||(tokenizer.currentToken == "&") || (tokenizer.currentToken == "|") || (tokenizer.currentToken == "<") || (tokenizer.currentToken == ">") ||(tokenizer.currentToken == "="))
{WriteXML("symbol", tokenizer.currentToken);string op = tokenizer.currentToken;CompileTerm(); // next term on stack, then process opif (op == "*") vmWriter.writeCall("Math.multiply", 2);else if (op == "/") vmWriter.writeCall("Math.divide", 2);else vmWriter.writeArithmetic(op);continue;
}else if (tokenizer.currentToken == expNextChar) // we’re done{
doneWithExpression = true;break;
}else ReportError((expNextChar + " or term (op Term)*"), tokenizer.currentToken);
}WriteXMLTag("/expression");if (doneWithExpression) WriteXML("symbol", tokenizer.currentToken);
}
Handling Program Flow
if (cond)
s1
else
s2
...
High-level code
VM code to compute and push !(cond)
if-goto L1
VM code for executing s1
goto L2
label L1
VM code for executing s2
label L2
...
Pseudo VM code
code
generation
while (cond)
s
...
High-level code
label L1
VM code to compute and push !(cond)
if-goto L2
VM code for executing s
goto L1
label L2
...
Pseudo VM code
code
generation
CompileWhileStatement Example
void ComplileWhileStatement(){
WriteXMLTag("whileStatement"); // only for Project 10WriteXML("keyword", "while"); // only for Project 10
string whileCondLabel = NextLabel("WHILE_EXP");string whileEndLabel = NextLabel("WHILE_END");
vmWriter.writeLabel(whileCondLabel);
expect("symbol", "(");CompileExpression(")"); // compute while conditionvmWriter.writeArithmetic("~"); // logically negate itvmWriter.writeIf(whileEndLabel);
expect("symbol", "{");// body of while loopCompileStatements(); vmWriter.writeGoto(whileCondLabel);vmWriter.writeLabel(whileEndLabel);
expect("symbol", "}");WriteXMLTag("/whileStatement"); // only for Project 10
}
Handling Labels Like the Book’s Compiler
private string NextLabel(string prefix){
int labelNumber;switch (prefix){
case "WHILE_EXP":WHILE_EXP_labelNumber += 1;labelNumber = WHILE_EXP_labelNumber;break;
case "WHILE_END":WHILE_END_labelNumber += 1;labelNumber = WHILE_END_labelNumber;break;
case "IF_TRUE":IF_TRUE_labelNumber += 1;labelNumber = IF_TRUE_labelNumber;break;
case "IF_FALSE":IF_FALSE_labelNumber += 1;labelNumber = IF_FALSE_labelNumber;break;
case "IF_END":IF_END_labelNumber += 1;labelNumber = IF_END_labelNumber;break;
default:labelNumber = 99999;break;
}return (prefix + (labelNumber.ToString()));
}
VMWriter Examples
public void writeLabel(string label)
{
VMOutFile.WriteLine("label {0}", label);
}
public void writeGoto(string label)
{
VMOutFile.WriteLine(" goto {0}", label);
}
public void writeIf(string label)
{
VMOutFile.WriteLine(" if-goto {0}", label);
Function Example
transfer 3transfer
// balance// sum
// this// sum
// balance
CompileReturn Example
void CompileReturnStatement(){
// Compiles <return-statement > := 'return' < expression >? ';'
WriteXMLTag("returnStatement"); // Chap 10 onlyWriteXML("keyword", "return"); // Chap 10 only
if (GetNextToken()){
if (tokenizer.currentToken == ";") //no return expression{
WriteXML("symbol", ";"); // Chap 10 onlyvmWriter.writePush("constant", 0); // see book pg. 235;
// void returns push 0 onto stack}else{
peekedAhead = true;CompileExpression(";"); // compile an expression or look for a ";"// leaving result as return value on stack
}vmWriter.writeReturn();
}else ReportError("return statement", "premature EOF");WriteXMLTag("/returnStatement"); // Chap 10 only
}
Special Handling of Methods and Constructors
…// when we get here we have processed all args and all var decs (if any)vmWriter.writeFunction(Working_Class + "." + Working_Subroutine,
symbolTable.VarCount(ST_KIND.VAR));
if (subKind == ST_KIND.METHOD_NAME){// Insert VM code that sets the base of the "this" seg. to the object passed as arg 0
vmWriter.writePush("argument", 0);vmWriter.writePop("pointer", 0);
}else if (subKind == ST_KIND.CONSTRUCTOR_NAME){// Allocate a memory block for the new object (and its field variables),// and set the base of the "this" segment to point to the new object's base// The Jack OS reportedly chokes on alloc(0), so always alloc at least one word.// if OS alloc gets fixed, change this
vmWriter.writePush("constant",Math.Max(symbolTable.VarCount(ST_KIND.FIELD),1));
vmWriter.writeCall("Memory.alloc", 1);vmWriter.writePop("pointer", 0);
}
Compilation Engine
The CompilationEngine effects the actual compilation output.
• It gets its input from the Jack Tokenizer and emits its parsed structure into an output filestream.
• The output is generated by a series of compilexxx() routines, one for every syntactic element xxx of the Jack grammar.
• The contract between these routines is that each compilexxx() routine should read the syntactic construct xxx from the input, advance() the tokenizer exactly beyond xxx, and output the parsing of xxx. Thus, compilexxx() may only be called if indeed xxx is the next syntactic element of the input.
• In the first version of the compiler (Project 10), this module emits a structured printout of the code, wrapped in XML tags (defined in the specs of Project 10). In the final version of the compiler, this module generates executable VM code (defined in the specs of Project 11).
• In both cases, the parsing logic and module API are exactly the same.
Compilation Engine (cont.)
CompilationEngine (cont.)
CompilationEngine (cont.)
Perspective
Jack simplifications that would be challenging to extend:
Limited primitive type system
No inheritance
No public class fields, e.g. must use r = c.getRadius()rather than r = c.radius
Jack simplifications that are easy to extend: :
Limited control structures, e.g. no for, switch, case, continue, break …
Cumbersome handling of char types, e.g. cannot use let x=‘c’
++, --, <=, >=, ==, etc.
Optimizations
Code, e.g., c=c+1 is translated inefficiently into push c, push 1, add, pop c.
Parallel processing
Many other possible improvements …