Introduction - UCCS College of Engineering and Applied...

90
Abdullah Sheneamer 2012 DCSPM Develop and Compile Subset of PASCAL Language to MSIL By Abdullah Sheneamer A project submitted to the Faculty of Graduate School of the University of Colorado at Colorado Springs in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science Department of Computer Science Fall 2012 1

Transcript of Introduction - UCCS College of Engineering and Applied...

Page 1: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

DCSPMDevelop and Compile Subset ofPASCAL Language to MSIL

By

Abdullah Sheneamer

A project submitted to the Faculty of Graduate School of the

University of Colorado at Colorado Springs

in Partial Fulfillment of the Requirements

for the Degree of

Master of Science in Computer Science

Department of Computer Science

Fall 2012

1

Page 2: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

© Copyright by Abdullah Sheneamer 2012

All Rights Reserved

2

Page 3: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

This project for the Master of Science degree by

Abdullah Sheneamer

has been approved for the

Department of Computer Science

By

_______________________________________________________ Dr. Albert Glock, Advisor

_______________________________________________________ Dr. C. Edward Chow, Committee member

_______________________________________________________ Albert Brouillette, Committee member

_______________________________ Date

3

Page 4: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

DCSPMDevelop and Compile Subset ofPASCAL Language to MSIL

Abstract

The focus of this project is to design the Intermediate language (IL or MSIL) for

PASCAL Language. This project aims to design a compiler that can compile a program

written in subset of PASCAL Language to MSIL including, Assignment statement, Write

line instructions, If statement, If/else statement, While statement, For statement, Switch

statement, If logic statement, and One dimensional array. The compilation time is

important so, we have evaluated these different implementations for their speed

performance in Lexical Analysis and Parser which can become bottleneck. First, I built

first phase which lexical analysis to read my code and produce tokens that passing to

parser which is the second phase. I built MSIL of PASCAL inside the parser phase so,

when parser finishes, MSIL has to be generated.

4

Page 5: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

Acknowledgements

I would never have been able to finish my dissertation without the guidance of my

committee members, and support from my family and wife.

I would like to express my deepest gratitude to my advisor, Dr. Albert Glock, for his

excellent guidance, caring, patience, and providing me with an excellent atmosphere for doing

research.

I offer my sincerest gratitude to Dr. Edward Chow, who let me experience the research of

practical issues beyond the textbooks, patiently corrected my writing research and giving

important questionable ideas to me through his comments on my proposal.

I would also like to thank Albert Brouillette for being interested in getting my proposal

succeeded and giving important questionable ideas to me through his comments on my proposal.

5

Page 6: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

Contents1 Introduction..............................................................................................................................9

1.1 Motivation:......................................................................................................................142 Background............................................................................................................................14

2.1 Overview of Compilation Process..................................................................................142.2 History.............................................................................................................................15

3 Design....................................................................................................................................163.1 Introduction to Symbol Table and Lexical Analysis......................................................16

3.1.1 Symbol Table Design..............................................................................................173.1.2 Lexical Analysis Design..........................................................................................18

3.2 Parser and MSIL (Microsoft Intermediate Language) of PASCAL Language Design. .213.2.1 Introduction to Parser (Syntax Analysis).................................................................213.2.2 Parser (Syntax Analysis) Design.............................................................................223.2.3 Introduction to MSIL (Microsoft Intermediate Language).....................................263.2.4 Intermediate language Instructions..........................................................................293.2.5 MSIL (Microsoft Intermediate Language) Design..................................................333.2.6 Design Common Syntax Errors Table.....................................................................42

4 Implementation......................................................................................................................435 Improvements and Evaluations..............................................................................................47

5.1 Improvements..................................................................................................................475.1.1 Lexical Analysis Improvement................................................................................475.1.2 Microsoft Intermediate Language (MSIL) of If Statement Improvement...............48

5.2 Evaluations and performance..........................................................................................506 Lessons Learned....................................................................................................................557 Future Works and conclusion................................................................................................598 References..............................................................................................................................6013. Pro C# 2008 and the .NET 3.5 Platform, Fourth Edition...................................................60Appendix A:...................................................................................................................................61

PASCAL Grammar BNF...........................................................................................................61Appendix B:...................................................................................................................................63

- Installing Visual C# 2010 Express Edition.........................................................................63Appendix C:...................................................................................................................................72

- How to use DCSPM Compiler...........................................................................................72

6

Page 7: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

List of Figures

Figure 1: The compilation and execution process of PASCAL programs.............................11

Figure 2: Compilation process....................................................................................................13

Figure 3: A Compiler...................................................................................................................14

Figure 4 : Class Token in Lexical Analysis...............................................................................17

Figure 5: State diagram for the lexical Analyzer (states 0,1,2)...............................................18

Figure 6: State diagram for the lexical Analyzer (states 3, 4, 5).............................................19

Figure 7: Syntax Tree..................................................................................................................20

Figure 8: Typical Data Structure for the given Syntax Tree...................................................20

Figure 9: Steps in the top-down construction of Parse Tree...................................................22

Figure 10: Method memory categories......................................................................................28

Figure 11: Application Code using .NET..................................................................................42

Figure 12: JIT Compilation........................................................................................................42

Figure 13: NET CLR...................................................................................................................43

Figure 14: Array list data structure vs. Dictionary data structure.........................................48

Figure 15: Parser phase results..................................................................................................50

Figure 16: unimproved and improved IF/Else MSIL results..................................................51

Figure 17: Benchmark between size files of unimproved and improved IF/Else.il...............52

Figure 18: How Branches of If/else statements logic works....................................................54

7

Page 8: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

List of Tables

Table 1: Part of Symbol Table....................................................................................................17

Table 2 : Two Characters Tokens..............................................................................................17

Table 3: Array list data structure vs. Dictionary data structure............................................49

Table 4: Complexity of ArrayList vs. Dictionary.....................................................................50

Table 5: Parser phase results......................................................................................................51

Table 6: benchmark between unimproved and improved IF/Else MSIL..............................52

Table 7: benchmark between unimproved and improved IF/Else.il files...............................53

8

Page 9: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

1 Introduction

In the computer world, techniques evolve rapidly from theories, algorithms,

programming languages, software systems, and software engineering.

“Programming languages are notations for describing computations to people and to

machines. The world as we know it depends on programming languages, because all the software

running on all the computers was written in some programming language. But, before a program

can be run, it first must be translated into a form in which it can be executed by a computer. The

software systems that do this translation are called compilers.” [6]

Fortunately, compilers allow programmers to write at a high level, and automated

processing takes care of creating the machine-specific instructions. My project designs and

creates a compiler that translates PASCAL source code into Microsoft Intermediate Language

(MSIL). When compiling the source code to managed code in .Net environment, the compiler

translates the source into Microsoft Intermediate Language (MSIL). MSIL includes instructions

for loading, storing, initializing, and calling methods on objects, as well as instructions for

arithmetic and logical operations. There is currently no PASCAL compiler which compiles to

MSIL. The Just-in-time (JIT) compiler will convert the MSIL to CPU- Specific code [1].

The advantage in compiling to MSIL is that 1) legacy PASCAL can now be run on

modern machines, 2) MSIL is platform independent and 3) JIT compilers can be optimized for

specific machines and architectures. The JIT compiler can also do aggressive optimizations

specifically for the machine where the code is running.

“Before you can run Microsoft intermediate language (MSIL), it must be converted by

a .NET Framework just-in-time (JIT) compiler to native code, which is CPU-specific code that

runs on the same computer architecture as the JIT compiler. Because the common language

9

Page 10: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

runtime supplies a JIT compiler for each supported CPU architecture, developers can write a set

of MSIL that can be JIT-compiled and run on computers with different architectures. However,

your managed code will run only on a specific operating system if it calls platform-specific

native APIs, or a platform-specific class library.

JIT compilation takes into account the fact that some code might never get called during

execution. Rather than using time and memory to convert all the MSIL in a portable executable

(PE) file to native code, it converts the MSIL as needed during execution and stores the resulting

native code so that it is accessible for subsequent calls. The loader creates and attaches a stub to

each of a type's methods when the type is loaded. On the initial call to the method, the stub

passes control to the JIT compiler, which converts the MSIL for that method into native code and

modifies the stub to direct execution to the location of the native code. Subsequent calls of the

JIT-compiled method proceed directly to the native code that was previously generated, reducing

the time it takes to JIT-compile and run the code.

The runtime supplies another mode of compilation called install-time code generation.

The install-time code generation mode converts MSIL to native code just as the regular JIT

compiler does, but it converts larger units of code at a time, storing the resulting native code for

use when the assembly is subsequently loaded and run. When using install-time code generation,

the entire assembly that is being installed is converted into native code, taking into account what

is known about other assemblies that are already installed. The resulting file loads and starts

more quickly than it would have if it were being converted to native code by the standard JIT

option.

As part of compiling MSIL to native code, code must pass a verification process unless

an administrator has established a security policy that allows code to bypass verification.

10

Page 11: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

Verification examines MSIL and metadata to find out whether the code is type safe, which

means that it only accesses the memory locations it is authorized to access. Type safety helps

isolate objects from each other and therefore helps protect them from inadvertent or malicious

corruption. It also provides assurance that security restrictions on code can be reliably enforced.

The runtime relies on the fact that the following statements are true for code that is verifiably

type safe:

A reference to a type is strictly compatible with the type being referenced.

Only appropriately defined operations are invoked on an object.

Identities are what they claim to be.

During the verification process, MSIL code is examined in an attempt to confirm that the

code can access memory locations and call methods only through properly defined types. For

example, code cannot allow an object's fields to be accessed in a manner that allows memory

locations to be overrun. Additionally, verification inspects code to determine whether the MSIL

has been correctly generated, because incorrect MSIL can lead to a violation of the type safety

rules. The verification process passes a well-defined set of type-safe code, and it passes only

code that is type safe. However, some type-safe code might not pass verification because of

limitations of the verification process, and some languages, by design, do not produce verifiably

type-safe code. If type-safe code is required by security policy and the code does not pass

verification, an exception is thrown when the code is run.” [12]

11

Page 12: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

-Compilation process: takes PASCAL source code and produces MSIL. The PASCAL compiler

includes lexical and syntax analysis, and the creation of the symbol table. MSIL is created when

compiling to manage native code. MSIL is a CPU-independent set of instructions that can be

efficiently converted to native code. Such as figure 2.

-Execution process: MSIL must be converted to CPU-specific code, usually by a just-in-time

(JIT) compiler. Native code is computer programming (code) that is compiled to run with a

particular processor (such as an Intel x86-class processor) and its set of instructions.

12

ExecutionCompilation

PASCALCompiler

Program HelloWorld; BeginWriteln (‘ Hello World’); End .

MSIL JIT Compiler

Native Code

.method public static void Main() cil managed { .entrypoint .maxstack 1 IL_00: ldstr "Hello World" IL_05: call void [mscorlib]System.Console::WriteLine(string) IL_10: ret } // end of method HelloWorld::Main

Figure 1: The compilation and execution process of PASCAL programs

Page 13: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

1.1 Motivation: “During compilation of MSIL, the source code is translated into MSIL code rather than

platform or processor-specific object code. MSIL is a CPU- and platform-independent

instruction set that can be executed in any environment supporting the Common Language

Infrastructure, such as the .NET runtime on Windows, or the cross-platform Mono runtime. In

theory, this eliminates the need to distribute different executable files for different platforms and

13

Lexical Analysis

Parser & MSIL

ErrorHandler

Symbol Table

Source code of PASCAL

.method public static void Main() cil managed { .entrypoint .maxstack 1 IL_00: ldstr "Hello World" IL_05: call void [mscorlib]System.Console::WriteLine(string) IL_10: ret } // end of method HelloWorld::Main

Figure 2: Compilation process

Page 14: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

CPU types. MSIL code is verified for safety during runtime, providing better security and

reliability than natively compiled executable files. ” [13]

Since, there is currently no PASCAL compiler which compiles to MSIL so, I designed

MSIL of subset of PASCAL language which has the advantage in compiling to MSIL is that 1)

legacy PASCAL can now be run on modern machines, 2) MSIL is platform independent and 3)

JIT compilers can be optimized for specific machines and architectures.

2 Background

2.1 Overview of Compilation Process

A compiler is a program that can read a program in one language- the source language –

and translate it into equivalent program in another language as Figure 2. An important role of the

compiler is to report any errors in the source program that it detects during the translation

process [6].

“Microsoft Intermediate Language (MSIL) is a language used as the output of a number

of compilers (C#, VB, .NET, and so forth). The ILDasm (Intermediate Language Disassembler)

tool that ships with the .NET Framework SDK (FrameworkSDK\Bin\ildasm.exe) allows the user

to see MSIL code in human-readable format. By using this utility, we can open any .NET

executable file (EXE or DLL) and see MSIL code.

14

CompilerSource Program

Target Program

Figure 3: A Compiler

Page 15: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

The ILAsm  (Intermediate Language Assembler) tool generates an executable file from

the MSIL language. We can find this program in the WINNT\Microsoft.NET\Framework\

vn.nn.nn directory. Any PASCAL programmer starting with .NET development is interested in

what happens in the low level of the .NET Framework. Learning MSIL gives a user the chance

to understand some things that are hidden from a programmer working with PASCAL or another

language. Knowing MSIL gives more power to a .NET programmer. We never need to write

programs in MSIL directly, but in some difficult cases it is very useful to open the MSIL code in

ILDasm and see how things are done” [14].

2.2 History

“Pascal is an influential imperative and procedural programming language, designed in

1968–1969 and published in 1970 by Niklaus Wirth a small and efficient language intended to

encourage good programming practices using structured programming and data structuring. A

derivative known as Object Pascal designed for object-oriented programming was developed in

1985. 

Pascal, named in honor of the French mathematician and philosopher Blaise Pascal, was

developed by Niklaus Wirth and based on the ALGOL  programming language

Prior to his work on Pascal, Wirth had developed Euler and ALGOL W and later went on to

develop the Pascal-like languages Modula-2 and Oberon.

Initially, Pascal was largely, but not exclusively, intended to teach students structured

programming. A generation of students used Pascal as an introductory language in undergraduate

courses. Variants of Pascal have also frequently been used for everything from research projects

15

Page 16: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

to PC games and embedded systems. . Newer Pascal compilers exist which are widely used”

[15].

Grace Murray Hopper coined the term compiler in the early 1950s. Translation was

viewed as the “compilation” of a sequence of machine language subprograms selected from a

library. One of the first real compilers was the FORTRAN compiler of the late 1950s. It allowed

a programmer to use a problem-oriented source language. Ambitious “optimizations” were used

to produce efficient machine code, which was vital for early computers with quite limited

capabilities. Efficient use of machine resources is still an essential requirement for modern

compilers [16].

3 Design

3.1 Introduction to Symbol Table and Lexical Analysis

A symbol table is a data structure containing a record for each identifier, with fields for

the attributes of the identifier (information about storage allocation, type,…, etc.). When the

lexical analyzer detects an identifier in the source, the identifier is entered into the symbol table.

However, its attributes will be entered in the following phases. These attributes are also used

later phases.

The lexical Analysis is the first phase of a compiler is called lexical analysis or scanning.

The lexical Analysis reads the stream of characters making up the source program and groups the

characters into meaningful sequences called lexemes.

16

Page 17: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

3.1.1 Symbol Table Design

Every key word is a token and has a unique integer code as shown in table 1:

So, The identifier token has a code 256, the number token has a code 257, and every

special character is a token and has an integer token code equals its ASCII number. Tokens of

two characters have unique to Codes as shown in the below table:

17

Table 1: Part of Symbol Table

Table 2 : Two Characters Tokens

Token CodeKeyword

300Begin

323If

302For

305Switch

376While

Token CodeTow – Characters Tokens

406!=

407==

408<=

409>=

Page 18: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

A token in an instance of the class as shown in the below figure:

3.1.2 Lexical Analysis Design

after reading next character from input stream ;

State 0 : identify the current token and decide the next state ;

State 1 : Handle identifiers and keywords.

State 2: Handle Number .

State 3 : Handle one – character token or two –character token .

State 4,5 : Handle Comments “\\” or “\*”, skip the line start with “\\” or skip the data between “\

*” and “*\”.

18

class Token } public int

code; public int

attr;{

Figure 4 : Class Token in Lexical Analysis

Page 19: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

19

INITIAL

ID1

NUM2

Any thing Else /

1 -return the last char into the input

stream .2 -search

the lexbuf in Symtable. 3- insert it as ID if not found otherwise get the row number P.

4 -build the token as : [code=symtab[p,token],attr=p]

5-Enqueue the token and set

lexbuf.“ “ =

Decimal Point\(.)

1 -count dec and if there is more than one dec point in num indicate an error and continue to the next word else put the dec

in to lexbuf.

Anay thing Else \

1 -return the last char into input stream . 2- parser the contents of lexbuf in to int/float : value. 3- Build the token as :[code:NUM,attr:value]

4 -Enqueue the token and

set lexbuf.”“=

Letter OR @ OR _ /Place it

in lexbuf

Digit/Place it in lexbuf

Digit /Place it in lexbuf

Letter OR Digit/Place it in

lexbufWhiteSpace /No Action

Begin - / 1- lexbuf “ =

2 -state = 0;

Begin -/ 1-lexbuf= “”2-state=0;

Figure 5: State diagram for the lexical Analyzer (states 0,1,2)

Page 20: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

20

INITIAL

ID1

NUM2

ONE OR

Two CHA

R3

Other Character:

*Place it in lexbuf

*Get the code for the two – character token now in lexbuf

*Build the token : [code = obtained code; attr=-1;]

*Lexbuf =“ “ ; *state=0 ;

*Return the token to the

parser;

Unrelated Character

*Return last character into input stream

*Build the token : [code= ASCII(first char in lexbuf ); attr=-1]

*lexbuf =“ ”; state=0;

*Return the token to the

parser

Letter OR @ OR _ /Place it in lexbuf

Digit /Place it in lexbuf

WhiteSpace /No Action

Begin - / 1- lexbuf “ =

2 -state = 0;

Any thing else/ place it in lexbuf

Single

line commen

t 4

Multiple line commen

t 5

Sequence is / ”//“

state =4;

Sequence is / ”*/“state =5;

Sequence is “*/” /

lexbuf=“ “; state=0;

New line/ lexbuf=“ “;

state=0;

Anything else/ _

Anything else/ _

Figure 6: State diagram for the lexical Analyzer (states 3, 4, 5)

Begin -/ 1-lexbuf= “”2-state=0;

Page 21: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

3.2 Parser and MSIL (Microsoft Intermediate Language) of PASCAL Language Design

3.2.1 Introduction to Parser (Syntax Analysis)

The parser inputs the stream of tokens into a hierarchical structure represented by a

syntax tree. A typical data structure for the syntax tree of this example “ position := initial + rate

* 60 token stream is shown below:

Position

21

=: • •

+ • •

* • •

Id1 1

Id2 2

Id3 3

Num 60

=

*

+

Initial

Rate 60

Figure 7: Syntax Tree

Figure 8: Typical Data Structure for the given Syntax Tree

Page 22: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

“Grammar is used throughout the parser to organize compiler front ends. A grammar naturally

describes the hierarchical structure of most programming language constructs such as the Pascal

Language. For example, an if- else statement in Pascal language can have the form

If (expression) statement else statement.

That is, an if-else statement is the concatenation of the keyword if , an opening parenthesis, an

expression, a closing parenthesis, a statement, the keyword else, and another statement. Using

the variable expr to denote an expression and variable stmt to denote a statement, this structuring

rule can be expressed as:

stmt if (expr) stmt else stmt

in which the arrow may be read as “ can have the form” Such a rule is called a production. In

production, lexical elements like the keyword if and the parentheses are called terminals.

Variables like expr and stmt represent sequence of terminals and called non-terminals” [6].

3.2.2 Parser (Syntax Analysis) Design

Parsing is the process of determining if a string of tokens can be generated by a grammar. To

parse Pascal, it is sufficient to make a single left to right scan over the input, looking ahead one

token at a time. Top-Down parsing constructs the nodes of a parse tree starting at the root and

proceeding towards the leaves such as the simple example in Figure 8. To construct the parse

tree, start at the root and repeatedly do the following two steps:

1- At the function “OneDimArray” construct children at “n”. For the symbols on the right

side the production.

2- Find the next node at which a sub tree is to be constructed.

22

Page 23: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

“Note: When starting with nonterminal OneDimArray at the root, we should use a production for

OneDimArray that starts with lookahead symbol array. The lookahead symbol always contains

the next token to be parsed in the input stream.” [6]

“When the node being considered in the parse tree is for a terminal, and the terminal

matches the look ahead symbol, then we advance in both the parse tree and the input. The next

token in the input becomes the new look ahead symbol, and the next child in the parse tree is

considered. When a node labeled with a nonterminal is considered, we repeat the process of

selecting a production for the nonterminal. In general, the selection of a production for a

nonterminal may involve trial-and-error. However, a method called “predictive parsing” is

simple and free from trial-and-error.” [6]

The statements and one dimensional array grammar that include my project:

1- Assignment statement that an arithmetic expression is an expression using additions +,

subtractions -, multiplications *, and divisions div. A single mode arithmetic expression

is an expression all of whose operands are of the same type

23

<OneDimArray>

array [ num dot dot num ] of <Standard Type>

integer

Figure 9: Steps in the top-down construction of Parse Tree

Page 24: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

(i.e. INTEGER, REAL or COMPLEX). However, only INTEGER and REAL will be

covered in this project. Therefore, those values or variables in a single mode arithmetic

expression are all integers or real numbers. such as a:=b+c div d-e OR  an assignment

statement gives a value to a variable such as x:=5; and compile that to Intermediate

language.

<assignment statement> ::= <variable> := <expression>

2- The PASCAL compiler is structured in such a way that a write, and writeln statements

containing more than one argument is compiled into several write statement with only

one argument. For writeln,  these statements are followed by a statement that writes the

end-of-line. So for example the writeln statement: “ Prgoram Write; Begin writeln('This

writeln is compiled into MSIL '); End . ”

3- “if” Statement grammar:

4- “if/Else” Statement grammar:

24

<if statement> ::= if <expression> then <statement>

<if statement> ::= if <expression> then <statement> | if <expression> then <statement> else <statement>

Page 25: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

5- “ While” Statement grammar:

<while statement> ::= while <expression> do <statement>

6-

“For” Statement grammar:

<for statement> ::= for <variable identifier > ::= <expression> to <expression> do < statement>

7- “Case” Statement grammar:

<Case> := Case id Of <case_element> End ‘;’ | empty

<case_element> := ‘’’ <case_label_list> ‘:’ <statement>’;’ <statement>

<case_element> | empty

< case_Label_list> := < Constant> ‘{‘ <case_label_list> | ‘,’ <constant>

<case_label_list> |’{‘

<constant> := ‘’’ | ’+’ | ’-‘ | id | num

8- “ Array” structure grammar:

<OnDimArray> := array [ num .. num ] of <standard_type>

<standard_type> := integer | real

25

Page 26: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

9- If logic statement grammar:

<IFLogic> := <ANDLOGIC> Or < expression_list> <IFLogic> | empty

<ANDLOGIC> := < expression_list> And <expression_list> <ANDLOGIC> |

empty

<expression_list> := < expression> | ‘,’ < expression >

<expression> :=…….

3.2.3 Introduction to MSIL (Microsoft Intermediate Language)

MSIL is the Microsoft Intermediate Language. All .NET compatible languages will get

converted to MSIL. MSIL also allows the .NET Framework to JIT compile the assembly on the

installed computer. The main purpose of this Intermediate code formation is to have a platform

independent code...that is once MSIL is available you can run on any platform provided

appropriate run time environments are installed on the specific platform you wish to run such as

CLR in case of .NET.

IL is what your Pascal code gets compiled into and is sent to the JIT compiler when .NET

programs are run. MSIL is a very low level language that is very fast, and working with it gives

you exceptional control over your programs. 

“All operations in MSIL are executed on the stack. When a function is called, its

parameters and local variables are allocated on the stack. Function code starting from this stack

state may push some values onto the stack, make operations with these values, and pop values

from the stack.

Execution of both MSIL commands and functions is done in three steps:

1. Push command operands or function parameters onto the stack.

26

Page 27: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

2. Execute the MSIL command or call function. The command or function pops their

operands (parameters) from the stack and pushes onto the stack result (return value).

3. Read result from the stack”[14].

The Pascal code of MSIL in our previous example looks like this simple code:

“ Program HelloWorld; Begin Writeln (‘ Hello World’); End . “ The output MSIL:

27

// Metadata version: v4.0.30319.assembly extern mscorlib{ .publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) // .z\V.4.. .ver 2:0:0:0}.assembly HelloWorld{ .hash algorithm 0x00008004 .ver 0:0:0:0}.module expression.dll.imagebase 0x00400000.file alignment 0x00000200.stackreserve 0x00100000.subsystem 0x0003 // WINDOWS_CUI.corflags 0x00000001 // ILONLY// Image base: 0x00820000// =============== CLASS MEMBERS DECLARATION ===================.class public auto ansi HelloWorld extends [mscorlib]System.Object{ .method public static void Main() cil managed { .entrypoint .maxstack 1 IL_00: ldstr "Hello World" IL_05: call void [mscorlib]System.Console::WriteLine(string) IL_10: ret } // end of method HelloWorld::Main .method public specialname rtspecialname instance void .ctor() cil managed { .maxstack 2 IL_00: ldarg.0 IL_01: call instance void [mscorlib]System.Object::.ctor() IL_06: ret } // end of method HelloWorld::.ctor

} // end of class HelloWorld

Page 28: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

What’s inside the Class Members Declaration :

.method : A method definition begins with the .method directive and can be defined at global

scope or within a class. The application entry point must be static, meaning an instance is not

required to call the method, and that is indicated by the static keyword. Declaring a global

method static seems redundant but the ILASM compiler complains if you omit

thestatic keyword in some cases. ‘void main()’ as the signature of the method which, as you

would expect, indicates that it does not return a value and takes zero arguments.

.entrypoint : The .entrypoint directive signals to the runtime that this method is the entry point

for the application. Only one method in the application can have this directive.

.maxstack : The .maxstack directive indicates how many stack slots the method expects to use.

For example, adding two numbers together involves pushing both numbers onto the stack and

then calling the add instruction which pops both numbers off the stack and pushes the result onto

the stack. In that example you will need two stack slots.

Ldstr : The ldstr instruction pushes the string that is passed to the WriteLine method onto the

stack.

Call : The call instruction invokes the static WriteLine method on the System.Console class

from the mscorlib assembly. This is an example of a method declaration. It provides the full

signature of the WriteLine method (including the string argument) so that the runtime can

determine which overload of the WriteLine method to call.

Ret : The ret instruction returns execution to the caller. In the case of the entry point method,

this would bring your application to an end.

Also, some programs have a .local directive that declares variables such as:

.local ( int32 a, int32 b,…..). In this MSIL method, variables are declared using the .locals directive.

28

Page 29: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

3.2.4 Intermediate language Instructions

“When a method is executed, three categories of memory local to the method plus one

category of external memory are involved. All these categories represent typed data slots, not

simply an address interval as is the case in the unmanaged world. The external memory

manipulated from the method is the community of the fields the method accesses (except the

fields of value types belonging to the local categories). The local memory categories include an

argument table, a local variable table, and an evaluation stack. Figure 9 describes data transitions

between these categories. As you can see, all IL instructions resulting in data transfer have the

evaluation stack as a source or a destination, or both.

The argument and local variable tables have a static type which can be any of the types defined

in the .NET Framework and the application. The evaluation stack table holds different types at

different times during the course of the method execution. So, the same stack could be used for

different variables.

29

Figure 10: Method memory categories

Page 30: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

IL instructions consist of an operation code (opcode), which for some instructions is

followed by an instruction parameter. Opcodes are either 1 byte or 2 bytes long.

Some of the IL instructions that I used in my project such as:

3.2.4.1 Unconditional branching

Instructions take nothing from the evaluation stack and put nothing on it.

br <int32> (0x38). Branch <int32> bytes from the current position.

By default, the IL assembler does not automatically choose between long-parameter and

short-parameter forms. Thus, if you specify a short-parameter instruction and put the

target label farther away than the short parameter permits, the calculated offset is

truncated to 1 byte, and the IL assembler issues an error message.

br.s <int8> (0x2B). The short-parameter form of br.

3.2.4.2 Conditional Branching Instructions brfalse (brnull, brzero) <int32> (0x39). Branch if <value> is 0. <value>*

brfalse.s (brnull.s, brzero.s) <int8> (0x2C). The short-parameter form of brfalse. I used

brfalse.s in my project is an improvement in the If /Else statement MSIL, I will talk about

it later.

brtrue (brinst) <int32> (0x3A). Branch if <value> is nonzero.

brtrue.s (brinst.s) <int8> (0x2D). The short-parameter form of brtrue.

3.2.4.3 Comparative Branching Instructions

Comparative branching instructions take two values (<value1>, <value2>) from the

evaluatio1n stack and compare them according to the <condition> specified by the

1*<value> is obtained from top value or the stack

30

Page 31: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

opcode. Not all combinations of types of <value1> and <value2> are valid. These are the

ones I used in my project:2

bgt.s <int8> (0x30). The short-parameter form of bgt.

blt.s <int8> (0x32). The short-parameter form of blt.

beq.s <int8> (0x2E). The short-parameter form of beq.

bne.un.s <int8> (0x33). The short-parameter form of bne.un.

ble.s <int8> (0x31). The short-parameter form of ble.

bge.un.s <int8> (0x34). The short-parameter form of bge.un.

3.2.4.4 Constant Loading

Constant loading instructions take at most one parameter (the constant to load) and load it on

the evaluation stack. The ILAsm syntax requires explicit specification of the constants (in other

words, you cannot use a variable or argument name), in decimal or hexadecimal form:

Some instructions have no parameters because the value to be loaded is specified by the opcode

itself.

Note that for integer and floating-point values, the slots of the evaluation stack are either 4- or 8-

bytes wide, so the constants being loaded are converted to the suitable size.

ldc.i4 <int32> (0x20). Load <int32> on the stack.

ldc.i4.s <int8> (0x1F). Load <int8> on the stack.

ldc.i4.m1 (ldc.i4.M1) (0x15). Load –1 on the stack.

ldc.i4.0 (0x16). Load 0.

ldc.i4.1 (0x17). Load 1.

ldc.i4.2 (0x18). Load 2.

ldc.i4.3 (0x19). Load 3.

2 <value> is obtained from top value or the stack

31

Page 32: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

ldc.i4.4 (0x1A). Load 4.

ldc.i4.5 (0x1B). Load 5.

ldc.i4.6 (0x1C). Load 6.

ldc.i4.7 (0x1D). Load 7.

3.2.4.5 Logical Condition Check Instructions

Logical condition check operations are similar to comparative branching instructions except

that they result not in branching but in putting the condition check result on the stack. The result

type is int32, and its value is equal to 1 if the condition checks and 0 otherwise; in other words,

logically the result is a Boolean value. The two operands being compared are taken from the

stack, and since no branching is performed, the condition check instructions have no parameters.

The logical condition check instructions are useful when you want to store the result of the

condition check for multiple use or for later use. If you need the condition check to decide only

once and on the spot whether you need to branch, you would be better off using a comparative

branching instruction.”[10]

ceq (0xFE 0x01). Check whether the two values on the stack are equal.

cgt (0xFE 0x02). Check whether the first value is greater than the second value. It’s the

stack we are working with, so the “second” value is the one on the top of the stack.

clt (0xFE 0x04). Check whether the first value is less than the second value.

3.2.4.6 Local Variable Loading

Local variable loading instructions are similar to argument loading instructions except that no

“invisible” items appear among the local variables, so local variable number 0 is always the first

one specified in the local variable signature.

32

Page 33: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

ldloc <unsigned int16> (0xFE 0x0C). Load the value of local variable

number <unsigned int16> on the stack. Like the argument numbers, local variable

numbers can range from 0 to 65534 (0xFFFE). The value 65535, also admissible for

unsigned 2-byte integers, is excluded because otherwise the counter of local variables

would have to be 4 bytes wide. Limiting the number of the local variables, however

standardized, seems arbitrary and implementation specific, because the number of local

variables of a method is not stored in the metadata or in the method header, so this

limitation comes purely from one particular implementation of the JIT compiler.

ldloc.s <unsigned int8> (0x11). The short-parameter form of ldloc.

ldloc.0 (0x06). Load the value of local variable number 0 on the stack.

ldloc.1 (0x07). Load the value of local variable number 1 on the stack.

ldloc.2 (0x08). Load the value of local variable number 2 on the stack.

ldloc.3 (0x09). Load the value of local variable number 3 on the stack.”[10]

3.2.5 MSIL (Microsoft Intermediate Language) Design

After the source code has been tokenized, the parsing phase commences. At the end of this stage,

if the source code is syntactically valid, the compiler will be generating: (1) an abstract syntax

tree (AST) of the source code and (2) Microsoft Intermediate Language (MSIL). The parser

phase starts with the Program() function which matches “program” keyword, Identifier, “;’” ,

calls declaration() function, compound Statement() function by called match() function which

checks every element of the source code for any syntax errors and checks for the validity of

entered token. Then the parser will call the nextToken() function to read the next token, MSIL is

ready to call the newlabel() function that sets up a new label and then calls the emit function to

combine the new label with the opcode. The same procedure applies to the rest of the functions,

the parser matches the valid tokens and M SIL sets up the new labels and combines them with

the opcodes.

33

Page 34: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

3.1.5.1. Constructors:Constructors are class methods that are executed when an object of a given type is

created. Constructors have the same name as the class, and usually initialize the data members of

the new object.

1- ldloc Table: it is used to save the variable with its load local location

2-Stloc Table: it is used for save the variable with its store local location

3.2.5.1 Functions:

<program> ::= Program <identifier> ; <block> .

1. Program funcion:

<program> ::= Program <identifier> ; <declaration> < compoundStatement>.

2. Declaration function

< declaration> ::= <empty> | var <Identifier list> : <type>     

34

program ID ; <CompoundStatements> <declaration > .

Var <Identifier list> : <type>

<declaration>

<Program>

Page 35: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

3. Identifier List function

Here, IdentifierList() function when matches identifier, it will enqueue attribute of identifier

with its ldloc and its stloc in two queues which retrieves the attribute of identifier from the

symbol table. This will help later on in MSIL when retrieving instructions of identifier for

example:

program Sum();var a,b,c;beginc:= a+b;end; end.

MSIL:IL_00: ldloc.0IL_01: ldloc.1IL_02: addIL_03: stloc.2

4. Type function

35

IDLdloc.Add( lookahead.attr, ldloc.jj);Stloc.Add( lookahead.attr, stloc.jj); jj++;

<Identifier list> | , <Identifier list>

<Identifier list>

ID <Identifier list> : <type> |

< type>

Integer | real |<Standard type> ;<type>

<OneDimArray> <type>

<Identifier list> ::= ID <Identifier list> | , <Identifier list> 

Page 36: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

5. Standard Type function

6. Compound Statements function

Prgoram function after calling Compand statements function, Compound function

matches “Begin” keyword and then calls Statements List function for the other statements such

as Writeline statement, if statement, if/else statement,..etc. We will talk about MSIL statements

in a bit. Next, matches End of Begin of our program and then it will do emit “IL_####:” and

“ret” instruction which return from method, possibly with a value.

36

<Program>

<Compound_Statements>

Begin <Statement List> End ; emit(IL_####, " ", "ret", "", "\n")

Integer| real :

Inqueue(“.locals init ( [1] int32, [2] int32,…)

<Standard type>

Page 37: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

7. Statements List function

After Compound Statements function called Statements list function, Statements list

function is going to call Statement function which decides or parses and compiles a statement to

MSIL. After that, matching the semicolon of statement and then calling Statements List again for

another statement this function is a recursion function because calling itself.

8. Statement function

expression <simple expression>

37

<Statements List>

<Statement> ; Semicolon <Statements List>

<Compound List>

<Statement>

<expression> Begin If

<Statements List>

| | While For Writeline Case| | | |

<Newlabel>: “<” : emit(“IL_##”,”clt”) “>” : emit(“IL_##”,”cgt”) “<=” : emit(“IL_##”,”cgt”) “>=” : emit(“IL_##”,”clt”) “==” : emit(“IL_##”,”ceq”) “<>” : emit(“IL_##”,”ceq”)

Page 38: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

simple expression

<term>

38

<Newlabel> “+” : emit(“IL_##”,”add”) “-” : emit(“IL_##”,”sub”)

<simple expression>

<term>

<Newlabel> “*” : emit(“IL_##”,”mul”) “/” : emit(“IL_##”,”div”)

<term>

<factor>

( <IFlogic> )

<factor>

ID : “*” : emit(“IL_##”,”ldloc.##”)<factor>

NUM : “*” : emit(“IL_##”,”ldc.i4.Num”)

<expression list>

OR : <NewLabel> if “bge.s”:emit(IL_##,”blt.s IL_##”) else “ble.s”:emit(IL_##,”bgt.s IL_##”) else “blt.s”:emit(IL_##,”bge.s IL_##”) else “bgt.s”:emit(IL_##,”ble.s IL_##”) else “bne.un.s”:emit(IL_##,”bne.un.s IL_##”) else “beq.s”:emit(IL_##,”beq.s IL_##”) } “brtrue.s”:emit(IL_##,”brtrue.s IL_##”)

<AndLogic>IFlogic

<IFlogic>

Page 39: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

<expression list>

39

<ANDlogic>

AND: <NewLabel> if “bge.s”:emit(IL_##,”bge.s IL_##”) else “ble.s”:emit(IL_##,”ble.s IL_##”) else “blt.s”:emit(IL_##,”blt.s IL_##”) else “bgt.s”:emit(IL_##,”bgt.s IL_##”) else “bne.un.s”:emit(IL_##,”beq.s IL_##”) else “beq.s”:emit(IL_##,”bne.un.s IL_##”) } “brtrue.s”:emit(IL_##,”brtrue.s IL_##”)

<ANDlogic>

‘,’: <expresson>

ID | NUM: <expression>

Page 40: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

40

<expression>

If LOGIC==false: <NewLabel> emit(IL_##,”ldc.i4.0”); emit(IL_##,”ceq”);

if LOGIC == true: <NewLabel> emit(IL_##,”br.s IL_”count+3”);

AND: <NewLabel> emit(IL_##,”ldc.i4.0”);OR: <NewLabel> emit(IL_##,”ldc.i4.1”);

If stat.

<Statement>

If IFLOGIC==true: <NewLabel> emit(IL_##,”br.s IL_count+3”); <NewLabel> emit(IL_##,”ceq”); <NewLabel> emit(IL_##,”ldc.i4.0”);

<NewLabel> emit(IL_##,”stloc.ii.ToString”); <NewLabel> emit(IL_##,”ldloc.ii.Tostring”); ii++;

<NewLabel> emit(IL_##,”brtrue.s IL_”BIF” ”);

Then <statement> ;

else : Brif =count; <NewLabel> emit(IL_##,”br.s IL_”BrIF” ”); BIF=count;

<statement>

Page 41: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

41

( <Newlabel> emit(“IL_##”, “br.s IL_”ForwordLabel”” Brture=count; f= lookahead.code; d= lookahead.attr; )

While stat.

<Statement>

If f==NUM; <Newlabel> emit(“IL_##,”ldc.i4.”+f.ToString())

If f==ID; <Newlabel> emit(“IL_##,”ldc.i4.”+f.ToString())

<Newlabel>: “<” : emit(“IL_##”,”clt”); “>” : emit(“IL_##”,”cgt”); “<=” : {emit(“IL_##”,”cgt”); emit(“IL_##,”ldc.i4.0”); emit(“IL_##,”ceq”);} “>=” : {emit(“IL_##”,”clt”); emit(“IL_##,”ldc.i4.0”); emit(“IL_##,”ceq”);} “==” : emit(“IL_##”,”ceq”); “<>” : emit(“IL_##”,”ceq”);

<NewLabel> emit(IL_##,”brtrue.s IL_”Brtrue” ”);

End ;

Do Begin <statement list> ForwordLabel = count;

<Newlabel> emit(“IL_##,”ldloc.”+d.ToString())

<NewLabel> emit(IL_##,”stloc.ii.ToString”); <NewLabel> emit(IL_##,”ldloc.ii.Tostring”); ii++;

Page 42: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

3.2.6 Design Common Syntax Errors Table

There are several types of error, with consequences ranging from deficiencies in the

formatting of the output to the calculation of wrong results. A compilation error (which prevents

the compiler from compiling the source code) is usually a syntax error but could be an error in

the compiler itself. A syntax error results when the source code does not obey the rules of the

language. The compiler generates error messages to help the programmer to fix the code. The

source code may compile to machine code which then fails upon execution. A run-time

error causes this situation. Potentially the most serious type of error occurs when the program

appears to be working but is performing faulty processing due to logic errors in the source code.

I classify the common errors as syntax errors.

I designed some of errors which the DCSPM compiler recognizes so, every token has ascii code

such as ‘(‘ , ‘)’, …..so on. The token code in the symbol table represents its keyword such as:

‘program’ keyword has 323, ‘do’ keyword has 305, … so on. The method in DCSPM compiler

looks like:

static void match(int t) {

if (lookahead.code == t) { lookahead = nextToken(); } else switch (t) { case 40: Err += "\n" + "\n" + "Missing '('"; break; case 41: Err += "\n" + "\n" + " Missing ')'"; break; case 44: Err += "\n" + "\n" + "Missing ','"; break; case 46: Err += "\n" + "\n" + "Missing '.'"; break; case 58: Err += "\n" + "\n" + "Missing ':'"; break; case 59: Err += "\n" + "\n" + "Missing ';'"; break;

case 91: Err += "\n" + "\n" + "Missing '['"; break; case 92: Err += "\n" + "\n" + "Missing ']'";

42

Page 43: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

break; case 300: Err += "\n" + "\n" + "Missing 'begin' "; break; case 305: Err += "\n" + "\n" + "Missing 'do' "; break; case 307: Err += "\n" + "\n" + "Missing 'then' "; break; case 308: Err += "\n" + "\n" + "Missing 'end' "; break; case 319: Err += "\n" + "\n" + "Missing 'of' "; break; case 323: Err += "\n" + "\n" + "Missing 'program'"; break; case 328: Err += "\n" + "\n" + "Missing 'to' "; break; case 331: Err += "\n" + "\n" + "Missing 'var' "; break; case 336: Err += "\n" + "\n" + "Missing 'integer'"; break; default: break; } }

4 Implementation

DCSPM is programmed in Microsoft visual C# Express 2010 that is contained in the MSDN

Library, which you can install locally on your own computer or network, and which is also

available on the internet at http://msdn.microsoft.com/library.

“C# (pronounced "C sharp") is a programming language that is designed for building a variety of

applications that run on the .NET Framework. C# is simple, powerful, type-safe, and object-

oriented. The many innovations in C# enable rapid application development while retaining the

expressiveness and elegance of C-style languages.

Visual C# is an implementation of the C# language by Microsoft. Visual Studio supports Visual

C# with a full-featured code editor, compiler, project templates, designers, code wizards, a

powerful and easy-to-use debugger, and other tools. The .NET Framework class library provides

43

Page 44: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

access to many operating system services and other useful, well-designed classes that speed up

the development cycle significantly” [18].

“This effectively reduces the refactoring capabilities of Visual C# Express to Renaming and

Extracting Methods. Developers state the reason of this removal as "to simplify the C# Express

user experience". However this created a controversy as some end users claim it is an important

feature, and instead of simplifying it cripples the user experience.

The ability to attach the debugger to an already-running process has also been removed,

hindering scenarios such as writing Windows services and re-attaching a debugger under

ASP.NET when errors under the original debugging session cause breakpoints to be ignored.

Additionally it has been observed that the express version requires that the time between builds

be greater than approximately 20 seconds. If a project is rapidly modified and rebuilt the target

will not be updated even though the source has been modified and saved.”[19]

 

The steps required to create a .NET application :

1. Application code is written using a .NET-compatible language such as C#.

2. That code is compiled into CIL, which is stored in an assembly such as Figure 10.

44

Figure 11: Application Code using .NET

Page 45: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

3. When this code is executed (either in its own right if it is an executable or when it is used

from other code), it must first be compiled into native code using a JIT compiler such as Figure

11.

4. The native code is executed in the context of the managed CLR, along with any other

running applications or processes, as shown in such as Figure 12.

Microsoft Visual C# is a programming environment used to create computer

applications for the Microsoft Windows family of operating systems. It combines the C#

language and the .NET Framework.

To test the MSIL, I general used the MSIL Disassembler (Ildasm.exe) tool that is

included with the .NET Framework SDK. The Ildasm.exe parses any .NET

45

Figure 12: JIT Compilation

Figure 13: NET CLR

Page 46: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

Framework .exe or .dll assembly, and shows the information in human-readable format.

Ildasm.exe shows more than just the Microsoft intermediate language (MSIL) code — it

also displays namespaces and types, including their interfaces. You can use Ildasm.exe

to examine native .NET Framework assemblies, such as Mscorlib.dll, as well as .NET

Framework assemblies provided by others or created yourself. Most .NET

Framework developers will find Ildasm.exe indispensable. You can find this tool

FrameworkSDK\Bin\ildasm.exe in your computer as I explained that in Background

section 2.

ILAsm has the same instruction set as the native assembly language. You can write code

for ILAsm in any text editor like notepad and then can use the command line compiler

(ILAsm.exe) provided by the .NET framework to compile that. ILAsm.exe is a command

line tool shipped with the .NET Framework and can be located at <windowsfolder>\

Microsoft.NET\Framework\<version> folder. You can include this path in your path

environment variable. When you have finished compiling your .IL file, then it will output

the exe with the same name as that of .IL file. You can specify the output file name

using /OutPut=<filename> switch like ILAsm Test.il /output=MyFile.exe. To run the

output exe file, just type the name of the exe and hit return. Output will be before you on

the screen. [11]

When the .il file is compiled it needs the Fusion.dll file. “Fusion.dll is an assembly

manager module used with the .net framework of Microsoft. The Common Language

Runtime (CLR) contains a system component called the assembly manager that takes on

the responsibilities of storing assembly files in the Global Assembly Cache (GAC) and

loading them at run time when they are first used by an application. The Global

Assembly Cache is the central repository for assemblies installed on a Windows

machine. It provides a uniform, versioned and safe access of assemblies by their strong

assembly name. The assembly manager is loaded from the system component

fusion.dll.”[20]

46

Page 47: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

5 Improvements and Evaluations

5.1 Improvements

5.1.1 Lexical Analysis Improvement

In the lexical analysis, the symbol table used the array list data structure which may not

always offer the best performance for a given task. The symbol table is a data structure, where

each keyword and identifier in a program's source code is associated with information relating to

its declaration or appearance in the source, such as its type, scope level and sometimes

its location.

public static ArrayList a = new ArrayList(); public static void insertKeyword() {

a.Add(new SymbolTable("Begin", 300)); a.Add(new SymbolTable("And", 301)); a.Add(new SymbolTable("Case", 302)); a.Add(new SymbolTable("Const", 303)); a.Add(new SymbolTable("Div", 304)); . . .

}

Arrays provide random access of a sequential set of data. Dictionaries (or associative

arrays) provide a map from a set of keys to a set of values. Most of the time a dictionary-like

type is built as a hash table, this type is very useful as it provides very fast lookups on average

(depending on the quality of the hashing algorithm). I have found dictionary data structure faster

than array list data structure when looking up in symbol table for keywords and identifiers. Array

lists just store a set of objects (that can be accessed randomly). Dictionaries store pairs of objects.

This makes array/lists more suitable when you have a group of objects in a set (prime numbers,

colors, students, etc.). Dictionaries are better suited for showing relationships between a pair of

objects. The Dictionary class constructor takes two parameters (generic type), first for the type of

the key and second for the type of the value. The following code snippet creates a Dictionary

where keys are strings and values are short. 

public static Dictionary<string, int> a = new Dictionary<string, int>();

47

Page 48: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

public static void insertKeyword() {

a.Add("Begin", 300); a.Add("And", 301); a.Add("Case", 302); a.Add("Const", 303); a.Add("Div", 304); . . . . }

So, in dictionary data structure doesn’t have a struct type to represent its objects.

5.1.2 Microsoft Intermediate Language (MSIL) of If Statement Improvement

We can optimize “if statement” MSIL by removing ldc.i4.0 instruction and ceq instruction and

replacing brtrue.s with brfalse.s and get the same results that before optimization. We can see

this in sample code below.

Sample code:

int a = 0, b = 1, c=2; if (a == 1) { a = b + c; }

Improvement MSIL Of Code:

IL_0000: nop IL_0001: ldc.i4.0 IL_0002: stloc.0 IL_0003: ldc.i4.1 IL_0004: stloc.1 IL_0005: ldc.i4.2 IL_0006: stloc.2 IL_0007: ldloc.0 IL_0008: ldc.i4.1

48

Page 49: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

IL_0009: ceq

IL_000e: stloc.3 IL_000f: ldloc.3 IL_0010: brfalse.s IL_0018

IL_0012: nop IL_0013: ldloc.1 IL_0014: ldloc.2 IL_0015: add IL_0016: stloc.0 IL_0017: nop IL_0018: ret

49

IL_000b: ldc.i4.0

IL_000c: ceq

We can Remove these instructions to improve the space and the time so after removing them we have to replace “brtrue.s IL_18” instruction with “ brfalse.s IL_18” instruction

IL_0010: brtrue.s IL_0018

Page 50: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

5.2 Evaluations and performance

The section describes the evaluation and performance of the DCSPM Compiler in

two stages. First stage is the symbol table of lexical analysis, second stage is the Parser

phase, and unimprovement if/else MSIL results and improvement if/else MSIL results. I

have tested different types of code such as 11,22,33,44,55,66,77,88, and 99 lines in

lexical analysis phase which is using array list data structure and lexical analysis phase

which is using dictionary data structure. Table 3 shows the results of the array list and

dictionary.

It’s obvious that when lexical analysis is using dictionary data structure is faster

than array list data structure such as the chart shown in Figure 14.

11 22 33 44 55 66 77 88 990123456789

10

Array ListDictionary

50

Time ms

Lines of Program

Table 3: Array list data structure vs. Dictionary data structure

Figure 14: Array list data structure vs. Dictionary data structure

#Lines Array List Dictionary11 7.7702 ms 6.0066 ms22 7.8529 ms 6.5299 ms33 7.9264 ms 6.6787 ms44 8.0363 ms 6.9415 ms55 8.4518 ms 7.1428 ms66 8.4946 ms 7.2742 ms77 8.6187 ms 7.2959 ms88 8.9369 ms 7.4568 ms99 9.2126 ms 7.5075 ms

Page 51: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

“The Dictionary<TKey,TVale> is probably the most used associative container class.

The Dictionary<TKey,TValue> is the fastest class for associative lookups/inserts/deletes

because it uses a hash table under the covers. Because the keys are hashed, the key type

should correctly implement GetHashCode() and Equals() appropriately or you should

provide an external IEqualityComparer to the dictionary on construction. The

insert/delete/lookup time of items in the dictionary is amortized constant time - O(1) -

which means no matter how big the dictionary gets, the time it takes to find something

remains relatively constant. This is highly desirable for high-speed lookups. The only

downside is that the dictionary, by nature of using a hash table, is unordered, so you

cannot easily traverse the items in a Dictionary in order.” [23]

These are differences between dictionary and array list what we've learned in a quick reference table. [23]

Collection Ordering Contiguous Storage?

Direct Access?

Lookup Efficiency

ManipulateEfficiency

Notes

Dictionary Unordered Yes Via Key Key:O(1)

O(1) Best for high performance lookups.

ArrayList User has precise control over element ordering

Yes Via Index O(n) O(n) Best for smaller lists

Table 4: Complexity of ArrayList vs. Dictionary

“ArrayList  resizes dynamically. As elements are added, it grows in capacity to accommodate

them. It is most often used in older C# programs. It stores a collection of elements of type object.

This makes casting necessary.” [24]

51

Page 52: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

The Second stage is parser phase which receives tokens of lexical analysis by

nextToken() function. I have tested parser phase after its implementation is completed for

different types of code: 11, 22,33,44,55,66,77,88, and 99 lines of Pascal code. Table 4

shows the results of testing.

As we see that time goes up when the code gets more

lines. I tested the parser phase using Stopwatch class such as this code below:

System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch();watch.Start(); Parser(); watch.Stop();double elapsedMS = watch.ElapsedMilliseconds;

11 22 33 44 55 66 77 88 9902468

10121416

Parser Phase

Parser Phase

# lines of Pascal code

Tim

e m

s

52

Table 5: Parser phase results

Figure 15: Parser phase results

# of code lines

Parser Phase

11 0.4187622 1.110433 2.149644 3.449955 5.126866 6.71977 8.889988 10.270199 13.3532

Page 53: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

Also, I have tested unimprovement and improvement if/else MSIL results which are

generated in il file by DCSPM compiler. I have created a batch timer.cmd file to calculate

time of MSIL results such as the code below:

@echo offecho %time% < nulcmd /c %1echo %time% < nul

When I have finished compiling file.il by using ILAsm my .il file, then it will

output the exe with the same name as that of .il file. I used the command in cmd: timer

myfile.exe. I have tested 11,22,33,44,55,66,77,88, and 99 lines of Pascal code. The table

5 shows the benchmark between unimproved and improved if/else MSIL results.

Lines of Pascal Code

unimprove MSIL code

improve MSIL code

11 12.6 ms 13.6 ms22 13.4 ms 11.6 ms33 12.8 ms 12.8 ms44 13.4 ms 10.4 ms55 14.4 ms 12.2 ms66 14.8 ms 12.8 ms77 15 ms 13.2 ms88 15.2 ms 13.4 ms99 15.6 ms 13.9 ms

11 22 33 44 55 66 77 88 9902468

1012141618

if/else MSIL results

unimprove MSIL code improve MSIL code

# lines of Pascal Code

Tim

e m

s

53

Table 6: benchmark between unimproved and improved IF/Else MSIL

Page 54: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

Improved if/else MSIL results are faster than if/else unimproved as shown in

figure 16 because of the size code of improved if/else MSIL less than the size code of

unimproved if/else MSIL. Also, when DCSPM generates improved if/else.il file, the file

size is less than the size of unimproved if/else .il file as shown in figure 17.

Lines of Pascal code Unimproved Size Improved Size11 2 kB 2 KB22 3 KB 3 KB33 5 KB 5 KB44 7 KB 6 KB55 8 KB 8 KB66 10 KB 9 KB77 11 KB 11 KB88 12 KB 11 KB99 14 KB 13 KB

11 22 33 44 55 66 77 88 990

2

4

6

8

10

12

14

16

Size of unimprove and improve if/els MSIL

Unimprove SizeImprove Size

# lines of Pascal Code

Size

/kb

54

Figure 16: unimproved and improved IF/Else MSIL results

Table 7: benchmark between unimproved and improved IF/Else.il files

Figure 17: Benchmark between size files of unimproved and improved IF/Else.il

Page 55: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

6 Lessons Learned

I started my research by reading books, papers, and e-books. I found tools which I can use to

verify my compilation of MSIL results such as ildasm.exe converts IL to human readable code

which can be located at C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin. Another tool is

that ilasm.exe converts human readable code to IL and has instructions set the same as that the

native assembly language has. I write my code for ilasm in any text editor like notepad and then I

can use the command line compiler (ilasm.exe) provided by the .NET framework and that can

located at c:\windows\Mircosoft.NET\Frame work\v1.14322 or C:\Windows\Microsoft.NET\

Framework\v2.0.50727.

In the parser phase, when I was programming my DCSPM compiler I faced some issues. First, In

the MSIL code, every instruction has a label and the label is generated depends on the size of

opcode instruction which has different size such as one byte, two byte or 5 bytes, so, I solved this

issue by this function below:

public static string newLabel()

{

string s = count1.ToString();

return str6;

}

This function generates the labels number but there’s another issue which is this function

generates decimal numbers since, MSIL code has to be hexadecimal numbers, I changed this

function to generate hexadecimal numbers such as this code below:

public static string newLabel()

{

string hexValue = count1.ToString("X");

str6 = "IL_" + hexValue + ":";

return str6;

}

55

Page 56: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

When I compile Pascal code to MSIL such as if/else statement which has branches to go to

forward some next instructions, it’s difficult to know that label number of instruction before I got

it for example:

In this if/else statement logic shows that when I was programming this statement I had to know

next label before parsing it. So, in this issue, I let parser phase to finish scanning all Pascal code

56

If ldc.i4.2 greater than ldloc.0 go to IL_10

Just branch to IL_11

Branch to IL_20 if the value is non-zero “true”

Just Branch to IL_26

program Example(input,output); var a,b: integer; begin a:=2; b:= 3; if(a>=2 or b<3) then begin a:=b+b; endelse begin a:=b-b; end; end; end.

Pascal Code

MSIL CodeIL_0: ldc.i4.2 IL_1: stloc.0 IL_2: ldc.i4.3 IL_3: stloc.1 IL_4: ldloc.0 IL_6: ldc.i4.2 IL_7: bge.s IL_10IL_9: ldloc.1 IL_B: ldc.i4.3 IL_C: clt IL_E: br.s IL_11 IL_10: ldc.i4.0 IL_11: stloc.2 IL_12: ldloc.2 IL_13: brtrue.s IL_20IL_15: ldloc.1 IL_17: ldloc.1 IL_19: add IL_1A: stloc.0 IL_1B: br IL_26IL_20: ldloc.1 IL_22: ldloc.1 IL_24: sub IL_25: stloc.0 IL_26: ret

Figure 18: How Branches of If/else statements logic works

Page 57: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

while the parser phase reads Pascal code and generates MSIL code and save it in queue, it will

save all labels in variables which I made them for labels of instructions that look for forward

label such as: 10, 11, 20, and 26 in figure . So, after parsing of Pascal code and saving MSIL in

queue are completed. Since the parser phase reads MSIL code which is inside the queue, I

created another queue for save MSIL code when the parser dequeues the MSIL code. While the

parser dequeues the MSIL code instruction by instruction, it will inqueues the MSIL code to the

new queue until the old queue is empty. I applied the same approach for all branches in the other

statements. I programmed case statement that looks like this:

program SwitchStatement(input,output); var a,b:integer; begin a:=1; b:= 4; case a of 1 : a:=a div b; 2 : a:= b+a; 3 : a:= a - b; 4 : a:=b*a; end;writeln(a);end; end.

When I changed the order of cases number of a such as this

program SwitchStatement(input,output); var a,b:integer; begin a:=5; b:= 4; case a of 1 : a:=a div b; 2 : a:= b+a; 3 : a:= a - b; 5 : a:=b*a; end;writeln(a);end; end.

Here, when I compiled this code to MSIL, it should be:

57

IL_000a: switch ( IL_0025, IL_002b, IL_0031, IL_003d, IL_0037)

.

.

. IL_003d: ret

This label goes to ret label

Page 58: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

So, my program result of case statement MSIL, this doesn’t not appear, I ordered the numbers of

cases to get the correct results such as 1,2,3,4,…. so on. Because I need time to do this problem.

In the test of my Pascal code, I used the DateTime and TimeSpan classes to measure the

speed of my code like this:

DateTime Start = DateTime.Now;

lex();

TimeSpan Elapsed = DateTime.Now- Start;

speed = "Time Elapsed of Lexical Analysis: " + Elapsed.TotalMilliseconds + "ms";

“ where you take the current DateTime, run the method whose performance you want to

measure, take the current time again, and subtract the inital time to get a TimeSpan object

representing the length of time your function took to execute.

Unfortunately, this method only gives a good measure of performance when the method you're

measuring has a long run time (a second or longer), since DateTime.Now uses the system timer,

which only has a resolution of about 10 milliseconds, meaning that if your method completes in

less then10 milliseconds, the elapsedMS variable above might return 0, telling you nothing about

how long your method actually took to complete.

Luckily since .Net 2.0, there is a better alternative to DateTime.Now: the stopwatch class in the

System.Diagnostics namespace. This class was, as the name implies, designed for performance

measuring, and uses your computer's high-resolution performance counter, which usually has a

resolution of less than one microsecond.”[25]

To rewrite the above code to use the Stopwatch class is easy:

System.Diagnostics.Stopwatch stopwatch =

new System.Diagnostics.Stopwatch();

Stopwatch stopwatch = new Stopwatch();

Stopwatch.Start();

lex();

stopwatch.Stop();

speed = "Time Elapsed of Lexical Analysis: " + Elapsed.TotalMilliseconds + "ms";

58

Page 59: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

7 Future Works and conclusionBecause DCSMP compiler is a relatively new field there is an enormous amount of work

still to be done. Many statements and data structure of Pascal language are not finished to

generate to MSIL in this project such as: complicated case statement, complicated nested if/else

statements logic, assert statement, exit statement, goto statement, repeat statement, next

statement, two dimensional array data structure, queue data structure, and stack data structure.

Also, in this project just implements integer, real types hence, there are many types are not done.

The predeclared procedures are not done in this project.

In conclusion, since there is currently no Pascal compiler which compiles to MSIL,

therefore, in this project focus in the MSIL code of Pascal language. The DCSPM compiler is

useful to legacy Pascal to run on modern machines and its MSIL is a platform independent.

MSIL code is verified for safety during runtime and MSIL can be executed in any environment

supporting the CLI (Common Language Infrastructure). MSIL certainly helps to understand that

the CLR is a stack based machine since others (e.g. JVM) are similar at their core. It really helps

to understand what's going on, how the runtime handles memory, metadata etc. and why some

things work and others don't. So, DCSPM compiler reads the Pascal code, scans the code token

by token, passes the tokens to the parser and MSIL phase, and generates MSIL code of Pascal

language. That's when I found my new best friend: Intermediate Language Disassembler

(ILDASM). ILDASM allows you to see the pseudo assembly language for .NET and it's the only

way you can see who, what, when, where, and why of .NET. While I will probably never write

major programs in Microsoft intermediate language (MSIL), knowing your way around the

assembly language certainly helps. I have faced many problems one of these problems is one

dimensional array. One dimensional array has two cases when compiling to MSIL. First, when

the array has one element or 2 elements will be the same looks like the MSIL of other statements

( if/else/while….etc) such as in figure 19.

59

Figure 19: MSIL of One dimensional Array has one element

Page 60: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

I designed this as I did with the other statements by adding a function in parser phase is called

OneDimArray() function and I added its MSIL and emitted them by emit() function. The MSIL

of one dimensional array like this code below:

Pascal code : var a: array[1..2] of integer = (3,4);

But when I compiled one dimensional array has three elements or more in ILDASM, I got

different MSIL results when it has one element or two elements such as in figure 20.

Pascal code of One dimensional array which has four elements such as below:

program ArrayOneDim (input,output); var a: array[1..4] of integer = (1,2,3,4); beginwriteln(a[2]);end;end.

60

.method private hidebysig static void Main(string[] args) cil managed{ .entrypoint // Code size 19 (0x13) .maxstack 3 .locals init ([0] int32[] a, [1] int32[] CS$0$0000) IL_0000: nop IL_0001: ldc.i4.2 IL_0002: newarr [mscorlib]System.Int32 IL_0007: stloc.1 IL_0008: ldloc.1 IL_0009: ldc.i4.0 IL_000a: ldc.i4.1 IL_000b: stelem.i4 IL_000c: ldloc.1 IL_000d: ldc.i4.1 IL_000e: ldc.i4.2 IL_000f: stelem.i4 IL_0010: ldloc.1 IL_0011: stloc.0 IL_0012: ret} // end of method Program::Main

Page 61: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

So, one dimensional array which has three elements or three, has MAINFEST, Test namespace,

Test.Program class, Private Impelentaion Details namespace , _ StaticArrayInitTypeSize=16

value class. The code below is explaning how I designed one dimensional array has three

elements or more

61

// =============== CLASS MEMBERS DECLARATION =============

.class public auto ansi ArrayOneDimextends [mscorlib]System.Object{.method private hidebysig static void Main() cil managed { .entrypoint.maxstack 10.locals init([0] int32 a,[1] bool CS$4$0000,[2] int32 CS$4$0001)

IL_0: ldc.i4.4

IL_1: newarr [mscorlib]System.Int32

IL_5: dup IL_6: ldtoken field valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=16' '<PrivateImplementationDetails>'::'$$method0x6000001-1' IL_B: call void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array, valuetype [mscorlib]System.RuntimeFieldHandle)

This is name of the class, it usually is changed by depending on the name of the program

Here, it’s 16 size because we have 4 elements so, 4*4=16

Presumably to preserve stack usage.

Figure 20: MSIL of One dimensional Array has four elements

Page 62: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

62

IL_10: stloc.0

IL_11: ldloc.0

IL_12: ldc.i4.2

IL_13: ldelem.i4

IL_14: call void[mscorlib]System.Console::WriteLine(int32)

IL_19: ret

} .method public specialname rtspecialname

instance void .ctor() cil managed {

.maxstack 8IL_0000: ldarg.0IL_0001: call instance void [mscorlib]System.Object::.ctor()IL_0006: ret}// end of method ArrayOneDim::.ctor}// end of class.class private auto ansi '<PrivateImplementationDetails>' extends [mscorlib]System.Object{.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 ) .field static assembly valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=16' '$$method0x6000001-1' at I_00002050.class explicit ansi sealed nested private '__StaticArrayInitTypeSize=16' extends [mscorlib]System.ValueType { .pack 1 .size 16 } // end of class '__StaticArrayInitTypeSize=16'} // end of class '<PrivateImplementationDetails>' .data I_00002050 = bytearray (01 00 00 0002 00 00 0003 00 00 0004 00 00 00 )

I got this usually the same whatever looks like one dimensional array

Elements of one dimensional array in hexadecimal.

Page 63: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

8 References

1. http://msdn.microsoft.com/en-us/library/c5tkafs1(v=vs.71).aspx

2. C# To Program By H.M Deitel & P.J.Deitel& J.Listfield & T.R. Nieto & C.Yaeger &

M.Zlatkina.

3. Compiler Construction principles and practice by Kennth C.louden

4. Data Structure using Java By D.S.Malik & P.S.Nair.

5. An introduction to formal languages and automata. Fourth Edition.  Peter Linz

6. Compilers Principles, Techniques and Tools by Alfred V.Aho, Ravi Sethi and Jeffrey D.

Ullman. 1985

7. Develop a Compiler in Java for a Compiler Design Course Abdul Sattar and Torben

Lorenzen

8. Guide to assembly language [electronic resource] : a concise introduction / James T.

Streib.Streib, James T. London ; New York : Springer, c2011.

9. Using a Stack Assembler Language in a Compiler Course by Dr. Gerald Wildenberg St .

John Fisher College, Rochester, NY Bristol Polytechnic, England (1989-1990 )

10. Expert .NET 2. IL assembler/ Serge Lidin. Lidin, Serge. 1956- Berkeley, CA

11. http://www.codeproject.com/Articles/3778/Introduction-to-IL-Assembly-Language

12. http://msdn.microsoft.com/en-us/library/ht8ecch6(v=vs.71 )

13. Pro C# 2008 and the .NET 3.5 Platform, Fourth Edition

14. http://www.codeguru.com/csharp/.net/net_general/il/article.php/c4635/MSIL-

Tutorial.htm

15. http://en.wikipedia.org/wiki/Pascal_(programming_language )

16. http://pages.cs.wisc.edu/~fischer/cs536.s08/lectures/Lecture02.4up.pdf

17. http://msdn.microsoft.com/en-us/library/system.collections.arraylist.aspx

18. http://msdn.microsoft.com/en-us/library/kx37x362.aspx

19. http://en.wikipedia.org/wiki/Microsoft_Visual_Studio_Express#Visual_C.23_Express

20. http://dll-repair-tools.com/dll-files/fusiondll-the-assembly-manager

21. http://www.learnvisualstudio.net/start-here/lesson-1-1-installing-visual-c-2010-express-

edition/ )

22. http://www.seas.gwu.edu/~hchoi/teaching/cs160d/pascal.pdf

23. http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-

choosing-the-right-collection-class.aspx

63

Page 64: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

24. http://www.dotnetperls.com/arraylist

Appendix A:

PASCAL Grammar BNF. [22] <program> ::= Program <identifier> ; <block> .<block> ::= <variable declaration part> 

<procedure declaration part> <statement part>

variable declaration part> ::= <empty> | var <variable declaration> ;     { <variable declaration> ; }

<variable declaration> ::= <identifier > { , <identifier> } : <type><type> ::= <simple type> <simple type> ::= <type identifier><type identifier> ::= <identifier>

<statement part> ::= <compound statement><compound statement> ::= begin <statement>{ ; <statement> } end<statement> ::= <simple statement> | <structured statement>

<simple statement> ::= <assignment statement> | <read statement> | <write statement>| <if statement> | <for statement>

<assignment statement> ::= <variable> := <expression><read statement> ::= read ( <input variable> { , <input variable> } )<input variable> ::= <variable><write statement> ::= write ( <output value> { , <output value> } )<output value> ::= <expression>

<structured statement> ::= <compound statement> | <if statement> | <while statement>

<if statement> ::= if <expression> then <statement> | if <expression> then <statement> else <statement>

<while statement> ::= while <expression> do <statement><for statement> ::= for <variable identifier > ::= <expression> to <expression> do < statement>

<expression> ::= <simple expression> | <simple expression> <relational operator> <simple expression>

<simple expression> ::= <sign> <term> { <adding operator> <term> }<term> ::= <factor> { <multiplying operator> <factor> }<factor> ::= <variable> | ( <expression> ) 

64

Page 65: Introduction - UCCS College of Engineering and Applied …cs.uccs.edu/.../asheneam/doc/DCSPMCompilerV4a.docx  · Web viewTarget ProgramSource ... the selection of a production for

Abdullah Sheneamer 2012

<relational operator> ::= = | <> | < | <= | >= | ><adding operator> ::= + | - <multiplying operator> ::= * | /

<variable> ::= <entire variable> <entire variable> ::= <variable identifier><variable identifier> ::= <identifier><identifier> ::= <letter> { <letter or digit> }<letter or digit> ::= <letter> | <digit><integer constant> ::= <digit> { <digit> }<character constant> ::= '< any character other than ' >'  |  ''''<letter> ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | 

p | q | r | s | t | u | v | w | x | y | z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9<special symbol> ::= + | - | * | = | <> | < | > | <= | >= | 

( | ) | := | . | , | ; | :  | if | then | else | of | while | do | begin | end | read | write | var |  | program | switch| for | to

<predefined identifier> ::= integer | Boolean

65