Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing &...

12
Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont

Transcript of Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing &...

Page 1: Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.

Testing Grammars For Top Down Parsers

ByAsma M Paracha, Frantisek F.

FranekDept. of Computing & Software

McMaster UniversityHamilton, Ont

Page 2: Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.

Presentation Outline

Background Information

Purdom’s Algorithm

Implementation of Purdom’s Algorithm

Test cases for MACS

Conclusions & Future Work

Page 3: Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.

Compiler Test Case Compiler is a computer program which accepts a source

program as input and produces either an object code or error messages depending upon the source code.

Test case for a compiler should have:1. A test case description2. A source program3. An expected output

Major Problem 1. Completeness of coverage

2. Unfeasible size of test data

Page 4: Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.

Testing Grammar Grammar defines both a language and the basis of deriving

elements for a language.

Grammar is considered both as a program and a specification.

Two important types of grammars:

1. Context free grammars2. Regular grammars

Grammar should be tested to verify:1. It defines the same language for which it is written

2. For completeness (every terminal and every rule is used)

3. To remove ambiguity.

Page 5: Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.

Context Free Grammars (CFG)

CFG is a set of recursive rewriting rules used to generate strings in various patterns.

Components of grammar:1. N: Finite set of non-terminals2. T: Finite set of terminals, does not intersect with N3. S: Initial symbol from N (starting symbol)4. P: Finite set of production rules of the form n m

No practical way to check the dynamic semantics of the language defined by CFG.

Page 6: Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.

Testing Parser The process of checking a given program against the grammar

rules to determine whether or not it is syntactically correct.

Test data for parser is a program which use all the production rules of underlying grammar.

Testing the parser

Page 7: Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.

Purdom’s Algorithm Proposed by Purdom in 1972 for testing compilers for automatically

generating sentences from the grammar by using every production rule at least once.

Sentences generated are good to check most of compiler code and

tables.

Only checks the syntactic aspect, no guarantee for the proper

execution.

Generated sentences may be inconsistent with the contextual

constraints for e.g. variable declarations, use of identifiers, type

checking.

Purdom’s algorithm verifies the compiler correctness, not interested in

other aspects such as: Efficiency Performance

Page 8: Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.

Testing MACS Compiler

MACS is a simple object oriented language very much similar to C++ and Java. It is used in the forth coming book of Franek on compilers.

MACS uses both LALR and LL(1) grammars and has a C++ bottom-

up parser built using Bison/Flex and a Java top-down parser built

using JavaCC.

MACS LL(1) grammar has 77 terminlas,90 non-terminals and 301

production rules.

Page 9: Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.

Implementation of Purdom’s algorithm

Phase I (Shortest String Length)

SLEN Shortest terminal string length for each symbol

RLEN The length of shortest string derivable from each production rule

SHORT

For every non-terminal, it contains the rule numbers which leads to shortest string derivation.

Phase II (Shortest Derivation Length)

DLEN For each non-terminal, the length of shortest terminal string which used it in its derivation

PREV Rule number which introduces a non-terminal in shortest string derivation

Page 10: Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.

Implementation of Purdom’s algorithm (Contd..)

Phase III (Generate Sentence)

ONST The number of occurrences of each non-terminal on the stack.

ONCE Information about each non-terminal whether to use it or not and how to

rewrite it. Contains anyone of the following values:1. READY

2. UNSURE

3. FINISHED

4. INTEGER

MARK Information whether a rule has been used or not. Contains boolean values.

STACK

Every symbol is pushed on the stack starting with the starting symbol “S”.A terminal is being popped every time it comes on the top and non-terminal is rewritten by an appropriate rule.

Page 11: Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.

ResultsPurdom Generated Sentences MACS Test

Cases Syntactically Correct

Semantically Correct

VOID CLASSNAME_DOT ID_LP CLASSNAME ID COMMA CONST CLASSNAME ID RP SEMICOL

void A.a( A a1, const A b);

X X

VOID CLASSNAME_DOT ID_LP BOOL ID RP SEMICOL

void A.a(bool b); X X

PUBLIC VOID CLASSNAME_DOT COMMA CLASSNAME_DOT ID SEMICOL

public void A.a,A.b;

CLASS CLASSNAME SEMICOL class A; X X

CLASS ID LB RB class a{} X X

PUBLIC CONST VOID CLASSNAME_DOT ID SEMICOL

public const void A.a;

X

PUBLIC SHARED VOID CLASSNAME_DOT ID SEMICOL

public shared void A.a;

X

Page 12: Testing Grammars For Top Down Parsers By Asma M Paracha, Frantisek F. Franek Dept. of Computing & Software McMaster University Hamilton, Ont.

Conclusions

Parser testing is an area which needs more attention.

Purdom’s algorithm is a complete method for testing small grammars.

Test cases generated are incorporated with the semantic aspects of the language to perform the compiler validation.

Future work includes testing the most advanced features of MACS compiler.