Testing Grammars For Top Down Parsers
ByAsma M Paracha, Frantisek F.
FranekDept. of Computing & Software
McMaster UniversityHamilton, Ont
Presentation Outline
Background Information
Purdom’s Algorithm
Implementation of Purdom’s Algorithm
Test cases for MACS
Conclusions & Future Work
Compiler Test Case Compiler is a computer program which accepts a source
program as input and produces either an object code or error messages depending upon the source code.
Test case for a compiler should have:1. A test case description2. A source program3. An expected output
Major Problem 1. Completeness of coverage
2. Unfeasible size of test data
Testing Grammar Grammar defines both a language and the basis of deriving
elements for a language.
Grammar is considered both as a program and a specification.
Two important types of grammars:
1. Context free grammars2. Regular grammars
Grammar should be tested to verify:1. It defines the same language for which it is written
2. For completeness (every terminal and every rule is used)
3. To remove ambiguity.
Context Free Grammars (CFG)
CFG is a set of recursive rewriting rules used to generate strings in various patterns.
Components of grammar:1. N: Finite set of non-terminals2. T: Finite set of terminals, does not intersect with N3. S: Initial symbol from N (starting symbol)4. P: Finite set of production rules of the form n m
No practical way to check the dynamic semantics of the language defined by CFG.
Testing Parser The process of checking a given program against the grammar
rules to determine whether or not it is syntactically correct.
Test data for parser is a program which use all the production rules of underlying grammar.
Testing the parser
Purdom’s Algorithm Proposed by Purdom in 1972 for testing compilers for automatically
generating sentences from the grammar by using every production rule at least once.
Sentences generated are good to check most of compiler code and
tables.
Only checks the syntactic aspect, no guarantee for the proper
execution.
Generated sentences may be inconsistent with the contextual
constraints for e.g. variable declarations, use of identifiers, type
checking.
Purdom’s algorithm verifies the compiler correctness, not interested in
other aspects such as: Efficiency Performance
Testing MACS Compiler
MACS is a simple object oriented language very much similar to C++ and Java. It is used in the forth coming book of Franek on compilers.
MACS uses both LALR and LL(1) grammars and has a C++ bottom-
up parser built using Bison/Flex and a Java top-down parser built
using JavaCC.
MACS LL(1) grammar has 77 terminlas,90 non-terminals and 301
production rules.
Implementation of Purdom’s algorithm
Phase I (Shortest String Length)
SLEN Shortest terminal string length for each symbol
RLEN The length of shortest string derivable from each production rule
SHORT
For every non-terminal, it contains the rule numbers which leads to shortest string derivation.
Phase II (Shortest Derivation Length)
DLEN For each non-terminal, the length of shortest terminal string which used it in its derivation
PREV Rule number which introduces a non-terminal in shortest string derivation
Implementation of Purdom’s algorithm (Contd..)
Phase III (Generate Sentence)
ONST The number of occurrences of each non-terminal on the stack.
ONCE Information about each non-terminal whether to use it or not and how to
rewrite it. Contains anyone of the following values:1. READY
2. UNSURE
3. FINISHED
4. INTEGER
MARK Information whether a rule has been used or not. Contains boolean values.
STACK
Every symbol is pushed on the stack starting with the starting symbol “S”.A terminal is being popped every time it comes on the top and non-terminal is rewritten by an appropriate rule.
ResultsPurdom Generated Sentences MACS Test
Cases Syntactically Correct
Semantically Correct
VOID CLASSNAME_DOT ID_LP CLASSNAME ID COMMA CONST CLASSNAME ID RP SEMICOL
void A.a( A a1, const A b);
X X
VOID CLASSNAME_DOT ID_LP BOOL ID RP SEMICOL
void A.a(bool b); X X
PUBLIC VOID CLASSNAME_DOT COMMA CLASSNAME_DOT ID SEMICOL
public void A.a,A.b;
CLASS CLASSNAME SEMICOL class A; X X
CLASS ID LB RB class a{} X X
PUBLIC CONST VOID CLASSNAME_DOT ID SEMICOL
public const void A.a;
X
PUBLIC SHARED VOID CLASSNAME_DOT ID SEMICOL
public shared void A.a;
X
Conclusions
Parser testing is an area which needs more attention.
Purdom’s algorithm is a complete method for testing small grammars.
Test cases generated are incorporated with the semantic aspects of the language to perform the compiler validation.
Future work includes testing the most advanced features of MACS compiler.
Top Related