ANTLR4 and its testing
Sahil SawhneySoftware Consultant Knoldus Software, LLP
Agenda
Understanding Grammar
Parse tree
Process Of Parsing
Knowing ANTLR
Who use ANTLR
Testing in ANTLR
Demonstration
What is grammar?
The set of rules that explains how words are used in a language
-Merriam Webster
For example use of an in English grammar
Why grammar?
To bring order out of chaos
And
we adore chaos because we love to produce order.
What type of grammar?
Context-free grammar (CFG)
It consists of a finite set of grammar rules in form of a quadruple (N, T, S, P) where
N is a set of non-terminal symbols. (Placeholders)
T is a set of terminals where N T = NULL
S is the start symbol. (must be a non-terminal)
P is a set of production rules, P: N (N T)*
An Example
Consider the production rule for palindrome with alphabet {a,b}.
S aSa | bSb | a | b |
here,
S is the start as well as non-terminal symbol
{a,b} is the set of terminal nodes
Example Cont...
Consider the string ababaCorresponding Parse Tree
(read from left terminal to right terminal node)
S
a
S
a
b
b
S
a
What is a parse tree?
A parse tree for a grammar G is a tree where
The root is the start symbol for G
The interior nodes are the nonterminals of G
The leaf nodes are the terminal symbols of G
A terminal string is considered valid with respect to a grammar only if there exists a valid parse tree for the input string among all possible parse trees.
Finally, what is ANTLR?
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
From a grammar, ANTLR generates a parser that can build and walk parse trees (data structure representing how a grammar matches the input).
Now what is this parser?
A parser is a program that takes input in the form of a sequence of tokens or program instructions and usually builds a data structure in the form of a parse tree or an abstract syntax tree
ababa
I am theparser
Grammar rules(S aSa | bSb | a | b | )
S
a
S
a
b
b
S
a
3 stages of parsing
Lexical Analysis It produces tokens from a stream of input string.
Syntactic Analysis Checks weather generated tokens form a grammatically correct expression.
Semantic Parsing If expressions are valid a meaning is associated with the expression and necessary actions are taken.
ANTLR Cont...
In a nutshell,
the ANTLR tool converts grammars into programs (Java programs for now) that recognize sentences in the language described by the grammar.
For example, given a grammar for JSON, the ANTLR tool generates a program that recognizes JSON input using some support classes from the ANTLR runtime library.
MyGrammar.g4
I amANTLRAnd the version is 4
MyGrammar.tokens
MyGrammarBaseListner
MyGrammarBaseVisitor
MyGrammarLexer
MyGrammarLexer.tokens
MyGrammarListner
MyGrammarParser
MyGrammarVisitor
Here, ANTLR acts on the grammar andgenerate corresponding Java files
But why ANTLR?
ANTLR generates recursive decent parsers (type of a top down parser) and has good error reporting.
The parser generated by ANTLR is more or less readable. This helps in debugging.
ANTLR is available as "open source" and there are a number of ANTLR users world wide, so there is a reasonable chance that bugs will be identified and corrected.
When to use ANTLR4?
DSL (Domain Specific Language)
Anyone care about ANTLR?
The following say YES WE DO :
Twitter search uses ANTLR for query parsing, with more than 2 billion queries a day
The NetBeans IDE parses C++ with ANTLR
Oracle uses ANTLR within the SQL Developer IDE and its migration tools
Knoldus uses ANTLR in there projects to achieve DSL (domain specific language) requirements
Any Alternatives?
The list is long. Some examples are :
CL-Yacc(Common Lisp)
Gold (C#, Java, Python, Visual Basic etc.)
Hime Parser Generator (C#, Java)
Coco/R (Ada, Pascal, Oberon, Ruby)
Yecc (Erlang)
Etc..
Testing ANTLR4
ANTLR provides a flexible testing tool in the runtime library called TestRig .
It can display lots of information about how a recognizer(auto generated Java classes) matches input from a file or standard input.
TestRig uses Java reflection to invoke compiled recognizers
References
https://github.com/antlr/antlr4/blob/master/doc/getting-started.md
https://www.javacodegeeks.com/2012/06/antlr-getting-started.html
https://en.wikipedia.org/wiki/Context-free_grammar
https://blog.knoldus.com/2016/04/29/testing-grammar-using-antlr4-testrig-grun/
Any Question?
Thank You !!!
Top Related