Languages and Compilers (SProg og Oversættere) Lecture 9 (3)
1 Languages and Compilers (SProg og Oversættere) Parsing.
-
Upload
lora-hardy -
Category
Documents
-
view
219 -
download
0
Transcript of 1 Languages and Compilers (SProg og Oversættere) Parsing.
![Page 1: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/1.jpg)
1
Languages and Compilers(SProg og Oversættere)
Parsing
![Page 2: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/2.jpg)
2
Parsing
– Describe the purpose of the parser
– Discuss top down vs. bottom up parsing
– Explain necessary conditions for construction of recursive decent parsers
– Discuss the construction of an RD parser from a grammar
![Page 3: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/3.jpg)
3
Top-down parsing
The cat sees a rat .The cat sees rat .
Sentence
Subject Verb Object .
Sentence
Noun
Subject
The
Noun
cat
Verb
sees a
Noun
Object
Noun
rat .
![Page 4: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/4.jpg)
4
Bottom up parsing
The cat sees a rat .The cat
Noun
Subject
sees
Verb
a rat
Noun
Object
.
Sentence
![Page 5: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/5.jpg)
5
Look-Ahead
Derivation
LL-Analyse (Top-Down)
Look-Ahead
Reduction
LR-Analyse (Bottom-Up)
Top-Down vs Bottom-Up parsing
![Page 6: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/6.jpg)
6
Development of Recursive Descent Parser
(1) Express grammar in EBNF
(2) Grammar Transformations: Left factorization and Left recursion elimination
(3) Create a parser class with– private variable currentToken– methods to call the scanner: accept and acceptIt
(4) Implement private parsing methods:– add private parseN method for each non terminal N
– public parse method that
• gets the first token form the scanner
• calls parseS (S is the start symbol of the grammar)
![Page 7: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/7.jpg)
7
Recursive Descent Parsing
Sentence ::= Subject Verb Object .Subject ::= I | a Noun | the Noun Object ::= me | a Noun | the NounNoun ::= cat | mat | ratVerb ::= like | is | see | sees
Sentence ::= Subject Verb Object .Subject ::= I | a Noun | the Noun Object ::= me | a Noun | the NounNoun ::= cat | mat | ratVerb ::= like | is | see | sees
Define a procedure parseN for each non-terminal N
private void parseSentence() ;private void parseSubject();private void parseObject(); private void parseNoun();private void parseVerb();
private void parseSentence() ;private void parseSubject();private void parseObject(); private void parseNoun();private void parseVerb();
![Page 8: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/8.jpg)
8
Recursive Descent Parsing
public class MicroEnglishParser {
private TerminalSymbol currentTerminal;
//Auxiliary methods will go here ...
//Parsing methods will go here ...}
public class MicroEnglishParser {
private TerminalSymbol currentTerminal;
//Auxiliary methods will go here ...
//Parsing methods will go here ...}
![Page 9: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/9.jpg)
9
Recursive Descent Parsing: Auxiliary Methods
public class MicroEnglishParser {
private TerminalSymbol currentTerminal
private void accept(TerminalSymbol expected) {if (currentTerminal matches expected) currentTerminal = next input terminal ;else report a syntax error
}
...}
public class MicroEnglishParser {
private TerminalSymbol currentTerminal
private void accept(TerminalSymbol expected) {if (currentTerminal matches expected) currentTerminal = next input terminal ;else report a syntax error
}
...}
![Page 10: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/10.jpg)
10
Recursive Descent Parsing: Parsing Methods
private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’);}
private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’);}
Sentence ::= Subject Verb Object .Sentence ::= Subject Verb Object .
![Page 11: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/11.jpg)
11
Recursive Descent Parsing: Parsing Methods
private void parseSubject() { if (currentTerminal matches ‘I’) accept(‘I’); else if (currentTerminal matches ‘a’) { accept(‘a’); parseNoun(); } else if (currentTerminal matches ‘the’) { accept(‘the’); parseNoun(); } else report a syntax error}
private void parseSubject() { if (currentTerminal matches ‘I’) accept(‘I’); else if (currentTerminal matches ‘a’) { accept(‘a’); parseNoun(); } else if (currentTerminal matches ‘the’) { accept(‘the’); parseNoun(); } else report a syntax error}
Subject ::= I | a Noun | the Noun Subject ::= I | a Noun | the Noun
![Page 12: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/12.jpg)
12
Recursive Descent Parsing: Parsing Methods
private void parseNoun() { if (currentTerminal matches ‘cat’) accept(‘cat’); else if (currentTerminal matches ‘mat’) accept(‘mat’); else if (currentTerminal matches ‘rat’) accept(‘rat’); else report a syntax error}
private void parseNoun() { if (currentTerminal matches ‘cat’) accept(‘cat’); else if (currentTerminal matches ‘mat’) accept(‘mat’); else if (currentTerminal matches ‘rat’) accept(‘rat’); else report a syntax error}
Noun ::= cat | mat | ratNoun ::= cat | mat | rat
![Page 13: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/13.jpg)
13
LL 1 Grammars
• The presented algorithm to convert EBNF into a parser does not work for all possible grammars.
• It only works for so called “LL 1” grammars.• Basically, an LL1 grammar is a grammar which can be
parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token.
• What grammars are LL1?
How can we recognize that a grammar is (or is not) LL1? We can deduce the necessary conditions from the parser
generation algorithm. We can use a formal definition
![Page 14: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/14.jpg)
14
LL 1 Grammars
parse X* parse X*
while (currentToken.kind is in starters[X]) { parse X}
while (currentToken.kind is in starters[X]) { parse X}
parse X|Y parse X|Y
switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error }
switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error }
Condition: starters[X] and starters[Y] must be disjoint sets.
Condition: starters[X] and starters[Y] must be disjoint sets.
Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X *
Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X *
![Page 15: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/15.jpg)
15
Formal definition of LL(1)
A grammar G is LL(1) iff for each set of productions M ::= X1 | X2 | … | Xn :1. starters[X1], starters[X2], …, starters[Xn] are all pairwise disjoint 2. If Xi =>* ε then starters[Xj]∩ follow[X]=Ø, for 1≤j≤ n.i≠j
If G is ε-free then 1 is sufficient
![Page 16: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/16.jpg)
16
Converting EBNF into RD parsers
• The conversion of an EBNF specification into a Java implementation for a recursive descent parser is so “mechanical” that it can easily be automated!
=> JavaCC “Java Compiler Compiler”
![Page 17: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/17.jpg)
17
JavaCC and JJTree
![Page 18: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/18.jpg)
18
LR parsing
– The algorithm makes use of a stack.
– The first item on the stack is the initial state of a DFA
– A state of the automaton is a set of LR0/LR1 items.
– The initial state is constructed from productions of the form S:= • [, $] (where S is the start symbol of the CFG)
– The stack contains (in alternating) order:
• A DFA state
• A terminal symbol or part (subtree) of the parse tree being constructed
– The items on the stack are related by transitions of the DFA
– There are two basic actions in the algorithm:
• shift: get next input token
• reduce: build a new node (remove children from stack)
![Page 19: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/19.jpg)
19
JavaCUP: A LALR generator for Java
Grammar BNF-like Specification
JavaCUP
Java File: Parser Class
Uses Scanner to get TokensParses Stream of Tokens
Definition of tokens
Regular Expressions
JFlex
Java File: Scanner Class
Recognizes Tokens
Syntactic Analyzer
![Page 20: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/20.jpg)
20
Steps to build a compiler with SableCC
1. Create a SableCC specification file
2. Call SableCC3. Create one or more
working classes, possibly inherited from classes generated by SableCC
4. Create a Main class activating lexer, parser and working classes
5. Compile with Javac
![Page 21: 1 Languages and Compilers (SProg og Oversættere) Parsing.](https://reader034.fdocuments.in/reader034/viewer/2022042822/56649e395503460f94b2a96d/html5/thumbnails/21.jpg)
21
Hierarchy