A Verifiable, Executable SLR Parser Generator - CECS
Transcript of A Verifiable, Executable SLR Parser Generator - CECS
A Verifiable, Executable SLR Parser Generator
Aditi Barthwal
Supervisor: Dr. Michael Norrish
ESOP '09
WHY?
! Provide a mechanised theory of parsing! Parsers as external proof oracles! Part of bigger piece of work, that of providing
mechanised language theory in HOL
SLR Parsing! Simple Left-to-right Rightmost derivation (SLR)
! Bottom-up parser
! Parser has a stack and an input. The first k tokens of the input are the lookahead (one in this case).
! Possible actions" Shift: move the first input token to the top of the stack" Reduce: Choose a grammar rule X ! ABC; pop C, B, A from
the top of the stack; push X on to the stack
! Uses a DFA to parse the input stream
! Example, derivation of ab w.r.t. G: S ! AB, A ! a, B ! b" ab " Ab " AB " S
An example
! Initial grammar" E ! E + T | T
" T ! n | (E)
! Augmented grammar" S ! E$
" E ! E + T | T
" T ! n | (E)
DFA for the grammar
S ! • E$E ! • E + TE ! • TT ! • nT ! •(E)
E ! T•
1 4T
Initial grammar
! E ! E + T | T! T ! n | (E)
Augmented grammar
! S ! E$! E ! E + T | T! T ! n | (E)
DFA for the grammar
S ! • E$E ! • E + TE ! • TT ! • nT ! •(E)
T ! n•
S ! E•$ E ! E• + T
E ! E + •TT ! •nT ! • (E)
T ! (E•)E ! E• + T
T ! (•E)E ! • E + TE ! • TT ! • nT ! •(E)
T ! (E)•
E ! E + T•
E ! T•
6
1
5
98
72
34T T
En
n
n
(
)
+
T
(
E
Initial grammar
! E ! E + T | T
! T ! n | (E)Augmented grammar
! S ! E$
! E ! E + T | T
! T ! n | (E)
+
HOL Implementation
SLR(check DFA)!
PARSER
parsetoken
tokens ! language
SOME tree NONE
otherwise
augmented
grammar
SLR parser
Reject
NONE
tokens
generator
Implementation
! lrparse :: grammar ! symbol ! symbol ! symbol list
! ptree option
! sgoto :: grammar ! state ! symbol
! state
! moveDot :: state ! symbol ! state
! closure :: grammar ! state ! state
! reduce :: grammar ! state ! symbol
! rule list
! followSet :: grammar ! symbol
! symbol set
Properties of Interest
! SoundnessIf a DFA can be constructed for some grammar g and the parser is able to parse token stream t into a parse tree,
then t belongs in the language of g.
! CompletenessIf a DFA can be constructed for some grammar g and
token stream t belongs in the language of g, then the parser can parse t into a parse tree.
Strategy Framework - Soundness
! Soundness
" Show that stack has invariant that any tree on it is valid with respect to the grammar
! S ! A ! a w.r.t G: S ! A, A ! a, " i.e. a ! L(G)
State = Stack + Tree
a SAa
a
A
a
A
S
Shift Reduce Reduce
Strategy for Completeness Proof
! Completeness
" Finite progression when working backwards through rightmost derivation
" Number of machine steps = length of derivation +
# shift steps
Complexity of Completeness
! Proof
" Count steps to get to next sentential form
! Problem
" Need to know any other steps are not possible
! Solution
" SLR-ness of grammar = 'if other derivations do exist, grammar is not SLR'
! Corollary" SLR grammars are unambiguous
Executability
! Nullablenullable g sl = RTC (derives g) sl []
! Executable nullablenullableML g sn [] = T "
nullableML g sn (TS x :: t) = F "
nullableML g sn (NTS n :: t) =
if MEM (NTS n) sn then F
else EXISTS (nullableML g (NTS n :: sn))
(getRhs n (rules g)) "
nullableML g sn t
! N ! N ! [], ERROR! N ! [] , OK
Equivalence Proof
! #g sn l.nullableML g sn l
==> nullable g l
! #g sn l.nullable g l ==> (sn=[])
==> nullableML g sn l
SABBC then S A B B B C
. . . . . . .
. . . . . . .
[] [] [] [] [] [] []
And in case you thought it was easy!
Note:All derivations referred to below are, righmost derivations.The NFA M defined above has the property that !!(q,#) containsA ! $•%, iff A ! $•% is valid for #.We must show that each item A ! $•% contained in !(q,#) is valid for #. If: Suppose A ! $•% is valid for #. Then S =>* #1Aw => #1$%w, where #1$=#. If we can show that !(q,#1) contains A ! $•%, then by rule (3) we know that !(q,#) contains A ! $•%. We therefore proveby induction on the length of above derivation that !(q,#1) containsA ! •$%.The basis,one step, follows from rule (1). For the induction,consider the step in S =>* #1Aw in which the explicitly shownA was introduced. That is, write S =>* #1Aw as S =>* #2Bx => #2#3A#4x =>* #2#3Ayx,where #2#3=#1 and yx=w. Then by the inductive hypothesis applied to the derivationS =>* #2Bx => #2#3A#4x ,we know that B!•#3A#4 is in !(q,#2). By rule (3), B ! #3•A#4 is in !(q,#2#3),and by rule (2) A ! •$% is in !(q,#2#3). Since #2#3=#1, we have proved the inductive hypothesis.Rule (1) !(q0,&) = { S ! •$ | S ! $ is a production }Rule (2) !(A ! $•B%,&) = { B ! •# | B ! # is a production }Rule (3) !(A ! $•X%,X) = { A ! $X•% }