A Verifiable, Executable SLR Parser Generator - CECS

16
A Verifiable, Executable SLR Parser Generator Aditi Barthwal Supervisor: Dr. Michael Norrish ESOP '09

Transcript of A Verifiable, Executable SLR Parser Generator - CECS

A Verifiable, Executable SLR Parser Generator

Aditi Barthwal

Supervisor: Dr. Michael Norrish

ESOP '09

WHY?

! Provide a mechanised theory of parsing! Parsers as external proof oracles! Part of bigger piece of work, that of providing

mechanised language theory in HOL

SLR Parsing! Simple Left-to-right Rightmost derivation (SLR)

! Bottom-up parser

! Parser has a stack and an input. The first k tokens of the input are the lookahead (one in this case).

! Possible actions" Shift: move the first input token to the top of the stack" Reduce: Choose a grammar rule X ! ABC; pop C, B, A from

the top of the stack; push X on to the stack

! Uses a DFA to parse the input stream

! Example, derivation of ab w.r.t. G: S ! AB, A ! a, B ! b" ab " Ab " AB " S

An example

! Initial grammar" E ! E + T | T

" T ! n | (E)

! Augmented grammar" S ! E$

" E ! E + T | T

" T ! n | (E)

DFA for the grammar

S ! • E$E ! • E + TE ! • TT ! • nT ! •(E)

E ! T•

1 4T

Initial grammar

! E ! E + T | T! T ! n | (E)

Augmented grammar

! S ! E$! E ! E + T | T! T ! n | (E)

DFA for the grammar

S ! • E$E ! • E + TE ! • TT ! • nT ! •(E)

T ! n•

S ! E•$ E ! E• + T

E ! E + •TT ! •nT ! • (E)

T ! (E•)E ! E• + T

T ! (•E)E ! • E + TE ! • TT ! • nT ! •(E)

T ! (E)•

E ! E + T•

E ! T•

6

1

5

98

72

34T T

En

n

n

(

)

+

T

(

E

Initial grammar

! E ! E + T | T

! T ! n | (E)Augmented grammar

! S ! E$

! E ! E + T | T

! T ! n | (E)

+

HOL Implementation

SLR(check DFA)!

PARSER

parsetoken

tokens ! language

SOME tree NONE

otherwise

augmented

grammar

SLR parser

Reject

NONE

tokens

generator

Implementation

! lrparse :: grammar ! symbol ! symbol ! symbol list

! ptree option

! sgoto :: grammar ! state ! symbol

! state

! moveDot :: state ! symbol ! state

! closure :: grammar ! state ! state

! reduce :: grammar ! state ! symbol

! rule list

! followSet :: grammar ! symbol

! symbol set

Properties of Interest

! SoundnessIf a DFA can be constructed for some grammar g and the parser is able to parse token stream t into a parse tree,

then t belongs in the language of g.

! CompletenessIf a DFA can be constructed for some grammar g and

token stream t belongs in the language of g, then the parser can parse t into a parse tree.

Strategy Framework - Soundness

! Soundness

" Show that stack has invariant that any tree on it is valid with respect to the grammar

! S ! A ! a w.r.t G: S ! A, A ! a, " i.e. a ! L(G)

State = Stack + Tree

a SAa

a

A

a

A

S

Shift Reduce Reduce

Strategy for Completeness Proof

! Completeness

" Finite progression when working backwards through rightmost derivation

" Number of machine steps = length of derivation +

# shift steps

Complexity of Completeness

! Proof

" Count steps to get to next sentential form

! Problem

" Need to know any other steps are not possible

! Solution

" SLR-ness of grammar = 'if other derivations do exist, grammar is not SLR'

! Corollary" SLR grammars are unambiguous

Executability

! Nullablenullable g sl = RTC (derives g) sl []

! Executable nullablenullableML g sn [] = T "

nullableML g sn (TS x :: t) = F "

nullableML g sn (NTS n :: t) =

if MEM (NTS n) sn then F

else EXISTS (nullableML g (NTS n :: sn))

(getRhs n (rules g)) "

nullableML g sn t

! N ! N ! [], ERROR! N ! [] , OK

Equivalence Proof

! #g sn l.nullableML g sn l

==> nullable g l

! #g sn l.nullable g l ==> (sn=[])

==> nullableML g sn l

SABBC then S A B B B C

. . . . . . .

. . . . . . .

[] [] [] [] [] [] []

And in case you thought it was easy!

Note:All derivations referred to below are, righmost derivations.The NFA M defined above has the property that !!(q,#) containsA ! $•%, iff A ! $•% is valid for #.We must show that each item A ! $•% contained in !(q,#) is valid for #. If: Suppose A ! $•% is valid for #. Then S =>* #1Aw => #1$%w, where #1$=#. If we can show that !(q,#1) contains A ! $•%, then by rule (3) we know that !(q,#) contains A ! $•%. We therefore proveby induction on the length of above derivation that !(q,#1) containsA ! •$%.The basis,one step, follows from rule (1). For the induction,consider the step in S =>* #1Aw in which the explicitly shownA was introduced. That is, write S =>* #1Aw as S =>* #2Bx => #2#3A#4x =>* #2#3Ayx,where #2#3=#1 and yx=w. Then by the inductive hypothesis applied to the derivationS =>* #2Bx => #2#3A#4x ,we know that B!•#3A#4 is in !(q,#2). By rule (3), B ! #3•A#4 is in !(q,#2#3),and by rule (2) A ! •$% is in !(q,#2#3). Since #2#3=#1, we have proved the inductive hypothesis.Rule (1) !(q0,&) = { S ! •$ | S ! $ is a production }Rule (2) !(A ! $•B%,&) = { B ! •# | B ! # is a production }Rule (3) !(A ! $•X%,X) = { A ! $X•% }

Possible Improvements

! An efficient algorithm for computing DFA states

! Decidability of the parser" Terminates on all inputs, not just elements of

language