Approximating Context-Free Grammars for Parsing and ...
Transcript of Approximating Context-Free Grammars for Parsing and ...
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion
Approximating Context-FreeGrammars
for Parsing and Verification
Sylvain Schmitz
LORIA, INRIA Nancy - Grand Est
October 18, 2007
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue
Standard MLMilner et al. [1997]
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end
/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */
%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec
%%
dec : "fun" fvalbind ;
fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;
sfvalbind : VID atpats ’=’ exp ;
exp : VID | "case" exp "of" match ;
match : mrule | match ’|’ mrule ;
mrule : pat "=>" exp ;
atpats : atpat | atpats atpat ;
atpat : VID ;
pat : VID atpat ;
%%
Executablecode
Programtext
Compiler
Language Specification
I MLtonI Moscow MLI Poly/MLI SML/NJ
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue
Standard MLMilner et al. [1997]
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end Executable
codeProgram
text
I MLtonI Moscow MLI Poly/MLI SML/NJ
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue
Standard MLMilner et al. [1997]
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end Executable
codeProgram
text
I MLtonI Moscow MLI Poly/MLI SML/NJ
Error: match.sml 9.25.
Syntax error: replacing EQUALOP with DARROW
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue
Standard MLMilner et al. [1997]
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end Executable
codeProgram
text
I MLtonI Moscow MLI Poly/MLI SML/NJ
! Toplevel input:
! | filterP ([], l) = rev l
! ˆ
! Syntax error.
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue
Standard MLMilner et al. [1997]
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end Executable
codeProgram
text
I MLtonI Moscow MLI Poly/MLI SML/NJ
Error: => expected but = was found
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue
Standard MLMilner et al. [1997]
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end Executable
codeProgram
text
I MLtonI Moscow MLI Poly/MLI SML/NJ
stdIn:7.24-7.29 Error: syntax error:
deleting EQUALOP ID
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue
Parsers
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end
/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */
%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec
%%
dec : "fun" fvalbind ;
fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;
sfvalbind : VID atpats ’=’ exp ;
exp : VID | "case" exp "of" match ;
match : mrule | match ’|’ mrule ;
mrule : pat "=>" exp ;
atpats : atpat | atpats atpat ;
atpat : VID ;
pat : VID atpat ;
%%
Compiler
Language Specification
Executablecode
Programtext
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue
Parsers
datatype ’a option = NONE | SOME of ’a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end
/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */
%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec
%%
dec : "fun" fvalbind ;
fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;
sfvalbind : VID atpats ’=’ exp ;
exp : VID | "case" exp "of" match ;
match : mrule | match ’|’ mrule ;
mrule : pat "=>" exp ;
atpats : atpat | atpats atpat ;
atpat : VID ;
pat : VID atpat ;
%%
Programtext
Executablecode
Language Specification
Compiler
Parser
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue
Parsers/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */
%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec
%%
dec : "fun" fvalbind ;
fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;
sfvalbind : VID atpats ’=’ exp ;
exp : VID | "case" exp "of" match ;
match : mrule | match ’|’ mrule ;
mrule : pat "=>" exp ;
atpats : atpat | atpats atpat ;
atpat : VID ;
pat : VID atpat ;
%%
/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */
%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec
%%
dec : "fun" fvalbind ;
fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;
sfvalbind : VID atpats ’=’ exp ;
exp : VID | "case" exp "of" match ;
match : mrule | match ’|’ mrule ;
mrule : pat "=>" exp ;
atpats : atpat | atpats atpat ;
atpat : VID ;
pat : VID atpat ;
%%
Parser
Context-freegrammar
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue
Parsers/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */
%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec
%%
dec : "fun" fvalbind ;
fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;
sfvalbind : VID atpats ’=’ exp ;
exp : VID | "case" exp "of" match ;
match : mrule | match ’|’ mrule ;
mrule : pat "=>" exp ;
atpats : atpat | atpats atpat ;
atpat : VID ;
pat : VID atpat ;
%%
〈fvalbind〉
〈sfvalbind〉
〈atpats〉
| NONE => filterP(r, l) | filterP ([], l ) = rev l
〈exp〉
. . .
〈mrule〉
〈pat〉
〈match〉
〈exp〉
〈sfvalbind〉
〈fvalbind〉
〈exp〉
|N
ON
E=>
filterP
(r,l)|
filterP
([],l)=
revl
...
Context-freegrammar
Parsergenerator
Parsetree
Inputtokens
Parser
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue
Parsers/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */
%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec
%%
dec : "fun" fvalbind ;
fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;
sfvalbind : VID atpats ’=’ exp ;
exp : VID | "case" exp "of" match ;
match : mrule | match ’|’ mrule ;
mrule : pat "=>" exp ;
atpats : atpat | atpats atpat ;
atpat : VID ;
pat : VID atpat ;
%%
〈fvalbind〉
〈sfvalbind〉
〈atpats〉
| NONE => filterP(r, l) | filterP ([], l ) = rev l
〈exp〉
. . .
〈mrule〉
〈pat〉
〈match〉
〈exp〉
〈sfvalbind〉
〈fvalbind〉
〈exp〉
|N
ON
E=>
filterP
(r,l)|
filterP
([],l)=
revl
...
Parser
Parsetree
Inputtokens
Parsergenerator
Context-freegrammar
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Conflicts
LALR(1) Parser Generator
I GNU Bisonstate 20
6 exp: "case" exp "of" match .
8 match: match . ’|’ mrule
’|’ shift, and go to state 24
’|’ [reduce using rule 6 (exp)]
I Restricted grammar class
CFG
LALR(1)
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Conflicts
LALR(1) Parser Generator
I GNU Bisonstate 20
6 exp: "case" exp "of" match .
8 match: match . ’|’ mrule
’|’ shift, and go to state 24
’|’ [reduce using rule 6 (exp)]
I Restricted grammar class
CFG
LALR(1)
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Conflicts
Dealing with ConflictsAn Objective Measure [Malloy et al., 2002] on a C# Grammar
0
100
200
300
400
500
600
700
2 4 6 8 10 12 14 16 18 20
LALR
(1)
conf
licts
Parser versions
’2002_malloy.data’ using 1:($2+$3)
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Conflicts
Dealing with ConflictsA Subjective Measure
Courtesy of http://www.phdcomics.com .
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Conflicts
Dealing with ConflictsA Subjective Measure
Courtesy of http://www.phdcomics.com .
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Conflicts
Dealing with ConflictsA Subjective Measure
Courtesy of http://www.phdcomics.com .
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
State of the Art
I LR(k) [Knuth, 1965]
I LR-Regular [Culik andCohen, 1973]
I Generalized LR [Tomita,1986]
I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]
I Horizontal and verticalunambiguity test[Brabrand et al., 2007]
LALR(1)
LR(k)
CFG
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
State of the Art
I LR(k) [Knuth, 1965]
I LR-Regular [Culik andCohen, 1973]
I Generalized LR [Tomita,1986]
I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]
I Horizontal and verticalunambiguity test[Brabrand et al., 2007]
CFG
LR-Regular
LR(k)
LALR(1)
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
State of the Art
I LR(k) [Knuth, 1965]
I LR-Regular [Culik andCohen, 1973]
I Generalized LR [Tomita,1986]
I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]
I Horizontal and verticalunambiguity test[Brabrand et al., 2007]
CFG
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
Ambiguity/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */
%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec
%%
dec : "fun" fvalbind ;
fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;
sfvalbind : VID atpats ’=’ exp ;
exp : VID | "case" exp "of" match ;
match : mrule | match ’|’ mrule ;
mrule : pat "=>" exp ;
atpats : atpat | atpats atpat ;
atpat : VID ;
pat : VID atpat ;
%%
〈exp〉
〈exp〉
〈match〉
〈mrule〉〈mrule〉
〈match〉
〈exp〉
〈mrule〉
〈exp〉
〈match〉
〈pat〉
〈exp〉 〈pat〉 〈exp〉 〈mrule〉
〈match〉
〈mrule〉
〈match〉
〈exp〉
〈match〉
〈mrule〉
〈exp〉
ca
se
ao
fb=>
ca
se
bo
fc=>
c|
d=>
d
case a of b => case b of c => c | d=> d
case a of b => case b of c => c | d=> d
Context-freegrammar
Parsergenerator
Inputtokens
Parseforest
Parser
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
Ambiguity/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */
%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec
%%
dec : "fun" fvalbind ;
fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;
sfvalbind : VID atpats ’=’ exp ;
exp : VID | "case" exp "of" match ;
match : mrule | match ’|’ mrule ;
mrule : pat "=>" exp ;
atpats : atpat | atpats atpat ;
atpat : VID ;
pat : VID atpat ;
%%
〈exp〉
〈exp〉
〈match〉
〈mrule〉〈mrule〉
〈match〉
〈exp〉
〈mrule〉
〈exp〉
〈match〉
〈pat〉
〈exp〉 〈pat〉 〈exp〉 〈mrule〉
〈match〉
〈mrule〉
〈match〉
〈exp〉
〈match〉
〈mrule〉
〈exp〉
case a of b => case b of c => c | d=> d
case a of b => case b of c => c | d=> d
ca
se
ao
fb=>
ca
se
bo
fc=>
c|
d=>
d
Context-freegrammar
Parsergenerator
Parseforest
Inputtokens
Parser
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
State of the Art
I LR(k) [Knuth, 1965]
I LR-Regular [Culik andCohen, 1973]
I Generalized LR [Tomita,1986]
I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]
I Horizontal and verticalunambiguity test[Brabrand et al., 2007]
CFG
UCFG
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
State of the Art
I LR(k) [Knuth, 1965]
I LR-Regular [Culik andCohen, 1973]
I Generalized LR [Tomita,1986]
I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]
I Horizontal and verticalunambiguity test[Brabrand et al., 2007]
CFG
UCFG
LR-Regular
LR(k)
LALR(1)
Safe
Unsafe
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
State of the Art
I LR(k) [Knuth, 1965]
I LR-Regular [Culik andCohen, 1973]
I Generalized LR [Tomita,1986]
I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]
I Horizontal and verticalunambiguity test[Brabrand et al., 2007]
CFG
UnsafeHVRU UCFG
Safe
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
State of the Art
I LR(k) [Knuth, 1965]
I LR-Regular [Culik andCohen, 1973]
I Generalized LR [Tomita,1986]
I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]
I Horizontal and verticalunambiguity test[Brabrand et al., 2007]
LALR(1)
LR-Regular
CFG
UnsafeHVRU
Safe
LR(k)
UCFG
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
Contributions
I Noncanonical parsingmethods [Szymanski andWilliams, 1976, Tai, 1979]
I Noncanonical LALR(1)
I Shift-Resolve
I Noncanonicalunambiguity test
I Framework for grammarapproximations
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
Contributions
I Noncanonical parsingmethods [Szymanski andWilliams, 1976, Tai, 1979]
I Noncanonical LALR(1)
I Shift-Resolve
I Noncanonicalunambiguity test
I Framework for grammarapproximations
CFG
UCFG
LALR(1)
NLALR(1)
LR(k)
LR-Regular
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
Contributions
I Noncanonical parsingmethods [Szymanski andWilliams, 1976, Tai, 1979]
I Noncanonical LALR(1)
I Shift-Resolve
I Noncanonicalunambiguity test
I Framework for grammarapproximations
LALR(1)
ShRe
LR(k)
LR-Regular
UCFG
CFG
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
Contributions
I Noncanonical parsingmethods [Szymanski andWilliams, 1976, Tai, 1979]
I Noncanonical LALR(1)
I Shift-Resolve
I Noncanonicalunambiguity test
I Framework for grammarapproximations
LALR(1)
LR(k)
LR-RegularNU
HVRU UCFG
CFG
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions
Contributions
I Noncanonical parsingmethods [Szymanski andWilliams, 1976, Tai, 1979]
I Noncanonical LALR(1)
I Shift-Resolve
I Noncanonicalunambiguity test
I Framework for grammarapproximations
LALR(1)
LR(k)
LR-RegularNU
HVRU UCFG
CFG
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations
Bracketed Grammars
G = 〈N, T , P, S〉, V = N ∪ T
〈dec〉 1−→ fun 〈fvalbind〉〈fvalbind〉 2−→ 〈sfvalbind〉〈fvalbind〉 3−→ 〈fvalbind〉 ′| ′ 〈sfvalbind〉〈sfvalbind〉 4−→ vid 〈atpats〉 = 〈exp〉
〈exp〉 5−→ case 〈exp〉 of 〈match〉〈match〉 6−→ 〈mrule〉〈match〉 7−→ 〈match〉 ′| ′ 〈mrule〉〈mrule〉 8−→ 〈pat〉 => 〈exp〉〈atpats〉 9−→ 〈atpat〉〈atpats〉 10−→ 〈atpats〉 〈atpat〉〈pat〉 11−→ vid 〈atpat〉
〈atpat〉 12−→ vid
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations
Bracketed Grammars
Gb = 〈N, Tb, Pb, S〉, Vb = N ∪ Tb
〈dec〉 1−→ d1 fun 〈fvalbind〉 r1
〈fvalbind〉 2−→ d2 〈sfvalbind〉 r2
〈fvalbind〉 3−→ d3 〈fvalbind〉 ′| ′ 〈sfvalbind〉 r3
〈sfvalbind〉 4−→ d4 vid 〈atpats〉 = 〈exp〉 r4
〈exp〉 5−→ d5 case 〈exp〉 of 〈match〉 r5
〈match〉 6−→ d6 〈mrule〉 r6
〈match〉 7−→ d7 〈match〉 ′| ′ 〈mrule〉 r7
〈mrule〉 8−→ d8 〈pat〉 => 〈exp〉 r8
〈atpats〉 9−→ d9 〈atpat〉 r9
〈atpats〉 10−→ d10 〈atpats〉 〈atpat〉 r10
〈pat〉 11−→ d11 vid 〈atpat〉 r11
〈atpat〉 12−→ d12 vid r12
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations
Positions
〈fvalbind〉
〈sfvalbind〉
〈fvalbind〉 〈sfvalbind〉’|’
〈exp〉=vid 〈atpats〉
d3 d2 〈sfvalbind〉 r2′| ′ ·d4 vid 〈atpats〉 = 〈exp〉 r4 r3
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations
Position Graph ΓLeft-to-right Walks in Trees
〈fvalbind〉
〈sfvalbind〉
〈fvalbind〉 〈sfvalbind〉’|’
〈exp〉=vid 〈atpats〉
d4
d3 d2 〈sfvalbind〉 r2′| ′ d4· vid 〈atpats〉 = 〈exp〉 r4 r3
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations
Position Graph ΓLeft-to-right Walks in Trees
〈fvalbind〉
〈sfvalbind〉
〈fvalbind〉 〈sfvalbind〉’|’
〈exp〉=vid 〈atpats〉
〈sfvalbind〉
d3 d2 〈sfvalbind〉 r2′| ′ d4 vid 〈atpats〉 = 〈exp〉 r4· r3
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations
Position Graph ΓLeft-to-right Walks in Trees
〈fvalbind〉
〈sfvalbind〉
〈fvalbind〉 〈sfvalbind〉’|’
〈exp〉=vid 〈atpats〉
r3
d3 d2 〈sfvalbind〉 r2′| ′ d4 vid 〈atpats〉 = 〈exp〉 r4 r3·
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations
Position Graph ΓLeft-to-right Walks in Trees
〈fvalbind〉
〈sfvalbind〉
〈fvalbind〉 〈sfvalbind〉’|’
〈exp〉=vid 〈atpats〉
r3d3
r4
d4
r2d2
. . .
. . . . . . . . .
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations
Position Automaton Γ/≡
DefinitionΓ/≡ is the quotient of Γ by an equivalence relation ≡between positions.
Theorem (Language over-approximation)
L(Gb) ⊆ L(Γ/≡) ∩ T ∗b
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations
Example: item0 Equivalence〈fvalbind〉
〈sfvalbind〉
〈fvalbind〉 〈sfvalbind〉’|’
r3d3
r2d2
d4
r4
= 〈exp〉
d4= 〈exp〉
r4
vid 〈atpats〉
vid 〈atpats〉
I equivalence class[〈sfvalbind〉 4−→vid 〈atpats〉· = 〈exp〉]
I LR(0) itemsI Γ/item0: nondeterministic LR(0) automaton
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations
Example: item0 Equivalence
[〈sfvalbind〉4−→·vid 〈atpats〉 = 〈exp〉]
d4
[〈sfvalbind〉4−→vid·〈atpats〉 = 〈exp〉]
vid
〈atpats〉
[〈sfvalbind〉4−→vid 〈atpats〉· = 〈exp〉]
=
[〈sfvalbind〉4−→vid 〈atpats〉 = ·〈exp〉]
[〈sfvalbind〉4−→vid 〈atpats〉 = 〈exp〉·]
〈exp〉
d4
[〈fvalbind〉2−→·〈sfvalbind〉]
[〈fvalbind〉3−→〈fvalbind〉 ′| ′ 〈sfvalbind〉·]
[〈fvalbind〉3−→〈fvalbind〉 ′| ′·〈sfvalbind〉]
[〈fvalbind〉2−→〈sfvalbind〉·]
r4r4
I equivalence class[〈sfvalbind〉 4−→vid 〈atpats〉· = 〈exp〉]
I LR(0) itemsI Γ/item0: nondeterministic LR(0) automaton
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations
Summary
I general framework for approximations
I applications:I parser construction
I ambiguity detection
I XML validation [Segoufin and Vianu, 2002]?
I symbolic supertagging [Boullier, 2003]?
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations
Summary
I general framework for approximations
I applications:I parser construction
I ambiguity detection
I XML validation [Segoufin and Vianu, 2002]?
I symbolic supertagging [Boullier, 2003]?
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Principles
Shift-Resolve Parsing
I noncanonical
I k = 1 reduced lookahead symbol
I resolve = reduce + pushback: emulates abounded reduced lookahead without any presetbound
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Principles
Shift-Resolve Parsing
I noncanonical
I k = 1 reduced lookahead symbol
I resolve = reduce + pushback: emulates abounded reduced lookahead without any presetbound
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Example
Shift-Resolve Parse
| NONE => filterP(r, l) | filterP ([], l ) = rev l. . .
〈fvalbind〉
〈sfvalbind〉
〈exp〉〈atpats〉
〈exp〉
〈sfvalbind〉
〈fvalbind〉
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Example
Shift-Resolve Parse
| NONE => filterP(r, l) | filterP ([], l ) = rev l. . .
〈fvalbind〉
〈sfvalbind〉
〈exp〉〈atpats〉
〈exp〉
〈sfvalbind〉
〈fvalbind〉
〈match〉
〈mrule〉
〈pat〉 〈exp〉
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Example
Shift-Resolve Parse
| NONE => filterP(r, l) | filterP ([], l ) = rev l. . .
〈fvalbind〉
〈sfvalbind〉
〈exp〉〈atpats〉
〈exp〉
〈sfvalbind〉
〈fvalbind〉
〈match〉
〈mrule〉
〈pat〉 〈exp〉
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Example
Shift-Resolve Parse
| NONE => filterP(r, l) | filterP ([], l ) = rev l. . .
〈fvalbind〉
〈exp〉
〈sfvalbind〉
〈fvalbind〉
〈match〉
〈mrule〉
〈pat〉 〈exp〉 〈exp〉〈atpats〉
〈sfvalbind〉
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Example
Shift-Resolve Parse
| NONE => filterP(r, l) | filterP ([], l ) = rev l. . .
〈fvalbind〉
〈sfvalbind〉
〈fvalbind〉
〈mrule〉
〈pat〉 〈exp〉 〈atpats〉 〈exp〉
〈sfvalbind〉
〈match〉
〈exp〉
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Example
Shift-Resolve Parse
| NONE => filterP(r, l) | filterP ([], l ) = rev l. . .
〈mrule〉
〈pat〉 〈exp〉 〈atpats〉 〈exp〉
〈match〉
〈exp〉
〈sfvalbind〉
〈sfvalbind〉
〈fvalbind〉
〈fvalbind〉
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Generating the Parser
1. position automaton
2. determinization by subset construction
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Subset ConstructionPrinciple
I di transitions denote traditional item closures
I ri transitions denote a phrase that should bereduced
I other transitions denote shifts
I items in the construction hold1. a state of the position automaton2. a parsing action3. a pushback length
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Subset ConstructionPrinciple
I di transitions denote traditional item closures
I ri transitions denote a phrase that should bereduced
I other transitions denote shifts
I items in the construction hold1. a state of the position automaton2. a parsing action3. a pushback length
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Subset ConstructionExample
〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0
〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Subset ConstructionExample
〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0
〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0
〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0
r5
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Subset ConstructionExample
〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0
〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0
〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈sfvalbind〉·, 5, 0
r4
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Subset ConstructionExample
〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0
〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0
〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0
〈dec〉−→fun 〈fvalbind〉·, 5, 0
S′−→〈dec〉·$, 5, 0
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Subset ConstructionExample
〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0
〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0
〈dec〉−→fun 〈fvalbind〉·, 5, 0
S′−→〈dec〉·$, 5, 0
〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Subset ConstructionExample
〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0
〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0
〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0
〈dec〉−→fun 〈fvalbind〉·, 5, 0
S′−→〈dec〉·$, 5, 0
〈fvalbind〉−→〈fvalbind〉 ’|’ ·〈sfvalbind〉, 5, 1
〈match〉−→〈match〉 ’|’ ·〈mrule〉, 0, 0
’|’
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Subset ConstructionExample
〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0
〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0
〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0
〈dec〉−→fun 〈fvalbind〉·, 5, 0
S′−→〈dec〉·$, 5, 0
〈fvalbind〉−→〈fvalbind〉 ’|’ ·〈sfvalbind〉, 5, 1
〈match〉−→〈match〉 ’|’ ·〈mrule〉, 0, 0
’|’
〈mrule〉−→·〈pat〉 => 〈exp〉, 0, 0d8
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Subset ConstructionExample
〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0
〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0
〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0
〈dec〉−→fun 〈fvalbind〉·, 5, 0
S′−→〈dec〉·$, 5, 0
〈fvalbind〉−→〈fvalbind〉 ’|’ ·〈sfvalbind〉, 5, 1
〈match〉−→〈match〉 ’|’ ·〈mrule〉, 0, 0
’|’
〈mrule〉−→·〈pat〉 => 〈exp〉, 0, 0
〈pat〉−→·vid 〈atpat〉, 0, 0
〈sfvalbind〉−→·vid 〈atpats〉 = 〈exp〉, 0, 0
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Construction Failure〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0
〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0
〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0
〈dec〉−→fun 〈fvalbind〉·, 5, 0
S′−→〈dec〉·$, 5, 0
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Construction Failure〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0
〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0
〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0
〈dec〉−→fun 〈fvalbind〉·, 5, 0
S′−→〈dec〉·$, 5, 0
〈mrule〉−→〈pat〉 ’|’ 〈exp〉·, 5, 0
r5
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Construction Failure〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0
〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈sfvalbind〉·, 5, 0
〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0
〈dec〉−→fun 〈fvalbind〉·, 5, 0
S′−→〈dec〉·$, 5, 0
〈mrule〉−→〈pat〉 ’|’ 〈exp〉·, 5, 0
〈match〉−→〈mrule〉·, 5, 0
〈match〉−→〈match〉· ’|’ 〈mrule〉, 5, 0
〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Complexity
I |Γ/≡|: size of the position automaton
|Γ/item0| = O(|G|)
I |A|: size of the parser: O(2|Γ/≡| |P|)
I parsing time complexity for input w: O(|w|)
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Complexity
I |Γ/≡|: size of the position automaton|Γ/item0| = O(|G|)
I |A|: size of the parser: O(2|Γ/≡| |P|)
I parsing time complexity for input w: O(|w|)
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Limitations
− incomparable with classical parsing techniques
+ subset construction mendable
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Limitations
− incomparable with classical parsing techniques
+ subset construction mendable
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction
Summary
I Shift Resolve parsers1. Large class of grammars accepted2. Unambiguity3. Linear time parsing
I 2-steps construction1. Simple2. Flexible
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion
Principles
I a bracketed sentence = a derivation tree
I ambiguity =more than one tree with the sameyield
d6d8d13 vid r13 =>d5 case d14 vid r14 of d7d6d8d13 vid r13 =>d14 vid r14r8r6′|′ d8d13 vid r13 =>d14 vid r14r8r7r5r8r6
d7d6d8d13 vid r13 =>d5 case d14 vid r14 of d7d8d13 vid r13 =>d14 vid r14r8r7r5r8r6′|′ d8d13 vid r13 =>d14 vid r14r8r7
I construct a FSA A such that L(Gb) ⊆ L(A), andlook for bracketed sentences with the same yield
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion
Principles
I a bracketed sentence = a derivation tree
I ambiguity =more than one tree with the sameyield
d6d8d13 vid r13 =>d5 case d14 vid r14 of d7d6d8d13 vid r13 =>d14 vid r14r8r6′|′ d8d13 vid r13 =>d14 vid r14r8r7r5r8r6
d7d6d8d13 vid r13 =>d5 case d14 vid r14 of d7d8d13 vid r13 =>d14 vid r14r8r7r5r8r6′|′ d8d13 vid r13 =>d14 vid r14r8r7
I construct a FSA A such that L(Gb) ⊆ L(A), andlook for bracketed sentences with the same yield
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion
Principles
I a bracketed sentence = a derivation tree
I ambiguity =more than one tree with the sameyield
d6d8d13 vid r13 =>d5 case d14 vid r14 of d7d6d8d13 vid r13 =>d14 vid r14r8r6′|′ d8d13 vid r13 =>d14 vid r14r8r7r5r8r6
d7d6d8d13 vid r13 =>d5 case d14 vid r14 of d7d8d13 vid r13 =>d14 vid r14r8r7r5r8r6′|′ d8d13 vid r13 =>d14 vid r14r8r7
I construct a FSA A such that L(Gb) ⊆ L(A), andlook for bracketed sentences with the same yield
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionRegular Unambiguity
RU(≡)
I G is regular unambiguous for ≡ of finite index, ifthere does not exist wb , w ′
b in L(Γ/≡)∩ T ∗b withh(wb) = h(w ′
b)
I LR(0) * RU(item0)
I regular approximations are too weak
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionRegular Unambiguity
RU(≡)
I G is regular unambiguous for ≡ of finite index, ifthere does not exist wb , w ′
b in L(Γ/≡)∩ T ∗b withh(wb) = h(w ′
b)
I LR(0) * RU(item0)
I regular approximations are too weak
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Nonterminal Transitions
I SF(Gb) ⊆ L(Γ/≡)
I look for two different bracketed sentential formsin L(Γ/≡)
d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
I a nonterminal transition represents exactly itsderived context-free language
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Nonterminal Transitions
I SF(Gb) ⊆ L(Γ/≡)
I look for two different bracketed sentential formsin L(Γ/≡)
d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
I a nonterminal transition represents exactly itsderived context-free language
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Nonterminal Transitions
I SF(Gb) ⊆ L(Γ/≡)
I look for two different bracketed sentential formsin L(Γ/≡)
d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
I a nonterminal transition represents exactly itsderived context-free language
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)
I synchronized left-to-right walks from an initialpair (qs, qs)
d6d8 d14 vid r14 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 d14 vid r14 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
epsilon: mae
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)
I synchronized left-to-right walks from an initialpair (qs, qs)
d6d8 d14 vid r14 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 d14 vid r14 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
epsilon: mae
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)
I synchronized left-to-right walks from an initialpair (qs, qs)
d6d8 d14 vid r14 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 d14 vid r14 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
epsilon: mae
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)
I synchronized left-to-right walks from an initialpair (qs, qs)
d6d8 d14 vid r14 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 d14 vid r14 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
shift: mas
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)
I synchronized left-to-right walks from an initialpair (qs, qs)
d6d8 d14 vid r14 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 d14 vid r14 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
nothing!
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)
I synchronized left-to-right walks from an initialpair (qs, qs)
d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
shift: mas
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)
I synchronized left-to-right walks from an initialpair (qs, qs)
d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
conflict: mac
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)
I synchronized left-to-right walks from an initialpair (qs, qs)
d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
conflict: mac
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)
I synchronized left-to-right walks from an initialpair (qs, qs)
d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
conflict: mac
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)
I synchronized left-to-right walks from an initialpair (qs, qs)
d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
shift: mas
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)
I synchronized left-to-right walks from an initialpair (qs, qs)
d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
reduce: mar
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)
I synchronized left-to-right walks from an initialpair (qs, qs)
d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6
d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7
conflict: mac
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
NU(≡)
I ma=mas ∪ mae ∪ mac ∪ mar
I G is noncanonically unambiguous if there doesnot exist a relation (qs, qs) ma∗ (qf, qf) that usesmac at some step
I Computation in O(|Γ/≡|2) in space
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Comparisons
I Regular Unambiguity RU(≡)
I Bounded-length detection schemes
I LR(k) and LR-Regular (LR(Π))
I Horizontal and vertical ambiguity (HVRU(≡))
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
Bounded-length detection[Gorn, 1963, Cheung and Uzgalis, 1995, Schroer, 2001, Jampana, 2005]
I generate sentencesI not conservativeI prefixm prevents from false positives in
sentences of length < m
I need to generate a2n+1 to find Gn4 ambiguous,
but Gn4 < NU(item0)
S−→A |Bna, A−→Aaa |a, B1−→aa, B2−→B1B1, . . . , Bn−→Bn−1Bn−1(Gn
4 )
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity
LR(k) and LR-Regular[Knuth, 1965, Hunt III et al., 1975, Culik and Cohen, 1973, Heilbrunner, 1983]
I conservative testsI define itemΠ s.t. LR(Π) ⊂ NU(itemΠ)
I need a LR(2n) test to prove Gn3 unambiguous,
but Gn3 ∈ NU(item0)
S−→A |Bn, A−→Aaa |a, B1−→aa, B2−→B1B1, . . . , Bn−→Bn−1Bn−1(Gn
3 )
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionExperimental Results
ImplementationI For the whole SML grammar:
I conflicts in the LALR(1) parsersml.y: conflicts: 223 shift/reduce, 35 reduce/reduce
I Our tool:89 potential ambiguities with LR(1) precision detected
I For the SML grammar fragment:2 potential ambiguities with LR(0) precision detected:
(match -> mrule . , match -> match . ’|’ mrule )
(match -> match . ’|’ mrule , match -> match ’|’ mrule . )
I NU(item1) correctly identifies 87% of ourunambiguous grammars—73% of thenon-LALR(1) ones
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionExperimental Results
Summary
I conservative ambiguity detection
I provably better than several other techniques
I also experimentally better
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionClosing Comments
Conclusion
I Main issues in parser development:I nondeterminismI ambiguity in particular
I Deterministic parsers for larger classes ofgrammars
I Ambiguity detection algorithm
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionFuture Work
Directions for Future Work
I Linear time parsing for NU(≡) grammars?
I Improved implementation
I Noncanonical languages
I Regular approximations
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionFuture Work
Thanks!
References
Our IssueShift/Reduce Conflict
GNU Bison
state 20
6 exp: "case" exp "of" match .
8 match: match . ’|’ mrule
’|’ shift, and go to state 24
’|’ [reduce using rule 6 (exp)]
References
Our IssueShift/Reduce Conflict
Which action to choose?
of SOME y => filterP(r, y :: l ) | NONE => filterP(r, l)
〈pat〉
〈mrule〉
〈exp〉
〈match〉
. . .
〈match〉
〈pat〉
〈mrule〉
〈exp〉
〈fvalbind〉
References
Our IssueShift/Reduce Conflict
Which action to choose? Reduce?
of SOME y => filterP(r, y :: l ) | NONE => filterP(r, l)
〈pat〉
〈mrule〉
〈exp〉
〈match〉
. . .
〈exp〉
vid
〈sfvalbind〉
error!
〈sfvalbind〉
〈fvalbind〉
References
Our IssueShift/Reduce Conflict
Which action to choose? Shift?
of SOME y => filterP(r, y :: l ) | NONE => filterP(r, l)
〈pat〉
〈mrule〉
〈exp〉
〈match〉
〈match〉
. . .
〈exp〉〈pat〉
〈mrule〉
〈fvalbind〉
References
Our IssueShift/Reduce Conflict
Which action to choose?
| NONE => filterP(r, l) | filterP ([], l ) = rev l
〈exp〉〈pat〉
〈mrule〉
〈match〉
. . .
〈fvalbind〉
〈sfvalbind〉
〈exp〉〈atpats〉
〈exp〉
〈sfvalbind〉
〈fvalbind〉
References
Our IssueShift/Reduce Conflict
Which action to choose? Reduce?
| NONE => filterP(r, l) | filterP ([], l ) = rev l
〈exp〉〈pat〉
〈mrule〉
〈match〉
. . .
〈fvalbind〉
〈sfvalbind〉
〈exp〉〈atpats〉
〈exp〉
〈sfvalbind〉
〈fvalbind〉
References
Our IssueShift/Reduce Conflict
Which action to choose? Shift?
| NONE => filterP(r, l) | filterP ([], l ) = rev l
〈exp〉〈pat〉
〈mrule〉
〈match〉
. . .
〈pat〉
〈exp〉
〈sfvalbind〉
〈fvalbind〉
〈fvalbind〉
〈exp〉
〈match〉
error!
| NONE => filterP(r, l) | filterP ([], l ) = rev l
〈exp〉〈pat〉
〈mrule〉
〈match〉
. . .
〈pat〉
〈exp〉
〈sfvalbind〉
〈fvalbind〉
〈fvalbind〉
〈exp〉
〈match〉
error!
| NONE => filterP(r, l) | filterP ([], l ) = rev l
〈exp〉〈pat〉
〈mrule〉
〈match〉
. . .
〈pat〉
〈exp〉
〈sfvalbind〉
〈fvalbind〉
〈fvalbind〉
〈exp〉
〈match〉
error!
References
Unbounded Lookahead
〈sfvb〉 〈mrule〉
〈atpats〉
〈atpat〉 〈atpat〉
〈pat〉
vid
vid =>=
| |
... ...
... ...
References
LimitationsAmbiguity Report
I grambiguity [Brabrand et al., 2007]*** horizontal ambiguity at E[plus]: Exp <--> ’+’ Exp
ambiguous string: "x+x+x"
I ANTLRWorks [Parr, 2007]
References
Other Limitations
I memory requirements: a solution could be aNLALR test
I dynamic disambiguation: inverse problem,some means to deciding equivalence needed
References
H. J. S. Basten. Ambiguity detection methods forcontext-free grammars. Master’s thesis, Centrumvoor Wiskunde en Informatica, Universiteit vanAmsterdam, Aug. 2007.
P. Boullier. Supertagging: A non-statisticalparsing-based approach. In IWPT’03, pages 55–65,2003. URLftp://ftp.inria.fr/INRIA/Projects/Atoll/
Pierre.Boullier/supertaggeur final.pdf.C. Brabrand, R. Giegerich, and A. Møller. Analyzing
ambiguity of context-free grammars. In J. Holuband J. Zd’arek, editors, CIAA’07, 2007. URLhttp://www.brics.dk/∼brabrand/grambiguity/.To appear in Lecture Notes in Computer Science.
D. G. Cantor. On the ambiguity problem of Backussystems. J. ACM, 9(4):477–479, 1962. ISSN0004-5411. doi: 10.1145/321138.321145.
B. S. N. Cheung and R. C. Uzgalis. Ambiguity incontext-free grammars. In SAC’95, pages 272–276.ACM Press, 1995. ISBN 0-89791-658-1. doi:10.1145/315891.315991.
N. Chomsky and M. P. Schutzenberger. The algebraictheory of context-free languages. In P. Braffort andD. Hirshberg, editors, Computer Programmingand Formal Systems, Studies in Logic, pages118–161. North-Holland Publishing, 1963.
K. Culik and R. Cohen. LR-Regular grammars—anextension of LR(k) grammars. J. Comput. Syst.Sci., 7:66–96, 1973. ISSN 0022-0000.
C. Donnely and R. Stallman. Bison version 2.3, Sept.2006. URLhttp://www.gnu.org/software/bison/manual/.
S. Gorn. Detection of generative ambiguities incontext-free mechanical languages. J. ACM, 10(2):196–208, 1963. ISSN 0004-5411. doi:10.1145/321160.321168.
S. Heilbrunner. Tests for the LR-, LL-, andLC-Regular conditions. J. Comput. Syst. Sci., 27(1):1–13, 1983. ISSN 0022-0000. doi:10.1016/0022-0000(83)90026-0.
H. B. Hunt III, T. G. Szymanski, and J. D. Ullman. Onthe complexity of LR(k) testing. Commun. ACM,18(12):707–716, 1975. ISSN 0001-0782. doi:10.1145/361227.361232.
S. Jampana. Exploring the problem of ambiguity incontext-free grammars. Master’s thesis, OklahomaState University, July 2005. URLhttp://e-archive.library.okstate.edu/
dissertations/AAI1427836/.P. Klint and E. Visser. Using filters for the
disambiguation of context-free grammars. InG. Pighizzini and P. San Pietro, editors, ASMICSWorkshop on Parsing Theory, Technical Report126-1994, pages 89–100. Universita di Milano,1994. URL http://citeseer.ist.psu.edu/klint94using.html.
D. E. Knuth. On the translation of languages fromleft to right. Information and Control, 8(6):607–639, 1965. ISSN 0019-9958. doi:10.1016/S0019-9958(65)90426-2.
B. A. Malloy, J. F. Power, and J. T. Waldron. Applyingsoftware engineering techniques to parser design:the development of a C# parser. In SAICSIT’02,pages 75–82. SAICSIT, 2002. ISBN 1-58113-596-3.URL http://www.cs.nuim.ie/∼jpower/Research/Papers/2002/saicsit02.pdf.
S. McPeak and G. C. Necula. Elkhound: A fast,practical GLR parser generator. In E. Duesterwald,editor, CC’04, volume 2985 of Lecture Notes inComputer Science, pages 73–88. Springer, 2004.ISBN 3-540-21297-3. doi: 10.1007/b95956.
R. Milner, M. Tofte, R. Harper, and D. MacQueen.The definition of Standard ML. MIT Press, revisededition, 1997. ISBN 0-262-63181-4.
T. J. Parr. The Definitive ANTLR Reference: BuildingDomain-Specific Languages. The PragmaticProgrammers, 2007. ISBN 0-9787392-5-6.
F. W. Schroer. AMBER, an ambiguity checker forcontext-free grammars. Technical report,compilertools.net, 2001. URLhttp://accent.compilertools.net/Amber.html.
L. Segoufin and V. Vianu. Validating streaming XMLdocuments. In PODS’02, pages 53–64. ACM Press,2002. ISBN 1-58113-507-6. doi:10.1145/543613.543622.
T. G. Szymanski and J. H. Williams. Noncanonicalextensions of bottom-up parsing techniques.SIAM J. Comput., 5(2):231–250, 1976. ISSN0097-5397. doi: 10.1137/0205019.
K.-C. Tai. Noncanonical SLR(1) grammars. ACMTrans. Prog. Lang. Syst., 1(2):295–320, 1979. ISSN0164-0925. doi: 10.1145/357073.357083.
M. Tomita. Efficient Parsing for Natural Language.Kluwer Academic Publishers, 1986. ISBN0-89838-202-5.
M. van den Brand, J. Scheerder, J. J. Vinju, andE. Visser. Disambiguation filters for scannerlessgeneralized LR parsers. In R. N. Horspool, editor,CC’02, volume 2304 of Lecture Notes in ComputerScience, pages 143–158. Springer, 2002. ISBN3-540-43369-4. URL http://www.springerlink.com/content/03359k0cerupftfh/.
E. Visser. Syntax Definition for LanguagePrototyping. PhD thesis, Sept. 1997. URL http://citeseer.ist.psu.edu/visser97syntax.html.