The CYK Parsing Method
description
Transcript of The CYK Parsing Method
The CYK Parsing MethodChiyo Hotani
Tanya Petrova
CL2 Parsing Course28 November, 2007
Overview
CYK Recognition with CF grammar Basic Algorithm Problems: unit-rules, є-rules Recognition with a grammar in CNF
CYK Parsing with CNF Parsing with CNF Recognition Table
Chart Parsing Summary
Advantages and Disadvantages Other remarks
Basic Algorithm of CYK Recognition (1)
Example Grammar:
A grammar describing numbers in scientific notation
Input: 32.5e+1
derivations of substrings of length 1
Basic Algorithm of CYK Recognition (2)
Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9Sign -> + | -
NumberS -> Integer | Real
Integer -> Digit | Integer Digit
Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
derivations of substrings of length 1
Unit Rule: rules of the form AB, where A and B are non-terminals. We can have chains of them in a derivation.
Basic Algorithm of CYK Recognition (3)
NumberS -> Integer | RealInteger -> Digit | Integer DigitFraction -> . IntegerScale -> e Sign Integer | Empty
Basic Algorithm of CYK Recognition (4)
NumberS -> Integer | RealReal -> Integer Fraction Scale
Number does indeed derive 32.5e+1.
Basic Algorithm of CYK Recognition (5)
є-rules
Basic Algorithm of CYK Recognition (6)
Rє = { Empty, Scale }
sentence: z = z1 z2 . . . zn
substring of z starting at positi
on i, of length l.
si,l = zizi+1. . . zi+l-1
Rsi,l: the set of non-terminals
deriving the substring si,l
A graphical presentation of substrings
Basic Algorithm of CYK Recognition (7)
CYK recognition with a grammar in CNF
Required restrictions: Eliminate є-rules and unit rulesLimit the maximum length of RHS of the
rule to 2CNF
No є-rules and unit rules all rules have one of the following two forms:
AaABC
Our example grammar in CNF
CYK Parsing with CNF
Building the recognition tableInput :
Our example grammar in CNF
input sentence: 32.5 e + 1
CYK Parsing with the CNF
bottom-row : read directly from the grammar (rules of the form A a )
Two Ways to Copmute a R s i,l:
check each right-hand side
compute possible right-hand sides from the recognition table
How this is done
Example: 2.5 e ( = s 2, 4)
1) N1 not in R s 2, 1 or R s 2, 2N1 is a member of R s 2, 3But Scale´ is not a member of R s 5, 1
2) R s 2, 4 is the set of Non- Terminals that have a right-hand side AB where either:
A in R s 2, 1 and B in R s 3, 3A in R s 2, 2 and B in R s 4, 2A in R s 2, 3 and B in R s 5, 1Possible combinations: N1 T2 or Number T2In our grammar we do not have such a right-
hand side, so nothing is added to R s 2, 4.
Recognition table
l
i
As a result we find out that:
This process is much less complicated than the one we saw before
Reasons
• We do not have to repeat the process again and again until no new Non-Terminals are added to R s i,l
(The substrings we are dealing with
are really substrings and cannot be equal to the string we start with)
• We only have to find one place where the substring must be split into two A B C
Here !
Chart Parsing
A chart is just a recognition table.
A short retrospective of CYK
First: recognition table using the original grammar.
Then: transforming grammar to CNF.
A short retrospective of CYK cont.
CNF is useful for improving the efficiency, but it is actually a bit too restrictive
Disadvantage of CNF: Resulting recognition table lacks the
information we need to construct a derivation using the original grammar!
A short retrospective of CYK cont.
In the transformation process, some non-terminals were thrown away
(non-productive)Missing information could be added.
A short retrospective of CYK cont.
Result: almost the same recognition table.Extra information on non-terminalsObtained in a simpler and much more
efficient way.
Thank you
for your attention!