Languages That Are and Are Not Context-Free
description
Transcript of Languages That Are and Are Not Context-Free
Languages That Are and Are Not Context-Free
Section 3.5
Wed, Oct 26, 2005
Regular vs. Context-Free Theorem: Every regular language is context-free. Proof:
Let L be regular. Given a DFA for L, add a stack, but do not use the stack. That is, change each DFA transition (p, a, q) to a PDA
transition ((p, a, e), (q, e)). The result is a PDA whose language is L. Therefore, L is context-free.
Closure under Union Theorem: Let L1 and L2 be CFLs. Then L1 L2 is
also a CFL. Proof:
Let L1 have grammar (V1, Σ1, R1, S1) and let L2 have
grammar (V2, Σ2, R2, S2).
Then L1 L2 has the grammar (V, Σ, R, S) where Σ = Σ 1 Σ 2
V = V1 V2
S is the new start symbol R = R1 R2 {S → S1S2}.
Proof, continued Therefore, L1 L2 is a CFL.
We must assume in the proof that
(V1 – Σ1) (V2 – Σ2) = .
Why?
Closure under Concatenation Theorem: Let L1 and L2 be CFLs. Then L1L2 is also
a CFL. Proof:
Let L1 have grammar (V1, Σ1, R1, S1) and let L2 have
grammar (V2, Σ2, R2, S2).
Then L1L2 has the grammar (V, Σ, R, S) where Σ = Σ 1 Σ 2
V = V1 V2
S is the start symbol R = R1 R2 {S → S1S2}.
Proof, continued Therefore, L1L2 is a CFL.
Again, we must assume that
(V1 – Σ1) (V2 – Σ2) = .
Closure under Kleene Star Theorem: Let L be a CFL. Then L* is also a CFL. Proof:
Let L have grammar (V, Σ, R, S). Then L* has the grammar (V, Σ, R, S) where
R = R {S → e | SS}.
Therefore, L* is a CFL.
Intersection of a Regular Language and a CFL. Theorem: The intersection of a CFL and a regular
language is a CFL. Proof (outline):
Use the cross product to construct the intersection of the PDA and the DFA.
Only one component uses the stack. Therefore, there is no complication. The cross product will function as a PDA.
Intersection of a Regular Language and a CFL. More specifically, the transitions (p, a) q from the
DFA and (p', a, ) (q', ) from the PDA may be combined into
((p, p'), a, ) ((q, q'), )
for the new PDA.
Complementation and Intersection The complement of a context-free language is not
necessarily context-free. The intersection of two context-free languages is not
necessarily context-free. Counterexamples will be given later.
The Concept behind the Pumping Lemma for CFLs The Pumping Lemma for CFLs will allow us to show
that some languages are not context-free. If a CFL contains a word w with a sufficiently long
derivation S * w, then some nonterminal A must appear more than once.
This is the Pigeonhole Principle.
The Concept behind the Pumping Lemma for CFLs That is, we have
S * uAz * uvAyz * uvxyz. Thus, A * vAy and A * x. We may repeat the derivation A * vAy as many
times as we like (including zero times), producing strings uvnxynz, for any n 0.
The Length of a Path in a Parse Tree In a parse tree T, define a path to be
empty, or a sequence of nodes, starting at a node in the tree and
ending at one of its descendants, and including all of the children along the way.
The length of a path is 0, if the path is empty, or 1 less than the number of nodes in the path.
Height and Fanout The height of a parse tree is the length of the tree’s
longest path. Given a grammar G, the fanout of G, denoted (G), is
the largest number of symbols on the right side of any rule in G.
A Lemma for the Lemma Lemma: Let G be a CFG. The yield of any parse tree
of G of height h has length no greater than (G)h. Proof:
The longest possible string is obtained if we always use a grammar rule with the maximum number of symbols on the right-hand side.
Therefore, if we apply grammar rules to each nonterminal in the string at most h times, then the length of the resulting string is at most f(G)h.
The Pumping Lemma for CFLs The Pumping Lemma for CFLs: Let G = (V, Σ, R, S)
be a context-free grammar. Then any string w L(G) with length at least n = (G)|V – | + 1 can be written as w = uvxyz for some strings u, v, x, y, z Σ* such that |v| > 0 or |y| > 0, |vxy| n, and uvkxykz L(G) for every k 0.
The Pumping Lemma for CFLs Proof:
Let n = (G)|V – | + 1. Let w L(G) with |w| n. Let T be a parse tree for w that uses the smallest number of
leaves possible (minimize the number of empty strings.) Let P be a path of maximum length in T. Since |w| > (G)|V – |, the length of P is greater than |V – |,
i.e., P is at least |V – | + 1. (Lemma) Therefore, the number of nodes on P is at least |V – | + 2.
The Pumping Lemma for CFLs
Let P' be the last part of P consisting of exactly |V – | + 2 nodes.
P' must contain exactly |V – | + 1 nonterminals. Therefore, at least one nonterminal must be repeated. Let A be the first nonterminal that is repeated as we follow
the path from the leaf back towards the root. Let T' be the subtree with root at the second-to-last
occurrence of A on the path P. If we remove T' from T, except for its root A, the result is a
parse tree for a string uAz.
The Pumping Lemma for CFLs
Let T'' be the subtree whose root node is the last occurrence of A on the path P.
T'' is a parse tree for a string x. If we remove T'' from T' except the root A, the result is a
parse tree for a string vAy. This parse tree may be attached at the leaf A in the tree T –
T' repeatedly as many times as we like (including zero times), creating parse trees for uvkAykz for any k 0.
Finally, we re-attach T'' and get a parse tree for uvkxykz.
The Pumping Lemma for CFLs
If v = e and y = e, then they could have been eliminated, producing a shorter tree.
We assumed that this was the shortest possible parse tree for w.
Therefore, v ≠ e or y ≠ e. The path from the second-to-last A to the last A and then to
the terminal has length at most |V – | + 1. Therefore, the subtree T' represents no more than (G)|V – | +
1 terminals. (Lemma) Thus, |vwy| n.
Standard Example of a Non-CFL The language {anbncn | n 0} is not context-free. Proof:
Suppose it is. Let n be the n of the Pumping Lemma. Let w = anbncn. Then w = uvxyz where |v| > 0 or |y| > 0 and |vxy| n. Then vxy contains at most two different symbols. Suppose it contains at most as and bs (but no cs). Then either v contains at least one a or y contains at least
one b.
Standard Example of a Non-CFL
Say v contains i as and y contains j bs, for some i and j, with i > 0 or j > 0.
Then uv2xy2z contains at least n + i as and at least n + j bs, at least one of which is greater than n.
But uv2xy2z contains only n cs. Thus, uv2xy2z L. This is a contradiction. Therefore, this language is not context-free. The other case, where vxy contains bs and cs, but no as, is
handled similarly.
Example of a Non-CFL The language {ambncmdnm, n 0} is not context-free. Proof:
Suppose that it is context-free. Let n be the n of the Pumping Lemma. Let w = anbncndn. Complete the proof using the Pumping Lemma.
Example of a Non-CFL The language
L = {w *#as = #bs = #cs}
is not context-free. Proof:
Suppose that it is context-free. Intersect it with L(a*b*c*), which is regular. The intersection is {anbncn | n 0}, which known to be
non-CFL. Therefore, the language L is not context-free.
Nonclosure Properties Theorem: The set of context-free languages is not
closed under intersection. Proof:
Let L1 = {anbncm | m, n 0} and let L2 = {ambncn | m, n 0}.
Clearly, L1 and L2 are context-free.
However, L1 L2 = {anbncn | n 0}, which is known to be non-context-free.
Nonclosure Properties Theorem: The set of context-free languages is not
closed under complementation. Proof:
Suppose it were closed under complementation. Let L1 and L2 be context-free languages.
Then (L1' L2')' is also context-free.
However, by DeMorgan’s Laws, this is L1 L2, which we now know is not necessarily context-free.
Example The language
L = {w * | w uu for any u *}
is context-free. The language
L′ = {w * | w = uu for some u *}
is not context-free.