Languages That Are and Are Not Context-Free

Languages That Are and Are Not Context-Free

Section 3.5

Wed, Oct 26, 2005

Regular vs. Context-Free Theorem: Every regular language is context-free. Proof:

Let L be regular. Given a DFA for L, add a stack, but do not use the stack. That is, change each DFA transition (p, a, q) to a PDA

transition ((p, a, e), (q, e)). The result is a PDA whose language is L. Therefore, L is context-free.

Closure under Union Theorem: Let L1 and L2 be CFLs. Then L1 L2 is

also a CFL. Proof:

Let L1 have grammar (V1, Σ1, R1, S1) and let L2 have

grammar (V2, Σ2, R2, S2).

Then L1 L2 has the grammar (V, Σ, R, S) where Σ = Σ 1 Σ 2

V = V1 V2

S is the new start symbol R = R1 R2 {S → S1S2}.

Proof, continued Therefore, L1 L2 is a CFL.

We must assume in the proof that

(V1 – Σ1) (V2 – Σ2) = .

Why?

Closure under Concatenation Theorem: Let L1 and L2 be CFLs. Then L1L2 is also

a CFL. Proof:

Let L1 have grammar (V1, Σ1, R1, S1) and let L2 have

grammar (V2, Σ2, R2, S2).

Then L1L2 has the grammar (V, Σ, R, S) where Σ = Σ 1 Σ 2

V = V1 V2

S is the start symbol R = R1 R2 {S → S1S2}.

Proof, continued Therefore, L1L2 is a CFL.

Again, we must assume that

(V1 – Σ1) (V2 – Σ2) = .

Closure under Kleene Star Theorem: Let L be a CFL. Then L* is also a CFL. Proof:

Let L have grammar (V, Σ, R, S). Then L* has the grammar (V, Σ, R, S) where

R = R {S → e | SS}.

Therefore, L* is a CFL.

Intersection of a Regular Language and a CFL. Theorem: The intersection of a CFL and a regular

language is a CFL. Proof (outline):

Use the cross product to construct the intersection of the PDA and the DFA.

Only one component uses the stack. Therefore, there is no complication. The cross product will function as a PDA.

Intersection of a Regular Language and a CFL. More specifically, the transitions (p, a) q from the

DFA and (p', a, ) (q', ) from the PDA may be combined into

((p, p'), a, ) ((q, q'), )

for the new PDA.

Complementation and Intersection The complement of a context-free language is not

necessarily context-free. The intersection of two context-free languages is not

necessarily context-free. Counterexamples will be given later.

The Concept behind the Pumping Lemma for CFLs The Pumping Lemma for CFLs will allow us to show

that some languages are not context-free. If a CFL contains a word w with a sufficiently long

derivation S * w, then some nonterminal A must appear more than once.

This is the Pigeonhole Principle.

The Concept behind the Pumping Lemma for CFLs That is, we have

S * uAz * uvAyz * uvxyz. Thus, A * vAy and A * x. We may repeat the derivation A * vAy as many

times as we like (including zero times), producing strings uvnxynz, for any n 0.

The Length of a Path in a Parse Tree In a parse tree T, define a path to be

empty, or a sequence of nodes, starting at a node in the tree and

ending at one of its descendants, and including all of the children along the way.

The length of a path is 0, if the path is empty, or 1 less than the number of nodes in the path.

Height and Fanout The height of a parse tree is the length of the tree’s

longest path. Given a grammar G, the fanout of G, denoted (G), is

the largest number of symbols on the right side of any rule in G.

A Lemma for the Lemma Lemma: Let G be a CFG. The yield of any parse tree

of G of height h has length no greater than (G)h. Proof:

The longest possible string is obtained if we always use a grammar rule with the maximum number of symbols on the right-hand side.

Therefore, if we apply grammar rules to each nonterminal in the string at most h times, then the length of the resulting string is at most f(G)h.

The Pumping Lemma for CFLs The Pumping Lemma for CFLs: Let G = (V, Σ, R, S)

be a context-free grammar. Then any string w L(G) with length at least n = (G)|V – | + 1 can be written as w = uvxyz for some strings u, v, x, y, z Σ* such that |v| > 0 or |y| > 0, |vxy| n, and uvkxykz L(G) for every k 0.

The Pumping Lemma for CFLs Proof:

Let n = (G)|V – | + 1. Let w L(G) with |w| n. Let T be a parse tree for w that uses the smallest number of

leaves possible (minimize the number of empty strings.) Let P be a path of maximum length in T. Since |w| > (G)|V – |, the length of P is greater than |V – |,

i.e., P is at least |V – | + 1. (Lemma) Therefore, the number of nodes on P is at least |V – | + 2.

The Pumping Lemma for CFLs

Let P' be the last part of P consisting of exactly |V – | + 2 nodes.

P' must contain exactly |V – | + 1 nonterminals. Therefore, at least one nonterminal must be repeated. Let A be the first nonterminal that is repeated as we follow

the path from the leaf back towards the root. Let T' be the subtree with root at the second-to-last

occurrence of A on the path P. If we remove T' from T, except for its root A, the result is a

parse tree for a string uAz.


Let T'' be the subtree whose root node is the last occurrence of A on the path P.

T'' is a parse tree for a string x. If we remove T'' from T' except the root A, the result is a

parse tree for a string vAy. This parse tree may be attached at the leaf A in the tree T –

T' repeatedly as many times as we like (including zero times), creating parse trees for uvkAykz for any k 0.

Finally, we re-attach T'' and get a parse tree for uvkxykz.


If v = e and y = e, then they could have been eliminated, producing a shorter tree.

We assumed that this was the shortest possible parse tree for w.

Therefore, v ≠ e or y ≠ e. The path from the second-to-last A to the last A and then to

the terminal has length at most |V – | + 1. Therefore, the subtree T' represents no more than (G)|V – | +

1 terminals. (Lemma) Thus, |vwy| n.

Standard Example of a Non-CFL The language {anbncn | n 0} is not context-free. Proof:

Suppose it is. Let n be the n of the Pumping Lemma. Let w = anbncn. Then w = uvxyz where |v| > 0 or |y| > 0 and |vxy| n. Then vxy contains at most two different symbols. Suppose it contains at most as and bs (but no cs). Then either v contains at least one a or y contains at least

one b.

Standard Example of a Non-CFL

Say v contains i as and y contains j bs, for some i and j, with i > 0 or j > 0.

Then uv2xy2z contains at least n + i as and at least n + j bs, at least one of which is greater than n.

But uv2xy2z contains only n cs. Thus, uv2xy2z L. This is a contradiction. Therefore, this language is not context-free. The other case, where vxy contains bs and cs, but no as, is

handled similarly.

Example of a Non-CFL The language {ambncmdnm, n 0} is not context-free. Proof:

Suppose that it is context-free. Let n be the n of the Pumping Lemma. Let w = anbncndn. Complete the proof using the Pumping Lemma.

Example of a Non-CFL The language

L = {w *#as = #bs = #cs}

is not context-free. Proof:

Suppose that it is context-free. Intersect it with L(a*b*c*), which is regular. The intersection is {anbncn | n 0}, which known to be

non-CFL. Therefore, the language L is not context-free.

Nonclosure Properties Theorem: The set of context-free languages is not

closed under intersection. Proof:

Let L1 = {anbncm | m, n 0} and let L2 = {ambncn | m, n 0}.

Clearly, L1 and L2 are context-free.

However, L1 L2 = {anbncn | n 0}, which is known to be non-context-free.

Nonclosure Properties Theorem: The set of context-free languages is not

closed under complementation. Proof:

Suppose it were closed under complementation. Let L1 and L2 be context-free languages.

Then (L1' L2')' is also context-free.

However, by DeMorgan’s Laws, this is L1 L2, which we now know is not necessarily context-free.

Example The language

L = {w * | w uu for any u *}

is context-free. The language

L′ = {w * | w = uu for some u *}

is not context-free.

Languages That Are and Are Not Context-Free

Documents

Transcript of Languages That Are and Are Not Context-Free