CS 203: Introduction to Formal Languages and Automata

47
CS 203: Introduction to Formal Languages and Automata Chapter 4 Properties of Regular Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata, by Peter Linz, published by Jones and Bartlett Publishers, Inc., Sudbury, MA.

description

CS 203: Introduction to Formal Languages and Automata. Chapter 4 Properties of Regular Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata , by Peter Linz, published by Jones and Bartlett Publishers, Inc., Sudbury, MA. - PowerPoint PPT Presentation

Transcript of CS 203: Introduction to Formal Languages and Automata

Page 1: CS 203: Introduction to Formal Languages and Automata

CS 203: Introduction to Formal Languages and Automata

Chapter 4

Properties of Regular Languages

These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata, by Peter Linz, published by Jones and Bartlett Publishers, Inc., Sudbury, MA.

Page 2: CS 203: Introduction to Formal Languages and Automata

Properties of regular languages

• What happens when we perform operations on regular languages?– E.g., if we concatenate two regular languages, is the

resulting language also regular?

• Can we decide whether a given language has a certain property or not?– E.g., Can we tell if a certain language is finite or not?

• Can we tell whether a given language is regular or not?

Page 3: CS 203: Introduction to Formal Languages and Automata

Closure properties of regular languages

• Definition: A regular language is any language that is accepted by a finite automaton

• Theorem 4.1 : The class of regular languages is closed under the following operations (that is, performing these operations on regular languages creates other regular languages) • Union• Concatenation• Kleene star• Complement• Intersection

Page 4: CS 203: Introduction to Formal Languages and Automata

Closure for union, concatenation, and Kleene star

• If L1 and L2 are regular languages, then there exist regular expressions r1 and r2 such that L1

= L(r1) and L2 = L(r2). By definition 3.1.2 in our text: r1+r2 , r1r2, and r* are regular expressions, and:

L1 L2 = L(r1+r2)

L1L2 = L(r1r2)

L1* = L(r*)

Page 5: CS 203: Introduction to Formal Languages and Automata

Closure for union, concatenation, and Kleene star

Since languages represented by regular expressions are by definition regular, performing the operations of union, concatenation, and star-closure on regular languages produces regular languages.

We say that the class of regular languages is closed under union, concatenation, and Kleene star (star-closure).

Page 6: CS 203: Introduction to Formal Languages and Automata

So:

• The null language is regular

• The language consisting of the empty string, {λ}, is regular

• For each a in , {a} is regular

• If L1 and L2 are regular:

L1 L2 is regular

L1L2 is regular

L1* is regular

Page 7: CS 203: Introduction to Formal Languages and Automata

Unions, Intersections, and Complements: Theorem 4.1, p. 100

Suppose that

M1 = (Q1, , 1, q1, F1) accepts language L1, and

M2 = (Q2, , 2, q2, F2) accepts language L2

Let M be an FA defined by M = (Q, , , q0, F) where

Q = Q1 Q2

q0 = (q1, q2)

and the transition function is defined by:

((p, q), a) = (1(p, a), 2(q, a)),

for any p Q1, q Q2, and a

Page 8: CS 203: Introduction to Formal Languages and Automata

Unions, Intersections, and Difference: Theorem 4.1, p. 100

Then:

1. If F = {(p, q) p F1 or q F2}, M accepts the language L1 L2

2. If F = {(p, q) p F1 and q F2}, M accepts the language L1 L2

3. If F = {(p, q) p F1 and q F2}, M accepts the language L1 L2

Page 9: CS 203: Introduction to Formal Languages and Automata

Theorem 4.1, p. 100

Proof:

For any x and any (p, q) Q:

*((p, q), x) = (1*(p, x), 2*(q, x))

A string x is accepted by M iff

*((q1, q2), x) F

By our formula, this is true only if

(1*(q1, x), 2*(q2, x)) F

Page 10: CS 203: Introduction to Formal Languages and Automata

Theorem 4.1, p. 100

Proof (continued):

For Case 1, this is equivalent to saying that:

1*(q1, x) A1 or 2*(q2, x) A2

Which is equivalent to

x L1 L2

Cases 2 and 3 are similar

Page 11: CS 203: Introduction to Formal Languages and Automata

Complement

Consider the special case in which L1 is all of *. Here, L1 – L2 is actually L2’ (the complement of L2)

Page 12: CS 203: Introduction to Formal Languages and Automata

Reversal

Theorem 4.2: The family of regular languages is closed under reversal.

Proof: If L is a regular language, construct an NFA with a single final state that accepts it. Now change the initial vertex into a final vertex, the final vertex into the initial vertex, and reverse the direction on all the edges. For every string w accepted by the original NFA, the modified version of the NFA accepts wR.

Page 13: CS 203: Introduction to Formal Languages and Automata

Homomorphism

Definition 4.1: A homomorphism is a substitution in which a single letter is replaced with a string. Formally, if and are alphabets, then a function

h : *

is a homomorphism.

If L is a language on S, then its homomorphic image is:

h(L) = {h(w) : w L}

Page 14: CS 203: Introduction to Formal Languages and Automata

Homomorphism

Theorem 4.3: If L is a regular langugae, then its homomorphic image h(L) is also regular.

Thus the family of regular languages is closed under homomorphism.

Page 15: CS 203: Introduction to Formal Languages and Automata

Right quotient

To form the right quotient of L1 with L2, L1/L2, take all strings in L1 that have a suffix belonging to L2 and remove the suffix.

Example:

L1 = {ab, aab, aaab, aaaab}

L2 = {b}

L1/L2 = {a, aa, aaa, aaaa}

Page 16: CS 203: Introduction to Formal Languages and Automata

Right quotient

Theorem 4.4: If L1 and L2 are regular languages, then L1/L2 is also regular. Thus the family of regular languages is closed under right quotient with another regular language.

Proof: By construction – see textbook, pp. 106-107.

Page 17: CS 203: Introduction to Formal Languages and Automata

The membership question

Given a language L and a string w, is w L?

A method for answering the membership question is called a membership algorithm. Is there a membership algorithm for regular languages?

Page 18: CS 203: Introduction to Formal Languages and Automata

The membership question

Theorem 4.5: Given a standard representation (i.e., a finite automaton, a regular expression, or a regular grammar) of any regular language L on and w *, there exists an algorithm for determining whether w is in L.

Proof: Here is the algorithm:

1. If the standard representation of L is in the form of a regular expression, or a regular grammar, construct an equivalent FA.

2. Test w to see if it is accepted by the FA.

Page 19: CS 203: Introduction to Formal Languages and Automata

The finiteness question

Theorem 4.6: Given a standard representation (i.e., a finite automaton, a regular expression, or a regular grammar) of any regular language L on , there exists an algorithm for determining whether L is empty, finite, or infinite.

Page 20: CS 203: Introduction to Formal Languages and Automata

The finiteness question

Proof: Here is the algorithm:1. If the standard representation of L is in the

form of a regular expression, or a regular grammar, construct an equivalent FA.

2. If there is a simple path from the initial vertex to any final vertex, then the language is not empty.

3. Find all the vertices that are the base of a cycle. If any of these vertices is on a path from the initial to a final vertex, the language is infinite; otherwise, it is finite.

Page 21: CS 203: Introduction to Formal Languages and Automata

The “does L1 = L2” question

Theorem 4.7: Given standard representations of two regular languages L1 and L2, there exists an algorithm for determining whether or not L1 = L2.

Page 22: CS 203: Introduction to Formal Languages and Automata

The “does L1 = L2” question

Proof: Here is the algorithm:

1. Define a new language

2. L3 = (L1 ~L2) (~L1 L2)

3. L3 is regular (see previous closure proofs)

4. Therefore, we can find a DFA that accepts L3.

5. Use theorem 4.6 to decide if L3 is empty.

6. L3 = iff L1 = L2 (exercise 8 in section 1.1 in the Linz textbook).

7. So L1 = L2 if L3 = ; otherwise, L1 L2

Page 23: CS 203: Introduction to Formal Languages and Automata

The pigeonhole principle

• The “pigeonhole principle” states that if n + 1 items are placed into n pigeonholes, then at least 1 pigeonhole must end up with more than 1 item in it.

• In set notation:if f : A B|A| = n + 1|B| = nthen f cannot be one-to-one

Page 24: CS 203: Introduction to Formal Languages and Automata

An automaton that accepts the language L = {akbk | k 0} must count the number of a’s in each string to make sure there is an identical number of b’s. There is no limit on how high the automaton might need to count to accept a string in this language. But an automaton with finite memory can only count as high as the size of its memory.

This is an intuitive argument why this language is not regular. It is not a proof, however. To prove that a language is not regular, we use a mathematical result called the “pumping lemma for regular languages.”

Not all formal languages are regular

Page 25: CS 203: Introduction to Formal Languages and Automata

4.3: The Pumping Lemma

• The Pumping Lemma is used to prove that a language is not regular

• How do we prove that a language L is regular?– Write a regular expression for it– Draw a Finite Automaton for it– Construct a regular grammar for it

Page 26: CS 203: Introduction to Formal Languages and Automata

Pumping Lemma

Theorem 4.8: Let L be a regular language. There exists a positive integer m such that for any string w L with |w| m, w may be written as w = xyz, for some x, y, and z satisfying the following: |xy| m,

|y| 1,and xyiz L for every i 0

Page 27: CS 203: Introduction to Formal Languages and Automata

Pumping Lemma

In other words, every sufficiently long string in L can be broken down into three parts in such a way that an arbitrary number of repetitions of the middle part yields another string in L. We say that the middle string is “pumped,” hence the term pumping lemma.

Page 28: CS 203: Introduction to Formal Languages and Automata

Based on the idea of loopsGiven:

M = (Q, , ,q0,A), where |Q| = n, and

any string x where |x| n , then x must pass through a sequence of n + 1 states.

Suppose x = a1 a2 a3 ... an y. Then the sequence of n+1 states

q0= *(q0, )

q1= *(q0, a1)

q2= *(q0, a1 a2)

qn= *(q0, a1 ...an)

must contain some state at least twice, by the pigeonhole principle.

Page 29: CS 203: Introduction to Formal Languages and Automata

Example

x = a|x| = 1Sequence of states = q0 q1

n = Number of different states passed through = 2

q0 q1a

b

Page 30: CS 203: Introduction to Formal Languages and Automata

Example

q0 q1a

b

x = bba so |x| = 3Sequence of states = q0 q0 q1

n = 2Any string where |x| n must have repeated a state

Page 31: CS 203: Introduction to Formal Languages and Automata

Pumping

• If a state is repeated one or more times, it means that there must be a loop in the transition diagram.

• If there is a loop, then it can be “pumped” to produce additional strings that belong to the language

Page 32: CS 203: Introduction to Formal Languages and Automata

Example

• If ba is in the language, and there are only 2 states in the automaton, then a, bba, bbba, bbbba, etc. are also in the language.

q0 q1a

b

Page 33: CS 203: Introduction to Formal Languages and Automata

Example of a nonregular language

L = 0i1i | i 0

Is this regular? No. Why not? Intuitively: We can’t build a finite automaton

to recognize it.

Why not?

Page 34: CS 203: Introduction to Formal Languages and Automata

Example of a nonregular language

L = 0i1i | i 0

Because the FA has no memory for past events except its states. Each state can tell you how you got to that state from the immediately previous state (i.e., the last character you processed), but, if there is a loop, it can’t remember the number of characters you processed up to that point.

Page 35: CS 203: Introduction to Formal Languages and Automata

Limits of a FA

Being in state q1 and having just read a 1 doesn’t tell you anything about how many 1’s have already been processed. The FA simply doesn’t have the memory needed to retain this information.

q0 q1

0 1

Page 36: CS 203: Introduction to Formal Languages and Automata

Limits of a FA

Moreover, if you have a loop like this in an FA, the FA must accept any number of 1’s in the loop. There is no way to specify “exactly as many 1’s as 0’s” – this FA can accept 000111, but must also accept 0111, 00001, etc.

q0 q1

0 1

Page 37: CS 203: Introduction to Formal Languages and Automata

Limits of an FA

Consequently, we can’t build an FA that can tell whether the number of 0’s that it saw at the beginning of the string exactly matches the number of 1’s at the end of the string.

But this is not a formal proof.

Page 38: CS 203: Introduction to Formal Languages and Automata

Proof idea

If a DFA has n states, then any path of length n mustvisit n+1 states, and contains a cycle. (This is anapplication of the “pigeonhole principle.”)

x

y

z

This part of the string can be “pumped”to produce other strings in the language.

Page 39: CS 203: Introduction to Formal Languages and Automata

•If an infinite language is regular, it is accepted by a DFA.•The DFA has some finite number of states, say, m.•Because the language is infinite, some strings must have length > m.•For a string of length > m accepted by the DFA, a “walk” through the DFA must contain a cycle.•Repeating the cycle an arbitrary number of times must yield another string accepted by the DFA.

Proof idea again

Page 40: CS 203: Introduction to Formal Languages and Automata

Proof

Suppose that qi = qi+p , where 0 i < i + p n

x = uvwu = a1a2…ai

v = ai+1a2…ai+p

w = ai+p+1ai+p+2…anyy = part of string longer than n + 1

Remember that qi = qi+p

Page 41: CS 203: Introduction to Formal Languages and Automata

ProofAssume a dfa with states labeled q0,q1,…qn

Now take as string in L |w| m = n +1To process w the machine could go through a set of states say,

q0, qi, qj, … qf.Since this sequence has exactly |w| +1 entries, at least one state

must be repeated, and this repetition starts no later than the nth move.

So the sequence of states must look likeq0, qi, qj, …, qr, …qr, …, qf

indicating there must be substrings x, y, z of w such that *(q0, x) = qr

*(qr, y) = qr

*(qr, z) = qf

with |xy| n +1 = m and |y| 1

Page 42: CS 203: Introduction to Formal Languages and Automata

Proof (cont.)

From this it immediately follows that *(q0, xz) = qf

as well as *(q0, xy2z) = qf,

*(q0, xy3z) = qf,

and so on, completing the proof of the theorem

Page 43: CS 203: Introduction to Formal Languages and Automata

The Pumping Lemma describes a property that is possessed by every regular language. If we show that a language does not possess this property, we know that it is not regular.

The strategy is proof by contradiction. We assume a language has the property described by the pumping lemma, and then we show that this leads to a contradiction. It follows that the language is not regular.

How to use the pumping lemma

Page 44: CS 203: Introduction to Formal Languages and Automata

Example 4.7: The language L = {anbn | n 0} is not regular.

The proof is by contradiction: •If L is regular, it must be accepted by some DFA. •Let m be the number of states of the DFA and consider some w L such that |w| m.•By the pumping lemma, we can split w into three pieces, w = xyz, such that for any n 0, the string xynz is in L.•So let w = ambm. •Because |xy| m, y must consist of all a’s. •But then xy2z will contain more a’s than b’s. •This is a contradiction.

Example

Page 45: CS 203: Introduction to Formal Languages and Automata

Homework

Use the pumping lemma to show that the language ofpalindromes L = {w | w = wR, w {a,b}*} is not regular.

Page 46: CS 203: Introduction to Formal Languages and Automata

Homework

Use the pumping lemma (plus some closure properties of regular languages) to show that the language L = {w {a,b}* | w contains equal number of a’s and b’s} is not regular.

Page 47: CS 203: Introduction to Formal Languages and Automata

Homework

Use the pumping lemma to show that the language L = {ww | w {a,b}*} is not regular.