chettinadtech.ac.inchettinadtech.ac.in/storage/14-07-14/14-07-14-13-37-31... · Web viewTo learn...

1

UNIT – 2

REGULAR EXPRESSIONS AND LANGUAGES

Objectives:

The objectives of this course are as follows:

To learn about the regular languages. To learn about Pumping lemma for regular languages. To learn about Closure properties of regular languages. To learn about Decision properties of Regular languages. To learn about Equivalence and Minimization of Finite Automata.

Regular Languages:

Operations on Languages

• Let L, L1, L2 be subsets of Σ*

• Concatenation: L1L2 = {xy | x is in L1 and y is in L2}• Concatenating a language with itself: L0 = {ε}

Li = LLi-1, for all i >= 1• Kleene Closure: L* = Li = L0 U L1 U L2 U…• Positive Closure: L+ = Li = L1 U L2 U…• Question: Does L+ contain ε?

Kleene closure

Say, L1={a, abc, ba}, on Σ ={a,b,c}

Then, L2 = {aa, aabc, aba, abca, abcabc, abcba, baa, baabc, baba}

L3= {a, abc, ba}. L

2

L* = {ε, L1, L

2, L

3, . . .}

Regular Expressions

• Highlights:– A regular expression is used to specify a language, and it does so precisely.– Regular expressions are very intuitive.– Regular expressions are very useful in a variety of contexts.– Given a regular expression, an NFA-ε can be constructed from it automatically.

2

– Thus, so can an NFA, a DFA, and a corresponding program, all automatically!

Definition of a Regular Expression

• Let Σ be an alphabet. The regular expressions over Σ are:– Ø Represents the empty set { }– ε Represents the set {ε}– a Represents the set {a}, for any symbol a in Σ

Let r and s be regular expressions that represent the sets R and S, respectively.– r+s Represents the set R U S (precedence 3)– rs Represents the set RS (precedence 2)– r* Represents the set R* (highest precedence)– (r) Represents the set R (not an op, provides precedence)

• If r is a regular expression, then L(r) is used to denote the corresponding language.

• Examples: Let Σ = {0, 1}(0 + 1)* All strings of 0’s and 1’s0(0 + 1)* All strings of 0’s and 1’s, beginning with a 0(0 + 1)*1 All strings of 0’s and 1’s, ending with a 1(0 + 1)*0(0 + 1)* All strings of 0’s and 1’s containing at least one 0

(0 + 1)*0(0 + 1)*0(0 + 1)* All strings of 0’s and 1’s containing at least two 0’s(0 + 1)*01*01* All strings of 0’s and 1’s containing at least two 0’s

(1 + 01*0)* All strings of 0’s and 1’s containing an even number of 0’s

1*(01*01*)* All strings of 0’s and 1’s containing an even number of 0’s

(1*01*0)*1* All strings of 0’s and 1’s containing an even number of 0’s

Equivalence of Regular Expressions and NFA-εs

• Note:Throughout the following, keep in mind that a string is accepted by an NFA-ε if there exists a path from the start state to a final state.

• Lemma 1: Let r be a regular expression. Then there exists an NFA-ε M such that L(M) = L(r). Furthermore, M has exactly one final state with no transitions out of it.

• Proof: (by induction on the number of operators, denoted by OP(r), in r).

3

Inductive Hypothesis: Suppose there exists a k 0 such that for any regular expression r where 0 OP(r) k, there exists an NFA-ε such that L(M) = L(r). Furthermore, suppose that M has exactly one final state.Inductive Step: Let r be a regular expression with k + 1 operators (OP(r) = k + 1), where k + 1 >= 1.Case 1) r = r1 + r2Since OP(r) = k +1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive hypothesis there exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and L(M2) = L(r2). Furthermore, both M1 and M2 have exactly one final state.

Construct M as:

Case 2) r = r1r2Since OP(r) = k+1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive hypothesis there exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and L(M2) = L(r2). Furthermore, both M1 and M2 have exactly one final state.

Construct M as:

Case 3) r = r1*

Since OP(r) = k+1, it follows that 0<= OP(r1) <= k. By the inductive hypothesis there exists an NFA-ε machine M1 such that L(M1) = L(r1). Furthermore, M1 has exactly one final state.

4

Example:r = 0(0+1)*

r = r1r2r1 = 0

r2 = (0+1)*

r2 = r3*

r3 = 0+1

r3 = r4 + r5r4 = 0

r5 = 1

• Example:r = 0(0+1)*

r = r1r2r1 = 0

r2 = (0+1)*

r2 = r3*

r3 = 0+1

r3 = r4 + r5r4 = 0

r5 = 1

5

• Example:r = 0(0+1)*

r = r1r2r1 = 0

r2 = (0+1)*

r2 = r3*

r3 = 0+1

r3 = r4 + r5r4 = 0

r5 = 1

6

Example:

r = 0(0+1)*

r = r1r2r1 = 0

r2 = (0+1)*

r2 = r3*

r3 = 0+1

r3 = r4 + r5r4 = 0

r5 = 1

Definitions Required Converting a DFA to a Regular Expression

• Let M = (Q, Σ, δ, q1, F) be a DFA with state set Q = {q1, q2, …, qn}, and define:

Ri,j = { x | x is in Σ* and δ(qi,x) = qj}

Ri,j is the set of all strings that define a path in M from qi to qj.• Note that states have been numbered starting at 1, not 0!

Example:

7

R2,3 = {0, 001, 00101, 011, …}

R1,4 = {01, 00101, …}

R3,3 = {11, 100, …}

• Another definition:

Rki,j = { x | x is in Σ* and δ(qi,x) = qj, and for no u where 1 |u| < |x| and

x = uv there is no case such that δ(qi,u) = qp where p>k}

• In words: Rki,j is the set of all the strings that define a path in M from qi to qj but that

passes through no state numbered greater than k. • Note that it may be true that i>k or j>k, only the intermediate states may not be >k.

8

R42,3 = {0, 1000, 011, …} R1

2,3 = {0}

111 is not in R42,3 111 is not in R1

2,3

101 is not in R12,3

R52,3 = R2,3

• Obeservations:

1) Rni,j = Ri,j

2) Rk-1i,j is a subset of Rk

i,j

3) L(M) = Rn1,q = R1,q

4) R0i,j = Easily computed from the DFA!

5) Rki,j = Rk-1

i,k (Rk-1k,k)* Rk-1

k,j U Rk-1i,j

• Notes on 5:

5) Rki,j = Rk-1

i,k (Rk-1k,k)* Rk-1

k,j U Rk-1i,j

9

• Consider paths represented by the strings in Rki,j :

• IF x is a string in Rki,j then no state numbered > k is passed through when processing x

and either:

– qk is not passed through, i.e., x is in Rk-1i,j

– qk is passed through one or more times, i.e., x is in Rk-1i,k (Rk-1

k,k)* Rk-1k,j

• Lemma 2: Let M = (Q, Σ, δ, q1, F) be a DFA. Then there exists a regular expression r such that L(M) = L(r).

• Proof:First we will show (by induction on k) that for all i,j, and k, where 1 i,j n

and 0 k n, that there exists a regular expression r such that L(r) = Rki,j .

Basis: k=0

R0i,j contains single symbols, one for each transition from qi to qj, and possibly ε if i=j.

case 1) No transitions from qi to qj and i != j

r0i,j = Ø

case 2) At least one (m 1) transition from qi to qj and i != j

r0i,j = a1 + a2 + a3 + … + am where δ(qi, ap) = qj,

for all 1 p m

case 3) No transitions from qi to qj and i = j

10

r0i,j = ε

case 4) At least one (m 1) transition from qi to qj and i = j

r0i,j = a1 + a2 + a3 + … + am + ε where δ(qi, ap) = qjfor all 1 p m

Inductive Hypothesis:

Suppose that Rk-1i,j can be represented by the regular expression rk-1

i,j for all1 i,j n, and some k1.

Inductive Step:

Consider Rki,j = Rk-1

i,k (Rk-1k,k)* Rk-1

k,j U Rk-1i,j . By the inductive hypothesis there

exist regular expressions rk-1i,k , rk-1

k,k , rk-1k,j , and rk-1

i,j generating Rk-1i,k , Rk-

1k,k , Rk-1

k,j , and Rk-1i,j , respectively. Thus, if we let

rki,j = rk-1i,k (rk-1

k,k)* rk-1k,j + rk-1

i,j

then rki,j is a regular expression generating Rki,j ,i.e., L(rki,j) = Rk

i,j .

• Finally, if F = {qj1, qj2, …, qjr}, then

rn1,j1 + rn1,j2 + … + rn1,jris a regular expression generating L(M). �

• Note: not only does this prove that the regular expressions generate the regular languages, but it also provides an algorithm for computing it!

11

• All remaining columns are computed from the previous column using the formula.

r12,3 = r02,1 (r01,1 )* r01,3 + r02,3= 0 (ε)* 1 + 1= 01 + 1

r21,3 = r11,2 (r12,2 )* r12,3 + r11,3

= 0 (ε + 00)* (1 + 01) + 1= 0*1

12

• To complete the regular expression, we compute:

r31,2+ r31,3

13

• Theorem: Let L be a language. Then there exists an a regular expression r such that L = L(r) if and only if there exits a DFA M such that L = L(M).

• Proof:(if) Suppose there exists a DFA M such that L = L(M). Then by Lemma 2 there exists a regular expression r such that L = L(r).(only if) Suppose there exists a regular expression r such that L = L(r). Then by Lemma 1 there exists a DFA M such that L = L(M). �

• Corollary: The regular expressions define the regular languages.• Note: The conversion from a regular expression to a DFA and a program accepting L(r)

is now complete, and fully automated!

Applications of Regular Expression

1.Regular expressions in Unix

In the UNIX operating system various commands use an extended regular expressions language that provideshorthands for many common expressions. In this we can write character classes (A character class is a pattern that defines a set of characters and matches exactly one character from that set.) to represent large set of characters. There are some rules for forming this character classes:

The dot symbol (.) is to represent ‘any character’.

The regular expression a+b+c+…+z is represented by [abc…z]

Within a character class representation, - can be used to define a set of characters in terms of a range. For example, a-z defines the set of lower-case letters and A-Z defines the set of upper-case letters. The endpoints of a range may be specified in either order (i.e. both 0-9 and 9-0 define the set of digits).

If our expression involves operators such as minus then we can place it first or last to avoid confusion with the range specifier. i.e. [-.0-9]. The special characters in UNIX regular language can be represented as characters using \ symbol i.e. \ provides the usual escapes within character class brackets. Thus [[\]] matches either [ or ], because \ causes the first ] in the

character class representation to be taken as a normal character rather than the closing bracket of the representation.

Special notations

[: digit : ] same as [0-9]

[: alpha:] same as [A-Za-z]

14

[: alnum :] same as [A-Za-z0-9]

Operators

| Used in place of +

? 0 or 1 of

R? Means 0 or 1 occurrence of R

1 or more of

R+ means 1 or more occurrence of R

{n} n copies of

R {3} means RRR

^ Compliment of

If the first character after the opening bracket of a character class is ^, the set defined by the remainder of the class is complemented with respect to the computer's character set. Using this notation, the character class represented by ‘.’ can be described as [^\n]. If ^ appears as any character of a class except the first, it is not considered to be an operator. Thus [^abc] matches any character except a, b, or c but [a^bc] or [abc^] matches a, b, c or ^.

When more than one expression can match the current character sequence, a choice is made as follows:

1. The longest match is preferred. 2. Among rules, which match the same number of characters, the rule given first is

preferred.

2.Lexical analysis

Compilers – in a nutshell

Purpose: translate a program in some language (the source language) into a lower-level language (the target language).

Phases:

Lexical Analysis:

Converts a sequence of characters into words, or tokens

15

Syntax Analysis:

Converts a sequence of tokens into a parse tree

Semantic Analysis:

Manipulates parse tree to verify symbol and type information

Intermediate Code Generation:

Converts parse tree into a sequence of intermediate code instructions

Optimization:

Manipulates intermediate code to produce a more efficient program

Final Code Generation:

Translates intermediate code into final (machine/assembly) code

Overview of Lexical Analysis

Convert character sequence into tokens, skip comments & whitespace Handle lexical errors Efficiency is crucial Tokens are specified as regular expressions, e.g. IDENTIFIER=[a-zA-Z][a-zA-Z0-9]* Lexical Analyzers are implemented by regular expressions.

There is a problem that more than one token may be recognized at once. Suppose the string else matches for regular expression as well as the expression for identifiers. This problem is resolved by giving priority to first expression listed.

16

Regular Grammars

A grammar is a quadruple

G = (V, T, S, P) where

V is a finite set of variables

T is a finite set of symbols, called terminals

S is in V and is called the startsymbol

P is a finite set of productions, which are rules of the formα → β

• whereα and β are strings consisting of terminals and variables.

A grammar is said to be right-linear if every production in P is of the form

A → xB or

A → x

where A and B are variables (perhaps the same, perhaps the start symbol S) in V

and x is any string of terminal symbols (including the empty string λ)

An alternate (and better) definition of a right-linear grammar says that every production in P is of the form

A → aB or

A → a or

S → λ(to allow λ to be in the language)

where A and B are variables (perhaps the same, but B can't be S) in V

and a is any terminal symbol

An alternate (and better) definition of a right-linear grammar says that every production in P is of the form

A → aB or

A → a or

17

S → λ(to allow λ to be in the language)



A grammar is said to be left-linear if every production in P is of the form

A → Bx or

A → x

where A and B are variables (perhaps the same, perhaps the start symbol S) in V

and x is any string of terminal symbols (including the empty string λ)

The alternate definition of a left-linear grammar says that every production in P is of the form

A → Ba or

A → a or

S → λ



Any left-linear or right-linear grammar is called a regular grammar.

For brevity, we often write a set of productions such as

A → x1

A → x2

A → x3

As

A → x1 | x2 | x3

A derivation in grammar G is any sequence of strings in V and T,

connected with

18

starting with S and ending with a string containing no variables

where each subsequent string is obtained by applying a production in P is called a derivation.

S x1 x2 x3 . . . xn

abbreviated as:

S xn

S x1 x2 x3 . . . xn

abbreviated as:

S xn

We say that xn is a sentence of the language generated by G, L(G).

We say that the other x's are sentential forms.

L(G) = {w | w T* and S xn}

We call L(G) the language generated by G

L(G) is the set of all sentences over grammar G

Example 1

S →abS | a is an example of a right-linear grammar.

Can you figure out what language it generates?

L = {w {a,b}* | w contains alternating a's and b's , begins with an a, and ends with a b} {a}

L((ab)*a)

Example 2

S → AabA → Aab | aBB → a is an example of a left-linear grammar.

Can you figure out what language it generates?

19

L = {w {a,b}* | w is aa followed by at least one set of alternating ab's}

L(aaab(ab)*)

Regular Grammars and NFA's

We get a feel for this by example.

Let S → aA A → abS | b

Regular Grammars and Regular Expressions

Example: L(aab*a)

We can easily construct a regular language for this expression:

S → aA

A → aB

B → b

B → a

20

Types of grammars:

21

Key Terms:

Introduction to regular operators, regular languages, Precedence of regular operators Regular expressions, Formal definition of regular expressions, Equivalence of Regular Expressions and Finite Automata. Theorem for conversion from regular expression to epsilon FA. Application of regular expressions Algebraic Laws for Regular Expressions.

Multiple choice questions:

1. The regular expression 01*.

a) The language consisting of strings of length 2.

b) The language consisting of all strings that are a single 0 followed by any number of 1’s

c) The language consisting of all strings that is a single 0.

d) The language consisting of all strings that are a single 1 followed by any number of 0’s.

2. Union and concatenation are associative.

a)True b)False

3. The regular expression 10*.


b) The language consisting of all strings that are either a single 0 followed by any number of 1’s

c) The language consisting of all strings that is a single 0.


4. The Kleene closure is represented by

a) L b)L+ c)L* d)L1

5. In a regular expression L(E+F) is equal to

a) L(E) + L(F) b) L(E) U L(F) c) L(EUF) d) L(EnF)

6. Union of a regular expression is commutative.

a)True b)False

22

7. . In a regular expression L(E*) is equal to

a) L(E) * b) (L(E) *) c) (L(E) ) * d) all of the above

8. The regular expression operators are

a)union, intersection and concatenation

b) union, concatenation and closure

c) closure, intersection and concatenation

d) union, intersection and closure

9. Concatenation of a regular expression is commutative.

a) Trueb)False

10. The regular expression (10)*.


b) The language consisting of all strings that are either a single 0 followed by any

number of 1’s

c) The language consisting of alternating strings that begin with 1 and end with 0.


11. Every language is a regular language.

a) Trueb) False

12. The inverse homomorphism of a regular language is regular

a) Trueb) False

13. The positive closure is represented by

a) L b) L+ c) L* d)L1

14. The regular expression for the set of strings that end with ‘1’ and has no substring ‘00’ is

given by

23

a) (0+1)*0101(0+1)*

b) 11(1+0+0)*11

c) (1+01)*(10+11)*1

d) none

Closure property : New recognizers for languages that are constructed from other languages by certain operations can be built.

Decision Property: This property gives algorithms for answering important questions about automata.

Pumping Lemma for Regular Languages

• Pumping Lemma relates the size of string accepted with the number of states in a DFA

• Lemma: (the pumping lemma)Let M be a DFA with |Q| = n states. If there exists a string x in L(M), such that |x| n, then there exists a way to write it as x = uvw, where u,v, and w are all in Σ* and:

– 1 |uv| n– |v| 1– such that, the strings uviw are also in L(M), for all i 0

• Proof:Let x = a1a2 … am where m n, x is in L(M), and δ(q0, a1a2 … ap) = qjp

a1 a2 a3 … amqj0 qj1 qj2 qj3… qjm m n and qj0 is q0

Consider the first n symbols, and first n+1 states on the above path:a1 a2 a3 … an

24

qj0 qj1 qj2 qj3… qjnSince |Q| = n, it follows from the pigeon-hole principle that js = jt for some 0 s<t n, i.e., some state appears on this path twice (perhaps many states appear more than once, but at least one does).

• Let:– u = a1…as– v = as+1…at

• Since 0 s<t n and uv = a1…at it follows that:– 1 |v| and therefore 1 |uv|– |uv| n and therefore 1 |uv| n– In addition, let:– w = at+1…am– It follows that uviw = a1…as(as+1…at)

iat+1…am is in L(M), for all i 0. �

25

In other words, when processing the accepted string x, the loop was traversed once, but could have been traversed as many times as desired, and the corresponding strings would be accepted.

27

u = ε u = b u = bbv = b or v = b v = bw = bbab w = bab w = ab

(b)ibbab is in L(M), for all i 0 b(b)ibab is in L(M), for all i 0NonRegularity Example

• Theorem: The language:

L = {0k1k | k 0} (1)is not regular.• Proof: (by contradiction) Suppose that L is regular. Then there exists a

DFA M such that:L = L(M) (2)

We will show that M accepts some strings not in L, contradicting (2).

Suppose that M has n states, and consider a string x=0m1m, where m>>n.By (1), x is in L.By (2), x is also in L(M), note that the machine accepts a language not just a string

28

Since |x| = m >> n, it follows from the pumping lemma that:– x = uvw– 1 |uv| n– 1 |v|, and– uviw is in L(M), for all i 0

Since 1 |uv| n and n<<m, it follows that 1 |uv| < m.

Also, since x = 0m1m it follows that uv is a substring of 0m.

In other words v=0j, for some j 1.

Since uviw is in L(M), for all i 0, it follows that 0m+cj1m is in L(M), for all c 1.

But by (1) and (2), 0m+cj1m is not in L(M), for any c 1, a contradiction. �• Note that L basically corresponds to balanced parenthesis.


L = {0k1k2k | k 0} (1) is not regular.• Proof: (by contradiction) Suppose that L is regular. Then there exists a



Suppose that M has n states, and consider a string x=0m1m2m, where m>>n.By (1), x is in L.By (2), x is also in L(M), note that the machine accepts a language not just a string


Since 1 |uv| n and n<<m, it follows that 1 |uv| m.

Also, since x = 0m1m2m it follows that uv is a substring of 0m.

29


Since uviw is in L(M), for all i 0, it follows that 0m+cj1m2m is in L(M), for all c 1.

But by (1) and (2), 0m+cj1m2m is not in L(M), for any c 1, a contradiction. �• Note that the above proof is almost identical to the previous proof.


L = {0m1n2m+n | m,n 0} (1)is not regular.• Proof: (by contradiction) Suppose that L is regular. Then there exists a



Suppose that M has n states, and consider a string x=0m1n2m+n, where m>>n.By (1), x is in L.By (2), x is also in L(M).


Since 1 |uv| n and n<<m, it follows that 1 |uv| m.

Also, since x = 0m1n2m+n it follows that uv is a substring of 0m.


Since uviw is in L(M), for all i 0, it follows that 0m+cj1m2m+n is in L(M), for all c 1. In other words v can be “pumped” as many times as we like, and we still get a string in L(M).

But by (1) and (2), 0m+cj1n2m+n is not in L(M), for any c 1, because the acceptable expression should be 0m+cj1n2m+cj+n, a contradiction. �• Note that the above proof is almost identical to the previous proof.

30

• Theorem: Let M = (Q, Σ, δ, q0, F) be a DFA. Then L(M) is finite iff |x| < |Q| for all x in L(M).

• Proof:(if) Suppose that |x| < |Q| for all x in L(M). Since the number of states |Q| and the number of input symbols |Σ| are both fixed, it follows that there are at most a finite number of strings of length less than |Q|. It follows that L(M) is finite (exercise: give an upper bound on the number of such strings).

(only if) By contradiction. Suppose that L(M) is finite, but that |x| |Q| for

some x in L(M). From the pumping lemma it follows that x=uvw, |v| 1 and uviw is in L(M) for all i 0. But then L(M) would be infinite, a contradiction.

• Theorem: Let M = (Q, Σ, δ, q0, F) be a DFA. Then L(M) is infinite iff there exists an x in L(M) such that |x| |Q|.

• Proof:(if) Suppose there exists an x in L(M) such that |x| |Q|. From the pumping lemma it follows that x=uvw, |v| 1 and uviw is in L(M) for all i 0. Therefore L(M) is infinite. (only if) By contradiction. Suppose that L(M) is infinite, but that there is no x in L(M) such that |x| |Q|. It follows that each x in L(M) has length less than |Q|. Since the number of states |Q| and the number of input symbols |Σ| are both fixed, it follows that there are at most a finite number of strings of length less than |Q|. It follows that L(M) is finite. A contradiction. �

• Note that the above also follows directly from the previous theorem.

• Theorem: Let M = (Q, Σ, δ, q0, F) be a DFA. Then L(M) is non-empty iff there exists an x in L(M) such that |x| < |Q|.

• Proof:

31

(if) Suppose there exists a string x in L(M) such that |x| < |Q|. Then clearly L(M) is non-empty.(only if) By contradiction. Suppose that L(M) is non-empty, but that there exists no string x in L(M) such that |x| < |Q|. It follows that |y| n, where n = |Q|, for all y in L(M). Let z be a string of shortest length in L(M). Then |z| n and, by the pumping lemma z=uvw, |v| 1 and uviw is in L(M) for all i 0. But then uv0w = uw is in L(M) and:

|uw| = |z| - |v| |z| - 1< |z|

Since uw is in L(M), it follows that z is not a string of shortest length in L(M), a contradiction.

• Corollary: Let M = (Q, Σ, δ, q0, F) be a DFA. Then there is an algorithm to determine if L(M) is empty.

• Proof:From the theorem it follows that if L(M) is non-empty then there exists a string x where |x| < n and n = |Q| such that M accepts x. We can try running M on each string of length < n to see if any are accepted. If one is accepted then L(M) is non-empty, otherwise L(M) is empty. Since the number of states |Q| and the number of input symbols |Σ| are both fixed, it follows that there are at most a finite number of strings (of length less than |Q|) that need to be tested. �

• Note that Dijkstra’s algorithm works here too…

• Theorem: Let M = (Q, Σ, δ, q0, F) be a DFA and let n = |Q|. Then L(M) is infinite iff there exists an x in L(M) such that n |x| < 2n.

• Proof: (left as an exercise; similar to the previous theorem).• (Contradiction) Suppose there is no string of length between n and 2n in the

language accepted by DFA M with n states.Suppose a string z is the shortest string accepted by the machine M, such that |z| 2n. Since |z| > n, by pumping lemma z = uvw, and 1 |v| n, because |uv| n. Then z’=uw must be accepted by the machine M. So, |z’| = |z| - |v| cannot be < n. As per assumption, |z’| 2n is not possible as z is the shortest

32

such string and |z|>|z’|. And we presumed that there is no string of size between n and 2n, so, n |z’| 2n cannot be true either. Contradiction.

• Corollary: Let M = (Q, Σ, δ, q0, F) be a DFA. Then there is an algorithm to determine if L(M) is infinite.

• Proof: (left as an exercise).

Closure Properties of Regular Languages

Union (of two regular language is regular)

Intersection

Complement

Difference

Reversal

Star closure

Concatenation

Homomorphism

Inverse homomorphism

Closure under Union

Let L and M are regular

Then these languages have regular expressions R and S i.e. L=L(R) and M=L(S)

LUM = L(R) U L(S) = L(R+S)

Since R and S are the regular expressions then R+S is also regular

This implies LUM is also regular (Union is Closed)

Closure under complementation

33

If L is regular then to prove that its complement Ľ (L bar), which is defined as S* - L, is also regular.

If there exists any DFA that accepts Ľ, then we say that the complement of L is also regular.

For this we learn to construct such DFA

Constructing a DFA to accept L compleme

Since L is regular, there exists a DFA that accepts it.

Let L=L(R ).

Convert the regular expression R to an є-NFA.

Constructing a DFA to accept L complement

Complement the accepting states of that DFA i.e. now the accepting states of this new DFA are the states other than the accepting states of previous one (i.e. Q-F)

Turn the complement DFA into a regular expression

34

Proving that the complement of a regular language is also regular

Let L=L(A) for some DFA A

Where A= (Q, S,d, q0,F)

Then Ľ=L(B) where B=(Q, S,d, q0,Q-F)

Then A and B are same other than the fact that their accepting states are different

Then any string w is in L(B) if and only if d*(q0,w) is in Q-F which occurs only if w is not in L(A). Hence Ľ is regular.

Closure under Intersection

Closure under Intersection

If L is regular, then complement of L is also regular.

If M is regular, then complement of M is also regular.

35

Also union of two regular languages is proved to be regular

Hence intersection of two regular languages is also regular.

Special DFA construct for intersection of two regular languages

The alphabet S is assumed to be the same.

The transition function d = dLXdM such that

d((p,q),a)= an intermediate state of A which is a pair of states obtained by transition of state p (in AL) and of state q (in AM)

d((p,q),a)=(dL(p,a) ,dM (q,a))

Closure under Difference

L-M = set of all strings that are in L and not in M

i.e. L-M = L ∩ M

Since Complement of M is regular as M is regular, Intersection of L and M is also regular

Hence the difference of two regular languages is regular

Homomorphism

A string homomorphism is a function on strings that works by substituting a particular string for each symbol.

Example : Let S= {0,1}; w=0011 then function h(0)=ab h(1)=є

Then h(w)= h(0)h(0)h(1)h(1)=ababєє

h(w)=abab

Homomorphism to a language

Homomorphism to a language is applied by applying it to each of the strings in the language

h(L) = {h(w) | w is in L}

36

Let L= 10*1

i.e. L={11,101,1001,…}

Then h(L) = {єє, єabє, єababє,…} or (ab)*

If L is regular then h(L) is also regular

Let L=L( R) for regular expression R

Let E be an expression over S

Then let h(E) be an expression obtained by replacing each symbol a of S in E by h(a)

Then h(R) defines the language h(L).

Decision Properties of Regular language

A decision property for a class of languages is an algorithm that takes a formal description of a language (e.g., a DFA) and tells whether or not some property holds.

Example: Is language L empty?

You might imagine that the language is described informally, so if my description is “the empty language” then yes, otherwise no.

But the representation is a DFA (or a RE that you will convert to a DFA).

Can you tell if L(A) = Æ for DFA A?

We might want a “smallest” representation for a language, e.g., a minimum-state DFA or a shortest RE.

If you can’t decide “Are these two languages the same?”

o I.e., do two DFA’s define the same language?

You can’t find a “smallest.”

37

Equivalence and Minimization of Automata

Testing whether two descriptors for regular languages are equivalent, in the sense that they define the same languages.

This gives a method to minimize a DFA

Definition: Equivalent states

Two states p and q are said to be equivalent if for all input strings w, d (p,w) is an accepting state if and only if d (q,w) is an accepting state

If two states are not equivalent then we say they are distinguishable (i.e. if they are not equivalent for at least one w)

39

Key Terms:

• Pumping Lemma relates the size of string accepted with the number of states in a DFA.

• Closure property : New recognizers for languages that are constructed from other languages by certain operations can be built.

• Decision Property: This property gives algorithms for answering important questions about automata.

• Equivalence and Minimization of Automata - Testing whether two descriptors for regular languages are equivalent, in the sense that they define the same languages. This gives a method to minimize a DFA.

Summary:

The properties of regular languages are being discussed.

The first property called the pumping lemma for regular languages is used to

prove a regular language.

The closure properties follow the decision properties helps in the

equivalence of automata.

The minimization properties helps in building smaller machines.

chettinadtech.ac.inchettinadtech.ac.in/storage/14-07-14/14-07-14-13-37-31... · Web viewTo learn...

Documents

Transcript of chettinadtech.ac.inchettinadtech.ac.in/storage/14-07-14/14-07-14-13-37-31... · Web viewTo learn...