UNIT II: Regular Expressions
Transcript of UNIT II: Regular Expressions
THEORY OF COMPUTATION
1
Jagadish Kashinath Kamble (M.Tech, IIT Kharagpur)
Assistant Professor,Dept. of Information Technology, Pune Institute of Computer Technology, Pune.
Regular ExpressionsUNIT II:
Regular Language Acceptor Finite
Automata
GeneratorRegular
Grammer
Right Linear Grammer
Left Linear Grammer
FA with Output
FA without Output
Representator
Regular
Expression
Regular Expressions
Regular expressions
describe regular languages
Example:
describes the language
*)( cba +
,...,,,,,*, bcaabcaabcabca =
Recursive Definition
,,
( )1
1
21
21
*
r
r
rr
rr
+
Are regular expressions
Primitive regular expressions:
2r1r
Given regular expressions
and
1) + : Union 2) . : Concatenation 3) * : Kleene Closure
{ }
{ }
{ a }
Languages of Regular Expressions
: language of regular expression
Example
( )rL r
( ) ,...,,,,,*)( bcaabcaabcacbaL =+
Definition (continued)
For regular expressions and1r 2r
( ) ( ) ( )2121 rLrLrrL =+
( ) ( ) ( )2121 rLrLrrL =
( ) ( )( )** 11 rLrL =
( )( ) ( )11 rLrL =
Example
Regular expression: ( ) *aba +
( )( )*abaL + ( )( ) ( )*aLbaL +=
( ) ( )*aLbaL +=
( ) ( )( ) ( )( )*aLbLaL =
( ) ( )*aba =
,...,,,, aaaaaaba =
,...,,,...,,, baababaaaaaa=
Equivalent Regular Expressions
Definition:
Regular expressions and
are equivalent if
1r 2r
)()( 21 rLrL =
• Regular Language
• ∅
• {ε}
• {a,b}*
• {aab}*{a,ab}
• {aa,bb}U {ab,ba}{aa,bb}*
• Regular Expression
• ∅
• {}
• (a+b)*
• (aab)*(a+ab)
• (aa+bb+(ab+ba)(aa+bb)*)
Find RE for the Language of strings of Length 2 over the alphabet {a,b}
Find RE for the Language of strings of Length at least 2 over the alphabet {a,b}
Find RE for the Language of strings of Length at most 2 over the alphabet {a,b}
Find RE for the Language of strings of even Length over the alphabet {a,b}
Find RE for the Language of strings of odd Length over the alphabet {a,b}
Find RE for the Language of strings which is divisible by 3 over the alphabet {a,b}
Find RE for the Language of strings = 2 mod 3 over the alphabet {a,b}
~
Find RE for the Language of strings of a’s exactly 2 over the alphabet {a,b}
• Find RE for the Language of strings of a’s atleast 2 over the alphabet {a,b}
• Find RE for the Language of strings of a’s atmost 2 over the alphabet {a,b}
• Find RE for the Language of strings of even length a’s over the alphabet {a,b}
• Find RE for the Language of strings which starts with a over the alphabet {a,b}
• Find RE for the Language of strings which ends with a over the alphabet {a,b}
• Find RE for the Language of strings which contains a over the alphabet {a,b}
• Find RE for the Language of strings which starts and ends with different symbol over the alphabet {a,b}
• Find RE for the Language of strings which starts and ends with Same symbol over the alphabet {a,b}
• Give RE for Following over the alphabet {0,1} (Aug-2015 6 Marks) – All binary strings with at least one 0
– All binary strings with at Most one 0
• Language of all strings containing substring 00
• The Language of Strings in {a, b}∗ Ending with b and Not Containing aa
• Find RE for the Language of strings which does not contains two a’s together over the alphabet {a,b}
Identities Related to Regular Expressions
• Given R, P, L, Q as regular expressions, the followingidentities hold −
• ∅* = ε
• ε* = ε
• RR* = R*R
• R*R* = R*
• (R*)* = R*
• RR* = R*R
• (PQ)*P =P(QP)*
• (a+b)* = (a*b*)* = (a*+b*)* = (a+b*)* = a*(ba*)*
• R + ∅ = ∅ + R = R (The identity for union)
• R ε = ε R = R (The identity for concatenation)
• ∅ L = L ∅ = ∅ (The annihilator for concatenation)
• R + R = R (Idempotent law)
• L (M + N) = LM + LN (Left distributive law)
• (M + N) L = ML + NL (Right distributive law)
• ε + RR* = ε + R*R = R*
Languages
Generated by
Regular Expressions
Regular
Languages
Languages
Generated by
Regular Expressions
Regular
Languages
Proof:
Proof - Part 1
r
)(rL
For any regular expression
the language is regular
Languages
Generated by
Regular Expressions
Regular
Languages
Proof by induction on the size of r
Induction Basis
Primitive Regular Expressions: ,,
Corresponding
NFAs
)()( 1 == LML
)(}{)( 2 LML ==
)(}{)( 3 aLaML ==
regular
languages
a
Inductive Hypothesis
Suppose
that for regular expressions and ,
and are regular languages1r 2r
)( 1rL )( 2rL
Using the regular closure of these operations,
we can construct recursively the NFA
that accepts
M)()( rLML =
Example:21 rrr +=
)()( 11 rLML =
)()( 22 rLML =
)()( rLML =
Regular Expression r1
1M
Single accepting state
NFA 2M
Single accepting state
Regular Expression r2
NFA
• Construct an NFA with null-moves, which accepts the language defined by:
• ((0 + 1)* 10 + (00)* (11)*)*
• 01[((10)+ + 111)* + 0]* 1
• (a / b)* ab.
Problems
Problems
• Construct DFA for the R.E l0 + (0 + ll)
• Construct DFA for regular expression (0 + l)*, (00 + ll)
Review
• Regular Expression
• Construction of RE from Language
• RE and DFA
• Conversion of RE to DFA
For any regular language there is
a regular expression with
Proof - Part 2
Languages
Generated by
Regular Expressions
Regular
Languages
L
r LrL =)(
We will convert an NFA that accepts
to a regular expression
L
Arden’s Theorem
In order to find out a regular expression of aFinite Automaton, we use Arden’s Theoremalong with the properties of regularexpressions.
Statement −
Let P, Q and R be the regular expressions.
If P does not contain null string, then
R = Q + RP has a unique solution that is
R = QP*
Proof −
R = Q + (Q + RP)P [After putting the value R = Q + RP]
= Q + QP + RPP
When we put the value of R recursively again and again, we get the following equation −
R = Q + QP + QP2 + QP3…..
R = Q (ε + P + P2 + P3 + …. )
R = QP* [As P* represents (ε + P + P2 + P3 + ….) ]
Hence, proved.
MethodStep 1 − Create equations as the following form for all the
states of the DFA having n states with initial state q1 (on incoming edges only and add ε to the initial state ).
q1 = q1R11 + q2R21 + … + qnRn1 + ε
q2 = q1R12 + q2R22 + … + qnRn2
…………………………
qn = q1R1n + q2R2n + … + qnRnn
Rij represents the set of labels of edges from qi to qj, if no such edge exists, then Rij = ∅
Step 2 − Solve these equations to get the equation for the final state in terms of Rij
Solution −
Here the initial state is q2 and the final state is q1.
The equations for the three states q1, q2, and q3 are as follows −
q1 = q1a + q3a + ε (ε move is because q1 is the initial state0
q2 = q1b + q2b + q3b
q3 = q2a
Now, we will solve these three equations −
q2 = q1b + q2b + q3b
= q1b + q2b + (q2a)b (Substituting value of q3)
= q1b + q2(b + ab)
= q1b (b + ab)* (Applying Arden’s Theorem)
q1 = q1a + q3a + ε
= q1a + q2aa + ε (Substituting value of q3)
= q1a + q1b(b + ab*)aa + ε (Substituting value of q2)
= q1(a + b(b + ab)*aa) + ε
= ε (a+ b(b + ab)*aa)*
= (a + b(b + ab)*aa)*
Hence, the regular expression is
(a + b(b + ab)*aa)*.
Solution −
Here the initial state is q1 and the final state is q2
Now we write down the equations −
q1 = q10 + ε
q2 = q11 + q20
q3 = q21 + q30 + q31
Now, we will solve these three equations −
q1 = ε0* [As, εR = R]
So, q1 = 0*
q2 = 0*1 + q20
So, q2 = 0*1(0)* [By Arden’s theorem]
Hence, the regular expression is
0*10*.
Review
• Regular Expression
• Construction of RE from Language
• RE and DFA
• Conversion of RE to DFA
• Conversion of DFA to RE
Construct Regular Expression for the following transitiondiagram using Arden’s theorem.(Nov 2014 4 Marks)
• From construct the equivalent
• Generalized Transition Graph
• in which transition labels are regular expressions
M
Example:
a
ba,
c
Ma
ba +
c
Corresponding
Generalized transition graph
State Elimination Method
q1 = q1a + q3a + ε
= q1a + q2aa + ε (Substituting value of q3)
= q1a + q1b(b + ab*)aa + ε (Substituting value of q2)
= q1(a + b(b + ab)*aa) + ε
= ε (a+ b(b + ab)*aa)*
= (a + b(b + ab)*aa)*
Hence, the regular expression is
(a + b(b + ab)*aa)*.
• Another Example:
ba +a
b
b
0q 1q 2q
ba,a
b
b
0q 1q 2q
b
bTransition labels
are regular
expressions
0q fq
1r
2r
3r4r
*)*(* 213421 rrrrrrr +=
LMLrL == )()(
The resulting regular expression:
By repeating the process until
two states are left, the resulting graph is
Initial graph Resulting graph
End of Proof-Part 2
When we say: We are given
a Regular Language
We mean:
L
Language is in a standard
representation
L
(DFA, NFA, or Regular Expression)
How can we prove that a language
is not regular?
L
Prove that there is no DFA or NFA or RE
that accepts L
Difficulty: this is not easy to prove
(since there is an infinite number of them)
Solution: use the Pumping Lemma !!!
The Pigeonhole Principle
...........
pigeons
pigeonholes
n
m
mn There is a pigeonhole
with at least 2 pigeons
Consider the walk of a “long’’ string:
1q 2q 3qa
b
4q
b
b
b
a a
a
aaaab
1q 2q 3q 2q 3q 4qa a a a b
A state is repeated in the walk of
(length at least 4)
aaaab
aaaab
1q 2q 3q 2q 3q 4qa a a a b
1q 2q 3q 4q
Pigeons:
Nests:(Automaton states)
Are more than
Walk of
The state is repeated as a result of
the pigeonhole principle
(walk states)
Repeated
state
Consider the walk of a “long’’ string:
1q 2q 3qa
b
4q
b
b
b
a a
a
aabb
1q 2q 3q 4q 4qa a b b
A state is repeated in the walk of
(length at least 4)
Due to the pigeonhole principle:
aabb
aabb
1q 2q 3q 4q
Automaton States
Pigeons:
Nests:(Automaton states)
Are more than
Walk of
The state is repeated as a result of
the pigeonhole principle
(walk states)1q 2q 3q 4q 4qa a b b
Repeated
state
iq...... ......
Repeated state
kw 21=
1 2 k
Walk of
iq.... iq.... ....1 2 ki j1+i 1+j
Arbitrary DFA
DFA of states#|| wIf ,
by the pigeonhole principle,
a state is repeated in the walk
In General:
1qzq
zq1q
w
mDFA of states#|| =w
1q 2q 1−mq mq
Walk of wPigeons:
Nests:(Automaton states)
Are
more
than
(walk states)
iq....iq.... ....
1q
iq.... ....
zq
A state is
repeated
Take an infinite regular language L
There exists a DFA that accepts L
mstates
(contains an infinite number of strings)
mw ||(number of
states of DFA)
then, at least one state is repeated
in the walk of w
q...... ......1 2 k
Take string with Lw
kw 21=
Walk in DFA of
Repeated state in DFA
Take to be the first state repeatedq
q....
w
There could be many states repeated
q.... ....
Second
occurrence
First
occurrence
Unique states
One dimensional projection of walk :
1 2 ki j1+i 1+j
Regular Languages: Grand Unification
)(
)()(
DFAsL
NFAsLsNFAL
=
=−
)()( RELFAL =
(Parallel Simulation)
(Rabin and Scott’s work)
(Collapsing graphs;
Structural Induction)
(S. Kleene’s work)
)()( RGLFAL = (Construction)
(Solving linear
equations) study later)()( RELRGL =
Pumping Lemma for Regular Languages
• It is a necessary condition.– Every regular language satisfies it.
– If a language violates it, it is not regular.• RL => PL not PL => not RL
• It is not a sufficient condition.– Not every non-regular language violates it.
• not RL =>? PL or not PL (no conclusion)
q.... q.... ....
Second
occurrence
First
occurrence
1 2 ki j1+i 1+j
wOne dimensional projection of walk :
ix 1= jiy 1+= kjz 1+=
xyzw =We can write
Observation: myx ||length number
of states
of DFA
Since, in no
state is repeated
(except q)
xy
Unique States
q...
x
y
...
1 i
1+ij
We do not care about the form of string z
q...
x
y
z
...
z may actually overlap with the paths of and x y
The string
is accepted
zyyx
q... ... ...
x z
Follow loop
2 times
Additional string:
y
...
1 ki
1+ij
1+j
The string
is accepted
zyyyx
q... ... ...
x z
Follow loop
3 times
Additional string:
y
...
1 ki
1+ij
1+j
The string
is accepted
zyx iIn General:
...,2,1,0=i
q... ... ...
x z
Follow loop
timesiy
...
1 ki
1+ij
1+j
The Pumping Lemma:
• Given a infinite regular language L
• there exists an integer m
• for any string with length Lw mw ||
• we can write zyxw =
• with andmyx || 1|| y
• such that: Lzyx i ...,2,1,0=i
(critical length)
)0 :(
0) || ( ) || (
)( :,,
|| :
Lzxyii
ymxy
wxyzzyx
mwLw
i
=
For all sufficiently long strings (w)
There exists non-null prefix (xy)
and substring (y)
For all repetitions of the substring (y),
we get strings in the language.
Observation:
Every language of finite size has to be regular
Therefore, every non-regular language
has to be of infinite size (contains an infinite number of strings)
(we can easily construct an NFA
that accepts every string in the language)
Suppose you want to prove that
An infinite language is not regular
1. Assume the opposite: is regular
2. The pumping lemma should hold for
3. Use the pumping lemma to obtain a
contradiction
L
L
L
4. Therefore, is not regular L
Explanation of Step 3: How to get a contradiction
2. Choose a particular string which satisfies
the length conditionLw
3. Write xyzw =
4. Show that Lzxyw i = for some 1i
5. This gives a contradiction, since from
pumping lemma Lzxyw i =
mw ||
1. Let be the critical length form L
Note: It suffices to show that
only one string
gives a contradiction
Lw
You don’t need to obtain
contradiction for every Lw
Theorem: The language }0:{ = nbaL nn
is not regular
Proof: Use the Pumping Lemma
Example of Pumping Lemma application
Assume for contradiction
that is a regular languageL
Since is infinite
we can apply the Pumping Lemma
L
}0:{ = nbaL nn
Let be the critical length for
Pick a string such that: w Lw
mw ||and length
mmbaw =We pick
m
}0:{ = nbaL nn
L
with lengths
From the Pumping Lemma:
1||,|| ymyx
babaaaaabaxyz mm ............==
mkay k = 1,
x y z
m m
we can write zyxbaw mm ==
Thus:
=w
From the Pumping Lemma:
Lbabaaaaaaazxy = ...............2
x y z
km+ m
Thus:
Lzyx 2
mmbazyx =
y
Lba mkm +
mkay k = 1,
Our assumption that
is a regular language is not true
L
Conclusion: L is not a regular language
Therefore:
END OF PROOF
The Pumping Lemma:
• Given a infinite regular language L
• there exists an integer (critical length)m
• for any string with length Lw mw ||
• we can write zyxw =
• with andmyx || 1|| y
• such that: Lzyx i ...,2,1,0=i
Assume for contradiction
that is a regular languageL
Since is infinite
we can apply the Pumping Lemma
L
*}:{ = vvvL R
mmmm abbaw =We pick
Let be the critical length for
Pick a string such that: w Lw
mw ||length
m
and
*}:{ = vvvL R
L
we can write: zyxabbaw mmmm ==
with lengths:
From the Pumping Lemma:
ababbabaaaaxyz ..................=
x y z
m m m m
1||,|| ymyx
mkay k = 1,Thus:
=w
From the Pumping Lemma:
Lababbabaaaaaazxy ∈.....................=2
x y z
km + m m m
y
Lzyx 2
Thus:
mmmm abbazyx =
Labba mmmkm +
mkay k = 1,
Our assumption that
is a regular language is not true
L
Conclusion: L is not a regular language
Therefore:
END OF PROOF
Assume for contradiction
that is a regular languageL
Since is infinite
we can apply the Pumping Lemma
L
}0,:{ = + lncbaL lnln
mmm cbaw 2=We pick
Let be the critical length of
Pick a string such that: w Lw
mw ||length
m
}0,:{ = + lncbaL lnln
and
L
We can write zyxcbaw mmm == 2
With lengths
From the Pumping Lemma:
cccbcabaaaaaxyz ..................=
x y z
m m m2
1||,|| ymyx
Thus:
=w
mkay k = 1,
From the Pumping Lemma:
Lcccbcabaaaxz = ...............
x z
km− m m2
mmm cbazyx 2=
Lxz
Thus: Lcba mmkm − 2
mkay k = 1,
Our assumption that
is a regular language is not true
L
Conclusion: L is not a regular language
Therefore:
END OF PROOF
Assume for contradiction
that is a regular languageL
Since is infinite
we can apply the Pumping Lemma
L
}0:{ ! = naL n
!maw =We pick
Let be the critical length of
Pick a string such that: w Lw
mw ||length
m
}0:{ ! = naL n
L
We can write zyxaw m == !
With lengths
From the Pumping Lemma:
aaaaaaaaaaaxyz m ...............! ==
x y z
m mm −!
1||,|| ymyx
mkay k = 1,Thus:
=w
From the Pumping Lemma:
Laaaaaaaaaaaazxy = ..................2
x y z
km+ mm −!
Thus:
!mazyx = mkay k = 1,
Lzyx 2
y
La km +!
Our assumption that
is a regular language is not true
L
Conclusion: L is not a regular language
Therefore:
END OF PROOF
Regular Languages
• If ∑ is an alphabet, the set R of regular languages over ∑ is defined as follows.
• 1. The language ∅ is an element of R, and for every a ∈ , the language {a} is in R.
• 2. For any two languages L1 and L2 in R, the three languages L1 ∪ L2, L1L2, and L1* are elements of R.
1L 2L
21LLConcatenation:
*1LStar:
21 LL Union:
Are regularLanguages
For regular languages and we will prove that:
1L
21 LL
Complement:
Intersection:
RL1
Reversal:
We say: Regular languages are closed under
21LLConcatenation:
*1LStar:
21 LL Union:
1L
21 LL
Complement:
Intersection:
RL1
Reversal:
a
b
b
aNFA
Equivalent NFA
a
b
b
a
A useful transformation: use one accept state
2 accept states
1 accept state
1LRegular language
( ) 11 LML =
1M
Single accepting state
NFA 2M
2L
Single accepting state
( ) 22 LML =
Regular language
NFA
Take two languages
Reverse
RL1
1M
NFA for
1M
1. Reverse all transitions
2. Make initial state accepting state and vice versa
1L
Complement
1. Take the DFA that accepts 1L
1M1L 1M1L
2. Make accepting states non-final, and vice-versa
DeMorgan’s Law: 2121 LLLL =
21 , LL regular
21 , LL regular
21 LL regular
21 LL regular
21 LL regular
1Lfor for 2LDFA
1M
DFA
2M
Construct a new DFA that accepts
Machine Machine
M 21 LL
M simulates in parallel and 1M 2M
Another Proof for Intersection Closure
iq
accept state
jp
accept states
New accept states
ji pq ,
kp
ki pq ,
1M 2MDFA DFA
MDFA
Both constituents must be accepting states
00, pq
Automaton for intersection
}{}{}{ ababbaL nn ==
10, pqa
21, pq
b
ab11, pq
20, pq
a
12, pq
22, pq
b
ba,
a
b
ba,
b
a
M simulates in parallel and 1M 2M
M accepts string w if and only if:
accepts string w1M
and accepts string w2M
)()()( 21 MLMLML =
Applications of Regular Expression
• Regular Expressions in Lexical Analysis
• Regular Expressions in Web Search Engines
• Regular Expressions in Software Engineering