Regular Expressions
Seungjin Choi
Department of Computer Science and EngineeringPohang University of Science and Technology
77 Cheongam-ro, Nam-gu, Pohang 37673, [email protected]
1 / 27
Outline
I Regular expressions: One type of language-defining notation
I Regular expressions and finite automata
I Regular grammars (will be discussed in Lecture 4)
2 / 27
Operators
I Union:L ∪M = {w |w ∈ L or w ∈ M}.
I Concatenation:
LM = {w |w = xy , x ∈ L, y ∈ M}.
I Powers:L0 = {ε}, L1 = L, Ln+1 = LLn.
I Kleene Closure (star-closure):
L∗ = L0 ∪ L1 ∪ L2 · · · = ∪∞i=0Li .
3 / 27
∅i
Question: What are ∅0,∅i ,∅∗?
I ∅0 = {ε}.I ∅i , i ≥ 1 is empty since we cannot select any strings from the empty
set.
I ∅∗ = {ε}.
4 / 27
Regular Expressions
I A FA (DFA or NFA) is a ”blueprint” for constructing a machinerecognizing a regular language. (machine-like description)
I A regular expression is a ”user-friendly” declarative way of describinga regular language. (algebraic description)
I Involves a combination of:I strings of symbols from ΣI parenthesesI operators (+, ·, ∗). (union, concatenation, star-closure)
5 / 27
Examples
I {a, b, c} (regular language) ⇐⇒ a + b + c (regular expression).
I (a + b · c)∗ represents the star-closure of {a} ∪ {bc}, which is{ε, a, bc, abc, bca, bcbc, aaa, . . .}.
I 01∗ + 10∗: The language consisting of all strings that are either asingle 0 followed by any number of 1’s or a single 1 followed by anynumber of 0’s.
I UNIX grep command
I UNIX Lex (lexical analyzer generator) and Flex (fast
lex) tools
6 / 27
Inductive Definition of Regular Expressions
DefinitionLet Σ be a given alphabet. Then
1. ∅, ε, and a ∈ Σ are all regular expressions. These are called primitiveregular expressions.
2. If r1 and r2 are regular expressions, so are r1 + r2, r1 · r2, r∗1 , (r1).
3. A string is a regular expression if and only if it can be derived fromprimitive regular expressions by a finite number of applications of therules described above.
7 / 27
Language L(r)
DefinitionThe language L(r) denoted by any regular expression r , is defined by thefollowing rules:
1. ∅ is a regular expression denoting the empty set.
2. ε is a regular expression denoting {ε}.3. For every a ∈ Σ, a is a regular expression denoting {a}.4. L(r1 + r2) = L(r1) ∪ L(r2).
5. L(r1r2) = L(r1)L(r2).
6. L((r1)) = L(r1).
7. L(r∗1 ) = (L(r1))∗.
8 / 27
Example
Exhibit the language L (a∗ · (a + b)) in set notation.
Solution.
L (a∗ · (a + b)) = L(a∗)L(a + b)
= (L(a))∗ (L(a) ∪ L(b))
= {ε, a, aa, aaa, . . .} · {a, b}= {a, aa, aaa, . . . , b, ab, aab, . . .}.
9 / 27
Another Example
Given Σ = {a, b} and r = (a + b)∗(a + bb), find L(r).
L(r) = L ((a + b)∗) L(a + bb)
= (L(a + b))∗ (L(a) ∪ L(bb))
= (L(a) ∪ L(b))∗(L(a) ∪ L2(b)
)= {a, b}∗ ({a} ∪ {bb})= {a, b}∗{a, bb}= {ε, a, b, aa, ab, ba, bb, aaa, aab, . . .}{a, bb}= {a, bb, aa, abb, ba, bbb, . . .}.
(a + b)∗ represents any string of a’s and b’s.
{a}∗ = {ε, a, aa, . . .}.
10 / 27
More Examples
1. Given r = (aa)∗(bb)∗b, determine L(r).
The set of strings with an even number of a’s followed by an odd numberof b’s, i.e.,
L(r) = {a2nb2m+1 | n ≥ 0,m ≥ 0}.
2. For Σ = {0, 1}, give a regular expression r such that
L(r) = {w ∈ Σ∗ |w has at least one pair of consecutive zeros}.
r = (0 + 1)∗00(0 + 1)∗.
3. Find a regular expression for
L = {w ∈ {0, 1}∗ |w has no pair of consecutive zeros}.
r = (1∗011∗)∗ (0 + ε)︸ ︷︷ ︸ending in 0
+ 1∗(0 + ε)︸ ︷︷ ︸all 1′s
. Alternatively, we see L as the
repetitions of the strings 1 and 01, i.e., r = (1 + 01)∗(0 + ε).
11 / 27
Connection between Regular Expressions and RegularLanguages
For every regular language, there is a regular expression and vice versa.
TheoremLet r be a regular expression. Then, there exists some NFA that acceptsL(r). Consequently L(r) is a regular language.
Proof is shown in the next several slides!
12 / 27
ProofWe begin with automata that accept languages for simple regularexpressions ∅, ε, and a ∈ Σ.
q0 q1
q0 q1e
q0 q1a
M(r)
NFA accepts ∅
NFA accepts ε
NFA accepts a
NFA accepts L(r)
13 / 27
ǫ
ǫ
ǫ
ǫ
M(r1)
M(r2)
Automaton for L(r1 + r2)
ǫǫ ǫM(r1) M(r2)
∗
Automaton for L(r1r2)
14 / 27
ǫ
ǫǫ
ǫ
M(r1)
Automaton for L(r∗1 )
W.l.g we consider a NFA with a single final state.
Just like building automata for L(r1 + r2), L(r1r2), L(r∗1 ), we can buildautomata for arbitrary regular expressions. �
15 / 27
Example
Find an NFA which accepts L(r) where r = (a + bb)∗(ba∗ + ε).
16 / 27
Generalized Transition Graph
Generalized transition graph is a transition graph whose edges are labeledwith regular expressions.
c∗
a+ b
a
L (a∗ + a∗(a + b)c∗)
17 / 27
Remove State q
qi q qj
a b
cde
qi qj
ae∗b
ce∗d ce∗bae∗d
After removing state q
18 / 27
Canonical Form
qi qj
r1
r2
r3 r4
Associated regular expression is r = r∗1 r2 (r4 + r3r∗1 r2)∗.
19 / 27
Regular Language and Regular Expression
TheoremLet L be a regular language. Then there exists a regular expression r suchthat L = L(r).
Proof. Let M be an NFA that accepts L. W.l.g. we can assume that Mhas only one final state and that q0 /∈ F . We interpret the graph M as ageneralized transition graph and apply the construction (illustrated inprevious slides) to it. We use the method of removing state q. Wecontinue this process, removing one state after the other, until we reachthe canonical form. Then the regular expression is of the form in theprevious slide. �
20 / 27
ExampleFind a regular expression for
L = {w ∈ {a, b}∗ | na(w) is even and nb(w) is odd}.
The associate DFA is given by
EE OE
OOEO
a
a
a
a
b bbb
21 / 27
The canonical graph is of the form
EE EO
aa+ ab(bb)∗ba
b+ ab(bb)∗a
b+ a(bb)∗ba a(bb)∗a
22 / 27
Algebraic Laws: Commutativity and Associativity
I Commutative law for union: r1 + r2 = r2 + r1I Commutative law for concatenation: r1r2 6= r2r1I Associative law for union: (r1 + r2) + r3 = r1 + (r2 + r3)
I Associative law for concatenation: (r1r2)r3 = r1(r2r3)
23 / 27
Algebraic Laws: Identities and Annihilators
I Identities (∅ and ε)
∅ + r = r + ∅ = r
ε r = r ε = r
I Annihilator (∅)
∅ r = r ∅ = ∅
24 / 27
Algebraic Laws: Distributive Laws
I Left distributive lat of concatenation over union
r1(r2 + r3) = r1r2 + r1r3
I Right distributive law of concatenation over union
(r2 + r3)r1 = r2r1 + r3r1
25 / 27
Algebraic Laws: Idempotent Law
DefinitionAn operator is said to be idempotent if the result of applying it to two ofthe same values as arguments is that that value.
I Common arithmetic operators are not idempotent, i.e.,
x + x 6= x , x × x 6= x .
I In regular expressions, idempotent law for union
r + r = r
26 / 27
Algebraic Laws: Laws Involving Closures
I (r∗)∗ = r∗
I ∅∗ = ε
I ε∗ = ε
I r+ = rr∗ = r∗r
I r∗ = r+ + ε
27 / 27
Top Related