Download - Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Regular Expressions

Seungjin Choi

Department of Computer Science and EngineeringPohang University of Science and Technology

77 Cheongam-ro, Nam-gu, Pohang 37673, [email protected]

1 / 27

Outline

I Regular expressions: One type of language-defining notation

I Regular expressions and finite automata

I Regular grammars (will be discussed in Lecture 4)

2 / 27

Operators

I Union:L ∪M = {w |w ∈ L or w ∈ M}.

I Concatenation:

LM = {w |w = xy , x ∈ L, y ∈ M}.

I Powers:L0 = {ε}, L1 = L, Ln+1 = LLn.

I Kleene Closure (star-closure):

L∗ = L0 ∪ L1 ∪ L2 · · · = ∪∞i=0Li .

3 / 27

∅i

Question: What are ∅0,∅i ,∅∗?

I ∅0 = {ε}.I ∅i , i ≥ 1 is empty since we cannot select any strings from the empty

set.

I ∅∗ = {ε}.

4 / 27

Regular Expressions

I A FA (DFA or NFA) is a ”blueprint” for constructing a machinerecognizing a regular language. (machine-like description)

I A regular expression is a ”user-friendly” declarative way of describinga regular language. (algebraic description)

I Involves a combination of:I strings of symbols from ΣI parenthesesI operators (+, ·, ∗). (union, concatenation, star-closure)

5 / 27

Examples

I {a, b, c} (regular language) ⇐⇒ a + b + c (regular expression).

I (a + b · c)∗ represents the star-closure of {a} ∪ {bc}, which is{ε, a, bc, abc, bca, bcbc, aaa, . . .}.

I 01∗ + 10∗: The language consisting of all strings that are either asingle 0 followed by any number of 1’s or a single 1 followed by anynumber of 0’s.

I UNIX grep command

I UNIX Lex (lexical analyzer generator) and Flex (fast

lex) tools

6 / 27

Inductive Definition of Regular Expressions

DefinitionLet Σ be a given alphabet. Then

1. ∅, ε, and a ∈ Σ are all regular expressions. These are called primitiveregular expressions.

2. If r1 and r2 are regular expressions, so are r1 + r2, r1 · r2, r∗1 , (r1).

3. A string is a regular expression if and only if it can be derived fromprimitive regular expressions by a finite number of applications of therules described above.

7 / 27

Language L(r)

DefinitionThe language L(r) denoted by any regular expression r , is defined by thefollowing rules:

1. ∅ is a regular expression denoting the empty set.

2. ε is a regular expression denoting {ε}.3. For every a ∈ Σ, a is a regular expression denoting {a}.4. L(r1 + r2) = L(r1) ∪ L(r2).

5. L(r1r2) = L(r1)L(r2).

6. L((r1)) = L(r1).

7. L(r∗1 ) = (L(r1))∗.

8 / 27

Example

Exhibit the language L (a∗ · (a + b)) in set notation.

Solution.

L (a∗ · (a + b)) = L(a∗)L(a + b)

= (L(a))∗ (L(a) ∪ L(b))

= {ε, a, aa, aaa, . . .} · {a, b}= {a, aa, aaa, . . . , b, ab, aab, . . .}.

9 / 27

Another Example

Given Σ = {a, b} and r = (a + b)∗(a + bb), find L(r).

L(r) = L ((a + b)∗) L(a + bb)

= (L(a + b))∗ (L(a) ∪ L(bb))

= (L(a) ∪ L(b))∗(L(a) ∪ L2(b)

)= {a, b}∗ ({a} ∪ {bb})= {a, b}∗{a, bb}= {ε, a, b, aa, ab, ba, bb, aaa, aab, . . .}{a, bb}= {a, bb, aa, abb, ba, bbb, . . .}.

(a + b)∗ represents any string of a’s and b’s.

{a}∗ = {ε, a, aa, . . .}.

10 / 27

More Examples

1. Given r = (aa)∗(bb)∗b, determine L(r).

The set of strings with an even number of a’s followed by an odd numberof b’s, i.e.,

L(r) = {a2nb2m+1 | n ≥ 0,m ≥ 0}.

2. For Σ = {0, 1}, give a regular expression r such that

L(r) = {w ∈ Σ∗ |w has at least one pair of consecutive zeros}.

r = (0 + 1)∗00(0 + 1)∗.

3. Find a regular expression for

L = {w ∈ {0, 1}∗ |w has no pair of consecutive zeros}.

r = (1∗011∗)∗ (0 + ε)︸︷︷︸ending in 0

+ 1∗(0 + ε)︸︷︷︸all 1′s

. Alternatively, we see L as the

repetitions of the strings 1 and 01, i.e., r = (1 + 01)∗(0 + ε).

11 / 27

Connection between Regular Expressions and RegularLanguages

For every regular language, there is a regular expression and vice versa.

TheoremLet r be a regular expression. Then, there exists some NFA that acceptsL(r). Consequently L(r) is a regular language.

Proof is shown in the next several slides!

12 / 27

ProofWe begin with automata that accept languages for simple regularexpressions ∅, ε, and a ∈ Σ.

q0 q1

q0 q1e

q0 q1a

M(r)

NFA accepts ∅

NFA accepts ε

NFA accepts a

NFA accepts L(r)

13 / 27

ǫ

ǫ

ǫ

ǫ

M(r1)

M(r2)

Automaton for L(r1 + r2)

ǫǫ ǫM(r1) M(r2)

∗

Automaton for L(r1r2)

14 / 27

ǫ

ǫǫ

ǫ

M(r1)

Automaton for L(r∗1 )

W.l.g we consider a NFA with a single final state.

Just like building automata for L(r1 + r2), L(r1r2), L(r∗1 ), we can buildautomata for arbitrary regular expressions. �

15 / 27

Example

Find an NFA which accepts L(r) where r = (a + bb)∗(ba∗ + ε).

16 / 27

Generalized Transition Graph

Generalized transition graph is a transition graph whose edges are labeledwith regular expressions.

c∗

a+ b

a

L (a∗ + a∗(a + b)c∗)

17 / 27

Remove State q

qi q qj

a b

cde

qi qj

ae∗b

ce∗d ce∗bae∗d

After removing state q

18 / 27

Canonical Form

qi qj

r1

r2

r3 r4

Associated regular expression is r = r∗1 r2 (r4 + r3r∗1 r2)∗.

19 / 27

Regular Language and Regular Expression

TheoremLet L be a regular language. Then there exists a regular expression r suchthat L = L(r).

Proof. Let M be an NFA that accepts L. W.l.g. we can assume that Mhas only one final state and that q0 /∈ F . We interpret the graph M as ageneralized transition graph and apply the construction (illustrated inprevious slides) to it. We use the method of removing state q. Wecontinue this process, removing one state after the other, until we reachthe canonical form. Then the regular expression is of the form in theprevious slide. �

20 / 27

ExampleFind a regular expression for

L = {w ∈ {a, b}∗ | na(w) is even and nb(w) is odd}.

The associate DFA is given by

EE OE

OOEO

a

a

a

a

b bbb

21 / 27

The canonical graph is of the form

EE EO

aa+ ab(bb)∗ba

b+ ab(bb)∗a

b+ a(bb)∗ba a(bb)∗a

22 / 27

Algebraic Laws: Commutativity and Associativity

I Commutative law for union: r1 + r2 = r2 + r1I Commutative law for concatenation: r1r2 6= r2r1I Associative law for union: (r1 + r2) + r3 = r1 + (r2 + r3)

I Associative law for concatenation: (r1r2)r3 = r1(r2r3)

23 / 27

Algebraic Laws: Identities and Annihilators

I Identities (∅ and ε)

∅ + r = r + ∅ = r

ε r = r ε = r

I Annihilator (∅)

∅ r = r ∅ = ∅

24 / 27

Algebraic Laws: Distributive Laws

I Left distributive lat of concatenation over union

r1(r2 + r3) = r1r2 + r1r3

I Right distributive law of concatenation over union

(r2 + r3)r1 = r2r1 + r3r1

25 / 27

Algebraic Laws: Idempotent Law

DefinitionAn operator is said to be idempotent if the result of applying it to two ofthe same values as arguments is that that value.

I Common arithmetic operators are not idempotent, i.e.,

x + x 6= x , x × x 6= x .

I In regular expressions, idempotent law for union

r + r = r

26 / 27

Algebraic Laws: Laws Involving Closures

I (r∗)∗ = r∗

I ∅∗ = ε

I ε∗ = ε

I r+ = rr∗ = r∗r

I r∗ = r+ + ε

27 / 27