Automata and Formal Languages - CM0081 Regular Expressions · Introduction Finite automata...

40
Automata and Formal Languages - CM0081 Regular Expressions Andrés Sicard-Ramírez Universidad EAFIT Semester 2018-2

Transcript of Automata and Formal Languages - CM0081 Regular Expressions · Introduction Finite automata...

Automata and Formal Languages - CM0081Regular Expressions

Andrés Sicard-Ramírez

Universidad EAFIT

Semester 2018-2

Introduction

Finite automataMachine-like descriptions of regular languages.

Regular expressionsAlgebraic description of regular languagesDeclarative (“user-friendly”) way to express the strings that belong tothe languageExample: (01)(01)∗ + (010)(010)∗Uses

Search commands (e.g. Grep)Lexical-analyzer generators (e.g. Lex and Alex)Domain specific languages (DSLs)

Regular Expressions 2/40

Introduction

Finite automataMachine-like descriptions of regular languages.

Regular expressionsAlgebraic description of regular languages

Declarative (“user-friendly”) way to express the strings that belong tothe languageExample: (01)(01)∗ + (010)(010)∗Uses

Search commands (e.g. Grep)Lexical-analyzer generators (e.g. Lex and Alex)Domain specific languages (DSLs)

Regular Expressions 3/40

Introduction

Finite automataMachine-like descriptions of regular languages.

Regular expressionsAlgebraic description of regular languagesDeclarative (“user-friendly”) way to express the strings that belong tothe language

Example: (01)(01)∗ + (010)(010)∗Uses

Search commands (e.g. Grep)Lexical-analyzer generators (e.g. Lex and Alex)Domain specific languages (DSLs)

Regular Expressions 4/40

Introduction

Finite automataMachine-like descriptions of regular languages.

Regular expressionsAlgebraic description of regular languagesDeclarative (“user-friendly”) way to express the strings that belong tothe languageExample: (01)(01)∗ + (010)(010)∗

UsesSearch commands (e.g. Grep)Lexical-analyzer generators (e.g. Lex and Alex)Domain specific languages (DSLs)

Regular Expressions 5/40

Introduction

Finite automataMachine-like descriptions of regular languages.

Regular expressionsAlgebraic description of regular languagesDeclarative (“user-friendly”) way to express the strings that belong tothe languageExample: (01)(01)∗ + (010)(010)∗Uses

Search commands (e.g. Grep)

Lexical-analyzer generators (e.g. Lex and Alex)Domain specific languages (DSLs)

Regular Expressions 6/40

Introduction

Finite automataMachine-like descriptions of regular languages.

Regular expressionsAlgebraic description of regular languagesDeclarative (“user-friendly”) way to express the strings that belong tothe languageExample: (01)(01)∗ + (010)(010)∗Uses

Search commands (e.g. Grep)Lexical-analyzer generators (e.g. Lex and Alex)

Domain specific languages (DSLs)

Regular Expressions 7/40

Introduction

Finite automataMachine-like descriptions of regular languages.

Regular expressionsAlgebraic description of regular languagesDeclarative (“user-friendly”) way to express the strings that belong tothe languageExample: (01)(01)∗ + (010)(010)∗Uses

Search commands (e.g. Grep)Lexical-analyzer generators (e.g. Lex and Alex)Domain specific languages (DSLs)

Regular Expressions 8/40

Operations on Languages

DefinitionLet 𝐿 and 𝑀 be languages. Then

Union 𝐿 ∪𝑀 = {𝑤 ∣ 𝑤 ∈ 𝐿 or 𝑤 ∈ 𝑀}

Concatenation 𝐿.𝑀 = {𝑤 ∣ 𝑤 = 𝑥𝑦, 𝑥 ∈ 𝐿 and 𝑦 ∈ 𝑀}

Powers 𝐿0 = {𝜀}𝐿1 = 𝐿

𝐿𝑘+1 = 𝐿.𝐿𝑘

Kleene closure 𝐿∗ =∞⋃𝑖=0

𝐿𝑖

Regular Expressions 9/40

Operations on Languages

ExamplesIf 𝐿 = {0, 1}, then 𝐿∗ consists of all strings of 0’s and 1’s and theempty word.

If 𝐿 = {0𝑛 ∣ 𝑛 ≥ 1}, then 𝐿∗ = 𝐿 ∪ {𝜀}.If 𝐿 = {0, 11}, then 𝐿∗ consists of the empty word and those stringsof 0’s and 1’s such that the 1’s come in pairs.∅0 = {𝜀}∅𝑖 = ∅, for 𝑖 ≥ 1∅∗ = {𝜀}

Regular Expressions 10/40

Operations on Languages

ExamplesIf 𝐿 = {0, 1}, then 𝐿∗ consists of all strings of 0’s and 1’s and theempty word.If 𝐿 = {0𝑛 ∣ 𝑛 ≥ 1}, then 𝐿∗ = 𝐿 ∪ {𝜀}.

If 𝐿 = {0, 11}, then 𝐿∗ consists of the empty word and those stringsof 0’s and 1’s such that the 1’s come in pairs.∅0 = {𝜀}∅𝑖 = ∅, for 𝑖 ≥ 1∅∗ = {𝜀}

Regular Expressions 11/40

Operations on Languages

ExamplesIf 𝐿 = {0, 1}, then 𝐿∗ consists of all strings of 0’s and 1’s and theempty word.If 𝐿 = {0𝑛 ∣ 𝑛 ≥ 1}, then 𝐿∗ = 𝐿 ∪ {𝜀}.If 𝐿 = {0, 11}, then 𝐿∗ consists of the empty word and those stringsof 0’s and 1’s such that the 1’s come in pairs.

∅0 = {𝜀}∅𝑖 = ∅, for 𝑖 ≥ 1∅∗ = {𝜀}

Regular Expressions 12/40

Operations on Languages

ExamplesIf 𝐿 = {0, 1}, then 𝐿∗ consists of all strings of 0’s and 1’s and theempty word.If 𝐿 = {0𝑛 ∣ 𝑛 ≥ 1}, then 𝐿∗ = 𝐿 ∪ {𝜀}.If 𝐿 = {0, 11}, then 𝐿∗ consists of the empty word and those stringsof 0’s and 1’s such that the 1’s come in pairs.∅0 = {𝜀}∅𝑖 = ∅, for 𝑖 ≥ 1∅∗ = {𝜀}

Regular Expressions 13/40

Syntax: What the Regular Expressions Are

DefinitionLet Σ be an alphabet. The regular expressions on Σ are inductivelydefined by:

Basis step: 𝜀 and ∅ are regular expressions. If 𝑎 ∈ Σ then a is aregular expression.Inductive step: Let 𝐸 and 𝐹 be regular expressions. Then 𝐸 + 𝐹 ,𝐸𝐹 , 𝐸∗ and (𝐸) are regular expressions.

Regular Expressions 14/40

Syntax: What the Regular Expressions Are

DefinitionLet Σ be an alphabet. The regular expressions on Σ are inductivelydefined by:

Basis step: 𝜀 and ∅ are regular expressions. If 𝑎 ∈ Σ then a is aregular expression.

Inductive step: Let 𝐸 and 𝐹 be regular expressions. Then 𝐸 + 𝐹 ,𝐸𝐹 , 𝐸∗ and (𝐸) are regular expressions.

Regular Expressions 15/40

Syntax: What the Regular Expressions Are

DefinitionLet Σ be an alphabet. The regular expressions on Σ are inductivelydefined by:

Basis step: 𝜀 and ∅ are regular expressions. If 𝑎 ∈ Σ then a is aregular expression.Inductive step: Let 𝐸 and 𝐹 be regular expressions. Then 𝐸 + 𝐹 ,𝐸𝐹 , 𝐸∗ and (𝐸) are regular expressions.

Regular Expressions 16/40

Semantics: What Language Denotes a Regular Expression

DefinitionLet 𝐸 be a regular expression. The language denoted by 𝐸, denotedby 𝐿(𝐸), is inductively defined by:

Basis step:𝐿(𝜀) = {𝜀},𝐿(∅) = ∅,𝐿(a) = {𝑎}.

Inductive step: Let 𝐿(𝐸) and 𝐿(𝐹) be the languages denoted by theregular expressions 𝐸 and 𝐹 , then

𝐿(𝐸 + 𝐹) = 𝐿(𝐸) ∪ 𝐿(𝐹),𝐿(𝐸𝐹) = 𝐿(𝐸)𝐿(𝐹),𝐿(𝐸∗) = (𝐿(𝐸))∗,𝐿((𝐸)) = 𝐿(𝐸).

Regular Expressions 17/40

Semantics: What Language Denotes a Regular Expression

DefinitionLet 𝐸 be a regular expression. The language denoted by 𝐸, denotedby 𝐿(𝐸), is inductively defined by:

Basis step:𝐿(𝜀) = {𝜀},𝐿(∅) = ∅,𝐿(a) = {𝑎}.

Inductive step: Let 𝐿(𝐸) and 𝐿(𝐹) be the languages denoted by theregular expressions 𝐸 and 𝐹 , then

𝐿(𝐸 + 𝐹) = 𝐿(𝐸) ∪ 𝐿(𝐹),𝐿(𝐸𝐹) = 𝐿(𝐸)𝐿(𝐹),𝐿(𝐸∗) = (𝐿(𝐸))∗,𝐿((𝐸)) = 𝐿(𝐸).

Regular Expressions 18/40

Semantics: What Language Denotes a Regular Expression

DefinitionLet 𝐸 be a regular expression. The language denoted by 𝐸, denotedby 𝐿(𝐸), is inductively defined by:

Basis step:𝐿(𝜀) = {𝜀},𝐿(∅) = ∅,𝐿(a) = {𝑎}.

Inductive step: Let 𝐿(𝐸) and 𝐿(𝐹) be the languages denoted by theregular expressions 𝐸 and 𝐹 , then

𝐿(𝐸 + 𝐹) = 𝐿(𝐸) ∪ 𝐿(𝐹),𝐿(𝐸𝐹) = 𝐿(𝐸)𝐿(𝐹),𝐿(𝐸∗) = (𝐿(𝐸))∗,𝐿((𝐸)) = 𝐿(𝐸).

Regular Expressions 19/40

Precedence of Operators

Order of precedence, from highest to lowest: (), ∗, . and +.The operators . and + are left-associative.

Example01∗ + 1 = (0(1∗)) + 1

≠ (01)∗ + 1≠ 0(1∗ + 1)

Regular Expressions 20/40

Precedence of Operators

Order of precedence, from highest to lowest: (), ∗, . and +.The operators . and + are left-associative.

Example01∗ + 1 = (0(1∗)) + 1

≠ (01)∗ + 1≠ 0(1∗ + 1)

Regular Expressions 21/40

Languages Denoted by Regular Expressions

Example𝐸 𝐿(𝐸)a + b 𝐿(a) ∪ 𝐿(b) = {𝑎} ∪ {𝑏} = {𝑎, 𝑏}

a∗ {𝜀, 𝑎, 𝑎𝑎, 𝑎𝑎𝑎,…}

(a + b)(a + b) 𝐿(a + b).𝐿(a + b) ={𝑎, 𝑏}.{𝑎, 𝑏} = {𝑎𝑎, 𝑎𝑏, 𝑏𝑎, 𝑏𝑏}

a + (ab)∗ {𝑎, 𝜀, 𝑎𝑏, 𝑎𝑏𝑎𝑏, 𝑎𝑏𝑎𝑏𝑎𝑏,…}

(0 + 1)∗01(0 + 1)∗ {𝑥01𝑦 ∣ 𝑥, 𝑦 ∈ {0, 1}∗}

ai(a1 + a2 +⋯+ an)∗ {𝑤 ∈ Σ∗ ∣ 𝑤 starts by 𝑎𝑖}

Regular Expressions 22/40

Languages Denoted by Regular Expressions

Example𝐸 𝐿(𝐸)a + b 𝐿(a) ∪ 𝐿(b) = {𝑎} ∪ {𝑏} = {𝑎, 𝑏}

a∗ {𝜀, 𝑎, 𝑎𝑎, 𝑎𝑎𝑎,…}

(a + b)(a + b) 𝐿(a + b).𝐿(a + b) ={𝑎, 𝑏}.{𝑎, 𝑏} = {𝑎𝑎, 𝑎𝑏, 𝑏𝑎, 𝑏𝑏}

a + (ab)∗ {𝑎, 𝜀, 𝑎𝑏, 𝑎𝑏𝑎𝑏, 𝑎𝑏𝑎𝑏𝑎𝑏,…}

(0 + 1)∗01(0 + 1)∗ {𝑥01𝑦 ∣ 𝑥, 𝑦 ∈ {0, 1}∗}

ai(a1 + a2 +⋯+ an)∗ {𝑤 ∈ Σ∗ ∣ 𝑤 starts by 𝑎𝑖}

Regular Expressions 23/40

Languages Denoted by Regular Expressions

Example𝐸 𝐿(𝐸)a + b 𝐿(a) ∪ 𝐿(b) = {𝑎} ∪ {𝑏} = {𝑎, 𝑏}

a∗ {𝜀, 𝑎, 𝑎𝑎, 𝑎𝑎𝑎,…}

(a + b)(a + b) 𝐿(a + b).𝐿(a + b) ={𝑎, 𝑏}.{𝑎, 𝑏} = {𝑎𝑎, 𝑎𝑏, 𝑏𝑎, 𝑏𝑏}

a + (ab)∗ {𝑎, 𝜀, 𝑎𝑏, 𝑎𝑏𝑎𝑏, 𝑎𝑏𝑎𝑏𝑎𝑏,…}

(0 + 1)∗01(0 + 1)∗ {𝑥01𝑦 ∣ 𝑥, 𝑦 ∈ {0, 1}∗}

ai(a1 + a2 +⋯+ an)∗ {𝑤 ∈ Σ∗ ∣ 𝑤 starts by 𝑎𝑖}

Regular Expressions 24/40

Languages Denoted by Regular Expressions

Example𝐸 𝐿(𝐸)a + b 𝐿(a) ∪ 𝐿(b) = {𝑎} ∪ {𝑏} = {𝑎, 𝑏}

a∗ {𝜀, 𝑎, 𝑎𝑎, 𝑎𝑎𝑎,…}

(a + b)(a + b) 𝐿(a + b).𝐿(a + b) ={𝑎, 𝑏}.{𝑎, 𝑏} = {𝑎𝑎, 𝑎𝑏, 𝑏𝑎, 𝑏𝑏}

a + (ab)∗ {𝑎, 𝜀, 𝑎𝑏, 𝑎𝑏𝑎𝑏, 𝑎𝑏𝑎𝑏𝑎𝑏,…}

(0 + 1)∗01(0 + 1)∗ {𝑥01𝑦 ∣ 𝑥, 𝑦 ∈ {0, 1}∗}

ai(a1 + a2 +⋯+ an)∗ {𝑤 ∈ Σ∗ ∣ 𝑤 starts by 𝑎𝑖}

Regular Expressions 25/40

Languages Denoted by Regular Expressions

Example𝐸 𝐿(𝐸)a + b 𝐿(a) ∪ 𝐿(b) = {𝑎} ∪ {𝑏} = {𝑎, 𝑏}

a∗ {𝜀, 𝑎, 𝑎𝑎, 𝑎𝑎𝑎,…}

(a + b)(a + b) 𝐿(a + b).𝐿(a + b) ={𝑎, 𝑏}.{𝑎, 𝑏} = {𝑎𝑎, 𝑎𝑏, 𝑏𝑎, 𝑏𝑏}

a + (ab)∗ {𝑎, 𝜀, 𝑎𝑏, 𝑎𝑏𝑎𝑏, 𝑎𝑏𝑎𝑏𝑎𝑏,…}

(0 + 1)∗01(0 + 1)∗ {𝑥01𝑦 ∣ 𝑥, 𝑦 ∈ {0, 1}∗}

ai(a1 + a2 +⋯+ an)∗ {𝑤 ∈ Σ∗ ∣ 𝑤 starts by 𝑎𝑖}

Regular Expressions 26/40

Languages Denoted by Regular Expressions

Example𝐸 𝐿(𝐸)a + b 𝐿(a) ∪ 𝐿(b) = {𝑎} ∪ {𝑏} = {𝑎, 𝑏}

a∗ {𝜀, 𝑎, 𝑎𝑎, 𝑎𝑎𝑎,…}

(a + b)(a + b) 𝐿(a + b).𝐿(a + b) ={𝑎, 𝑏}.{𝑎, 𝑏} = {𝑎𝑎, 𝑎𝑏, 𝑏𝑎, 𝑏𝑏}

a + (ab)∗ {𝑎, 𝜀, 𝑎𝑏, 𝑎𝑏𝑎𝑏, 𝑎𝑏𝑎𝑏𝑎𝑏,…}

(0 + 1)∗01(0 + 1)∗ {𝑥01𝑦 ∣ 𝑥, 𝑦 ∈ {0, 1}∗}

ai(a1 + a2 +⋯+ an)∗ {𝑤 ∈ Σ∗ ∣ 𝑤 starts by 𝑎𝑖}

Regular Expressions 27/40

Languages Denoted by Regular Expressions

ExampleWrite a regular expression for the language 𝐿 defined by

𝐿 = {𝑤 ∈ {0, 1}∗ ∣ 0 and 1 alternate in 𝑤}.

Solution: (01)∗ + (10)∗ + 0(10)∗ + 1(01)∗

Other solution: (𝜀 + 1)(01)∗(𝜀 + 0)

ExampleThe regular expression

(10 + 0)∗(𝜀 + 1)denotes the set of strings of 0’s and 1’s that have no two adjacent 1’s.

Regular Expressions 28/40

Languages Denoted by Regular Expressions

ExampleWrite a regular expression for the language 𝐿 defined by

𝐿 = {𝑤 ∈ {0, 1}∗ ∣ 0 and 1 alternate in 𝑤}.

Solution: (01)∗ + (10)∗ + 0(10)∗ + 1(01)∗

Other solution: (𝜀 + 1)(01)∗(𝜀 + 0)

ExampleThe regular expression

(10 + 0)∗(𝜀 + 1)denotes the set of strings of 0’s and 1’s that have no two adjacent 1’s.

Regular Expressions 29/40

Languages Denoted by Regular Expressions

ExampleWrite a regular expression for the language 𝐿 defined by

𝐿 = {𝑤 ∈ {0, 1}∗ ∣ 0 and 1 alternate in 𝑤}.

Solution: (01)∗ + (10)∗ + 0(10)∗ + 1(01)∗

Other solution: (𝜀 + 1)(01)∗(𝜀 + 0)

ExampleThe regular expression

(10 + 0)∗(𝜀 + 1)denotes the set of strings of 0’s and 1’s that have no two adjacent 1’s.

Regular Expressions 30/40

Languages Denoted by Regular Expressions

ExampleWrite a regular expression for the language 𝐿 defined by

𝐿 = {𝑤 ∈ {0, 1}∗ ∣ 0 and 1 alternate in 𝑤}.

Solution: (01)∗ + (10)∗ + 0(10)∗ + 1(01)∗

Other solution: (𝜀 + 1)(01)∗(𝜀 + 0)

ExampleThe regular expression

(10 + 0)∗(𝜀 + 1)denotes the set of strings of 0’s and 1’s that have no two adjacent 1’s.

Regular Expressions 31/40

Languages Denoted by Regular Expressions

ExampleWrite a regular expression for denoting the set of strings over Σ = {0, 1}not ending in 01.

Solution: 𝜀 + 0 + 1 + (0 + 1)∗(00 + 10 + 11)

Regular Expressions 32/40

Languages Denoted by Regular Expressions

ExampleWrite a regular expression for denoting the set of strings over Σ = {0, 1}not ending in 01.Solution: 𝜀 + 0 + 1 + (0 + 1)∗(00 + 10 + 11)

Regular Expressions 33/40

Libraries

RemarkTheoretical regular expressions ≠ practical regular expressions.

Some programming languages with support to regular expressions.NET, C, Haskell, Java, Mathematica, MATLAB and Perl.

Regular Expressions 34/40

Libraries

RemarkTheoretical regular expressions ≠ practical regular expressions.

Some programming languages with support to regular expressions.NET, C, Haskell, Java, Mathematica, MATLAB and Perl.

Regular Expressions 35/40

Algorithms

AlgorithmsSee the Haskell implementation of some algorithms on regular expressionsin the course homepage.

Regular Expressions 36/40

Applications

Some programs that use regular expressionsGrep: print lines matching a patternAwk: pattern scanning and processing languageSed: stream editor for filtering and transforming textAlex, Flex and Lex: lexical-analyzer generatorsEmacs and Vim: test editorsMySQL, Oracle and PostgreSQL: databases

Regular Expressions 37/40

Applications

ReadingApplications of Regular Expressions (Hopcroft, Motwani and Ullman 2007,§ 3.3).

𝐿+ def= 𝐿𝐿∗ (one or many times operator)

𝐿? def= 𝜀 + 𝐿 (zero or one time operator)

Regular Expressions 38/40

An Implementation: A Regular Expression Matcher

“Rob’s implementation itself is asuperb example of beautiful code:compact, elegant, efficient, anduseful. It’s one of the best examplesof recursion that I have ever seen.”

Brian Kernighan, p. 3.

Regular Expressions 39/40

References

Hopcroft, J. E., Motwani, R. and Ullman, J. D. (2007). Introduction toAutomata theory, Languages, and Computation. 3rd ed. Pearson Education(cit. on p. 38).

Regular Expressions 40/40