Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida...
-
Upload
alexandrina-barnett -
Category
Documents
-
view
213 -
download
1
Transcript of Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida...
Regular Expressions
Prepared by
Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida
Programming Language Translators
Regular Expressions
• A compact, easy-to-read language description.
• Use operators to denote the language constructors described earlier, to build “complex” languages from simple “atomic” ones.
Regular ExpressionsDefinition: A regular expression over an alphabet Σ is
recursively defined as follows:
1. ø denotes language ø 2. ε denotes language {ε}3. a denotes language {a}, for all a Σ.4. (P + Q) denotes L(P) U L(Q), where P, Q are r.e.’s.5. (PQ) denotes L(P)·L(Q), where P, Q are r.e.’s.6. P* denotes L(P)*, where P is r.e.
To prevent excessive parentheses, we assume left associativity, with the following operator precedence hierarchy, from most to least binding: *, ·, +
Regular Expressions
Examples:(O + 1)*: any string of O’s and 1’s.(O + 1)*1: any string of O’s and 1’s, ending with a 1.1*O1*: any string of 1’s with a single O inserted.Letter (Letter + Digit)*: an identifier.Digit Digit*: an integer.Quote Char* Quote: a string. †# Char* Eoln: a comment. †{Char*}: another comment. †
† Assuming that Char does not contain quotes, eoln’s, or } .
Regular Expressions
Conversion from Right-linear grammars to regular expressions
Example:S → aS R → aS → bR → ε
What does S → aS mean? L(S) {a}·L(S)
S → bR means L(S) {b}·L(R)S → ε means L(S) {ε}
Regular Expressions
Together, they mean that L(S) = {a}·L(S) + {b}·L(R) + {ε}or S = aS + bR + ε
Similarly, R → aS means R = aS.
Thus, S = aS + bR + ε R = aS
System of simultaneous equations, in which the variables are nonterminals.
Regular Expressions
Solving systems of simultaneously equations.S = aS + bR + εR = aS
Back substitute R = aS:S = aS + baS + ε
= (a + ba) S + ε
Question: What to do with equations of the form:X = X + β ?
Regular Expressions
Answer: β L(x), so αβ L(x), ααβ L(x), αααβ L(x), …
Thus α*β = L(x).
In our case,S = (a + ba) S + ε = (a + ba)* ε = (a + ba)*
Regular Expressions
Right-linear regular grammar↓
regular expression
1. A = α1 + α2 + … + αn if A → α1
→ α2
. . . → αn
Regular Expressions
2. If equation is of the form X = α, where X does not appear in α, then replace every occurrence of X with α in all other equations, and delete equation X = α.
If equation is of the form X = αX + β, where X does not occur in either α or β, then replace the equation with X = α*β.
Note: Some algebraic manipulations may be needed to obtain the form X = αX + β.
Important: Catenation is not commutative!!
Regular Expressions
Example: S → a R → abaU U → aS → bU → U → b → bR
S = a + bU + bRR = abaU + U = (aba + ε) UU = aS + b
Back substitute R:S = a + bU + b(aba + ε) UU = aS + b
Regular Expressions
Back substitute U:S = a + b(aS + b) + b(aba + ε)(aS + b) = a + baS + bb + babaaS + babab + baS + bb
= (ba + babaa)S + (a + bb + babab)
thereforeS = (ba + babaa)*(a + bb + babab)
repeats
Regular Expressions
Summarizing:
RGR RGL Minimum
DFA
RE NSA DFA
Done
Soon
Regular Expressions
Regular Expression↓
NFA
Recursively build the FSA, mimicking the structure of the regular expression. Each FSA built has one start state, and one final state.
Conversions:
if ø21
Regular Expressions
• if ε
• if a
• if P + Q
• if P· Q
or
1
1 2a
1 2
ε
Q
P
ε ε
ε
P Qε
1 Pε
Q 2ε ε
Regular Expressions if P*
Example: (b (aba + ε) a)*
(b (aba + ε) a)*
(b (aba + ε) a)*
(b (aba + ε) a)*
1 Pε
2
ε
ε
ε
1 2
3 4
5 6
b
a
b
Regular Expressions
(b (aba + ε) a)*
(b (aba + ε) a)*
(b (aba + ε) a)*
(b (aba + ε) a)*
7 8
9
10 11
a
a
3 4 5 6
78
a b
a
εε
Regular Expressions
(b (aba + ε) a)*
(b (aba + ε) a)*
3 4 5 6
78
a b
a
εε
13912
εεε
ε
3 4 5 6
78
a b
a
εε
13912
εεε
ε
2 1b
ε
Regular Expressions
(b (aba + ε) a) *
3 4 5 6
78
a b
a
εε
13912
εεε
ε
2 1b
ε
1011
εa
Regular Expressions
(b (aba + ε) a)*
2 12 3 4
67
ε a
ε
ε
813 aε
14 1ε b
10 ε ε
59ε
ε
11ε
a15
ε
ε
Regular Expressions
Regular Expression↓
NFA
Start With:
ALGORITHM 2
E
Regular ExpressionsApply Rules:
a*
a + b
ab
ε εa
a b
a
b
Regular Expressions
Algorithm 1:• Builds FSA bottom up• Good for machines• Bad for humans
Algorithm 2:• Builds FSA top down• Bad for machines• Good for humans
Arguable
Regular Expressions
Example (Algorithm 2):
(a + b)* (aa + bb)
(a + b)* aa + bb
ε εaa
bba + b
ε ε
a
b
a a
b b
Regular Expressions
Example (Algorithm 2):
ba(a + b)* ab
b a ε ε a b
a
b
Regular Expressions
Deterministic Finite-State Automata (DFA’s)
Definition: A deterministic FSA is defined just like an NFA, except that
δ: Q x Σ → Q, rather thanδ: Q x Σ union {ε}→ 2Q
Thus, bothand
are impossible.
ε a
a
Regular Expressions
Every transition of a DFA consumes a symbol. Fortunately, DFA’s are just as powerful as NFA’s.
Theorem: For every NFA there exists an equivalent (accepting the same language) DFA.
Regular ExpressionsConversion from NFA’s to DFA’s:
• “Simulate” all moves of the NFA with the DFA.• The start state of the DFA is the start state of the
NFA (say, S), together with states that are ε-reachable from S.
• Each state in the DFA is a subset of the set of states of the NFA; the notion of being in “any one of” a number of states.
• New states in the DFA are constructed by calculating the sets of states that are reachable through symbols, after the start state.
• The final states in the DFA are those that contain any final state of the NFA.
Regular Expressions
Example: a*b + ba*
NFA
ε
b
b
ε ε
ε
1
3
4
2
5
6
a
a
Regular Expressions
DFAInput
State a b123 23 456 23 23 6456 56 --- 6 --- --- 56 56 ---
a
b123
23
456 56
6b
a
a
a
Regular Expressions
In general, if NFA has N states, the DFA can have as many as 2N states.
Example: ba (a + b)* ab
ε
a
ε ε
ε
3
5
6
4
7
8b a ε0 1 2
b
ε
ε
11 10 9
NFA
Regular Expressions
DFAInput
State a b 0 --- 1 1 234689 --- 234689 34568910 34678934568910 34568910 34678911 346789 34568910 34678934678911 34568910 346789
Regular Expressions
a
b
a
b
234689
346789
b
34568910
34678911a
b
a0 1 ab
Regular Expressions
State Minimization
Theorem: Given a DFA M, there exists an equivalent DFA M’ that is minimal, i.e. no other equivalent DFA exists with fewer states than M’.
Definition: A partition of a set S is a set of subsets of S such that every element of S appears in exactly one of the subsets.
Regular Expressions
Example: S = {1, 2, 3, 4, 5} Π1 = { {1, 2, 3, 4}, {5} }
Π2 = { {1, 2, 3,}, {4}, {5} }
Π3 = { {1, 3}, {2}, {4}, {5} }
Note: Π2 is a refinement of Π1 , and Π3 is a refinement of Π2.
Regular Expressions
Minimization Algorithm:
1. Remove all undefined transitions by introducting a TRAP state, i.e. a state from which no final state is reachable.
2. Partition all states into two groups (final states and non-final states).
3. Complete the “Next State” table for each group, by specifying transitions from group to group.Form the next partition: split groups in which Next State table entries differ.Repeat 3 until no further splitting is possible.
4. Determine start and final states.
Regular ExpressionsExample:
Π0 = { {1, 2, 3, 4}, {5} }
State a b1 1234 12342 1234 12343 1234 12344 1234 55 1234 1234
b
a
b
1
2
3 5
4
b
b aa
a
b
a
Split {4} from partition {1,2,3,4}
Regular Expressions
Π1 = { {1, 2, 3}, {4}, {5} }
State a b1 123 1232 123 43 123 1234 123 55 123 123
Split {2} from partition {1,2,3}
a
b
1
2
3 5
4
b
b aa
a
Regular Expressions
Π2 = { {1, 3}, {2}, {4}, {5} }
State a b1 2 133 2 132 2 44 2 55 2 13
No more splitting Minimal DFA
5
13
4
2a
a
aa
b
b
b
Regular Expressions
Summary of Regular Languages
• Smallest class in the Chomsky hierarchy.• Appropriate for lexical analysis.
• Four representations: RGR , RGL , RE and FSA.
• All four are equivalent; there are algorithms to perform transformations among them.
• Various advantages and disadvantages among these four, for language designer, implementor, and user.
• FSA’s can be made deterministic, and minimal.