automata7.ppt

101
1-303 Discrete Maths: Automata/7 Discrete Maths Discrete Maths Recognising input using: Recognising input using: automata automata : a graph-based technique : a graph-based technique regular expressions regular expressions : an algebraic technique : an algebraic technique equivalent to automata equivalent to automata 241-303, Semester 1 2009-2010 7. Automata an d Regular Express ions

Transcript of automata7.ppt

Page 1: automata7.ppt

241-303 Discrete Maths: Automata/7 1

Discrete MathsDiscrete Maths

Recognising input using:Recognising input using:– automataautomata: a graph-based technique: a graph-based technique– regular expressionsregular expressions: an algebraic technique: an algebraic technique

equivalent to automata equivalent to automata

241-303, Semester 1 2009-2010

7. Automata andRegular Expressions

Page 2: automata7.ppt

241-303 Discrete Maths: Automata/7 2

OverviewOverview

1. 1. Introduction to AutomataIntroduction to Automata

2.2. Representing AutomataRepresenting Automata

3. 3. The ‘aeiou’ AutomatonThe ‘aeiou’ Automaton

4.4. Generating OutputGenerating Output

5.5. Bounce Filter ExampleBounce Filter Example

6.6. Deterministic and Deterministic and Nondeterministic AutomataNondeterministic Automata

continued

Page 3: automata7.ppt

241-303 Discrete Maths: Automata/7 3

7.7. ‘washington’ Partial Anagrams‘washington’ Partial Anagrams

8.8. Regular ExpressionsRegular Expressions

9.9. UNIX Regular ExpressionsUNIX Regular Expressions

10.10. From REs to AutomataFrom REs to Automata

11.11. More InformationMore Information

Page 4: automata7.ppt

241-303 Discrete Maths: Automata/7 4

1. Introduction to Automata1. Introduction to Automata

A A finite state automatonfinite state automaton represents a problem as a represents a problem as a series of series of statesstates and and transitionstransitions between the states between the states– the automaton starts in an initial statethe automaton starts in an initial state– input causes a transition from the current state to anothinput causes a transition from the current state to anoth

er;er;– a state may be a state may be acceptingaccepting

the automaton can terminate successfully when it enters an acthe automaton can terminate successfully when it enters an accepting state (if it wants to)cepting state (if it wants to)

Page 5: automata7.ppt

241-303 Discrete Maths: Automata/7 5

1.1. An Example1.1. An Example

evenA oddAstart

b

a

a

b

The states are the ovals.The states are the ovals. The transitions are the arrowsThe transitions are the arrows

– labelled with the input that ‘trigger’ themlabelled with the input that ‘trigger’ them The ‘oddA’ state is accepting.The ‘oddA’ state is accepting.

continued

The ‘even-odd’ Automaton

Page 6: automata7.ppt

241-303 Discrete Maths: Automata/7 6

Execution SequenceExecution Sequence InputInput Move to StateMove to State

b a b a a evenA

b a b a a evenA

b a b a a oddA

b a b a a oddA

initialstate

the automaton could choose toterminate here

b a b a a evenA

b a b a a oddAstops since

no more input

Page 7: automata7.ppt

241-303 Discrete Maths: Automata/7 7

1.2. Why are Automata Useful?1.2. Why are Automata Useful?

Automata are a very good way of modeling Automata are a very good way of modeling finite-state systemsfinite-state systems which change state due t which change state due to input. Examples:o input. Examples:– text editors, compilers, UNIX tools like text editors, compilers, UNIX tools like grepgrep

– communications protocolscommunications protocols– digital hardware componentsdigital hardware components

e.g. adders, RAMe.g. adders, RAM very differentapplications

Page 8: automata7.ppt

241-303 Discrete Maths: Automata/7 8

2. Representing Automata2. Representing Automata

Automata have a mathematical basis which Automata have a mathematical basis which allows them to be analysed, e.g.:allows them to be analysed, e.g.:– prove that they accept correct inputprove that they accept correct input– prove that they do prove that they do notnot accept accept incorrectincorrect input input

Automata can be manipulated to simplify thAutomata can be manipulated to simplify them, and they can be automatically converteem, and they can be automatically converted into code.d into code.

Page 9: automata7.ppt

241-303 Discrete Maths: Automata/7 9

2.1. A Mathematical Coding2.1. A Mathematical Coding

We can represent an automaton in terms of sets We can represent an automaton in terms of sets and mathematical functions.and mathematical functions.

The ‘even-odd’ automaton is:The ‘even-odd’ automaton is:startSet = { evenA }startSet = { evenA }

acceptSet = { oddA }acceptSet = { oddA }

nextState(evenA, b) => evenAnextState(evenA, b) => evenAnextState(evenA, a) => oddAnextState(evenA, a) => oddAnextState(oddA, b) => oddAnextState(oddA, b) => oddAnextState(oddA, a) => evenAnextState(oddA, a) => evenA

continued

Page 10: automata7.ppt

241-303 Discrete Maths: Automata/7 10

Analysis of the mathematical form can shoAnalysis of the mathematical form can show that the ‘even-odd’ automaton only accepw that the ‘even-odd’ automaton only accepts strings which:ts strings which:– contain an odd number of ‘a’scontain an odd number of ‘a’s– e.g.e.g.

babaa abb abaab aabba aaaaba … babaa abb abaab aabba aaaaba …

Page 11: automata7.ppt

241-303 Discrete Maths: Automata/7 11

2.2. Automaton in Code 2.2. Automaton in Code

It is easy to (automatically) translate an automaIt is easy to (automatically) translate an automaton into code, but ...ton into code, but ...– an automaton graph does not contain all the details an automaton graph does not contain all the details

needed for a programneeded for a program

The main extra coding issues:The main extra coding issues:– what to do when we enter an accepting state?what to do when we enter an accepting state?– what to do when the input cannot be processed?what to do when the input cannot be processed?

e.g. e.g. abzzabzz is enteredis entered

Page 12: automata7.ppt

241-303 Discrete Maths: Automata/7 12

Encoding the ‘even-odd’ AutomatonEncoding the ‘even-odd’ Automaton

enum state {evenA, oddA}; // possible states

enum state currState = evenA; // start stateint isAccepting = 0; // falseint ch;

while ((ch = getchar()) != EOF)) { currState = nextState(currState, ch); isAccepting = acceptable(currState);}if (isAccepting) printf(“accepted\n);else printf(“not accepted\n”);

continued

accepting stateonly used atend of input

Page 13: automata7.ppt

241-303 Discrete Maths: Automata/7 13

enum state nextState(enum state s, int ch){ if ((s == evenA) && (ch == ‘b’)) return evenA; if ((s == evenA) && (ch == ‘a’)) return oddA; if ((s == oddA) && (ch == ‘b’)) return oddA; if ((s == oddA) && (ch == ‘a’)) return evenA;

printf(“Illegal Input”); exit(1);}

simple handlingof incorrect input

continued

Page 14: automata7.ppt

241-303 Discrete Maths: Automata/7 14

int acceptable(enum state s){ if (s == oddA) return 1; // oddA is an accepting state return 0;

}

Page 15: automata7.ppt

241-303 Discrete Maths: Automata/7 15

3. The ‘aeiou’ Automaton3. The ‘aeiou’ Automaton

What English words contain the five vowels What English words contain the five vowels (a, e, i, o, u) in order?(a, e, i, o, u) in order?

Some words that match:Some words that match:– abstemiousabstemious– facetiousfacetious– sacrilegioussacrilegious

Page 16: automata7.ppt

241-303 Discrete Maths: Automata/7 16

3.1. Automaton Graph3.1. Automaton Graph

0

L - a

astart1

L - e

e2

L - i

i3

L - o

o4

L - u

u5

L = all letters

Page 17: automata7.ppt

241-303 Discrete Maths: Automata/7 17

3.2. Execution Sequence (1)3.2. Execution Sequence (1)

InputInput Move to StateMove to Statef a c e t i o u s 0

0

1

1

continued

f a c e t i o u s

f a c e t i o u s

f a c e t i o u s

Page 18: automata7.ppt

241-303 Discrete Maths: Automata/7 18

InputInput Move to StateMove to State2

2

f a c e t i o u s

f a c e t i o u s

3

4

f a c e t i o u s

f a c e t i o u s

5f a c e t i o u sthe automaton canterminate here;no need to processmore input

Page 19: automata7.ppt

241-303 Discrete Maths: Automata/7 19

Execution Sequence (2)Execution Sequence (2) InputInput Move to StateMove to State

a n d r e w 0

a n d r e w 1

a n d r e w 1

a n d r e w 1

continued

Page 20: automata7.ppt

241-303 Discrete Maths: Automata/7 20

InputInput Move to StateMove to Statea n d r e w 1

a n d r e w 2

a n d r e w 2, and end of inputmeans failure

Page 21: automata7.ppt

241-303 Discrete Maths: Automata/7 21

3.3. Translation to Code3.3. Translation to Code

enum state {0, 1, 2, 3, 4, 5}; // poss. states

enum state currState = 0; // start stateint isAccepting = 0; // falseint ch;

while ((ch = getchar()) != EOF) && !isAccepting) { currState = nextState(currState, ch); isAccepting = acceptable(currState);}if (isAccepting) printf(“accepted\n);else printf(“not accepted\n”);

stop processingwhen the accepting

state is entered

continued

Page 22: automata7.ppt

241-303 Discrete Maths: Automata/7 22

enum state nextState(enum state s, int ch){ if (s == 0) { if (ch == ‘a’) return 1; else return 0; // input is L-a } if (s == 1) { if (ch == ‘e’) return 2; else return 1; // input is L-e } if (s == 2) { if (ch == ‘i’) return 3; else return 2; // input is L-i } : continued

Page 23: automata7.ppt

241-303 Discrete Maths: Automata/7 23

: if (s == 3) { if (ch == ‘o’) return 4; else return 3; // input is L-o } if (s == 4) { if (ch == ‘u’) return 5; else return 4; // input is L-u }

printf(“Illegal Input”); exit(1);} // end of nextState()

simple handlingof incorrect input

Page 24: automata7.ppt

241-303 Discrete Maths: Automata/7 24

int acceptable(enum state s){ if (s == 5) return 1; // 5 is an accepting state return 0;

}

Page 25: automata7.ppt

241-303 Discrete Maths: Automata/7 25

4. Generating Output4. Generating Output

One possible extension to the basic automatOne possible extension to the basic automaton idea is to allow output:on idea is to allow output:– when a transition is ‘triggered’ there can be optiwhen a transition is ‘triggered’ there can be opti

onal output as wellonal output as well

Automata which generate output are sometiAutomata which generate output are sometimes called mes called Finite State MachinesFinite State Machines (FSMs). (FSMs).

Page 26: automata7.ppt

241-303 Discrete Maths: Automata/7 26

4.1. ‘even-odd’ with Output4.1. ‘even-odd’ with Output

evenA oddAstart

ba/1

a

b

When the ‘a’ transition is triggered out of the When the ‘a’ transition is triggered out of the evenA state, then a ‘1’ is output.evenA state, then a ‘1’ is output.

Page 27: automata7.ppt

241-303 Discrete Maths: Automata/7 27

4.2. Mathematical Coding4.2. Mathematical Coding

Add an ‘output’ mathematical function to thAdd an ‘output’ mathematical function to the automaton representation:e automaton representation:

output( evenA, a ) => 1output( evenA, a ) => 1

Page 28: automata7.ppt

241-303 Discrete Maths: Automata/7 28

4.3. Extending the C Coding4.3. Extending the C Coding

The while loop for ‘even-odd’ will become:The while loop for ‘even-odd’ will become:

:while ((ch = getchar()) != EOF)) { output(currState, ch); currState = nextState(currState, ch); isAccepting = acceptable(currState);}

:

continued

Page 29: automata7.ppt

241-303 Discrete Maths: Automata/7 29

The The output()output() C function: C function:

void output(enum state s, int ch){ if ((s == evenA) && (ch == ‘a’)) putchar(‘1’);}

Page 30: automata7.ppt

241-303 Discrete Maths: Automata/7 30

5. Bounce Filter Example5. Bounce Filter Example

A signal processing problem:A signal processing problem:– a stream of 1’s and 0’s are ‘smoothed’ by the filter sa stream of 1’s and 0’s are ‘smoothed’ by the filter s

o that:o that: a single 0 surrounded by 1’s becomes a 1:a single 0 surrounded by 1’s becomes a 1:

...1111...1111001111... => ...111111111...1111... => ...111111111... a single 1 surrounded by 0’s becomes a 0a single 1 surrounded by 0’s becomes a 0

...0000...0000110000... => ...000000000...0000... => ...000000000...

This kind of filtering is used in image processinThis kind of filtering is used in image processing to reduce ‘noise’.g to reduce ‘noise’.

Page 31: automata7.ppt

241-303 Discrete Maths: Automata/7 31

5.1. The ‘bounce’ Automaton5.1. The ‘bounce’ Automaton

b

a

d

cstart

0/0 1/0

1/1 0/0

0/0 1/1

0/11/1

Page 32: automata7.ppt

241-303 Discrete Maths: Automata/7 32

NotesNotes There is no accepting stateThere is no accepting state

– the code will simply terminate at EOFthe code will simply terminate at EOF

The ‘a’ and ‘b’ states (left side) mostly have tThe ‘a’ and ‘b’ states (left side) mostly have transitions that output ‘0’s.ransitions that output ‘0’s.

The ‘c’ and ‘d’ states (right side) mostly have The ‘c’ and ‘d’ states (right side) mostly have transitions that output ‘1’s.transitions that output ‘1’s.

Page 33: automata7.ppt

241-303 Discrete Maths: Automata/7 33

5.2. Execution Sequence5.2. Execution Sequence InputInput Move to StateMove to State OutputOutput

0 1 0 1 1 0 1 a

a 0

b 0

a 0

continued

0 1 0 1 1 0 1

0 1 0 1 1 0 1

0 1 0 1 1 0 1

Page 34: automata7.ppt

241-303 Discrete Maths: Automata/7 34

InputInput Move to StateMove to State OutputOutputb 0

c 1

d 1

c 1

moved to righthand side

0 1 0 1 1 0 1

0 1 0 1 1 0 1

0 1 0 1 1 0 1

0 1 0 1 1 0 1

Page 35: automata7.ppt

241-303 Discrete Maths: Automata/7 35

5.3. I/O Behaviour5.3. I/O Behaviour

Input: Input: 0 1 0 1 1 0 10 1 0 1 1 0 1Output:Output: 0 0 0 0 1 1 1 0 0 0 0 1 1 1

It takes 2 bits of the same type before the auIt takes 2 bits of the same type before the automaton realises that it has a new bit sequentomaton realises that it has a new bit sequence rather than a ‘noise’ bit.ce rather than a ‘noise’ bit.

smoothed awayin the output

Page 36: automata7.ppt

241-303 Discrete Maths: Automata/7 36

6. Deterministic and 6. Deterministic and Nondeterministic Automata Nondeterministic Automata

We have been writing We have been writing deterministicdeterministic automata s automata so far:o far:– for an input read by a state there is for an input read by a state there is at most one tranat most one tran

sition that can be firedsition that can be fired state ‘s’ can process input ‘a’ and ‘w’, and fails for anytstate ‘s’ can process input ‘a’ and ‘w’, and fails for anyt

hing elsehing else

S

a

w

Page 37: automata7.ppt

241-303 Discrete Maths: Automata/7 37

Nondeterministic AutomataNondeterministic Automata

A A nondeterministicnondeterministic (ND) automaton can ha (ND) automaton can have 2 or more transitions with the same label ve 2 or more transitions with the same label leaving a state.leaving a state.

ProblemProblem: if state S sees input ‘x’, then whic: if state S sees input ‘x’, then which transition should it use?h transition should it use?

S

a

x

x U

T

V

Page 38: automata7.ppt

241-303 Discrete Maths: Automata/7 38

6.1. The ‘man’ Automaton6.1. The ‘man’ Automaton

Accept all strings that contain “man”Accept all strings that contain “man”– this is hard to write as a deterministic automatothis is hard to write as a deterministic automato

n. The following has bugs:n. The following has bugs:

0 1 2 3start

L - m

m a n

L - n

L - a

continued

WRONG

Page 39: automata7.ppt

241-303 Discrete Maths: Automata/7 39

The input string The input string commandcommand

will get stuck at state 0:will get stuck at state 0:

0o

0m

1m

0a

0n

0d

0c

the problemstarts here

0

Page 40: automata7.ppt

241-303 Discrete Maths: Automata/7 40

6.2. A ND Automaton Solution6.2. A ND Automaton Solution

0 1 2 3start

L

m a n

It is nondeterministic because an ‘m’ input in It is nondeterministic because an ‘m’ input in state 0 can be dealt with by two transitions:state 0 can be dealt with by two transitions:– a transition back to state 0, ora transition back to state 0, or– a transition to state 1a transition to state 1

continued

Page 41: automata7.ppt

241-303 Discrete Maths: Automata/7 41

Processing Processing commandcommand input: input:

0o

0m

0m

0a

0n

0d

0c

0

1

1a

2n

3acceptingstate

mfail: rejectthe input

Page 42: automata7.ppt

241-303 Discrete Maths: Automata/7 42

6.3. Executing a ND Automata6.3. Executing a ND Automata It is difficult to code ND automata in conventionIt is difficult to code ND automata in convention

al languages, such as C.al languages, such as C.

Two different coding approaches:Two different coding approaches:– 1. When an input arrives, execute 1. When an input arrives, execute all transitions in pall transitions in p

arallelarallel. See which succeeds.. See which succeeds.

– 2. When an input arrives,2. When an input arrives, try one transitiontry one transition. If it lead. If it leads to failure then s to failure then backtrackbacktrack and try another transition. and try another transition.

Page 43: automata7.ppt

241-303 Discrete Maths: Automata/7 43

Approach (1) in ParlogApproach (1) in Parlog A A concurrentconcurrent logic programming language. logic programming language.

state0([X|Rest]) :- state0(Rest) : true.state0([m|Rest]) :- state1(Rest) : true.

state1([a|Rest]) :- state2(Rest).

state2([n|Rest]).

concurrenttesting

Call:?- state0([c,o,m,m,a,n,d]).

Page 44: automata7.ppt

241-303 Discrete Maths: Automata/7 44

Approach (2) in PrologApproach (2) in Prolog

nextState(0, _, 0).nextState(0, ‘m’, 1).nextState(1, ‘a’, 2).nextState(2, ‘n’, 3).

nda(State, [Ch|Input]) :- nextState(State, Ch, NewState), nda(NewState, Input).nda(3, []). // accepting state

Call:?- nda(0, [c,o,m,m,a,n,d]).

the nondeterministic part

a sequential logic programming language

Page 45: automata7.ppt

241-303 Discrete Maths: Automata/7 45

6.4. Why use ND Automata?6.4. Why use ND Automata?

With nondeterminism, some problems are eWith nondeterminism, some problems are easier to solve/model.asier to solve/model.

Nondeterminism is common in some applicNondeterminism is common in some application areas, such as AI, graph search, and coation areas, such as AI, graph search, and compilers.mpilers.

continued

Page 46: automata7.ppt

241-303 Discrete Maths: Automata/7 46

It is possible to translate a ND automaton inIt is possible to translate a ND automaton into a (larger, complex) deterministic one.to a (larger, complex) deterministic one.

In mathematical terms, ND automata and deIn mathematical terms, ND automata and determinstic automata are terminstic automata are equivalentequivalent– they can be used to model all the same problemthey can be used to model all the same problem

ss

Page 47: automata7.ppt

241-303 Discrete Maths: Automata/7 47

7. ‘washington’ Partial Anagrams7. ‘washington’ Partial Anagrams

Find all the words which can be made from the lFind all the words which can be made from the letters in “washington”.etters in “washington”.

There are over 240 words. Some of the 7-letter wThere are over 240 words. Some of the 7-letter words:ords:– agonistagonist– goatishgoatish– showingshowing– washingwashing

Page 48: automata7.ppt

241-303 Discrete Maths: Automata/7 48

7.1. A Two Stage Process7.1. A Two Stage Process

1. Select all the words from a dictionary (e.g. 1. Select all the words from a dictionary (e.g. /us/us

r/share/dict/wordsr/share/dict/words on on calvincalvin) which use the lett) which use the letters in “washington”ers in “washington”– use a use a deterministicdeterministic automaton automaton

2. Delete the words which use the “washington” l2. Delete the words which use the “washington” letters too many times (e.g. “hash”)etters too many times (e.g. “hash”)– use a use a nondeterministicnondeterministic automaton automaton

Page 49: automata7.ppt

241-303 Discrete Maths: Automata/7 49

7.2. Stage 1: Deterministic Automaton7.2. Stage 1: Deterministic Automaton

Send each word in the dictionary through thSend each word in the dictionary through the automaton:e automaton:

If state 1 is reached, then the word is passed If state 1 is reached, then the word is passed to stage 2.to stage 2.

0 1start newline

S = {w,a,s,h,i,n,g,t,o}

Page 50: automata7.ppt

241-303 Discrete Maths: Automata/7 50

For example, “hash\n” is accepted:For example, “hash\n” is accepted:

0a

0s

0h

0\n

10h

Page 51: automata7.ppt

241-303 Discrete Maths: Automata/7 51

7.3. Stage 2: ND Automaton7.3. Stage 2: ND Automaton

Check if a word uses a “washington” letter tCheck if a word uses a “washington” letter too often:oo often:– e.g. delete “hash”e.g. delete “hash”

The ND The ND automatonautomaton succeeds if a word uses succeeds if a word uses too many letters. too many letters.

Then the Then the programprogram will will notnot output the word. output the word.

Page 52: automata7.ppt

241-303 Discrete Maths: Automata/7 52

Checking each LetterChecking each Letter

There are 9 different letters in “washington”.There are 9 different letters in “washington”. Nine deterministic automaton can be used to Nine deterministic automaton can be used to

detect if the given word has:detect if the given word has:– more than 1 ‘a’more than 1 ‘a’– more than 1 ‘g’more than 1 ‘g’– ......– more than 2 ‘n’smore than 2 ‘n’s

Page 53: automata7.ppt

241-303 Discrete Maths: Automata/7 53

Check for more than 1 ‘a’Check for more than 1 ‘a’

0 1 2start

L - a

a a

L - a

If this succeeds then the program will not oIf this succeeds then the program will not output the word.utput the word.

e.g. ‘nana’

Page 54: automata7.ppt

241-303 Discrete Maths: Automata/7 54

Checking all the Letters at OnceChecking all the Letters at Once

The 9 deterministic automaton can be appliThe 9 deterministic automaton can be applied to the same word at the same time.ed to the same word at the same time.

Combine the 9 deterministic automata to creCombine the 9 deterministic automata to create a single nondeterministic automaton.ate a single nondeterministic automaton.

Page 55: automata7.ppt

241-303 Discrete Maths: Automata/7 55

Nondeterminstic CheckingNondeterminstic Checking

0 1 2start

L

a a

L - a

3 4g

L - g

g

5 6h

L - h

h

continued

two a's

two g's

two h's

Page 56: automata7.ppt

241-303 Discrete Maths: Automata/7 56

9 10 11n

L - n

n n

L - n

7 8i

L - i

i

12 13o

L - o

o

continued

two i's

three n's

two o's

Page 57: automata7.ppt

241-303 Discrete Maths: Automata/7 57

16 17t t

L - t

14 15s

L - s

s

18 19w

L - w

w

two s's

two t's

two w's

Page 58: automata7.ppt

241-303 Discrete Maths: Automata/7 58

Processing “hash”Processing “hash”

Reaching an accepting state means that the prograReaching an accepting state means that the program will m will notnot output “hash”. output “hash”.

0a

0s

0h

0h

0

5

1414

111

6555

h

h

h

s

a s

ha

Page 59: automata7.ppt

241-303 Discrete Maths: Automata/7 59

7.4. UNIX Coding7.4. UNIX Coding

Stages 0,1,2, piped together:Stages 0,1,2, piped together:

tr A-Z a-z < /usr/share/dict/words | tr A-Z a-z < /usr/share/dict/words | grep '^[washingto]*$' | grep '^[washingto]*$' | egrep -v 'a.*a|g.*g|h.*h|i.*i|egrep -v 'a.*a|g.*g|h.*h|i.*i|

n.*n.*n|o.*o|s.*s|t.*t|w.*w’ n.*n.*n|o.*o|s.*s|t.*t|w.*w’

The call to The call to trtr translates all the words taken from the d translates all the words taken from the dictionary into lower case.ictionary into lower case.

tr grep egrep -v

/usr/share/dict/words

Page 60: automata7.ppt

241-303 Discrete Maths: Automata/7 60

8. Regular Expressions (REs)8. Regular Expressions (REs)

REs are an REs are an algebraicalgebraic way of specifying ho way of specifying how to recognise inputw to recognise input– ‘‘algebraic’ means that the recognition algebraic’ means that the recognition patternpattern is is

defined using RE operands and operatorsdefined using RE operands and operators

REs are REs are equivalentequivalent to automata to automata– REs and automata can be used on all the same pREs and automata can be used on all the same p

roblemsroblems

Page 61: automata7.ppt

241-303 Discrete Maths: Automata/7 61

8.1. REs in grep8.1. REs in grep

grep searches input lines, a line at a time.grep searches input lines, a line at a time. If the line contains a string that matches greIf the line contains a string that matches gre

p's RE (pattern), then the line is output.p's RE (pattern), then the line is output.

grep "RE"

input lines(e.g. from a file)

hello andymy name is andymy bye byhe

output matching lines(e.g. to a file)

continued

Page 62: automata7.ppt

241-303 Discrete Maths: Automata/7 62

ExamplesExamples

grep "and"hello andymy name is andymy bye byhe

hello andymy name is andy

grep –E "an|my"hello andymy name is andymy bye byhe

hello andymy name is andymy bye byhe

continued

"|" means "or"

Page 63: automata7.ppt

241-303 Discrete Maths: Automata/7 63

grep "hel*"hello andymy name is andymy bye byhe

hello andymy bye byhe

"*" means "0 or more"

Page 64: automata7.ppt

241-303 Discrete Maths: Automata/7 64

8.2. Why use REs?8.2. Why use REs?

They are very useful for expressing patterns They are very useful for expressing patterns that recognise textual input.that recognise textual input.

For example, REs are used in:For example, REs are used in:– editorseditors– compilerscompilers– web-based search enginesweb-based search engines– communication protocolscommunication protocols

Page 65: automata7.ppt

241-303 Discrete Maths: Automata/7 65

8.3. The RE Language8.3. The RE Language

A RE defines a pattern which recognises (A RE defines a pattern which recognises (matches) a matches) a setset of strings of strings– e.g. a RE can be defined that recognises the ste.g. a RE can be defined that recognises the st

rings { aa, aba, abba, abbba, abbbba, …} rings { aa, aba, abba, abbba, abbbba, …}

These recognisable strings are sometimes These recognisable strings are sometimes called the RE’s called the RE’s languagelanguage..

Page 66: automata7.ppt

241-303 Discrete Maths: Automata/7 66

RE OperandsRE Operands

There are 4 basic kinds of operands:There are 4 basic kinds of operands:– characters (e.g. ‘a’, ‘1’, ‘(‘)characters (e.g. ‘a’, ‘1’, ‘(‘)

– the symbol the symbol (means an empty string ‘’)(means an empty string ‘’)

– the symbol {} (means the empty set)the symbol {} (means the empty set)

– variables, which can be assigned a REvariables, which can be assigned a RE variable = REvariable = RE

Page 67: automata7.ppt

241-303 Discrete Maths: Automata/7 67

RE OperatorsRE Operators

There are three basic operators:There are three basic operators:– union ‘|’union ‘|’– concatenation concatenation – closure *closure *

Page 68: automata7.ppt

241-303 Discrete Maths: Automata/7 68

UnionUnion

S | TS | T– this RE can use the S this RE can use the S oror T RE to match strings T RE to match strings

Example REs:Example REs:a | ba | b matches strings {a, b}matches strings {a, b}

a | b | ca | b | c matches strings {a, b, c }matches strings {a, b, c }

Page 69: automata7.ppt

241-303 Discrete Maths: Automata/7 69

ConcatenationConcatenation

S TS T– this RE will use the S RE this RE will use the S RE followed byfollowed by the T RE the T RE

to match against stringsto match against strings

Example REs:Example REs:a ba b matches the string { ab }matches the string { ab }

w | (a b)w | (a b) matches the strings {w, ab}matches the strings {w, ab}

Page 70: automata7.ppt

241-303 Discrete Maths: Automata/7 70

What strings are matched by the REWhat strings are matched by the RE(a | ab ) (c | bc)(a | ab ) (c | bc)

Equivalent to:Equivalent to:{a, ab} followed by {c, bc}{a, ab} followed by {c, bc}

=> {ac, abc, abc, abbc}=> {ac, abc, abc, abbc}

=> {ac, abc, abbc}=> {ac, abc, abbc}

Page 71: automata7.ppt

241-303 Discrete Maths: Automata/7 71

ClosureClosure

S*S*– this RE can use the S RE this RE can use the S RE 0 or more times0 or more times to ma to ma

tch against stringstch against strings

Example RE:Example RE:a*a* matches the strings:matches the strings:

{{, a, aa, aaa, aaaa, aaaaa, ... }, a, aa, aaa, aaaa, aaaaa, ... }

empty string

Page 72: automata7.ppt

241-303 Discrete Maths: Automata/7 72

8.4. REs for C Identifiers8.4. REs for C Identifiers

We define two RE variables, We define two RE variables, letterletter and and digidigi

tt::letter = A | B | C | D ... Z |letter = A | B | C | D ... Z |

a | b | c | d .... z a | b | c | d .... z

digit = 0 | 1 | 2 | ... 9digit = 0 | 1 | 2 | ... 9

ident ident is defined using is defined using letterletter and and digitdigit::ident = letter ( letter | digit )*ident = letter ( letter | digit )*

continued

Page 73: automata7.ppt

241-303 Discrete Maths: Automata/7 73

Strings matched by Strings matched by identident include: include:ab345ab345 ww h5gh5g

Strings not matched:Strings not matched:22 $abc$abc ********

Page 74: automata7.ppt

241-303 Discrete Maths: Automata/7 74

9. UNIX Regular Expressions9. UNIX Regular Expressions

Different UNIX tools use slightly different extenDifferent UNIX tools use slightly different extensions of the basic RE notationsions of the basic RE notation– vivi, , awkawk, , sedsed, , grepgrep, , egrepegrep, etc., etc.

Extra features include:Extra features include:– character classescharacter classes– line start ‘^’ and end ‘$’ symbolsline start ‘^’ and end ‘$’ symbols– the wild card symbol ‘.’the wild card symbol ‘.’– additional operators, R? and R+additional operators, R? and R+

Page 75: automata7.ppt

241-303 Discrete Maths: Automata/7 75

9.1. Character Classes9.1. Character Classes

The character class [aThe character class [a11 a a22 ... a ... ann] stands for ] stands for

aa11 | a | a22 | ... | a | ... | ann

aa11- a- ann stands for the set of characters betwee stands for the set of characters betwee

n an a11 and a and ann

– e.g. e.g. [A-Z] [a-z0-9][A-Z] [a-z0-9]

Page 76: automata7.ppt

241-303 Discrete Maths: Automata/7 76

9.2. Line Start and End9.2. Line Start and End

The ‘^’ matches the beginning of the line, ‘The ‘^’ matches the beginning of the line, ‘$’ matches the end$’ matches the end– e.g. e.g. grep ‘^andr’ /usr/share/dict/words grep ‘^andr’ /usr/share/dict/words

grep '^[washingto]*$' grep '^[washingto]*$'

/usr/share/dict/words /usr/share/dict/words

Page 77: automata7.ppt

241-303 Discrete Maths: Automata/7 77

Example as a DiagramExample as a Diagram

grep "^andr"AA'sAOLAOL's : :

androgenandrogen'sandrogynousandroidandroid'sandroids

/usr/share/dict/words

Page 78: automata7.ppt

241-303 Discrete Maths: Automata/7 78

9.3. Wild Card Symbol9.3. Wild Card Symbol

The ‘.’ stands for any character except the nThe ‘.’ stands for any character except the newlineewline– e.g. e.g. grep ‘^a..b.$’ chapter1.txtgrep ‘^a..b.$’ chapter1.txt

grep ‘t.*t.*t’ manualgrep ‘t.*t.*t’ manual

Page 79: automata7.ppt

241-303 Discrete Maths: Automata/7 79

grep "^a..b.$"AA'sAOLAOL's : :

adobealibiameba

/usr/share/dict/words

Page 80: automata7.ppt

241-303 Discrete Maths: Automata/7 80

9.4. R? and R+9.4. R? and R+

R? stands for R? stands for | R (0 or 1 R) | R (0 or 1 R)

R+ stands for R | RR | RRR | ...R+ stands for R | RR | RRR | ...which can also be written as R R*which can also be written as R R*– one or more occurrences of Rone or more occurrences of R

Page 81: automata7.ppt

241-303 Discrete Maths: Automata/7 81

9.5. Operator Precedence9.5. Operator Precedence

The operators *, +, and ? have the highest pThe operators *, +, and ? have the highest precedence.recedence.

Then comes concatenationThen comes concatenation Union ‘|’ is the lowest precedenceUnion ‘|’ is the lowest precedence

Example:Example:– a | bc? means a | (b(c?)), and matches the stringa | bc? means a | (b(c?)), and matches the string

s {a, b, bc}s {a, b, bc}

Page 82: automata7.ppt

241-303 Discrete Maths: Automata/7 82

10. From REs to Automata10. From REs to Automata The translation uses a special kind of ND automata whiThe translation uses a special kind of ND automata whi

ch uses ch uses -transitions-transitions. Automata of this type are someti. Automata of this type are sometimes calledmes called -NFAs-NFAs..

The translation steps are:The translation steps are:– RERE =>=> -NFA-NFA

– -NFA -NFA =>=> ND automatonND automaton

– ND automaton => ND automaton => deterministic automatondeterministic automaton– deterministic automaton => codedeterministic automaton => code

Page 83: automata7.ppt

241-303 Discrete Maths: Automata/7 83

10.1. 10.1. -NFAs-NFAs

A A -NFA allows a transition to use a -NFA allows a transition to use a label label..

A transition using an A transition using an label can be triggered label can be triggered without having to match any input.without having to match any input.

Page 84: automata7.ppt

241-303 Discrete Maths: Automata/7 84

-NFA Example-NFA Example

a*b | b*a is accepted by the following a*b | b*a is accepted by the following -NF-NFA:A:

1 6

2

4

3

5

start

b

a

b

a

nondeterminismoccurs here

Example input:"bbba"

Page 85: automata7.ppt

241-303 Discrete Maths: Automata/7 85

10.2. RE to 10.2. RE to -NFA-NFA The resulting The resulting -NFA has:-NFA has:

– one start state and one accepting stateone start state and one accepting state

– at most two transitions out of any stateat most two transitions out of any state

The construction uses standard automata ‘pieces’ The construction uses standard automata ‘pieces’ corresponding to RE operands and operators.corresponding to RE operands and operators.

The pieces are put together based on an expressioThe pieces are put together based on an expression tree for the RE.n tree for the RE.

Page 86: automata7.ppt

241-303 Discrete Maths: Automata/7 86

Automata Pieces for RE OperandsAutomata Pieces for RE Operands

xstartAutomaton fora character x

startAutomaton for

startAutomaton for {}

This automaton does notaccept any strings.

Page 87: automata7.ppt

241-303 Discrete Maths: Automata/7 87

Automata Pieces for RE OperatorsAutomata Pieces for RE Operators

Union S | T:Union S | T:

S

T

start

continued

Page 88: automata7.ppt

241-303 Discrete Maths: Automata/7 88

Concatenation S T:Concatenation S T:

S Tstart

continued

Page 89: automata7.ppt

241-303 Discrete Maths: Automata/7 89

Closure S*:Closure S*:

Sstart

Page 90: automata7.ppt

241-303 Discrete Maths: Automata/7 90

10.3. Translating a | bc*10.3. Translating a | bc*

The first step in building the automaton is to The first step in building the automaton is to drawdraw a | bc* a | bc* as an expression tree: as an expression tree:

|

.

*

c

b

a

the concatenatesymbol

Page 91: automata7.ppt

241-303 Discrete Maths: Automata/7 91

Translate the 3 leavesTranslate the 3 leaves

1 2astartAutomaton for a

4 5bstartAutomaton for b

7 8cstartAutomaton for c

Page 92: automata7.ppt

241-303 Discrete Maths: Automata/7 92

Automaton for c*Automaton for c*

7 8 9

6start

c

Page 93: automata7.ppt

241-303 Discrete Maths: Automata/7 93

Automaton for bc*Automaton for bc*

7 8 9

6start

c54

b

Page 94: automata7.ppt

241-303 Discrete Maths: Automata/7 94

Final Automaton for a | bc*Final Automaton for a | bc*

7 8

6

start

c54

b

3

21a

9

0

Page 95: automata7.ppt

241-303 Discrete Maths: Automata/7 95

10.4. From 10.4. From -NFA to ND Automaton-NFA to ND Automaton

The The -transitions can be removed by -transitions can be removed by combincombininging the states that use them. the states that use them.

If we are in a state S with If we are in a state S with -transition outpu-transition outputs, then we are also in any state that can be rts, then we are also in any state that can be reached from S by following those eached from S by following those transitio transitions.ns.

Page 96: automata7.ppt

241-303 Discrete Maths: Automata/7 96

Example: simplify the lower branch of a|bc*Example: simplify the lower branch of a|bc*

7 8

6

c54

b

3

9

0

continued

Page 97: automata7.ppt

241-303 Discrete Maths: Automata/7 97

becomes:

7 8

6

c54

b

3

9

0

39

continued

Page 98: automata7.ppt

241-303 Discrete Maths: Automata/7 98

becomes:

7 8

6,9,3

cb9,30,4

continued

becomes:

75,6,9,3

cb8,9,30,4

5

state combinationbegins

Page 99: automata7.ppt

241-303 Discrete Maths: Automata/7 99

becomes:

5,6,9,3

cb

7,8,9,30,4

becomes: cb

5,6,7,8,9,30,4

simplify the labels: cb50

Page 100: automata7.ppt

241-303 Discrete Maths: Automata/7 100

All of a|bc* simplified:All of a|bc* simplified:

5

2

0

b

a

start

c

This also happensto be a deterministicautomaton, so thetranslation is finished.

Page 101: automata7.ppt

241-303 Discrete Maths: Automata/7 101

11. More Information11. More Information

Johnsonbaugh, R. Johnsonbaugh, R. 19971997. . Discrete MatheDiscrete Mathematicsmatics, Prentice Hall, chapter , Prentice Hall, chapter 1010..