CSCI 3130: Automata theory and formal languages

30
CSCI 3130: Automata theory and formal languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/ ~andrejb/csc3130 The Chinese University of Hong Kong Text search and closure properties Fall 2010

description

Fall 2010. The Chinese University of Hong Kong. CSCI 3130: Automata theory and formal languages. Text search and closure properties. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Text search. The program grep. grep -E regexp file. - PowerPoint PPT Presentation

Transcript of CSCI 3130: Automata theory and formal languages

CSCI 3130: Automata theory and formal languages

Andrej Bogdanov

http://www.cse.cuhk.edu.hk/~andrejb/csc3130

The Chinese University of Hong Kong

Text search and closure properties

Fall 2010

Text search

The program grep

grep -E regexp file

Searches for the occurrence of patterns matching a regular expression

[ab][12] {a1, a2, b1, b2} concatenation

cat|12 {cat, 12} union

[abc] {a, b, c} shorthand for a|b|c

[ab]{2} {aa, ab, ba, bb} {n} copies

(ab)* {, ab, abab, ...} star

[ab]? {, a, b} zero or one

(cat)+ {cat, catcat, ...} one or more

Searching with grep

grep –E `savou?r` words

outsavorsavorsavoredsavorersavorilysavorinesssavoringlysavorlesssavorous

savorsomesavorysavourunsavoredunsavoredlyunsavorednessunsavorilyunsavorinessunsavory

Words containing savor or savour

grep –E `[ab]{5}` words

grabbable

Words with 5 consecutive a or b

grep –E `zo+zo+` words

zoozoo

More grep commands

$ end of line

. any symbol

\< beginning of line

[a-z] anything in a range

s u f f e r

r e g u l a r

s t a r

n f aa

l

en

1

2

3

4

5

1 If a DFA is too hard, I do an ...

2 CSCI 3130 homework makes me ...

3 If a DFA likes it, it is ...

4 $10000000 = $10?

5 I study 3130 hard because it will make me ...

grep –E `\<.ff.u..t` words

how do you look for...

grep –E `([aeiouy].*){10}` words

grep –E `\<cat.*cat` words

grep –E `\<[^AEIOUYaeiouy]*$` words

Words with at least ten vowels?

Words that start in cat and have another cat

Words without any vowels? [^R] does not contain R

grep –E `\<[^AEIOUYaeiouy]* ([aeiouy][^AEIOUYaeiouy]*){10}$` words

Words with exactly ten vowels?

How grep (could) work

regularexpression

NFA NFAwithout

DFA

text file input

[ab]?, a+, (cat){3} not allowed allowed

input handling matches wholelooks for pattern

output accept/reject finds pattern

differences in class in grep

Implementation of grep

• How do you handle expressions like:

[ab]? zero or one

(cat)+ one or more

a{3} {n} copies

→ ()|[ab]R? → |R

→ (cat)(cat)*R+ → RR*

→ aaaR{n} → RR...R

n times

[^aeiouy] not containing any ?

Closure properties

Example

• The language L of strings that end in 101 is regular

• How about the language L of strings that do not end in 101?

(0+1)*101

Example

• Hint: w does not end in 101 if and only if it ends in:

or it has length 0, 1, or 2

• So L can be described by the regular expression

000, 001, 010, 011, 100, 110 or 111

(0+1)*(000+001+010+010+100+110+111) + + (0 + 1) + (0 + 1)(0 + 1)

Complement

• The complement L of a language L is the set of all strings that are not in L

• Examples ( = {0, 1})– L1 = all strings that end in 101

– L1 = all strings that do not end in 101 = all strings end in 000, …, 111 or have length 0, 1, or 2

– L2 = 1* = {, 1, 11, 111, …}

– L2 = all strings that contain at least one 0 = (0 + 1)*0(0 + 1)*

Example

• The language L of strings that contain 101 is regular

• How about the language L of strings that do not contain 101?

(0+1)*101(0+1)*

You can write a regular expression,but it is a lot of work!

Closure under complement

• To argue this, we can use any of the equivalent definitions for regular languages:

• The DFA definition will be most convenient– We assume L has a DFA, and show L also has a

DFA

regularexpression

DFANFA

If L is a regular language, so is L.

Arguing closure under complement

• Suppose L is regular, then it has a DFA M

• Now consider the DFA M’ with the accepting and rejecting states of M reversed

accepts L

accepts strings not in L

this is exactly L

Food for thought

• Can we do the same thing with an NFA?

1 0

0, 1

q q q (0+1)*10

1 0

0, 1

q q q (0+1)*

NO!

Intersection

• The intersection L L’ is the set of strings that are in both L and L’

• Examples:

• If L, L’ are regular, is L L’ also regular?

L = (0 + 1)*11 L’ = 1* L L’ =

L = (0 + 1)*10 L’ = 1* L L’ =

1*11

Closure under intersection

If L and L’ are regular languages, so is L L’.

regularexpression

DFANFA

• To argue this, we can use any of the equivalent definitions for regular languages:

Suppose L and L’ have DFAs, call them M and M’

Goal: Construct a DFA (or NFA) for L L’

An example

L = even number of 0s

r0 r1

1 10

0

Ms0 s1

0 01

1

M’

L’ = odd number of 1s

L L’ = even number of 0s and odd number of 1s

r0, s1

1

1

1

1

0 0 0 0

r0, s0

r1, s1r1, s0

Closure under intersection

M and M’ DFA for L L’

states Q = {r1, ..., rn}

Q’ = {s1, ..., sn’}

Q × Q’ = {(r1, s1), (r1, s2), ..., (r2, s1), ..., (rn, sn’)}

Whenever M is in state ri and M’ is in state sj,the DFA for L L’ will be in state (ri, sj)

start state

acceptingstates

ri for Msj for M’

(ri, sj)

F for MF’ for M’

F × F’ = {(ri, sj): ri ∈ F, sj ∈ F’}

Closure under intersection

M and M’ DFA for L L’

transitions

si’ sj’

a

ri, si’ rj, sj’ain M

in M’

ri rj

a

Reversal

• The reversal wR of a string w is w written backwards

• The reversal LR of a language L is the language obtained by reversing all its strings

w = cave wR = evac

L = {cat, dog} LR = {tac, god}

Reversal of regular languages

• L = all strings that end in 101 is regular

• How about LR?

• This is the language of all strings beginning in 101

• Yes, because it is represented by

(0+1)*101

101(0+1)*

Closure under reversal

• How do we argue?

If L is a regular language, so is LR.

regularexpression

DFANFA

Arguing closure under reversal

• Take a regular expression E for L

• We will show how to reverse E

• A regular expression can be of the following types:– The special symbols and – Alphabet symbols like a, b– The union, concatenation, or star of simpler

expressions

Proof of closure under reversal

regular expression E

a (alphabet symbol)

E1 + E2

reversal ER

E1E2

E1*

a

E1R + E2

R

E2RE1

R

(E1R)*

A question

If L is regular, is LDUP also regular?

regularexpression

DFANFA?

LDUP = {ww: w L} L = {cat, dog}Ex.

LDUP = {catcat, dogdog}

A question

• Let’s try with regular expression:

• Let’s try with NFA:

q0 q1NFA for L NFA for L

LDUP = LLL = {a, b}

LDUP = {aa, bb}

LL = {aa, ab, ba, bb}

An example

• Let’s try to design an NFA for LDUP

L = 0*1 is regular

LDUP = {11, 0101, 001001, 00010001, ...}

= {0n10n1: n ≥ 0}

LDUP = {1, 01, 001, 0001, ...}

An example

LDUP = {11, 0101, 001001, 00010001, ...}

= {0n10n1: n ≥ 0}

0 0 0 0

0001

1

001

1

01

1

1

1