ICS 482 Natural Language Processing Regular Expression and Finite Automata - 2
1 Introduction to Finite Automata Fundamentals Deterministic finite automata Representations of...
-
Upload
alban-norris -
Category
Documents
-
view
252 -
download
2
Transcript of 1 Introduction to Finite Automata Fundamentals Deterministic finite automata Representations of...
1
Introduction to Finite Automata
FundamentalsDeterministic finite automataRepresentations of automata
Definition of a languageProof of language content
2
Review:What is the difference between
S not equal to T and S T = null?
Recursive definitions use that thing being defined as part of the definition.
Example: recursive definition of a treeBasis: single node is a treeIH: assume that T1 … Tk are treesInduction: connect T1 … Tk to the node that is the root of the recursively defined tree
3
Review: FA Recognizing Strings Ending in “ing”
empty Saw ii
Not i
Saw ingg
i
Not i or g
Saw inn
i
Not i or n
Start
Double circle denotes “finish” or “accepting”state
We can get the next character in any stateWhere we go depends on which character
What is wrong with this FA?
Rules for deterministic finite automata
Finite automata move from state to state in response to input controls
Deterministic finite automata (DFA) are always in a unique state
Each input determines one and only one state to which the automaton will transition from its current state
4
Ready Sending
data in
done
timeout
Start
5
Review: FA Recognizing Strings Ending in “ing”
empty Saw ii
Not i
Saw ingg
i
Not i or g
Saw inn
i
Not i or n
Start
Does this example obey the rules?
How many string ending in “ing” will this FA find in the input?
Any letter
6
Review: FA Recognizing Strings Ending in “ing”
empty Saw ii
Not i
Saw ingg
i
Not i or g
Saw inn
i
Not i or n
Start
How do I change this FA so that it only accepts input string ending in “ing”
Any letter
Central concepts of automata theory
Alphabets: sets of symbols Strings: lists of symbols from an
alphabet Languages: sets of string from the
same alphabet
7
8
Alphabets
Any finite set of symbols Usually denoted by Examples:
=set of all ASCII characters ={0,1} binary alphabet ={a,b,…,z} lower case letters
9
Sets of Strings A set of strings over alphabet Σ is a set of
lists, each element of which is a member of Σ
e denotes the empty string (length=0) Σ* denotes the set of all strings over {0,1}*={ε,0,1,00,01,10,11,000,…} k denotes strings of length k over a
specified alphabet 0={e} for any alphabet + denotes non-empty strings
10
More about Strings * =0 1 2 … and 1 give different meaning to the
same entity denotes an alphabet 1 denotes strings of unit length over alphabet
If x and y are strings, then xy is a string formed by their concatenation
11
Languages A language is a subset of strings for
some alphabet L=Σ* language of all strings from L can be empty L={e} is not empty. It contains the
empty string See text p31 for examples from binary
alphabet
12
Using DFAs to define languages
Elements in the formalism for defining languages:
1. A finite set of states (Q, typically).2. An input alphabet (Σ, typically).3. A transition function (δ, typically).4. A start state in Q (q0, typically).
5. A set of final states ⊆ Q (F, typically).“Final” and “accepting” are synonyms.
Language A=(Q,,,q0,F) is a “five tuple”
13
The Transition Function Takes two arguments: a state and
an input symbol. δ(q, a) = the state that the DFA
goes to when it is in state q and input a is received.
In graph representation δ(q, a) = p is shown by arc from state q to state p labeled by a
The set of strings “accepted” by A=(Q,,,q0,F) its
language How do we determine if A=(Q,,,q0,F)
accepts a string? Let a1,a2,…an denote a finite string
Locate q0 in A using “start”
Find states q1,q2,…qn such that (qi-1,ai)=qi
If qn F then string accepted, otherwise string is rejected
14
15
DFA that accepts all strings without two
consecutive 1’s
Start
1
0
A CB1
0 0,1
No 11’s and ends in 0
No 11’s and ends In 1
Consecutive1’s havebeen seen.
L=({A,B,C}, {0,1}, , A, {A,B})
3 possible terminations
16
Transition Table: δ(q, a) is the element in row q of
columns a
0 1
A A BB A CC C C
Final statesstarred
**Arrow for
start state
Start
1
0
A CB1
0 0,1
17
Using graph to answer Is string in language?
Start
1
0
A CB1
0 0,1
Is string 101 in the language of the DFA below?Start at A.
18
Is string in language 2
Start
1
0
A CB1
0 0,1
Follow arc labeled 1.
Is string 101 in the language of the DFA below?
19
Is string in language 3
Start
1
0
A CB1
0 0,1
Then arc labeled 0 from current state B.
Is string 101 in the language of the DFA below?
20
Is string in language 4
Start
1
0
A CB1
0 0,1
Finally arc labeled 1 from current state A. Resultis an accepting state, so 101 is in the language.
Is string 101 in the language of the DFA below?
Extended Transition FunctionDelta-hat
We defined δ(q, a) as the state that the DFA goes to when it is in state q and input a is received.
We want an extended transition function (q, w) defined as the state a DFA is in after it processes string w starting from state q
This extended transition function allows a language to be defined by L={w|(q0,w)F}
21˄
˄
˄
22
Recursive definition of Delta-hat
Write w=xa where a is last symbol in w Basis: δ(q, ε) = q, where e is empty string Basis: δ(q, a) = δ(q, a) Induction: δ(q,xa) = δ(δ(q,x),a) If δ(q,x)=p then δ(q,w) = δ(p,a)
˄
˄
˄
˄
˄
˄
˄
23
Use δ(q,xa) = δ(δ(q,x),a) to decompose delta-hat to nested
delta’s0 1
A* A BB* A CC C C
δ(B,011) = δ(δ(B,01),1) = δ(δ(δ(B,0),1),1) =
δ(δ(A,1),1) = δ(B,1) = C011 not accepted
˄ ˄
For convenience of slide preparation, I will use delta-underline to mean delta-hat
24
Quiz #2 Monday 9-8-14on material in lecture “finite automata 2”and text pp28-52
25
Provej=1 to n j = n(n+1)/2
Base case n=1j=1 to 1 j = 1(1+1)/2
Review: What method is used in this proof?If S(?) then S(?)
26
Review: Induction on integers
• Methods if S(n) then S(n+1) and if S(n-1) then S(n) are equivalent but if S(n+1) then S(n) is inconsistent with induction principal.
• What you assume is your IH. Make sure it is true.
• I’m not accepting the “identity” method (text pp20-21) because it does not generalize to other forms of induction.
27
Review: Language of a DFA
Automata of all kinds define languages. If A=(Q,,,q0,F) is an automaton, L(A) is its
language. L(A) = the set of strings w such that δ(q0, w) is in F.˄
28
0 1
A* A BB* A CC C C
˄
˄
Formal proof of language content
Delta_hat and transition table make it easy to test if a string is in L(DFA)
Formal proof of language content is a problem in equality of sets
29
S = L(DFA)T = strings of 0’s and 1’s with no consecutive 1’s
To prove S=T, we need to prove If w is in S, then w is in T If w is in T, then w is in S
Proof of “If w is in S, then w is in T” is by induction on length of the string |w|
Proof of language content
Start
1
0
A CB 1
0 0,1
30
Since DFA has 2 accepting states, if w is in S=L(DFA) has 2 cases
•If δ(A, w) = A (string ends with 0)•If δ(A, w) = B (string ends with 1)
Start
1
0
A CB1
0 0,1
No 11’s and ends in 0
No 11’s and ends in 1
31
Prove: if w in L(DFA) then w no 11’s
Basis: |w| = 0; i.e., w = ε δ(A, e)=A by definition (e in S=L(DFA)) since ε has no 1’s, e is in T Basis is true
Induction: |w| >0, write w = xa, where a is the last symbol of w.
inductive hypothesis: if x in L(DFA) then x no 11’s
32
IH: if x in L(DFA) then x no 11’s has 2 cases
Inductive hypothesis: if δ(A, x) = A, then x has no
consecutive 1’s and ends in 0. if δ(A, x) = B, then x has no
consecutive 1’s and ends in a single 1.
Start
1
0
A CB 10 0,1
33
If δ(A, xa) = A then δ(A, x) must be A or B and a must be 0 (see DFA above). By the IH, x has no 11’s and ends in 0 or 1. Thus, w has no 11’s and end in 0.
If δ(A, xa) = B then δ(A, x) must be A and a must be 1 (see DFA above). By IH (case 1), x has no 11’s and ends in 0. Thus, w has no 11’s and ends in 1
Start
1
0
A CB 10 0,1Induction
w=xa
34
Contrapositive : If w is not accepted by DFA (i.e. (A,w)=C) then w has 11’s. If δ(A,w)=C then w = x1y, where δ(A,x)=B
and y is the tail of w that follows what gets to C the first time.
If δ(A,x)=B then x = z1 for some z. Thus w = z11y and has 11.
Start
1
0
A CB1
0 0,1
Prove: if w no 11’s then w in L(DFA)
35
Exercise 2.2.2 p53: prove (q, xy)=((q, x), y)
Recall: xy is the concatenation of strings x and y
When a is last character in string, true by definition of delta_hat: (q, xa)=((q, x), a)
Base case is true Complete proof by induction on |y|
36
Let y=za((q,x),y)=((q,x),za) let(q,x)=p
=(p,za) =((p,z),a) def of txt p49 =(((q,x),z)a) substitution for p
Assume (q,xz)=((q,x),z) for strings shorter than ySince |z|<|y|((q,x),y)=(((q,x),z)a)=((q,xz),a)
=(q,xza) def of txt p49 =(q,xy) substitution for za
prove (q,xy)=((q,x),y) by induction on |y|
37
Review: Assignment 2Due 9-10-14Exercise 2.2.9a, text p 54Given: A=(Q,,,q0,{qf}) and (q0, a)=(qf, a) for all a in Prove: (q0, w) = (qf, w) for all w with |w| > 0 using induction on the length of the string. Your proof must include the following:(1) truth of base case(2) statement of inductive hypothesis(3) application of inductive hypothesis
Hint: let w=xa and used definition of delta_hat
38
Exercise 2.2.9b, text p54Given: A = (Q, , , q0, {qf}) and (q0, w) = (qf, w) iff w in L(A) with |w| > 0 Prove by induction on integers that if |x| > 0 and x is in L(A) then xk (k > 0 concatenations of x) is also in L(A) Your proof must include the following:(1) truth of base cases k=1 and k=2(2) statement of inductive hypothesis k>3(3) application of inductive hypothesis to show that if S(k-1) then S(k)
Note: the proof is about membership in L(A) Hint: use the result from exercise 2.2.2 in text p53
39
Review: designing finite automataExercise 2.2.4a: A=({A,B,C},{0,1},,A,{C})
L(A)=set of input strings ending in 00Graph A
Exercise 2.2.10: A= 0 1->A A B
*B B A
L(A)=set input strings with an odd number of 1’s
Prove by induction on length of string that an input string with an odd number of 1’s is accepted by A
40
Assignment #3 due 9-19-14Exercise 2.2.10: DFA= 0 1
->A A B *B B A
L(DFA)=set input strings with an odd number of 1’s
Prove by induction on length of string that (A,w)=A if w contains an even number of 1’s(A,w)=B if w contains an odd number of 1’s
Hints: Basis includes |w|=0 and |w|=1if w=xa, then a can be zero or oneif w=xa and a=0 then x and w contain the
same number of 1’s
41
More hints on Assignment #3Exercise 2.2.10: DFA= 0 1
->A A B *B B A
Define parity(w) as whether w contains and even or odd number of ones.
Prove by induction on |w|If parity(w)=even, then (A,w)=AIf parity(w)=odd, then (A,w)=B
Basis includes w={e,0,1}. For each case, your proof must ask “what is parity(w)?” and “what is (A,w)?”.
Induction with w=xa has 4 cases: party(x)=even, a={0,1} and party(x)=odd, a={0,1}. For each case, your proof must ask the same questions as in the Basis.
42
Regular Languages
A language L is regular if it consist entirely of strings accepted by some DFA. The DFA must accept only the strings in
L, no others.
Some languages LNR are not regular. No DFA exist that accepts all the string
in LNR
43
Example of LNR
L1 = {0n1n | n ≥ 1}
L1 is the set of strings consisting of n 0’s followed by n 1’s, such that n is at least 1.
Thus, L1 = {01, 0011, 000111,…}
44
Another LNR
L2 = {w | w in {(, )}* and w is balanced } alphabet consists of the parenthesis
symbols ’(’ and ’)’. Balanced parentheses are those that can
appear in an arithmetic expression. (), ()(), (()), (()()), etc
45
Many Languages are Regular
We will discuss 4 ways to describe a Regular Language DFA’s Non-deterministic FA’s Non-deterministic FA’s with -
transtions Regular expressions
Will show at all are equivalent
46