Building Finite-State Machines

70
600.465 - Intro to NLP - J. Eisner 1 Building Finite-State Machines

description

Building Finite-State Machines. Finite-State Toolkits. In these slides, we’ll use Xerox’s regexp notation Their tool is XFST – free version is called FOMA Usage: Enter a regular expression; it builds FSA or FST Now type in input string FSA: It tells you whether it’s accepted - PowerPoint PPT Presentation

Transcript of Building Finite-State Machines

Page 1: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 1

Building Finite-State Machines

Page 2: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 2

Finite-State Toolkits

In these slides, we’ll use Xerox’s regexp notation Their tool is XFST – free version is called FOMA Usage:

Enter a regular expression; it builds FSA or FST Now type in input string

FSA: It tells you whether it’s accepted FST: It tells you all the output strings (if any) Can also invert FST to let you map outputs to inputs

Could hook it up to other NLP tools that need finite-state processing of their input or output

There are other tools for weighted FSMs (Thrax, OpenFST)

Page 3: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 3

Common Regular Expression Operators (in XFST notation)

concatenation EF* + iteration E*, E+| union E | F& intersection E & F~ \ - complementation, minus ~E, \x, F-E.x. crossproduct E .x. F.o. composition E .o. F.u upper (input) language E.u “domain”

.l lower (output) language E.l “range”

Page 4: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 4

Common Regular Expression Operators (in XFST notation)

concatenation EF

EF = {ef: e E, f F}

ef denotes the concatenation of 2 strings.EF denotes the concatenation of 2 languages.

To pick a string in EF, pick e E and f F and concatenate them.

To find out whether w EF, look for at least one way to split w into two “halves,” w = ef, such that e E and f F.

A language is a set of strings. It is a regular language if there exists an FSA that accepts all

the strings in the language, and no other strings.If E and F denote regular languages, than so does EF.

(We will have to prove this by finding the FSA for EF!)

Page 5: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 5

Common Regular Expression Operators (in XFST notation)

concatenation EF* + iteration E*, E+

E* = {e1e2 … en: n0, e1 E, … en E}

To pick a string in E*, pick any number of strings in E and concatenate them.

To find out whether w E*, look for at least one way to split w into 0 or more sections, e1e2 … en, all of which are in E.

E+ = {e1e2 … en: n>0, e1 E, … en E} =EE*

Page 6: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 6

Common Regular Expression Operators (in XFST notation)

concatenation EF* + iteration E*, E+| union E | F

E | F = {w: w E or w F} = E F

To pick a string in E | F, pick a string from either E or F. To find out whether w E | F, check whether w E or w

F.

Page 7: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 7

Common Regular Expression Operators (in XFST notation)

concatenation EF* + iteration E*, E+| union E | F& intersection E & F

E & F = {w: w E and w F} = E F

To pick a string in E & F, pick a string from E that is also in F.

To find out whether w E & F, check whether w E and w F.

Page 8: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 8

Common Regular Expression Operators (in XFST notation)

concatenation EF* + iteration E*, E+| union E | F& intersection E & F~ \ - complementation, minus ~E, \x, F-

E

~E = {e: e E} = * - EE – F = {e: e E and e F} = E & ~F\E = - E (any single character not in E)

is set of all letters; so * is set of all strings

Page 9: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 9

Regular ExpressionsA language is a set of strings. It is a regular language if there exists an FSA that

accepts all the strings in the language, and no other strings.

If E and F denote regular languages, than so do EF, etc.

Regular expression: EF*|(F & G)+Syntax:

E F

*

F G

concat

&

+

| Semantics: Denotes a regular language. As usual, can build semantics compositionally bottom-up.E, F, G must be regular languages. As a base case, e denotes {e} (a language containing a single string), so ef*|(f&g)+ is regular.

Page 10: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 10

Regular Expressionsfor Regular RelationsA language is a set of strings. It is a regular language if there exists an FSA that

accepts all the strings in the language, and no other strings.

If E and F denote regular languages, than so do EF, etc.

A relation is a set of pairs – here, pairs of strings.It is a regular relation if here exists an FST that accepts

all the pairs in the language, and no other pairs.If E and F denote regular relations, then so do EF, etc.

EF = {(ef,e’f’): (e,e’) E, (f,f’) F}Can you guess the definitions for E*, E+, E | F, E & F

when E and F are regular relations?Surprise: E & F isn’t necessarily regular in the case of relations; so not

supported.

Page 11: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 11

Common Regular Expression Operators (in XFST notation)

concatenation EF* + iteration E*, E+| union E | F& intersection E & F~ \ - complementation, minus ~E, \x, F-

E.x. crossproduct E .x. F

E .x. F = {(e,f): e E, f F} Combines two regular languages into a regular relation.

Page 12: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 12

Common Regular Expression Operators (in XFST notation)

concatenation EF* + iteration E*, E+| union E | F& intersection E & F~ \ - complementation, minus ~E, \x, F-E.x. crossproduct E .x. F.o. composition E .o. F

E .o. F = {(e,f): m. (e,m) E, (m,f) F} Composes two regular relations into a regular relation. As we’ve seen, this generalizes ordinary function

composition.

Page 13: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 13

Common Regular Expression Operators (in XFST notation)

concatenation EF* + iteration E*, E+| union E | F& intersection E & F~ \ - complementation, minus ~E, \x,

F-E.x. crossproduct E .x. F.o. composition E .o. F.u upper (input) language E.u “domain”

E.u = {e: m. (e,m) E}

Page 14: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 14

Common Regular Expression Operators (in XFST notation)

concatenation EF* + iteration E*, E+| union E | F& intersection E & F~ \ - complementation, minus ~E, \x, F-E.x. crossproduct E .x. F.o. composition E .o. F.u upper (input) language E.u “domain”

.l lower (output) language E.l “range”

Page 15: Building Finite-State Machines

Function from strings to ...

a:x/.5

c:z/.7

:y/.5.3

Acceptors (FSAs) Transducers (FSTs)

a:x

c:z

:y

a

c

Unweighted

Weighted a/.5

c/.7

/.5.3

{false, true} strings

numbers (string, num) pairs

Page 16: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 16

How to implement?

concatenation EF* + iteration E*, E+| union E | F~ \ - complementation, minus ~E, \x, E-F& intersection E & F.x. crossproduct E .x. F.o. composition E .o. F.u upper (input) language E.u “domain”

.l lower (output) language E.l “range”

slide courtesy of L. Karttunen (modified)

Page 17: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 17

Concatenation

==

example courtesy of M. Mohri

r

r

Page 18: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 18

||

Union

==

example courtesy of M. Mohri

r

Page 19: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 19

Closure (this example has outputs too)

==

**

example courtesy of M. Mohri

The loop creates (red machine)+ . Then we add a state to get do | (red machine)+ .Why do it this way? Why not just make state 0 final?

Page 20: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 20

Upper language (domain)

.u.u

==

similarly construct lower language .lalso called input & output languages

example courtesy of M. Mohri

Page 21: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 21

Reversal

.r.r

==

example courtesy of M. Mohri

Page 22: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 22

Inversion

.i.i

==

example courtesy of M. Mohri

Page 23: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 23

Complementation

Given a machine M, represent all strings not accepted by M

Just change final states to non-final and vice-versa

Works only if machine has been determinized and completed first (why?)

Page 24: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 24

Intersectionexample adapted from M. Mohri

fat/0.5

10 2/0.8pig/0.3 eats/0

sleeps/0.6

fat/0.210 2/0.5

eats/0.6

sleeps/1.3

pig/0.4

&&

0,0fat/0.7

0,1 1,1pig/0.7

2,0/0.8

2,2/1.3

eats/0.6

sleeps/1.9

==

Page 25: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 25

Intersectionfat/0.5

10 2/0.8pig/0.3 eats/0

sleeps/0.6

0,0fat/0.7

0,1 1,1pig/0.7

2,0/0.8

2,2/1.3

eats/0.6

sleeps/1.9

==

fat/0.210 2/0.5

eats/0.6

sleeps/1.3

pig/0.4

&&

Paths 0012 and 0110 both accept fat pig eats So must the new machine: along path 0,0 0,1 1,1 2,0

Page 26: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 26

fat/0.5

fat/0.2

Intersection

10 2/0.5

10 2/0.8pig/0.3 eats/0

sleeps/0.6

eats/0.6

sleeps/1.3

pig/0.4

0,0fat/0.7

0,1==

&&

Paths 00 and 01 both accept fatSo must the new machine: along path 0,0 0,1

Page 27: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 27

pig/0.3

pig/0.4

Intersection

fat/0.5

10 2/0.8eats/0

sleeps/0.6

fat/0.210 2/0.5

eats/0.6

sleeps/1.3

0,0fat/0.7

0,1pig/0.7

1,1==

&&

Paths 00 and 11 both accept pigSo must the new machine: along path 0,1 1,1

Page 28: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 28

sleeps/0.6

sleeps/1.3

Intersection

fat/0.5

10 2/0.8pig/0.3 eats/0

fat/0.210

eats/0.6

pig/0.4

0,0fat/0.7

0,1 1,1pig/0.7

sleeps/1.9 2,2/1.3

2/0.5

==

&&

Paths 12 and 12 both accept fatSo must the new machine: along path 1,1 2,2

Page 29: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 29

eats/0.6

eats/0

sleeps/0.6

sleeps/1.3

Intersection

fat/0.5

10 2/0.8pig/0.3

fat/0.210

pig/0.4

0,0fat/0.7

0,1 1,1pig/0.7

sleeps/1.9

2/0.5

2,2/1.3

eats/0.6 2,0/0.8

==

&&

Page 30: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 30

What Composition Means

ab?d abcd

abed

abjd

3

2

6

4

2

8

...

f

g

Page 31: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 31

What Composition Means

ab?d

...

Relation composition: f g

3+4

2+2

6+8

Page 32: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 32

Relation = set of pairs

ab?d abcd

abed

abjd

3

2

6

4

2

8

...

f

g

does not contain any pair of the form abjd …

ab?d abcdab?d abedab?d abjd …

abcd abed abed …

Page 33: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 33

Relation = set of pairs

ab?d abcdab?d abedab?d abjd …

abcd abed abed …

ab?d

4

2

...

8

ab?d ab?d ab?d …

f g = {xz: y (xy f and yz g)}where x, y, z are strings

f g

Page 34: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 34

Wilbur:pink/0.7

Intersection vs. Composition

pig/0.310

pig/0.4

1 0,1pig/0.7

1,1==&&

Intersection

Composition

Wilbur:pig/0.310

pig:pink/0.4

1 0,1 1,1==.o..o.

Page 35: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 35

Wilbur:gray/0.7

Intersection vs. Composition

pig/0.310

elephant/0.4

1 0,1pig/0.7

1,1==&&

Intersection mismatch

Composition mismatch

Wilbur:pig/0.310

elephant:gray/0.4

1 0,1 1,1==.o..o.

Page 36: Building Finite-State Machines

Composition example courtesy of M. Mohri

.o..o. ==

Page 37: Building Finite-State Machines

Composition

.o..o. ==

aa:b .o. b::b .o. b:bb = = aa::bb

Page 38: Building Finite-State Machines

Composition

.o..o. ==

aa:b .o. b::b .o. b:aa = = aa::aa

Page 39: Building Finite-State Machines

Composition

.o..o. ==

aa:b .o. b::b .o. b:aa = = aa::aa

Page 40: Building Finite-State Machines

Composition

.o..o. ==

bb:b .o. b::b .o. b:a a = = bb::aa

Page 41: Building Finite-State Machines

Composition

.o..o. ==

aa:b .o. b::b .o. b:aa = = aa::aa

Page 42: Building Finite-State Machines

Composition

.o..o. ==

aa:a .o. a::a .o. a:bb = = aa::bb

Page 43: Building Finite-State Machines

Composition

.o..o. ==

bb:b .o. a::b .o. a:bb = nothing = nothing(since intermediate symbol doesn’t (since intermediate symbol doesn’t match)match)

Page 44: Building Finite-State Machines

Composition

.o..o. ==

bb:b .o. b::b .o. b:aa = = bb::aa

Page 45: Building Finite-State Machines

Composition

.o..o. ==

aa:b .o. a::b .o. a:bb = = aa::bb

Page 46: Building Finite-State Machines

Composition in Dyna

600.465 - Intro to NLP - J. Eisner 46

start = &pair( start1, start2 ).

final(&pair(Q1,Q2)) :- final1(Q1), final2(Q2).

edge(U, L, &pair(Q1,Q2), &pair(R1,R2)) min= edge1(U, Mid, Q1, R1) + edge2(Mid, L, Q2, R2).

Page 47: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 47

Relation = set of pairs

ab?d abcdab?d abedab?d abjd …

abcd abed abed …

ab?d

4

2

...

8

ab?d ab?d ab?d …

f g = {xz: y (xy f and yz g)}where x, y, z are strings

f g

Page 48: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 48

3 Uses of Set Composition:

Feed string into Greek transducer: {abedabed} .o. Greek = {abed,abed} {abed} .o. Greek = {abed,abed} [{abed} .o. Greek].l = {}

Feed several strings in parallel: {abcd, abed} .o. Greek

= {abcd,abed,abed}

[{abcd,abed} .o. Greek].l = {,,} Filter result via No = {,,…}

{abcd,abed} .o. Greek .o. No = {abcd,abed}

Page 49: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 49

What are the “basic” transducers?

The operations on the previous slides combine transducers into bigger ones

But where do we start?

a: for a :x for x

Q: Do we also need a:x? How about ?

a:

:x

Page 50: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 50

Some Xerox Extensions

$ containment=> restriction-> @-> replacement

Make it easier to describe complex languages and relations without extending the formal power of finite-state systems.

slide courtesy of L. Karttunen (modified)

Page 51: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 51

Containment

$[ab*c]$[ab*c]

““Must contain a substringMust contain a substringthat matches that matches ab*cab*c.”.”

Accepts Accepts xxxacyyxxxacyyRejects Rejects bcbabcba

?* [ab*c] ?* ?* [ab*c] ?*

Equivalent expression Equivalent expression

bb

cc

aa

a,b,c,?a,b,c,?

a,b,c,?a,b,c,?

Warning: ?? in regexps means “any character at all.”

But ?? in machines means “any character not explicitly

mentioned anywhere in the machine.”

Page 52: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 52

Restriction

??cc

bb

bb

cc?? aa

cc

a => b _ ca => b _ c

““AnyAny aa must be preceded bymust be preceded by bband followed byand followed by cc.”.”

Accepts Accepts bacbbacdebacbbacdeRejects Rejects bacbacaa

~[~[~[?* b] a ?*~[?* b] a ?*]] & & ~[~[?* a ~[c ?*]?* a ~[c ?*]]] Equivalent expression Equivalent expression

slide courtesy of L. Karttunen (modified)

contains aa not preceded by bb

contains aa not followed by cc

Page 53: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 53

Replacement

a:ba:b

bb

aa

??

??

b:ab:a

aa

a:ba:b

a b -> b a

““Replace ‘ab’ by ‘ba’.”Replace ‘ab’ by ‘ba’.”

Transduces Transduces abcdbaba to to bacdbbaa

[[~$[a b] ~$[a b] [[a b] .x. [b a]][[a b] .x. [b a]]]*]* ~$[a b]~$[a b] Equivalent expression Equivalent expression

slide courtesy of L. Karttunen (modified)

Page 54: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 54

Replacement is Nondeterministic

a b -> b a | x

““Replace ‘ab’ by ‘ba’ or ‘x’, nondeterministically.”Replace ‘ab’ by ‘ba’ or ‘x’, nondeterministically.”

Transduces Transduces abcdbaba to { to {bacdbbaa,, bacdbxa,, xcdbbaa,, xcdbxa}}

Page 55: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 55

Replacement is Nondeterministic

[ a b -> b a | x ] .o. [ x => _ c ]

““Replace ‘ab’ by ‘ba’ or ‘x’, nondeterministically.”Replace ‘ab’ by ‘ba’ or ‘x’, nondeterministically.”

Transduces Transduces abcdbaba to { to {bacdbbaa,, bacdbxa,, xcdbbaa,, xcdbxa}}

Page 56: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 56

Replacement is Nondeterministic

a b | b | b a | a b a -> x

applied to “aba”

Four overlapping substrings match; we haven’t told it which one to replace so it chooses nondeterministically

a b a a b a a b a a b a

a x a a x x a x

slide courtesy of L. Karttunen (modified)

Page 57: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 57

More Replace Operators

Optional replacement: a b (->) b a

Directed replacement guarantees a unique result by

constraining the factorization of the input string by Direction of the match (rightward or

leftward) Length (longest or shortest)

slide courtesy of L. Karttunen

Page 58: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 58

@-> Left-to-right, Longest-match Replacement

a b | b | b a | a b a @-> x

applied to “aba”

a b a a b a a b a a b a

a x a a x x a x

slide courtesy of L. Karttunen

@-> left-to-right, longest match @> left-to-right, shortest match ->@ right-to-left, longest match >@ right-to-left, shortest match

Page 59: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 59

Using “…” for marking

0:[0:[

[[

0:]0:]

??

aa

ee

iioo

uu]]

a|e|i|o|u -> [ ... ]

p o t a t op o t a t op[o]t[a]t[o]p[o]t[a]t[o]

Note: actually have to write as Note: actually have to write as -> %[ ... %] or or -> “[” ... “]”

since [] are parens in the regexp languagesince [] are parens in the regexp language

slide courtesy of L. Karttunen (modified)

Page 60: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 60

Using “…” for marking

0:[0:[

[[

0:]0:]

??

aa

ee

iioo

uu]]

a|e|i|o|u -> [ ... ]

p o t a t op o t a t op[o]t[a]t[o]p[o]t[a]t[o]

Which way does the FST transduce potatoe?

slide courtesy of L. Karttunen (modified)

How would you change it to get the other answer?

p o t a t o ep o t a t o ep[o]t[a]t[o][e]p[o]t[a]t[o][e]

p o t a t o ep o t a t o ep[o]t[a]t[o e]p[o]t[a]t[o e]

vs.

Page 61: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 61

Example: Finnish Syllabification

define C [ b | c | d | f ...define C [ b | c | d | f ...define V [ a | e | i | o | u ];define V [ a | e | i | o | u ];

s t r u k t u r a l i s m s t r u k t u r a l i s m iis t r u k - t u - r a - l i s - m s t r u k - t u - r a - l i s - m ii

[C* V+ C*] @-> ... "-" || _ [C V][C* V+ C*] @-> ... "-" || _ [C V]

““Insert a hyphen after the longest instance of theInsert a hyphen after the longest instance of the

C* V+ C*C* V+ C* pattern in front of a pattern in front of a C VC V pattern.” pattern.”

slide courtesy of L. Karttunen

why?why?

Page 62: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 62

Conditional Replacement

The relation that replaces A by B between L and R leaving everything else unchanged.

A -> BA -> B

Replacement

L _ RL _ R

Context

Sources of complexity:

Replacements and contexts may overlap

Alternative ways of interpreting “between left and right.”

slide courtesy of L. Karttunen

Page 63: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 63

Hand-Coded Example: Parsing Dates

Today is [Tuesday, July 25, 2000].

Today is Tuesday, [July 25, 2000].

Today is [Tuesday, July 25], 2000.

Today is Tuesday, [July 25], 2000.

Today is [Tuesday], July 25, 2000.

Best result

Bad results

Need left-to-right, longest-match constraints.

slide courtesy of L. Karttunen

Page 64: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 64

Source code: Language of Dates

Day = Monday | Tuesday | ... | Sunday

Month = January | February | ... | December

Date = 1 | 2 | 3 | ... | 3 1

Year = %0To9 (%0To9 (%0To9 (%0To9))) - %0?* from 1 to 9999

AllDates = Day | (Day “, ”) Month “ ” Date (“, ” Year))

slide courtesy of L. Karttunen

Page 65: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 65

actually represents 7 arcs, each labeled by a string

Object code: All Dates from 1/1/1 to 12/31/9999

, ,

FebJan

Mar

MayJunJul

Apr

Aug

OctNovDec

Sep

3

,

,

123456789

0123456789

0

123456789

0123456789

123456789

0

10

21

TueMon

Wed

FriSatSun

Thu 456789

MayJan Feb Mar Apr Jun

Jul Aug Oct Nov DecSep

13 states, 96 arcs13 states, 96 arcs29 760 007 date expressions29 760 007 date expressions

slide courtesy of L. Karttunen

Page 66: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 66

Parser for Dates

AllDates @-> “[DT ” ... “]”Compiles into an

unambiguous transducer (23 states,

332 arcs).

Today is Today is [DT Tuesday, July 25, 2000][DT Tuesday, July 25, 2000] because yesterday because yesterday

was was [DT Monday][DT Monday] and it was and it was [DT July 24][DT July 24] so tomorrow must so tomorrow must

be be [DT Wednesday, July 26][DT Wednesday, July 26] and not and not [DT July 27][DT July 27] as it says as it says

on the program.on the program.

Xerox left-to-right replacement operator

slide courtesy of L. Karttunen (modified)

Page 67: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 67

Problem of Reference

Valid dates

Tuesday, July 25, 2000

Tuesday, February 29, 2000

Monday, September 16, 1996Invalid dates

Wednesday, April 31, 1996

Thursday, February 29, 1900

Tuesday, July 26, 2000

slide courtesy of L. Karttunen

Page 68: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 68

Refinement by IntersectionAllDatesAllDates

ValidValidDatesDates

WeekdayDateWeekdayDate

MaxDaysMaxDaysIn MonthIn Month

“ 31” => Jan|Mar|May|… _“ 30” => Jan|Mar|Apr|… _

LeapYearsLeapYears

Feb 29, => _ …

slide courtesy of L. Karttunen (modified)

Xerox contextual restriction operator

Q: Why do these rulesstart with spaces?(And is it enough?)

Q: Why does this ruleend with a comma?Q: Can we write the whole rule?

Page 69: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 69

Defining Valid Dates

AllDates

&

MaxDaysInMonth

&

LeapYears

&

WeekdayDates

= ValidDates

AllDates: 13 states, 96 arcs29 760 007 date expressions

ValidDates: 805 states, 6472 arcs7 307 053 date expressions

slide courtesy of L. Karttunen

Page 70: Building Finite-State Machines

600.465 - Intro to NLP - J. Eisner 70

Parser for Valid and Invalid Dates

[AllDates - ValidDates] @-> “[ID ” ... “]”,

ValidDates @-> “[VD ” ... “]”

Today is [VD Tuesday, July 25, 2000],

not [ID Tuesday, July 26, 2000].

valid date

invalid date

2688 states,20439 arcs

slide courtesy of L. Karttunen

Comma creates a single FST that does left-to-right

longest match against either pattern