November 2003CSA3050 Conflation Algorithms1 CSA305: NLP Algorithms Conflation Algorithms.
5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.
-
Upload
meryl-goodman -
Category
Documents
-
view
227 -
download
0
Transcript of 5.11.2002CSA3050: NLP Algorithms1 Finite State Transducers for Morphological Parsing.
5.11.2002 CSA3050: NLP Algorithms 1
CSA3050: NLP Algorithms
Finite State Transducers for Morphological Parsing
5.11.2002 CSA3050: NLP Algorithms 2
Resumé
• FSAs are equivalent to regular languages
• FSTs are equivalent to regular relations (over pairs of regular languages)
• FSTs are like FSAs but with complex labels.
• We can use FSTs to transduce between surface and lexical levels.
5.11.2002 CSA3050: NLP Algorithms 3
Dotted Pair Notation
f o x1) FSA recogniser for "fox"
f:f o:o x:x
2) FST transducers for fox/fox; goose/geese
g:g o:e s:so:e e:e
5.11.2002 CSA3050: NLP Algorithms 4
Dotted Pair Notation (2)
• By convention, x:y pairs lexical symbol x with surface symbol y
• By convention, within the context of FSTs, we often encounter "default pairs" of the form x:x. These are often written as "x".
g o:e so:e e
5.11.2002 CSA3050: NLP Algorithms 5
FSA for Number Inflection
How can we augment this to producean analysis?
5.11.2002 CSA3050: NLP Algorithms 6
3 Steps
1. Create a transducer Tnum for noun number inflection. This will add number and category information given word classes as input.
2. Create a transducer Tstems mapping words to word classes.
3. Hook the two together.
5.11.2002 CSA3050: NLP Algorithms 7
Tnum example“lexical”
“intermediate”
+PLreg-noun-stem +N
^ #s
reg-noun-stem
5.11.2002 CSA3050: NLP Algorithms 8
1. Tnum: Noun Number Inflection
• multi-character symbols• morpheme boundary ^• word boundary #
5.11.2002 CSA3050: NLP Algorithms 9
Tstems example
#
reg-noun-stem #
“intermediate”
“surface”
d:d o:o g:gf:f o:o x:x
Tstems
5.11.2002 CSA3050: NLP Algorithms 10
Tstems example
#
irreg-pl-noun-form#
“intermediate”
“surface”
m o:i u:ε s es h e e p
Tstems
5.11.2002 CSA3050: NLP Algorithms 11
2. Tstems Lexicon
5.11.2002 CSA3050: NLP Algorithms 12
Hooking Together
• There are two ways to hook the two transducers together
• Cascading: hooking the output of one transducer with the input of the other and running them in series.
• Composition: composing the two transducers together mathematically to create a third, equivalent transducer.
5.11.2002 CSA3050: NLP Algorithms 13
Hooking Together: cascading+PLreg-noun-stem +N
sreg-noun-stem ^ #
sdogfox
#
lexical
intermediate
surface
Tstems
Tnum
5.11.2002 CSA3050: NLP Algorithms 14
Composition of Relations
• Let R and S be binary relations.
• The composition of R and S written R S is defined as:
• (a,c) R S if and only if(a,b) R and (b,c) Sfor all a,b,c
• Transducers can also be composed
5.11.2002 CSA3050: NLP Algorithms 15
Tnum o Tstem
5.11.2002 CSA3050: NLP Algorithms 16
English Spelling Rules
• consonant doubling: beg / begging
• y replacement: try/tries
• k insertion: panic/panicked
• e deletion: make/making
• e insertion: watch/watches
• Each rule can be stated in more detail ...
5.11.2002 CSA3050: NLP Algorithms 17
e Insertion Rule
• Insert an e on the surface tape just when the lexical tape has morpheme ending in x,s,z,or ch and the next and final morpheme is -s
• Stated formally e [x|s|z|ch]^ __ s#
5.11.2002 CSA3050: NLP Algorithms 18
e insertion over 3 levelsThe rule corresponds to the mapping betweensurface and intermediate levels
5.11.2002 CSA3050: NLP Algorithms 19
e insertion as an FST
5.11.2002 CSA3050: NLP Algorithms 20
Incorporating Spelling Rules
• Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned".
• The set of spelling rules is positioned between the surface level and the intermediate level.
• Parallel execution of FSTs can be carried out:– by simulation: in this case FSTs must first be aligned.
– by first constructing a a single FST corresponding to their intersection.
5.11.2002 CSA3050: NLP Algorithms 21
Putting it all together
execution of FSTi
takes place in parallel
5.11.2002 CSA3050: NLP Algorithms 22
Kaplan and KayThe Xerox View
FSTi are alignedbut separate
FSTi intersectedtogether
5.11.2002 CSA3050: NLP Algorithms 23
Operations over FSTs
• We can perform operations over FSTs which yield other FSTs. – Inversion– Union– Composition
• The inversion of T, or T-1 simply computes the inverse mapping to T.
5.11.2002 CSA3050: NLP Algorithms 24
Inversion
surface
lexical lexical
surfacea t
c
c s
ta ^ cPL a t PL^
ta sc
T
T-1
5.11.2002 CSA3050: NLP Algorithms 25
Inversion
• To invert a transducer– we switch the order of the complex symbols,
i.e. every i:o becomes o:i– or we leave the transducer alone, and slightly
change the parsing algorithm.
• Practical consequences:– Transducer is reversible– We can use the exactly the same transducer to
perform either analysis or generation.
5.11.2002 CSA3050: NLP Algorithms 26
Closure Properties of FSTs
Relations computed by FSTs are – closed under
• inversion
• union
• composition
– not closed (in general) under
• intersection. However intersection is possible provided that we restrict the class of transducers.
• complementation
• subtraction