Human Language Technology
-
Upload
arden-joyce -
Category
Documents
-
view
31 -
download
2
description
Transcript of Human Language Technology
![Page 1: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/1.jpg)
Human Language Technology
Finite State Transducers
![Page 2: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/2.jpg)
October 2009 HLT: finite state transducers 2
Acknowledgement
• Material in this lecture derived/copied in part from– Richard Sproat CL46 Lectures– Lauri Karttunen LSA lectures 2005– Shuly Wintner 2008 Malta
![Page 3: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/3.jpg)
October 2009 HLT: finite state transducers 3
Three Key Concepts
Finite StateTransducers
RegularRelations
ComputationalMorphology
![Page 4: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/4.jpg)
October 2009 HLT: finite state transducers 4
Three Key Concepts
Finite StateTransducers
RegularRelations
ComputationalMorphology
![Page 5: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/5.jpg)
October 2009 HLT: finite state transducers 5
A Regular Set
ababababababababababababababab..
L1
![Page 6: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/6.jpg)
October 2009 HLT: finite state transducers 6
Two Regular Sets
ababababababababababababababab..
bababababababababababababababa..
L1 L2
![Page 7: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/7.jpg)
October 2009 HLT: finite state transducers 7
A Regular Relation L1 x L2
ababababababababababababababab..
bababababababababababababababa..
L1 L2
or {("ab","ba"), ("abab","baba"),...}
![Page 8: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/8.jpg)
October 2009 HLT: finite state transducers 8
Some closure properties for regular relations
• Concatenation [R1 R2]
• Power (Rn)
• Reversal
• Inversion (R-1)
• Composition: R1 ○ R2
![Page 9: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/9.jpg)
October 2009 HLT: finite state transducers 9
Concatenation and Power
ConcatenationR1 = {("a","b")}R2 = {("c","d")}[R1 R2] = {("ac","bd")}
PowerR1+ = {("a","b"),("aa","bb"), ("aaa","bbb"), ...}
![Page 10: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/10.jpg)
October 2009 HLT: finite state transducers 1061
Composition
• R1 ○ R2 denotes the composition of relations R1 and R2.
• DefinitionIf R1 contains <x,y>
And R2 contains <y,z>
Then R1 ○ R2 contains <x,z>
• R1 and R2 and B must be relations. If either is just a language, it is assumed to abbreviate the identity relation.
• R1 ○ R2 is written [R1 .o. R2] in xfst
![Page 11: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/11.jpg)
October 2009 HLT: finite state transducers 11
Closure Properties of Regular Languages and Relations
Operation Regular Languages Regular RelationsUnion yes yesConcatenation yes yesIteration yes yes
Intersection yes noSubtraction yes noComplementation yes no
Composition n/a yes
![Page 12: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/12.jpg)
October 2009 HLT: finite state transducers 12
Morphology as a Regular Relation
catcatsmicelives...
catcat+N+PLmouse+N+PLlife+N+PLlive+V+3SING..
surface language lexical language
or {("cat,cat"),("cats","cat+N+PL"),......}
![Page 13: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/13.jpg)
October 2009 HLT: finite state transducers 13
Part-of-Speech Tagging
• I know some new tricks• PRON V DET ADJ N
• said the Cat in the Hat• V DET N P DET N
![Page 14: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/14.jpg)
October 2009 HLT: finite state transducers 14
Singular-to-plural mapping:
• cat hat ox child mouse sheep • cats hats oxen children mice sheep
![Page 15: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/15.jpg)
October 2009 HLT: finite state transducers 15
Three Key Concepts
Finite StateTransducers
RegularRelations
ComputationalMorphology
![Page 16: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/16.jpg)
October 2009 HLT: finite state transducers 16
FSA
a
Used for• Recognition• Generation
![Page 17: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/17.jpg)
October 2009 HLT: finite state transducers 17
Finite State Transducers
• A finite state transducer (FST) is essentially an FSA finite state automaton that works on two (or more) tapes.
• The most common way to think about transducers is as a kind of translating machine which works by reading from one tape and writing onto the other.
![Page 18: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/18.jpg)
October 2009 HLT: finite state transducers 18
FST Definition
• A 2 way FST is a quintuple (K,s,F,ixo,) where
i, o are input and output alphabets
• K is a finite set of states
• s K is an initial state
• FK are final states is a transition relation of type
K x i x o x K
![Page 19: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/19.jpg)
October 2009 HLT: finite state transducers 19
FST
a
Used for• Recognition• Generation• Translation
b
upper tape
lower tape
![Page 20: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/20.jpg)
October 2009 HLT: finite state transducers 20
A Very Simple Transducer
a
b
Relation { ("a","b") }
Notation a:b encodes the transition
![Page 21: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/21.jpg)
October 2009 HLT: finite state transducers 21
A Very Simple Transducer
a
b
a:b
also written as
![Page 22: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/22.jpg)
October 2009 HLT: finite state transducers 22
A Very Simple Transducer
a
b
a:b
upper side
lower side
![Page 23: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/23.jpg)
October 2009 HLT: finite state transducers 23
Symbol Pairs
• Symbols vs. symbol pairs– In general, no distinction is made in
xfst betweena the language {“a”}a:a the identity relation
{(“a”, “a”)}
a
![Page 24: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/24.jpg)
October 2009 HLT: finite state transducers 24
A (more interesting) Transducer
• Relation
{ ("a","b"), ("aa","bb"), ...}
• Notationa:b*
• N.B. with this notation a and b must be single symbols
1
a:b
![Page 25: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/25.jpg)
October 2009 HLT: finite state transducers 25
Transducer have SeveralModes of Operation
• generation mode: It writes on both tapes. A string of as on one tape and a string of bs on the other tape. Both strings have the same length.
• recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs.
• translation mode (left to right): It reads as from the first tape and writes a b for every a that it reads onto the second tape.
• translation mode (right to left): It reads bs from the second tape and writes an a for every b that it reads onto the first tape.
![Page 26: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/26.jpg)
October 2009 HLT: finite state transducers 26
The Basic Idea
• Morphology is regular
• Morphology is finite state
![Page 27: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/27.jpg)
October 2009 HLT: finite state transducers 27
Morphology is Regular
• The relation between the surface forms of a language and the corresponding lexical forms can be described as a regular relation, e.g.
{ ("leaf+N+Pl","leaves"),("hang+V+Past","hung"),...}
• Regular relations are closed under operations such as concatenation, iteration, union, and composition.
• Complex regular relations can be derived from simpler relations.
![Page 28: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/28.jpg)
October 2009 HLT: finite state transducers 28
Morphology is finite-state
• A regular relation can be defined using the metalanguage of regular expressions.
• [{talk} | {walk} | {work}]• [%+Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:{ed}];
• A regular expression can be compiled into a finite-state transducer that implements the relation computationally.
![Page 29: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/29.jpg)
October 2009 HLT: finite state transducers 29
Compilation
• [{talk} | {walk} | {work}]• [%+Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:
{ed}];
Regular expression
k
t
a
a
wo
l
r
+Progr:i :g
+3rdSg:s
+Past:e :d
:n
+Base:
Finite-state transducer
finalstate
initialstate
![Page 30: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/30.jpg)
October 2009 HLT: finite state transducers 30
work+3rdSg --> works
k:k
t:t
a:a
a:a
w:wo:o
l:l
r:r
+Progr:i :g
+3rdSg:s
+Past:e :d
:n
+Base:
Generation
![Page 31: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/31.jpg)
October 2009 HLT: finite state transducers 31
talked --> talk+Past
k:k
t:t
a:a
a:a
w:wo:o
l:l
r:r
+Progr:i :g
+3rdSg:s
+Past:e :d
:n
+Base:
Analysis
![Page 32: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/32.jpg)
October 2009 HLT: finite state transducers 32
XFST Demo 2
• xfst[0]: regex • [{talk} | {walk} | {work}]• [% +Base:0 | %+SgGen3:s | %+Progr:{ing} | %
+Past:{ed}];
% xfstxfst[0]:
start xfst
compile a regular expression
apply the resultxfst[1]: apply up walkedwalk+Past
xfst[1]: apply down talk+SgGen3talks
![Page 33: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/33.jpg)
October 2009 HLT: finite state transducers 33
Lexical transducer
veut
vouloir +IndP +SG + P3
Finite-state transducer
inflected form
citation form inflection codes
v o u l o i r +IndP +SG +P3
v e u t
• Bidirectional: generation or analysis• Compact and fast• Comprehensive systems have been
built for over 40 languages:– English, German, Dutch, French,
Italian, Spanish, Portuguese, Finnish, Russian, Turkish, Japanese, Korean, Basque, Greek, Arabic, Hebrew, Bulgarian, …
![Page 34: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/34.jpg)
October 2009 HLT: finite state transducers 34
How lexical transducers are made
LexiconFST
RuleFSTs
Compiler
f a t +Adj
r
+Comp
f a t t e
Lexical Transducer(a single FST)composition
LexiconRegular Expression
RulesRegular Expressions
Morphotactics
Alternations
![Page 35: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/35.jpg)
October 2009 HLT: finite state transducers 35
Sequential Model
...
Surface form
Intermediate form
Lexical form
fst 1
fst 2
fst n
Ordered sequenceof rewrite rules(Chomsky & Halle ‘68)can be modeledby a cascade offinite-state transducersJohnson ‘72Kaplan & Kay ‘81
![Page 36: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/36.jpg)
October 2009 HLT: finite state transducers 36
Parallel Model
Set of parallelof two-level rules (constraints)
compiled into finite-state automatainterpreted as transducers
Koskenniemi ‘83
fst 1 fst 2 fst n...
Surface form
Lexical form
![Page 37: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/37.jpg)
October 2009 HLT: finite state transducers 37
Sequential vs. Parallel rules
compose intersect
FST
rule 1 rule 2 rule n...
Surface form
Lexical form
Koskenniemi 1983
Intermediate form
...
Surface form
Lexical form
rule 1
rule n
rule 1
Chomsky&Halle 1968
![Page 38: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/38.jpg)
October 2009 HLT: finite state transducers 38
Sequential vs. Parallel Rules
• Sequential rules are combined by means of composition.
• Advantage: FSTs are closed under composition• Disadvantage: order of operations is sensitive• Parallel rules are combined by means of
intersection• In general, FSTs are not closed under
intersection.• … but FSTs without ε-transitions are closed
under intersection.
![Page 39: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/39.jpg)
October 2009 HLT: finite state transducers 39
Crossproduct
• A .x. B The relation that maps every string in A to every string in B, and vice versa
• A:B Same as [A .x. B].
b:y c:0a:x
a b c .x. x y [a b c] : [x y] {abc}:{xy}
![Page 40: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/40.jpg)
October 2009 HLT: finite state transducers 40
Composition
• A .o. B The relation C such that if A maps x to y and B maps y to z, C maps x to z.
b:B c:Ca:A
b ca
a:A
b:B
c:C
d:D {abc} .o. [a:A | b:B | c:C | d:D]*
![Page 41: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/41.jpg)
October 2009 HLT: finite state transducers 41
Transducers are not closed under intersection
ε:b
c:bc:b
ε:a
ε:bc:a
T1(Cn) = { anbm | m≥0 }
T2(Cn) = { ambn | m≥0 }
T1∩T2 (Cn) = { anbn }
![Page 42: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/42.jpg)
October 2009 HLT: finite state transducers 42
Xerox RE Operators
• $ containment• => restriction• -> replacement
– Make it easier to describe complex languages and relations without extending the formal power of finite-state systems.
![Page 43: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/43.jpg)
October 2009 HLT: finite state transducers 43
Containment
aa?? ?? aa$a$a
[?* a ?*][?* a ?*]
![Page 44: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/44.jpg)
October 2009 HLT: finite state transducers 44
Restriction
??cc
bb
bb
cc?? aa
cc
a => b _ ca => b _ c
““AnyAny aa must be preceded bymust be preceded by bband followed byand followed by cc.”.”
~[~[?* b] a ?*] & ~[?* a ~[c ?*]] ~[~[?* b] a ?*] & ~[?* a ~[c ?*]]
Equivalent expression Equivalent expression
![Page 45: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/45.jpg)
October 2009 HLT: finite state transducers 45
Replacement
a:ba:b
bb
aa
??
??
b:ab:a
aa
a:ba:b
a b -> b a
““Replace ‘ab’ by ‘ba’.”Replace ‘ab’ by ‘ba’.”
[[~$[a b] [[a b] .x. [b a]]]* ~$[a b]]
Equivalent expression Equivalent expression
![Page 46: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/46.jpg)
October 2009 HLT: finite state transducers 46
Replacement + Marking
0:[0:[
[[
0:]0:]
??
aa
ee
iioo
uu]]
a|e|i|o|u -> %[ ... %]
p o t a t op o t a t op[o]t[a]t[o]p[o]t[a]t[o]
![Page 47: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/47.jpg)
October 2009 HLT: finite state transducers 47
Conditional Replacement
The relation that replaces A by B between L and R leaving everything else unchanged.
A -> BA -> B
Replacement
L _ RL _ R
Context
![Page 48: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/48.jpg)
October 2009 HLT: finite state transducers 48
Sequential application
N -> m / _ p
p -> m / m _
k a N p a n
k a m p a n
k a m m a n
![Page 49: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/49.jpg)
October 2009 HLT: finite state transducers 49
Sequential application in detail
N:m
N
?? 0
2
1
pN:m
m
pN
m
p:m
?? 0 1
mp
m
k a N p a n
k a m p a n
k a m m a n
0 0 0 2 0 0 0
0 0 0 1 0 0 0
![Page 50: Human Language Technology](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813050550346895d95fca0/html5/thumbnails/50.jpg)
October 2009 HLT: finite state transducers 50
Composition
N:m
N
?? 0
3
1
N:m
m
p
N
?
m2
p:m
p:m
N m
N:mk a N p a n
k a m m a n
0 0 0 3 0 0 0