29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite...

24
29.10.2002 CSA3050: NLP Algorithms 1 CSA3050: NL Algorithms • Introduction to English Morphology • Finite State Transducers

Transcript of 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite...

Page 1: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 1

CSA3050: NL Algorithms

• Introduction to English Morphology

• Finite State Transducers

Page 2: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 2

Acknowledgement

For further details see Jurafsky & Martin Ch.3

Page 3: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 3

Morphology

• Morphology is the study of how word-parts combine to form word wholes.

• Several different dimensions:

• Orthographic - rules for combining strings of characters together.

• Syntax - effect on syntactic category.

• Semantic - effect on meaning.

Page 4: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 4

Examples ofMorphological Processes

• Affixation– prefix– suffix– circumfix: German ge + stem + t

e.g. sagen, gesagt– infix: unbloodylikely

• Vowel change: swim/swam

• Consonant change: send/sent

Page 5: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 5

Inflectional/DerivationalMorphology

• Inflectional+s plural+ed past

• category preserving• productive: always

applies (esp. new words, e.g. fax)

• systematic: same semantic effect

• Derivational+ment

• category changingescape+ment

• not completely productive: detractment*

• not completely systematic: catchment

Page 6: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 6

English Inflectional Morphology

• Applies to nouns, verbs and adjectives only• Number of inflections relatively small• Nouns

– Plural, Possessive

• Verbs– Verb forms

• Adjectives– Comparison

Page 7: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 7

Noun Inflections

Regular Irregular

Singular cat church mouse ox

Plural cats churches mice oxen

Page 8: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 8

Regular Verb Inflections

stem walk merge try map

-s form walks merges tries maps

-ing participle

walking merging trying mapping

-ed participle

or past

walked merged tried mapped

Page 9: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 9

Irregular Verb Inflectionsstem eat catch cut go

-s form eats catches cuts goes

-ing participle

eating catching cutting going

Past ate caught cut went

-ed participle

eaten caught cut gone

Page 10: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 10

Morphological Parsing

MorphologicalParser

Input Word

cats

OutputAnalysis

cat + PL

• Output is a string of morphemes• Reversibility?

Page 11: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 11

Morphological Parsing: Examples

Input word Output morphemes

cats cat +N +PL

cat cat + N + SG

cities city + N + PL

walks walk + V + 3SG

cook cook +N +SG or

cook +V

Page 12: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 12

Morphemes• Morpheme is a theoretical contruct ...• but has a practical use• Choice of morpheme vocabulary:

theoretical and practical motivation• Distinction between underlying morpheme

and its realisation.• String of morphemes could be turned into

another representation later

Page 13: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 13

Morphological Parsing Requires

1. Lexicon: list of stems and affixes + related information (e.g syntactic category)

2. Morphotactics: a model of ordering constraints over morphemes (e.g. the fact that +s comes after the stem not before).

3. Correspondences between input and output strings

4. Spelling Rules: city + s cities

Page 14: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 14

Lexicon

• Lexicon is generally divided into sublexicons– Stem Lexicon

• Noun Stems

• Verb Stems

• etc

– Suffix Lexicon

– Prefix Lexicon

• Can all be represented as FSAs

Page 15: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 15

FSA for Sublexicon Fragment

t h e s

ei

s

a

t

o

Page 16: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 16

FSA for Morphotactics forNoun Inflection

Page 17: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 17

Morphotactics for Verb Inflection

Page 18: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 18

Input/Output Correspondences

• Problem: how to specify correspondence between input word, and output analysis.

• Given: both input and output are strings.

• Two level morphology (Koskenniemi 1983) proposes– Surface Tape (words)– Lexical Tape (concatenation of morphemes)

Page 19: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 19

2 Level Model

The automaton used to perform the mapping Between these levels is the finite state transducer(FST).

Page 20: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 20

Basic FS Transducer

• Each transition of a transducer is labelled with a pair of symbols

• Input symbols are matched against the lower-side symbols on transitions.

• If analysis succeeds, return the string of upper-side symbols

input symb

output symb

Page 21: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 21

Morphological Analysis

{ ("CATS", "CAT+N+PL"),

("CAT", "CAT+N+SG")

}

+PLTAC

AC T S

+N

Page 22: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 22

FST Formal Definition

• States, initial state, final states: same as FSA

• Alphabets I and O are input and output alphabets, not necessarily disjoint.

• FST Alphabet Σ I x O• Transition function δ(q, i:o), defines the

state q' that ensues when the machine is in state q and encounters complex symbol i:o.

Page 23: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 23

FST Alphabet Example

c at

εI

O

c:cc:ac:tc:ε

Σ

t:ct:at:tt:ε

a:ca:aa:ta:ε

':c':a':t':ε

'

I x O

Page 24: 29.10.2002CSA3050: NLP Algorithms1 CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers.

29.10.2002 CSA3050: NLP Algorithms 24

Summary

• Morphological processing can be handled by finite state machinery

• Finite State Transducers are formally very similar to Finite State Automata.

• They are formally equivalent to regular relations, i.e. sets of pairings of sentences of regular languages.