Finite State Machinery - I Fundamentals Recognisers and Transducers.

24
Finite State Machinery - I Fundamentals Recognisers and Transducers

Transcript of Finite State Machinery - I Fundamentals Recognisers and Transducers.

Page 1: Finite State Machinery - I Fundamentals Recognisers and Transducers.

Finite State Machinery - I

• Fundamentals

• Recognisers and Transducers

Page 2: Finite State Machinery - I Fundamentals Recognisers and Transducers.

4

Reference Outline• Websites

– Xerox: www.xrce.xerox.com/research/mltt/fst/– Groningen: grid.let.rug.nl/~vannoord/FSA/fsa.html– AT & T: www.research.att.com/sw/tools/fsm

• Books/Collections– Karttunen & Oflazer (2000)– Jurafsky & Martin (2000)– Hopcraft and Ullman (1979)– Roche and Schabes (1977)

• Classic Articles– Kaplan and Kay (1994)– Koskenniemi (1983)– Johnson (1972)

• Tools– Van Noord et al.– Mohri et al.– Daciuk.– Karttunen & Beesley

Page 3: Finite State Machinery - I Fundamentals Recognisers and Transducers.

5

Acknowledgements to

• Lauri Karttunen, Ken Beesley and colleagues at Xerox.

• Most materials in this tutorial are from their website.

• Forthcoming book: Finite State Morphology – Xerox Tools and Techniques.

Page 4: Finite State Machinery - I Fundamentals Recognisers and Transducers.

FS Motivation

• Chomsky hierarchy of language classes based on classes of descriptive notation, and also on asociated classes of machine.

• Chomsky (1957) dismissed FS grammars, and associated machinery, as fundamentally inadequate for the description of NL.

Page 5: Finite State Machinery - I Fundamentals Recognisers and Transducers.

Embedding

• Basic problem is not that sentences can grow to arbitrary length, it is that the description of a syntactic constitutent may embed any other constituents including the sentence itelf.The dog bit the cat.

The dog that the man saw bit the cat.

The dog that the man that the horse kicked saw bit the cat.

etc

Page 6: Finite State Machinery - I Fundamentals Recognisers and Transducers.

On the other hand …...• Plenty of language just ain't like that.• Words

– Orthographic spelling.– Phonological spelling.– Morphology.

• Fixed expression types (e.g dqtes).• Gross constitutent structures (e.g. the big,

bad, blue wolf).

Page 7: Finite State Machinery - I Fundamentals Recognisers and Transducers.

Recent Application Areas for FS Technology Include

• POS Tagging

• Spell Checking

• Information Extraction

• Speech Recognition

• Text to Speech

• Spoken Dialogue

• Parsing

Page 8: Finite State Machinery - I Fundamentals Recognisers and Transducers.

21

Recognition of Italian Words• The coke machine recognises words in the

coke machine language.

• The following machine recognises two words in Italian.

• Recognition mechanism is language independent.

C A S A

I N Q U E

Page 9: Finite State Machinery - I Fundamentals Recognisers and Transducers.

22

The Process of Analysis

• Start in the initial state and at the first symbol of the word.

• If there is an arc labelled with that symbol, the machine transitions to the next state, and the symbol is consumed.

• The process continues with successive symbols until .....

Page 10: Finite State Machinery - I Fundamentals Recognisers and Transducers.

23

The Process of AnalysisOne or more of these conditions holds:• A. A final state is reached• B. All symbols are consumed• C. There are no transitions out of a state for

the current symbol. – If both A and B, analysis succeeds and the

word is recognised.– Otherwise recognition fails.

Page 11: Finite State Machinery - I Fundamentals Recognisers and Transducers.

24

Success and Failure

I N Q U E

C A S A

EE N T

L

LE; CASA; CINQUANTA; LENTEMENTE

Page 12: Finite State Machinery - I Fundamentals Recognisers and Transducers.

27

Transducers

• Recognisers either accept or reject a word.

• Although this is useful, networks can actually return more substantial information.

• This is achieved by providing networks with the ability to write as well as to read.

Page 13: Finite State Machinery - I Fundamentals Recognisers and Transducers.

28

Basic Transducer• Each transition of a transducer is labelled with a

pair of symbols rather than with a single symbol.• Analysis proceeds as before, except that input

symbols are matched against the lower-side symbols on transitions.

• If analysis succeeds, return the string of upper-side symbols on the path to the final state

Page 14: Finite State Machinery - I Fundamentals Recognisers and Transducers.

Confusing Terminology

• Lower side = surface side.

• Upper side = "deep" side.

• Analysis proceeds from lower to upper.

• Synthesis (generation) proceeds from upper to lower.

Page 15: Finite State Machinery - I Fundamentals Recognisers and Transducers.

29

Lexical Transducers

• In common parlance, a transducer is a device which converts one form of energy into another, e.g. a microphone converts from sound to electrical signals.

• Next we look at lexical transducers which convert one string of symbols into another.

Page 16: Finite State Machinery - I Fundamentals Recognisers and Transducers.

30

Lexical Transducer Example

C A S A

C A S E

• Input: CASE• Output: CASA

lexical string

surface string

Page 17: Finite State Machinery - I Fundamentals Recognisers and Transducers.

31

Morphological Analysis

R

ATNOC

OC N T

+V E+SG +1P

O

• Input: CONTO• Output: CONTARE +V +1P +SG

Page 18: Finite State Machinery - I Fundamentals Recognisers and Transducers.

32

Remarks

stands for "epsilon". During analysis, epsilon transitions are taken freely without consuming any input.

• Note also single symbols with multi-character print names (e.g. +SG).

• The order of these symbols, and the choice of infinitive as baseform, is determined by linguists.

Page 19: Finite State Machinery - I Fundamentals Recognisers and Transducers.

33

Exercise

• The word "conto" in Italian is also a masculine noun meaning (a) story and (b) bank account

• Draw the corresponding 2-level networks.

• How can the different meanings be incorporated into the same network

Page 20: Finite State Machinery - I Fundamentals Recognisers and Transducers.

31

Conto +N +SG

+N

OTNOC

OC N T

O

+SG

• Input: CONTO• Output: CONTO +N+SG

A

Page 21: Finite State Machinery - I Fundamentals Recognisers and Transducers.

34

Synthesis

• Transducers are reversible. This means that they can be used to perform the inverse transduction from an transducers.

• The process of synthesis is the inverse of analysis

Page 22: Finite State Machinery - I Fundamentals Recognisers and Transducers.

35

The Process of Synthesis

• Start at the start state and at the beginning of the input string.

• Match the input symbols against the upper-side symbols of the arcs, consuming symbols until a final state is reached.

• If successful, return the string of lower-side symbols (else nothing).

Page 23: Finite State Machinery - I Fundamentals Recognisers and Transducers.

36

Morphological Synthesis

R

ATNOC

OC N T

+V E+SG +1P

O

•Input: CONTARE +V +1P +SG•Output: CONTO•N.B. symbols are ignored on output

Page 24: Finite State Machinery - I Fundamentals Recognisers and Transducers.

37

Analysis and Synthesis

• Upper Side Language (Lexical Strings).

• Lower Side Language (Surface Strings).

• Transducer maps between the two.

• However large the lexical transducer may become, analysis and synthesis are performed by the same language-independent matching techniques.