Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The...
-
Upload
melvin-allen -
Category
Documents
-
view
216 -
download
0
Transcript of Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The...
Week 9: resources for globalisation
Finish spell checkers Machine Translation (MT)
The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate
Calculus Human involvement Historical note
Spelling dictionaries Implementing spelling identification
and correction algorithm
Spelling dictionaries Implementing spelling identification and
correction algorithm STAGE 1: compare each string in document with a
list of legal strings; if no corresponding string in list mark as misspelled
STAGE 2: generate list of candidates Apply any single transformation to the typo string Filter the list by checking against a dictionary
STAGE 3: assign probability values to each candidate in the list
STAGE 4: select best candidate
Spelling dictionaries STAGE 3
prior probability given all the words in English, is this candidate more
likely to be what the typist meant than that candidate? P(c) = c/N where N is the number of words in a corpus
likelihood Given, the possible errors, or transformation, how likely
is it that error y has operated on candidate x to produce the typo?
P(t/c), calculated using a corpus of errors, or transformations
Bayesian rule: get the product of the prior probability and the
likelihood P(c) X P(t/c)
Spelling dictionaries non-word errors Implementing spelling identification
and correction algorithm STAGE 1: identify misspelled words STAGE 2: generate list of candidates STAGE 3a: rank candidates for probability STAGE 3b: select best candidate Implement:
noisy channel model Bayesian Rule
Resoucres for Globalisation:Machine translation
Resoucres for Globalisation:Machine translation
The ‘decoding’ paradigm Assumes one-to-one relation between
source symbol and target symbol
Resoucres for Globalisation:Machine translation
The ‘decoding’ paradigm Assumes one-to-one relation between
source symbol and target symbol one-to-many (homonymy)
Resoucres for Globalisation:Machine translation
The ‘decoding’ paradigm Assumes one-to-one relation between
source symbol and target symbol one-to-many (homonymy) one-to-many (hypernym →
hyponyms):
Resoucres for Globalisation:Machine translation
The ‘decoding’ paradigm Assumes one-to-one relation between
source symbol and target symbol one-to-many (homonymy) one-to-many (hypernym →
hyponyms): many-to-one (hyponyms → hypernym)
Machine translation
The ‘decoding’ paradigm one-to-many (homonymy)
bank → Ufer, Bank (German)
Machine translation
The ‘decoding’ paradigm one-to-many (homonymy) one-to-many (hypernym →
hyponyms): brother → otooto, oniisan (Japanese) blue → синий, голубой (Russian)
many-to-one (hyponyms → hypernym)
Machine translation
The ‘decoding’ paradigm one-to-many (homonymy) one-to-many (hypernym →
hyponyms): many-to-one (hyponyms → hypernym)
hill, mountain → Berg (German) learn, teach → leren (Dutch)
Machine translation and globalisation
Ambiguity‘I made her duck’
“The possibility of interpreting an expression in two or more distinct ways”
Collins English Dictionary
Machine translation Ambiguity
Challenge of the translation depends on the level of ambiguity that arises
This depends on the closeness of the source and target languages w.r.t. the following:
vocabulary homonyms
grammar structural ambiguity
conceptual structure specificity ambiguity lexical gaps
Machine translation
Pragmatic approach
Machine translation
Pragmatic approach aim for a rough translation, ‘gist’
translation Used for multi-lingual information
retrieval
Machine translation
Pragmatic approach aim for a rough translation, ‘gist’
translation Used for multi-lingual information
retrieval involve human translators in the
process:computer-aided translation
Machine translation
Translation models Transfer model ‘the dog bit my friend’
Hindi: kutte-ne mere dost ko-kata dog my friend bit
Machine translation
Translation models Transfer model
Alter grammatical structure of source language to make it adhere to the grammatical structure of target language
Use transformation rule Analysis process (source) Transfer process (‘bridge’) Generation process (target) Problem: each source-target pair will need it own
unique set of transformation rules
Machine translation
Translation models Inter-lingua model
Extract the meaning from the source string Give it a language independent
representation, i.e. an interlingua Translation process takes the interlingua as
its input Multiple translation processes take the same
input for multiple target language outputs
Machine translation
Translation models What is the inter-lingua?
for words, some sort of semantic analysis,
e.g. (GO, BY-FOOT) (GO, BY-TRANSPORT)Russian: идти ехать
English: go go
Machine translation and globalisation
Translation models What is the inter-lingua?
for sentences, a logical languagee.g. First Order Predicate Calculus
Meaning representation Goal:
1. the semantic representation must give you a one-to-one mapping to non-linguistic knowledge of the world 2. The representation must be expressive, i.e. handle different types of data
Meaning representation First Order Predicate Calculus
computationally tractable objects (terms) properties of objects relations amongst objects
Predicate argument structure large composite representations
logical connectives
Meaning representation First Order Predicate Calculus
Object: referred to uniquely by a term constant e.g. SurreyUniversity function e.g. LocationOf(SurreyUniversity) variable
Meaning representation First Order Predicate Calculus
Relations amongst objects Predicates:
“symbols that refer to, or name, the relations that hold among some fixed number of objects” (J & M)
Educates(SurreyUniversity, Citizens) two-place predicate
Meaning representation First Order Predicate Calculus
Relations amongst objects Predicates: Can specify the category of an object
University(SurreyUniversity) one-place predicate
Meaning representation First Order Predicate Calculus
properties / parts of objects functions:
LocationOf(SurreyUniversity)
Meaning representation First Order Predicate Calculus
Composite representations through predicates and functions:Near(LocationOf(SurreyUniversity), LocationOf(Cathedral))
Meaning representation First Order Predicate Calculus
Logical connectives combine basic representations to form
larger more complex representationse.g ٨ operator = ‘and’
Meaning representation First Order Predicate Calculus
Logical connectives combine basic representations to form larger
more complex representationsEducates(SurreyUniversity, Citizens) ٨ ¬ Remunerates(SurreyUniversity, Staff)
Machine translation and globalisation
Machine translation and globalisation: change of
priorities 1954: IBM and Georgetown University, first MT demo
goal: ‘perfect’ translation 1967: Automatic Language Process Advisory Committee
(ALPAC) report: damning of goal Post ALPAC
Goal: rough translation, involve human element Current situation: online translation, e.g. Babel Fish,
descendant of SYSTRAN whose goal was rough translation Journal of Machine Translation
Next week
Globalisation as an industry SDL and the SDLX-TRADOS
globalisation application