THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam

27
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam

description

THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam. The Problem. The problem of machine translation is discussed. Five Statistical Models are proposed for the translation process. Algorithms for estimating their parameters are described. - PowerPoint PPT Presentation

Transcript of THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam

Page 1: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

THE MATHEMATICS OF

STATISTICAL MACHINE

TRANSLATION

Sriraman M Tallam

Page 2: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 2

The Problem

The problem of machine translation is discussed. Five Statistical Models are proposed for the

translation process.• Algorithms for estimating their parameters are described.

For the learning process, pairs of sentences that are translations of one another are used.

Previous work shows statistical methods to be useful in achieving linguistically interesting goals.

• natural extension - matching up words within pairs of aligned sentences.

Results show the power of statistical methods in extracting linguistically interesting correlations.

Page 3: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 3

Statistical Translation

Warren Weaver first suggested the use of statisitical techniques for machine translation. [Weaver 1955]

Fundamental Equation for Machine Translation

Pr(e|f) = Pr(e) Pr(f|e) --------------- Pr(f)

ê = argmax Pr(e) Pr(f|e)

Page 4: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 4

Statistical Translation

A translator when writing a French sentence, even a native speaker, conceives an English sentence and then mentally translates it.

• Machine translation’s goal is to find that English sentence.

Equation summarizes the 3 computational challenges presented by statistical translation.

• Language Model Probability Estimation - Pr(e)• Translational Model Probability Estimation - Pr(f|e)• Search Problem - maximizing their product

Why not reverse the translation models ?• Class Discussion !!

Page 5: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 5

Alignments

What is a translation ?• Pair of strings that are translations of one another• (Qu’ aurions-nous pu faire ? | What could we have done ?)

What is an alignment ?

Page 6: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 6

Alignments

The mapping in an alignment could be from one-one to many-many.

The alignment in the figure is expressed as• (Le programme a ete mis en application | And the(1) program(2)

has(3) been(4) implemented(5,6,7)).

This alignment though acceptable has a lower probability.• (Le programme a ete mis en application | And(1,2,3,4,5,6,7) the

program has been implemented).

A(e,f) is the set of alignments of (f|e)• If e has length ‘l’ and f has length ‘m’, there are 2 lm alignments in

all.

Page 7: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 7

Cepts

What is a cept ?• To express the fact that each word is related to a concept, in a

figurative sense, a sentence is a web of concepts woven together

• The cepts in the example are The, poor and don’t have any money• There is the notion of an empty cept.

Page 8: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 8

Translation Models

Five Translation models have been developed. Each model is a recipe for computing Pr(f|e), which

is called the likelihood of the translation (f,e). The likelihood is a function of many parameters ( !). The idea is to guess values for these parameters

and to apply the EM algorithm iteratively.

Page 9: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 9

Translation Models

Models 1 and 2.• all possible lengths are equally possible• In Model 1, all connections for each french position are equally

likely.• In Model 2, connection probabilities are more realistic• These models lead to unsatisfactory alignments very often

Models 3,4 and 5.• No assumptions on the length of the French string• Models 3 and 4 make more realistic assumptions on the

connection probabilities• Models 1 - 4 are a stepping stone for the training of Model 5• Start with Model 1 for initial estimates and pipe thru the models, 2

- 5.

Page 10: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 10

Translation Models

The likelihood of f | e is,

over all elements of A(e,f) Then,

• choose the length of the French string given the English• for each french word position, choose the alignment, given previous

alignments and words• choose the identity of the word at this position given our knowledge

of the previous alignments and words.

Page 11: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 11

Model 1

Assumptions We assume Pr(m|e) is independent of e and m

• All reasonable lengths of the French string are equally likely.

Also, depends only on l.• All connections are equally likely, and for a word there are (l + 1) connections, so this quantity is equal to (l + 1) -1

is called the translation probability of fj given eaj

Page 12: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 12

Model 1

The joint likelihood function for Model 1 is,

and for j = 1 … m, and aj from 1 … l Therefore,

subject to,

Page 13: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 13

Model 1

Technique of Lagrange Multipliers,

EM algorithm is applied repeatedly.

=

X =

Y = The expected number of times e connects to f is

t (f | e)

f, e, l

set of aj

Page 14: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 14

Model 1

Page 15: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 15

Model 1 -> Model 2

Model 1 does not take into account where words appear in either string

• All connections are equally probable

In Model 2, alignment probabilities are introduced and,

which satisfy the constraints,

Page 16: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 16

Model 2

The likelihood function now is,

and the cost function is,

Page 17: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 17

Fertitlity and Tablet

Fertility of a english word is the number of French words it is connected to - i

Each english word translates to a set of French words called the Tablet - Ti

The collection of Tablets is the Tableau - T. The final French string is a permutation of the words

in the Tableau -

Page 18: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 18

Joint Likelihood of a Tableau and Permutation

The joint likelihood of a Tableau and Permutation is,

and ,

Page 19: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 19

Model 3

Assumptions The fertility probability of an english word only

depends on the word.

The translation probability is,

The distortion probability is,

Page 20: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 20

Model 3

The likelihood function for Model 3 is now,

Page 21: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 21

Deficiency of Model 3

The fertility of word i does not depend on the fertility of previous words.

• Does not always concentrate its probability on events of interest.

This deficiency is no serious problem. It might decrease the probability of all well-formed

strings by a constant factor.

Page 22: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 22

Model 4

Allowing Phrases in the English String to move and be translated as units in the French String

• Model 3 doesn’t account for this well, because of the word by word movement.

where, A and B are functions of the French and English words.

Using this they account for facts that an adjective appears before a noun in English and reverse in Frernch. - THIS IS GOOD !

Page 23: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 23

Model 4

For example, implemented produces mis en application, all occuring together, whereas not produces ne pas which occurs with a word in between.

So, d>1(2 | B(pas)) is relatively large when compared to d>1(2 | B(en))

Models 3 and 4 are both deficient. Words can be placed before the first position or beyond the last position in the French string. Model 5 removes this deficiency.

Page 24: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 24

Model 5

They define to be the number of vacancies up to and including position j just before forming the words of the ith cept.

And, this gives rise to the following distortion probability equation,

Model 5 is powerful but must be used in tandem with the other 4 models.

Page 25: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 25

Results

Page 26: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 26

Changing Viterbi Alignments with Iterations

Page 27: THE MATHEMATICS OF  STATISTICAL MACHINE  TRANSLATION Sriraman M Tallam

Sriraman Tallam April 22, 2023 27

Key Points from Results

Words like nodding have a large fertility because they don’t slip gracefully into French.

Words like should do not have a fertility greater than one but they translate into many different possible words, their translation probability is spread more.

Words like the have zero fertlility some times since English prefers an article in some places where French does not.