Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. ·...

102
1 Yulia Tsvetkov Algorithms for NLP IITP, Fall 2019 Lecture 21: Machine Translation I

Transcript of Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. ·...

Page 1: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

1

Yulia Tsvetkov

Algorithms for NLP

IITP, Fall 2019

Lecture 21: Machine Translation I

Page 2: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Machine Translation

Page 3: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation
Page 4: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

from Dream of the Red Chamber Cao Xue Qin (1792)

Page 5: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation
Page 6: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

English: leg, foot, paw

French: jambe, pied, patte, etape

Page 7: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

ChallengesAmbiguities

▪ words▪ morphology▪ syntax▪ semantics▪ pragmatics

Gaps in data

▪ availability of corpora▪ commonsense knowledge

+Understanding of context, connotation, social norms, etc.

Page 8: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Classical Approaches to MT

▪ Direct translation: word-by-word dictionary translation▪ Transfer approaches: word dictionary + rules

morphological analysis

lexical transfer using bilingual dictionary

local reordering

morphological generation

source language text

target language text

Page 9: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation
Page 10: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Levels of Transfer

Page 11: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Levels of Transfer

▪ Syntactic transfer

Page 12: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Levels of Transfer

▪ Syntactic transfer

Page 13: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Levels of Transfer

▪ Syntactic transfer

Page 14: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Levels of Transfer

▪ Semantic transfer

Page 15: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Levels of Transfer

Page 16: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Levels of Transfer

Page 17: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Levels of Transfer

Page 18: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

The Vauquois Triangle (1968)

Page 19: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation
Page 20: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Levels of Transfer: The Vauquois triangle

Page 21: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation
Page 22: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Statistical approaches

Page 23: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Statistical MT

Modeling correspondences between languages

Sentence-aligned parallel corpus:

Yo lo haré mañanaI will do it tomorrow

Hasta prontoSee you soon

Hasta prontoSee you around

Yo lo haré prontoNovel Sentence

I will do it soon

I will do it around

See you tomorrow

Machine translation system:

Model of translation

Page 24: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Research Problems

▪ How can we formalize the process of learning to translate from examples?

▪ How can we formalize the process of finding translations for new inputs?

▪ If our model produces many outputs, how do we find the best one?

▪ If we have a gold standard translation, how can we tell if our output is good or bad?

Page 25: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Two Views of MT

Page 26: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

MT as Code Breaking

Page 27: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

27

The Noisy-Channel Model

source W Anoisy channel

Page 28: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

28

source W Anoisy channel

decoder observed w a

best

The Noisy-Channel Model

Page 29: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

29

source W Anoisy channel

decoder observed w a

best

▪ We want to predict a sentence given acoustics:

The Noisy-Channel Model

Page 30: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

30

▪ We want to predict a sentence given acoustics:

▪ The noisy-channel approach:

The Noisy-Channel Model

Page 31: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

31

▪ The noisy-channel approach:

channel model source model

source W Anoisy channel

decoderobserved

w abest

The Noisy-Channel Model

Page 32: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

32

▪ The noisy-channel approach:

source W Anoisy channel

decoderobserved

w abest

Prior

Acoustic model (HMMs)Translation model

Likelihood

Language model: Distributions over sequences of words (sentences)

The Noisy-Channel Model

Page 33: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Noisy Channel Model

Page 34: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

MT as Direct Modeling

▪ one model does everything▪ trained to reproduce a corpus of translations

Page 35: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Two Views of MT

▪ Code breaking (aka the noisy channel, Bayes rule)▪ I know the target language ▪ I have example translations texts (example enciphered data)

▪ Direct modeling (aka pattern matching)▪ I have really good learning algorithms and a bunch of example

inputs (source language sentences) and outputs (target language translations)

Page 36: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Which is better?

▪ Noisy channel -▪ easy to use monolingual target language data▪ search happens under a product of two models (individual models

can be simple, product can be powerful)▪ obtaining probabilities requires renormalizing

▪ Direct model -▪ directly model the process you care about▪ model must be very powerful

Page 37: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Where are we in 2019?

▪ Direct modeling is where most of the action is▪ Neural networks are very good at generalizing and conceptually

very simple▪ Inference in “product of two models” is hard

▪ Noisy channel ideas are incredibly important and still play a big role in how we think about translation

Page 38: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Two Views of MT

Page 39: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Noisy Channel: Phrase-Based MT

fesource phrase

target phrase

translation features

Translation Model

f

Language Model

e f

Reranking Model

feature weights

Parallel corpus

Monolingual corpus

Held-out parallel corpus

Page 40: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Neural MT: Conditional Language Modeling

http://opennmt.net/

Page 41: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

A common problem

Both models must assign probabilities to how a sentence in one language translates into a sentence in another language.

Page 42: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Learning from Data

Page 43: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Parallel corpora

Page 44: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation
Page 45: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation
Page 46: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation
Page 47: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Mining parallel data from microblogs Ling et al. 2013

Page 48: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

http://opus.nlpl.eu

Page 49: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation
Page 50: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

▪ There is a lot more monolingual data in the world than translated data

▪ Easy to get about 1 trillion words of English by crawling the web

▪ With some work, you can get 1 billion translated words of English-French▪ What about Japanese-Turkish?

Page 51: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Phrase-Based MT

fesource phrase

target phrase

translation features

Translation Model

f

Language Model

e f

Reranking Model

feature weights

Parallel corpus

Monolingual corpus

Held-out parallel corpus

Page 52: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Phrase-Based System Overview

Sentence-aligned corpus

cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9language ||| langue ||| 0.9 …

Phrase table(translation model)Word alignments

Many slides and examples from Philipp Koehn or John DeNero

Page 53: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Word Alignment Models

Page 54: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Lexical Translation

▪ How do we translate a word? Look it up in the dictionary

Haus — house, building, home, household, shell

▪ Multiple translations▪ some more frequent than others▪ different word senses, different registers, different inflections (?)▪ house, home are common

▪ shell is specialized (the Haus of a snail is a shell)

Page 55: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

How common is each?

Look at a parallel corpus (German text along with English translation)

Page 56: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Estimate Translation Probabilities

Maximum likelihood estimation

Page 57: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

▪ Goal: a model▪ where e and f are complete English and Foreign sentences

Lexical Translation

Page 58: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Lexical Translation

▪ Goal: a model ▪ where e and f are complete English and Foreign sentences▪ Lexical translation makes the following assumptions:▪ Each word in e

i in e is generated from exactly one word in f

▪ Thus, we have an alignment ai that indicates which word e

i “came

from”, specifically it came from fai

▪ Given the alignments a, translation decisions are conditionally independent of each other and depend only on the aligned source word f

ai

Page 59: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Lexical Translation

▪ Putting our assumptions together, we have:

Page 60: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

The Alignment Function

▪ Alignments can be visualized in by drawing links between two sentences, and they are represented as vectors of positions:

Page 61: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Reordering

▪ Words may be reordered during translation.

Page 62: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Word Dropping

▪ A source word may not be translated at all

Page 63: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Word Insertion

▪ Words may be inserted during translation ▪ English just does not have an equivalent▪ But it must be explained - we typically assume every source

sentence contains a NULL token

Page 64: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

One-to-many Translation

▪ A source word may translate into more than one target word

Page 65: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Many-to-one Translation

▪ More than one source word may not translate as a unit in lexical translation

Page 66: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1

▪ Generative model: break up translation process into smaller steps

▪ Simplest possible lexical translation model▪ Additional assumptions▪ All alignment decisions are independent▪ The alignment distribution for each a

i is uniform over all source

words and NULL

Page 67: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1

▪ Translation probability▪ for a foreign sentence f = (f

1, ..., f

lf ) of length l

f

▪ to an English sentence e = (e1, ..., e

le ) of length l

e

▪ with an alignment of each English word ej to a foreign word f

i

according to the alignment function a : j → i

▪ parameter ϵ is a normalization constant

Page 68: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Example

Page 69: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Learning Lexical Translation Models

We would like to estimate the lexical translation probabilities t(e|f) from a parallel corpus

▪ ... but we do not have the alignments

▪ Chicken and egg problem▪ if we had the alignments,

→ we could estimate the parameters of our generative model (MLE)

▪ if we had the parameters,

→ we could estimate the alignments

Page 70: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

EM Algorithm

▪ Incomplete data▪ if we had complete data, we could estimate the model▪ if we had the model, we could fill in the gaps in the data

▪ Expectation Maximization (EM) in a nutshell

1. initialize model parameters (e.g. uniform, random)

2. assign probabilities to the missing data

3. estimate model parameters from completed data

4. iterate steps 2–3 until convergence

Page 71: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

EM Algorithm

▪ Initial step: all alignments equally likely▪ Model learns that, e.g., la is often aligned with the

Page 72: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

EM Algorithm

▪ After one iteration▪ Alignments, e.g., between la and the are more likely

Page 73: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

EM Algorithm

▪ After another iteration▪ It becomes apparent that alignments, e.g., between fleur and

flower are more likely (pigeon hole principle)

Page 74: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

EM Algorithm

▪ Convergence▪ Inherent hidden structure revealed by EM

Page 75: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

EM Algorithm

▪ Parameter estimation from the aligned corpus

Page 76: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM

EM Algorithm consists of two steps

▪ Expectation-Step: Apply model to the data▪ parts of the model are hidden (here: alignments)▪ using the model, assign probabilities to possible values

▪ Maximization-Step: Estimate model from data▪ take assigned values as fact▪ collect counts (weighted by lexical translation probabilities)▪ estimate model from counts

▪ Iterate these steps until convergence

Page 77: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM

▪ We need to be able to compute:▪ Expectation-Step: probability of alignments▪ Maximization-Step: count collection

Page 78: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM

t-table

Page 79: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM

t-table

Page 80: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM

t-table

Page 81: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM

Applying the chain rule:

t-table

Page 82: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM: Expectation Step

Page 83: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM: Expectation Step

Page 84: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

The Trick

Page 85: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM: Expectation Step

Page 86: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM: Expectation Step

E-step

t-table

Page 87: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM: Maximization Step

Page 88: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM: Maximization Step

E-step

M-step

t-table

Page 89: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM: Maximization Step

Page 90: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM: Maximization Step

Update t-table:

p(the|la) = c(the|la)/c(la)

E-step

M-step

t-table

Page 91: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1 and EM: Pseudocode

Page 92: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Convergence

Page 93: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1

▪ Generative model: break up translation process into smaller steps

▪ Simplest possible lexical translation model▪ Additional assumptions▪ All alignment decisions are independent▪ The alignment distribution for each a

i is uniform over all source

words and NULL

Page 94: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

IBM Model 1

▪ Translation probability▪ for a foreign sentence f = (f

1, ..., f

lf ) of length l

f

▪ to an English sentence e = (e1, ..., e

le ) of length l

e

▪ with an alignment of each English word ej to a foreign word f

i

according to the alignment function a : j → i

▪ parameter ϵ is a normalization constant

Page 95: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Example

Page 96: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Evaluating Alignment Models

▪ How do we measure quality of a word-to-word model?

▪ Method 1: use in an end-to-end translation system▪ Hard to measure translation quality▪ Option: human judges▪ Option: reference translations (NIST, BLEU)▪ Option: combinations (HTER)▪ Actually, no one uses word-to-word models alone as TMs

▪ Method 2: measure quality of the alignments produced▪ Easy to measure▪ Hard to know what the gold alignments should be▪ Often does not correlate well with translation quality (like perplexity in LMs)

Page 97: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Alignment Error Rate

Page 98: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Alignment Error Rate

Page 99: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Alignment Error Rate

Page 100: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Alignment Error Rate

Page 101: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Alignment Error Rate

Page 102: Algorithms for NLPdemo.clab.cs.cmu.edu/algo4nlp19/slides/FA19_IITP_lecture... · 2019. 11. 19. · Phrase-Based MT e f source phrase target phrase translation features Translation

Problems with Lexical Translation

▪ Complexity -- exponential in sentence length▪ Weak reordering -- the output is not fluent▪ Many local decisions -- error propagation