Machine Translation and Sequence-to-sequence Models

43
1 Machine Translation and Sequence-to-sequence Models Machine Translation and Sequence-to-sequence Models Graham Neubig Carnegie Mellon University CS 11-731 http://phontron.com/class/mtandseq2seq2017/

Transcript of Machine Translation and Sequence-to-sequence Models

Page 1: Machine Translation and Sequence-to-sequence Models

1

Machine Translation and Sequence-to-sequence Models

Machine Translation and Sequence-to-sequence

Models

Graham Neubig

Carnegie Mellon UniversityCS 11-731

http://phontron.com/class/mtandseq2seq2017/

Page 2: Machine Translation and Sequence-to-sequence Models

2

Machine Translation and Sequence-to-sequence Models

What is Machine Translation?

kare wa ringo wo tabeta .

He ate an apple .

Page 3: Machine Translation and Sequence-to-sequence Models

3

Machine Translation and Sequence-to-sequence Models

What areSequence-to-sequence Models?

Sequence-to-sequence Models

Machine translation:kare wa ringo wo tabeta → he ate an appleTagging:he ate an apple → PRN VBD DET PPDialog:he ate an apple → good, he needs to slim downSpeech Recognition:

→ he ate an appleAnd just about anything...:

1010000111101 → 00011010001101

Page 4: Machine Translation and Sequence-to-sequence Models

4

Machine Translation and Sequence-to-sequence Models

Why MT as a Representative?Useful! Global MT Market

Expected To Reach $983.3Million by 2022

Source: The Register

Source: Grand View ResearchImperfect...

Page 5: Machine Translation and Sequence-to-sequence Models

5

Machine Translation and Sequence-to-sequence Models

MT and Machine Learning

Big Data! Billions of words for major languages … but little for others

Well-defined, Difficult Problem!Use for algorithms, math, etc.

Algorithms Widely Applicable!

Page 6: Machine Translation and Sequence-to-sequence Models

6

Machine Translation and Sequence-to-sequence Models

MT and Linguistics

트레이나 베이커는 좋은 사람이니까요

Baker yinikkayo tray or a good man

Morphology! 이니까요 is a variant of 이다 (to be)

Syntax! should keep subject together

Trina Baker is a good person

Semantics! “Trina” is probably not a man...

… and so much more!

Page 7: Machine Translation and Sequence-to-sequence Models

7

Machine Translation and Sequence-to-sequence Models

Class Organization

Page 8: Machine Translation and Sequence-to-sequence Models

8

Machine Translation and Sequence-to-sequence Models

Class Format● Before class:

● Read the assigned material● Ask questions via web (piazza/email)

● In class:● Take a small quiz about material● Discussion/questions● Pseudo-code walk● Programming (TAs/Instructor will supervise)

Page 9: Machine Translation and Sequence-to-sequence Models

9

Machine Translation and Sequence-to-sequence Models

Assignments● Assignment 1: Create a neural

sequence-to-sequence modeling system.Turn in code to run it, and a short 1-2page report.

● Assignment 2: Create a symbolicsequence-to-sequence modeling system.Similarly turn in code/report.

● Final project: Come up with aninteresting new idea and test it.

Page 10: Machine Translation and Sequence-to-sequence Models

10

Machine Translation and Sequence-to-sequence Models

Assignment Instructions

● Bring your computer to every class and make a Githubaccount.

● We recommend you implement in the followinglibraries:● DyNet: for neural networks (C++ or Python)● OpenFST: for transducers, if you use them (C++)● pyfst: for transducers in Python

● It is OK to work in small groups up to 3, particularly forthe final project. If you do so, please use a shared gitrepository and commit the code that you write, and inreports note who did what part of the project.

Page 11: Machine Translation and Sequence-to-sequence Models

11

Machine Translation and Sequence-to-sequence Models

Class Grading

● Short quizzes: 20%

● Assignment 1: 20%

● Assignment 2: 20%

● Final Project: 40%

Page 12: Machine Translation and Sequence-to-sequence Models

12

Machine Translation and Sequence-to-sequence Models

Class Plan1. Introduction (Today): 1 class2. Language Models: 4 classes3. Neural MT: 2 classes3. Evaluation/Data: 2 classes4. Symbolic MT: 4 classes5. Algorithms: 2 classes7. Advanced Topics: 11 classes8. Final Project Discussion: 2 classes

#1 Due

#2 Due

ProjectDue

Page 13: Machine Translation and Sequence-to-sequence Models

13

Machine Translation and Sequence-to-sequence Models

Guest Lectures● Bob Frederking:

Knowledge-based Translation● LP Morency:

Something Multi-modal

(Date TBD)

Page 14: Machine Translation and Sequence-to-sequence Models

14

Machine Translation and Sequence-to-sequence Models

Statistical Machine Translation

Page 15: Machine Translation and Sequence-to-sequence Models

15

Machine Translation and Sequence-to-sequence Models

Statistical Machine Translation

kare wa ringo wo tabeta .

He ate an apple .

F =

E =

Probability model: P(E|F;Θ)

Parameters

Page 16: Machine Translation and Sequence-to-sequence Models

16

Machine Translation and Sequence-to-sequence Models

Problems in SMT● Modeling: How do we define P(E|F;Θ)?● Learning: How do we learn Θ?● Search: Given F, how do we find the

highest scoring translation?

● Evaluation: Given E' and a humanreference E, how do we determine howgood E' is?

E' = argmaxE P(E|F;Θ)

Page 17: Machine Translation and Sequence-to-sequence Models

17

Machine Translation and Sequence-to-sequence Models

Part 1: Neural Models

Page 18: Machine Translation and Sequence-to-sequence Models

18

Machine Translation and Sequence-to-sequence Models

Language Models 1: n-gramLanguage Models

E1 = he ate an apple

E2 = he ate an apples

E4 = preliminary orange orange

E3 = he insulted an apple

Given multiple candidates,which is most likely asan English sentence?

● Definition of language modeling● Count-based n-gram language models● Evaluating language models● Implement: n-gram language model

Page 19: Machine Translation and Sequence-to-sequence Models

19

Machine Translation and Sequence-to-sequence Models

Language Models 2: Log-linearLanguage Models

● Log-linear language models● Stochastic gradient descent● Features for language modeling● Implement: Log-linear language model

w2,giving

= w1,a

=

athetalkgifthat…

3.02.5

-0.20.11.2…

b =

-0.2-0.31.02.0

-1.2…

-6.0-5.10.20.10.6…

s =

-3.2-2.91.02.20.6…

Page 20: Machine Translation and Sequence-to-sequence Models

20

Machine Translation and Sequence-to-sequence Models

Language Models 3: NeuralNetworks and Feed-forward LMs

● Neural networks and back-propagation● Feed-forward neural language models● Mini-batch training● Implement: Feed-forward LM

<s> <s> this is a pen </s>

Page 21: Machine Translation and Sequence-to-sequence Models

21

Machine Translation and Sequence-to-sequence Models

Language Models 4:Recurrent LMs

● Recurrent neural networks● Vanishing Gradient and LSTMs/GRUs● Regularization and dropout● Implement: Recurrent neural network LM

<s> <s> this is a pen </s>

Page 22: Machine Translation and Sequence-to-sequence Models

22

Machine Translation and Sequence-to-sequence Models

Neural MT 1:Encoder-decoder Models

this is a pen </s>

kore wa pen desu </s>

kore penwa desu

● Encoder-decoder Models● Searching for hypotheses● Mini-batched training● Implement: Encoder-decoder model

Page 23: Machine Translation and Sequence-to-sequence Models

23

Machine Translation and Sequence-to-sequence Models

Neural MT 2:Attentional Models

● Attention in its various varieties● Unknown word replacement● Attention improvements, coverage models● Implement: Attentional model

kouenwookonaimasu</s>

g1,...,g

4a

1a

2a

3a

4

hi-1

hi

ri-1 P(e

i|F,e

1,...,e

i-1)

Page 24: Machine Translation and Sequence-to-sequence Models

24

Machine Translation and Sequence-to-sequence Models

Data and Evaluation

Page 25: Machine Translation and Sequence-to-sequence Models

25

Machine Translation and Sequence-to-sequence Models

Creating Data

● Preprocessing● Document harvesting and crowdsourcing● Other tasks: dialog, captioning● Implement: Find/preprocess data

Page 26: Machine Translation and Sequence-to-sequence Models

26

Machine Translation and Sequence-to-sequence Models

Evaluationtaro ga hanako wo otozureta

Taro visited Hanako the Taro visited the Hanako Hanako visited Taro

Adequate? ○       ○ ☓Fluent?  ○ ☓ ○Better? B, C C

● Human evaluation● Automatic evaluation● Significance tests and meta-evaluation ● Implement: BLEU and measure correlation

Page 27: Machine Translation and Sequence-to-sequence Models

27

Machine Translation and Sequence-to-sequence Models

Symbolic TranslationModels

Page 28: Machine Translation and Sequence-to-sequence Models

28

Machine Translation and Sequence-to-sequence Models

Symbolic Methods 1:IBM Models

● The IBM/HMM models● The EM algorithm● Finding word alignments ● Implement: Word alignment

太郎 が 花子 を 訪問 した 。

taro visited hanako .

太郎 が 花子 を 訪問 した 。

taro visited hanako .

Page 29: Machine Translation and Sequence-to-sequence Models

29

Machine Translation and Sequence-to-sequence Models

Symbolic Methods 2:Monotonic Symbolic Models

● Models for sequence transduction● The Viterbi algorithm● Weighted finite-state transducers● Implement: A part-of-speech tagger

he ate an apple↓

PRN VBD DET PP

Page 30: Machine Translation and Sequence-to-sequence Models

30

Machine Translation and Sequence-to-sequence Models

Symbolic Methods 3:Phrase-based MT

E = I will give a talk at CMU .

watashi waI

CMU deat CMU

kouena talk

wo okonaimasuwill give

.

.

watashi waI

CMU deat CMU

kouena talk

wo okonaimasuwill give

.

.

F = watashi wa CMU de kouen wo okonaimasu .

● Phrase extraction and scoring● Reordering models● Phrase-based decoding● Implement: Phrase extraction or decoding

Page 31: Machine Translation and Sequence-to-sequence Models

31

Machine Translation and Sequence-to-sequence Models

Symbolic Methods 4:Tree-based MT

● Graphs and hyper-graphs● Synchronous context free grammars● Tree substitution grammars● Implement: Search over hyper-graphs

CMU de kouen wo okonaimasu

VP0-5

PP0-1

VP2-5

PP2-3

N2

P3

V4

N0

P1

VP4

x2 at x1

x2 x1

CMU

a talk

give

give a talkat CMU

Page 32: Machine Translation and Sequence-to-sequence Models

32

Machine Translation and Sequence-to-sequence Models

Algorithms

Page 33: Machine Translation and Sequence-to-sequence Models

33

Machine Translation and Sequence-to-sequence Models

Algorithms 1: Search

● Beam search and cube pruning● Hypothesis recombination● Future costs, A* search● Implement: Beam search

○ ○ ○0, </s>

● ○ ○1, yes

○ ● ○2, is

○ ○ ●3, or

● ● ○2, is

● ○ ●3, or

● ● ●2, true

○ ● ●2, is

● ● ●3, or

</s>XX

X

Page 34: Machine Translation and Sequence-to-sequence Models

34

Machine Translation and Sequence-to-sequence Models

Algorithms 2:Parameter Optimization

● Loss functions● Deciding the hypothesis space● Optimization criteria● Implement: Optimization of NMT or PBMT

LM TM RM-4 -3 -1 -2.2-5 -4 -1 -2.7-2 -3 -2 -2.3

Highest ○

0.2*0.2*0.2*

0.3*0.3*0.3*

0.5*0.5*0.5*

○ Taro visited Hanako☓ the Taro visited the Hanako☓ Hanako visited Taro

Page 35: Machine Translation and Sequence-to-sequence Models

35

Machine Translation and Sequence-to-sequence Models

Advanced Topics

Page 36: Machine Translation and Sequence-to-sequence Models

36

Machine Translation and Sequence-to-sequence Models

Other Sequence-to-sequenceTasks

● Case studies about task-specific models● Consistency constraints in tagging● Diversity objectives in dialog● Dynamic programming in speech

● Implement: Try models on other tasks

he ate an apple → PRN VBD DET PPhe ate an apple → good, he needs to slim down

→ he ate an apple

Page 37: Machine Translation and Sequence-to-sequence Models

37

Machine Translation and Sequence-to-sequence Models

Ensembling/SystemCombination

● Ensembles and distillation● Post-hoc hypothesis combination● Reranking● Implement: Ensembled decoding

Model 1 + Model 2

Page 38: Machine Translation and Sequence-to-sequence Models

38

Machine Translation and Sequence-to-sequence Models

Hybrid Neural-symbolicModels

● Symbolic models with neural components● Neural models with symbolic components● Implement: Implement lexicons in NMT or

neural feature functions

watashi waI

CMU deat CMU

Page 39: Machine Translation and Sequence-to-sequence Models

39

Machine Translation and Sequence-to-sequence Models

Multi-lingual and Multi-taskLearning

● Learning for multiple tasks● Learning for multiple languages● Implement: Implement a multi-lingual neural

system

hello こんにちはhola

Page 40: Machine Translation and Sequence-to-sequence Models

40

Machine Translation and Sequence-to-sequence Models

Subword Models

● Character models● Subword models● Morphology models● Implement: Implement subword splitting

reconstructed

re+ construct+ ed

Page 41: Machine Translation and Sequence-to-sequence Models

41

Machine Translation and Sequence-to-sequence Models

Document Level Models

● Document level modeling● Document level evaluation● Stream decoding● Implement: Document level measures

Page 42: Machine Translation and Sequence-to-sequence Models

42

Machine Translation and Sequence-to-sequence Models

For Next Class

Page 43: Machine Translation and Sequence-to-sequence Models

43

Machine Translation and Sequence-to-sequence Models

Homework

● Read n-gram language modeling materials

● Get software working on your machine (doing all atonce may be more efficient?)● By Thursday 1/19: Python● By Tuesday 1/24: Numpy● By Thursday 1/26: DyNet