Machine Translation and Sequence-to-sequence Models

1

Machine Translation and Sequence-to-sequence Models

Machine Translation and Sequence-to-sequence

Models

Graham Neubig

Carnegie Mellon UniversityCS 11-731

http://phontron.com/class/mtandseq2seq2017/

2


What is Machine Translation?

kare wa ringo wo tabeta .

He ate an apple .

3


What areSequence-to-sequence Models?

Sequence-to-sequence Models

Machine translation:kare wa ringo wo tabeta → he ate an appleTagging:he ate an apple → PRN VBD DET PPDialog:he ate an apple → good, he needs to slim downSpeech Recognition:

→ he ate an appleAnd just about anything...:

1010000111101 → 00011010001101

4


Why MT as a Representative?Useful! Global MT Market

Expected To Reach $983.3Million by 2022

Source: The Register

Source: Grand View ResearchImperfect...

5


MT and Machine Learning

Big Data! Billions of words for major languages … but little for others

Well-defined, Difficult Problem!Use for algorithms, math, etc.

Algorithms Widely Applicable!

6


MT and Linguistics

트레이나 베이커는 좋은 사람이니까요

Baker yinikkayo tray or a good man

Morphology! 이니까요 is a variant of 이다 (to be)

Syntax! should keep subject together

Trina Baker is a good person

Semantics! “Trina” is probably not a man...

… and so much more!

7


Class Organization

8


Class Format● Before class:

● Read the assigned material● Ask questions via web (piazza/email)

● In class:● Take a small quiz about material● Discussion/questions● Pseudo-code walk● Programming (TAs/Instructor will supervise)

9


Assignments● Assignment 1: Create a neural

sequence-to-sequence modeling system.Turn in code to run it, and a short 1-2page report.

● Assignment 2: Create a symbolicsequence-to-sequence modeling system.Similarly turn in code/report.

● Final project: Come up with aninteresting new idea and test it.

10


Assignment Instructions

● Bring your computer to every class and make a Githubaccount.

● We recommend you implement in the followinglibraries:● DyNet: for neural networks (C++ or Python)● OpenFST: for transducers, if you use them (C++)● pyfst: for transducers in Python

● It is OK to work in small groups up to 3, particularly forthe final project. If you do so, please use a shared gitrepository and commit the code that you write, and inreports note who did what part of the project.

11


Class Grading

● Short quizzes: 20%

● Assignment 1: 20%

● Assignment 2: 20%

● Final Project: 40%

12


Class Plan1. Introduction (Today): 1 class2. Language Models: 4 classes3. Neural MT: 2 classes3. Evaluation/Data: 2 classes4. Symbolic MT: 4 classes5. Algorithms: 2 classes7. Advanced Topics: 11 classes8. Final Project Discussion: 2 classes

#1 Due

#2 Due

ProjectDue

13


Guest Lectures● Bob Frederking:

Knowledge-based Translation● LP Morency:

Something Multi-modal

(Date TBD)

14


Statistical Machine Translation

15


Statistical Machine Translation

kare wa ringo wo tabeta .

He ate an apple .

F =

E =

Probability model: P(E|F;Θ)

Parameters

16


Problems in SMT● Modeling: How do we define P(E|F;Θ)?● Learning: How do we learn Θ?● Search: Given F, how do we find the

highest scoring translation?

● Evaluation: Given E' and a humanreference E, how do we determine howgood E' is?

E' = argmaxE P(E|F;Θ)

17


Part 1: Neural Models

18


Language Models 1: n-gramLanguage Models

E1 = he ate an apple

E2 = he ate an apples

E4 = preliminary orange orange

E3 = he insulted an apple

Given multiple candidates,which is most likely asan English sentence?

● Definition of language modeling● Count-based n-gram language models● Evaluating language models● Implement: n-gram language model

19


Language Models 2: Log-linearLanguage Models

● Log-linear language models● Stochastic gradient descent● Features for language modeling● Implement: Log-linear language model

w2,giving

= w1,a

=

athetalkgifthat…

3.02.5

-0.20.11.2…

b =

-0.2-0.31.02.0

-1.2…

-6.0-5.10.20.10.6…

s =

-3.2-2.91.02.20.6…

20


Language Models 3: NeuralNetworks and Feed-forward LMs

● Neural networks and back-propagation● Feed-forward neural language models● Mini-batch training● Implement: Feed-forward LM

<s> <s> this is a pen </s>

21


Language Models 4:Recurrent LMs

● Recurrent neural networks● Vanishing Gradient and LSTMs/GRUs● Regularization and dropout● Implement: Recurrent neural network LM

<s> <s> this is a pen </s>

22


Neural MT 1:Encoder-decoder Models

this is a pen </s>

kore wa pen desu </s>

kore penwa desu

● Encoder-decoder Models● Searching for hypotheses● Mini-batched training● Implement: Encoder-decoder model

23


Neural MT 2:Attentional Models

● Attention in its various varieties● Unknown word replacement● Attention improvements, coverage models● Implement: Attentional model

kouenwookonaimasu</s>

g1,...,g

4a

1a

2a

3a

4

hi-1

hi

ri-1 P(e

i|F,e

1,...,e

i-1)

24


Data and Evaluation

25


Creating Data

● Preprocessing● Document harvesting and crowdsourcing● Other tasks: dialog, captioning● Implement: Find/preprocess data

26


Evaluationtaro ga hanako wo otozureta

Taro visited Hanako the Taro visited the Hanako Hanako visited Taro

Adequate? ○　　　　　　 ○ ☓Fluent?　 ○ ☓ ○Better? B, C C

● Human evaluation● Automatic evaluation● Significance tests and meta-evaluation ● Implement: BLEU and measure correlation

27


Symbolic TranslationModels

28


Symbolic Methods 1:IBM Models

● The IBM/HMM models● The EM algorithm● Finding word alignments ● Implement: Word alignment

太郎が花子を訪問した。

taro visited hanako .

太郎が花子を訪問した。

taro visited hanako .

29


Symbolic Methods 2:Monotonic Symbolic Models

● Models for sequence transduction● The Viterbi algorithm● Weighted finite-state transducers● Implement: A part-of-speech tagger

he ate an apple↓

PRN VBD DET PP

30


Symbolic Methods 3:Phrase-based MT

E = I will give a talk at CMU .

watashi waI

CMU deat CMU

kouena talk

wo okonaimasuwill give

.

.

watashi waI

CMU deat CMU

kouena talk

wo okonaimasuwill give

.

.

F = watashi wa CMU de kouen wo okonaimasu .

● Phrase extraction and scoring● Reordering models● Phrase-based decoding● Implement: Phrase extraction or decoding

31


Symbolic Methods 4:Tree-based MT

● Graphs and hyper-graphs● Synchronous context free grammars● Tree substitution grammars● Implement: Search over hyper-graphs

CMU de kouen wo okonaimasu

VP0-5

PP0-1

VP2-5

PP2-3

N2

P3

V4

N0

P1

VP4

x2 at x1

x2 x1

CMU

a talk

give

give a talkat CMU

32


Algorithms

33


Algorithms 1: Search

● Beam search and cube pruning● Hypothesis recombination● Future costs, A* search● Implement: Beam search

○ ○ ○0, </s>

● ○ ○1, yes

○ ● ○2, is

○ ○ ●3, or

● ● ○2, is

● ○ ●3, or

● ● ●2, true

○ ● ●2, is

● ● ●3, or

</s>XX

X

34


Algorithms 2:Parameter Optimization

● Loss functions● Deciding the hypothesis space● Optimization criteria● Implement: Optimization of NMT or PBMT

LM TM RM-4 -3 -1 -2.2-5 -4 -1 -2.7-2 -3 -2 -2.3

Highest ○

0.2*0.2*0.2*

0.3*0.3*0.3*

0.5*0.5*0.5*

○ Taro visited Hanako☓ the Taro visited the Hanako☓ Hanako visited Taro

35


Advanced Topics

36


Other Sequence-to-sequenceTasks

● Case studies about task-specific models● Consistency constraints in tagging● Diversity objectives in dialog● Dynamic programming in speech

● Implement: Try models on other tasks

he ate an apple → PRN VBD DET PPhe ate an apple → good, he needs to slim down

→ he ate an apple

37


Ensembling/SystemCombination

● Ensembles and distillation● Post-hoc hypothesis combination● Reranking● Implement: Ensembled decoding

Model 1 + Model 2

38


Hybrid Neural-symbolicModels

● Symbolic models with neural components● Neural models with symbolic components● Implement: Implement lexicons in NMT or

neural feature functions

watashi waI

CMU deat CMU

39


Multi-lingual and Multi-taskLearning

● Learning for multiple tasks● Learning for multiple languages● Implement: Implement a multi-lingual neural

system

hello こんにちはhola

40


Subword Models

● Character models● Subword models● Morphology models● Implement: Implement subword splitting

reconstructed

re+ construct+ ed

41


Document Level Models

● Document level modeling● Document level evaluation● Stream decoding● Implement: Document level measures

42


For Next Class

43


Homework

● Read n-gram language modeling materials

● Get software working on your machine (doing all atonce may be more efficient?)● By Thursday 1/19: Python● By Tuesday 1/24: Numpy● By Thursday 1/26: DyNet

Machine Translation and Sequence-to-sequence Models

Documents

Transcript of Machine Translation and Sequence-to-sequence Models