Machine Translation and Sequence-to-sequence Models
Transcript of Machine Translation and Sequence-to-sequence Models
1
Machine Translation and Sequence-to-sequence Models
Machine Translation and Sequence-to-sequence
Models
Graham Neubig
Carnegie Mellon UniversityCS 11-731
http://phontron.com/class/mtandseq2seq2017/
2
Machine Translation and Sequence-to-sequence Models
What is Machine Translation?
kare wa ringo wo tabeta .
He ate an apple .
3
Machine Translation and Sequence-to-sequence Models
What areSequence-to-sequence Models?
Sequence-to-sequence Models
Machine translation:kare wa ringo wo tabeta → he ate an appleTagging:he ate an apple → PRN VBD DET PPDialog:he ate an apple → good, he needs to slim downSpeech Recognition:
→ he ate an appleAnd just about anything...:
1010000111101 → 00011010001101
4
Machine Translation and Sequence-to-sequence Models
Why MT as a Representative?Useful! Global MT Market
Expected To Reach $983.3Million by 2022
Source: The Register
Source: Grand View ResearchImperfect...
5
Machine Translation and Sequence-to-sequence Models
MT and Machine Learning
Big Data! Billions of words for major languages … but little for others
Well-defined, Difficult Problem!Use for algorithms, math, etc.
Algorithms Widely Applicable!
6
Machine Translation and Sequence-to-sequence Models
MT and Linguistics
트레이나 베이커는 좋은 사람이니까요
Baker yinikkayo tray or a good man
Morphology! 이니까요 is a variant of 이다 (to be)
Syntax! should keep subject together
Trina Baker is a good person
Semantics! “Trina” is probably not a man...
… and so much more!
7
Machine Translation and Sequence-to-sequence Models
Class Organization
8
Machine Translation and Sequence-to-sequence Models
Class Format● Before class:
● Read the assigned material● Ask questions via web (piazza/email)
● In class:● Take a small quiz about material● Discussion/questions● Pseudo-code walk● Programming (TAs/Instructor will supervise)
9
Machine Translation and Sequence-to-sequence Models
Assignments● Assignment 1: Create a neural
sequence-to-sequence modeling system.Turn in code to run it, and a short 1-2page report.
● Assignment 2: Create a symbolicsequence-to-sequence modeling system.Similarly turn in code/report.
● Final project: Come up with aninteresting new idea and test it.
10
Machine Translation and Sequence-to-sequence Models
Assignment Instructions
● Bring your computer to every class and make a Githubaccount.
● We recommend you implement in the followinglibraries:● DyNet: for neural networks (C++ or Python)● OpenFST: for transducers, if you use them (C++)● pyfst: for transducers in Python
● It is OK to work in small groups up to 3, particularly forthe final project. If you do so, please use a shared gitrepository and commit the code that you write, and inreports note who did what part of the project.
11
Machine Translation and Sequence-to-sequence Models
Class Grading
● Short quizzes: 20%
● Assignment 1: 20%
● Assignment 2: 20%
● Final Project: 40%
12
Machine Translation and Sequence-to-sequence Models
Class Plan1. Introduction (Today): 1 class2. Language Models: 4 classes3. Neural MT: 2 classes3. Evaluation/Data: 2 classes4. Symbolic MT: 4 classes5. Algorithms: 2 classes7. Advanced Topics: 11 classes8. Final Project Discussion: 2 classes
#1 Due
#2 Due
ProjectDue
13
Machine Translation and Sequence-to-sequence Models
Guest Lectures● Bob Frederking:
Knowledge-based Translation● LP Morency:
Something Multi-modal
(Date TBD)
14
Machine Translation and Sequence-to-sequence Models
Statistical Machine Translation
15
Machine Translation and Sequence-to-sequence Models
Statistical Machine Translation
kare wa ringo wo tabeta .
He ate an apple .
F =
E =
Probability model: P(E|F;Θ)
Parameters
16
Machine Translation and Sequence-to-sequence Models
Problems in SMT● Modeling: How do we define P(E|F;Θ)?● Learning: How do we learn Θ?● Search: Given F, how do we find the
highest scoring translation?
● Evaluation: Given E' and a humanreference E, how do we determine howgood E' is?
E' = argmaxE P(E|F;Θ)
17
Machine Translation and Sequence-to-sequence Models
Part 1: Neural Models
18
Machine Translation and Sequence-to-sequence Models
Language Models 1: n-gramLanguage Models
E1 = he ate an apple
E2 = he ate an apples
E4 = preliminary orange orange
E3 = he insulted an apple
Given multiple candidates,which is most likely asan English sentence?
● Definition of language modeling● Count-based n-gram language models● Evaluating language models● Implement: n-gram language model
19
Machine Translation and Sequence-to-sequence Models
Language Models 2: Log-linearLanguage Models
● Log-linear language models● Stochastic gradient descent● Features for language modeling● Implement: Log-linear language model
w2,giving
= w1,a
=
athetalkgifthat…
3.02.5
-0.20.11.2…
b =
-0.2-0.31.02.0
-1.2…
-6.0-5.10.20.10.6…
s =
-3.2-2.91.02.20.6…
20
Machine Translation and Sequence-to-sequence Models
Language Models 3: NeuralNetworks and Feed-forward LMs
● Neural networks and back-propagation● Feed-forward neural language models● Mini-batch training● Implement: Feed-forward LM
<s> <s> this is a pen </s>
21
Machine Translation and Sequence-to-sequence Models
Language Models 4:Recurrent LMs
● Recurrent neural networks● Vanishing Gradient and LSTMs/GRUs● Regularization and dropout● Implement: Recurrent neural network LM
<s> <s> this is a pen </s>
22
Machine Translation and Sequence-to-sequence Models
Neural MT 1:Encoder-decoder Models
this is a pen </s>
kore wa pen desu </s>
kore penwa desu
● Encoder-decoder Models● Searching for hypotheses● Mini-batched training● Implement: Encoder-decoder model
23
Machine Translation and Sequence-to-sequence Models
Neural MT 2:Attentional Models
● Attention in its various varieties● Unknown word replacement● Attention improvements, coverage models● Implement: Attentional model
kouenwookonaimasu</s>
g1,...,g
4a
1a
2a
3a
4
hi-1
hi
ri-1 P(e
i|F,e
1,...,e
i-1)
24
Machine Translation and Sequence-to-sequence Models
Data and Evaluation
25
Machine Translation and Sequence-to-sequence Models
Creating Data
● Preprocessing● Document harvesting and crowdsourcing● Other tasks: dialog, captioning● Implement: Find/preprocess data
26
Machine Translation and Sequence-to-sequence Models
Evaluationtaro ga hanako wo otozureta
Taro visited Hanako the Taro visited the Hanako Hanako visited Taro
Adequate? ○ ○ ☓Fluent? ○ ☓ ○Better? B, C C
● Human evaluation● Automatic evaluation● Significance tests and meta-evaluation ● Implement: BLEU and measure correlation
27
Machine Translation and Sequence-to-sequence Models
Symbolic TranslationModels
28
Machine Translation and Sequence-to-sequence Models
Symbolic Methods 1:IBM Models
● The IBM/HMM models● The EM algorithm● Finding word alignments ● Implement: Word alignment
太郎 が 花子 を 訪問 した 。
taro visited hanako .
太郎 が 花子 を 訪問 した 。
taro visited hanako .
29
Machine Translation and Sequence-to-sequence Models
Symbolic Methods 2:Monotonic Symbolic Models
● Models for sequence transduction● The Viterbi algorithm● Weighted finite-state transducers● Implement: A part-of-speech tagger
he ate an apple↓
PRN VBD DET PP
30
Machine Translation and Sequence-to-sequence Models
Symbolic Methods 3:Phrase-based MT
E = I will give a talk at CMU .
watashi waI
CMU deat CMU
kouena talk
wo okonaimasuwill give
.
.
watashi waI
CMU deat CMU
kouena talk
wo okonaimasuwill give
.
.
F = watashi wa CMU de kouen wo okonaimasu .
● Phrase extraction and scoring● Reordering models● Phrase-based decoding● Implement: Phrase extraction or decoding
31
Machine Translation and Sequence-to-sequence Models
Symbolic Methods 4:Tree-based MT
● Graphs and hyper-graphs● Synchronous context free grammars● Tree substitution grammars● Implement: Search over hyper-graphs
CMU de kouen wo okonaimasu
VP0-5
PP0-1
VP2-5
PP2-3
N2
P3
V4
N0
P1
VP4
x2 at x1
x2 x1
CMU
a talk
give
give a talkat CMU
32
Machine Translation and Sequence-to-sequence Models
Algorithms
33
Machine Translation and Sequence-to-sequence Models
Algorithms 1: Search
● Beam search and cube pruning● Hypothesis recombination● Future costs, A* search● Implement: Beam search
○ ○ ○0, </s>
● ○ ○1, yes
○ ● ○2, is
○ ○ ●3, or
● ● ○2, is
● ○ ●3, or
● ● ●2, true
○ ● ●2, is
● ● ●3, or
</s>XX
X
34
Machine Translation and Sequence-to-sequence Models
Algorithms 2:Parameter Optimization
● Loss functions● Deciding the hypothesis space● Optimization criteria● Implement: Optimization of NMT or PBMT
LM TM RM-4 -3 -1 -2.2-5 -4 -1 -2.7-2 -3 -2 -2.3
Highest ○
0.2*0.2*0.2*
0.3*0.3*0.3*
0.5*0.5*0.5*
○ Taro visited Hanako☓ the Taro visited the Hanako☓ Hanako visited Taro
35
Machine Translation and Sequence-to-sequence Models
Advanced Topics
36
Machine Translation and Sequence-to-sequence Models
Other Sequence-to-sequenceTasks
● Case studies about task-specific models● Consistency constraints in tagging● Diversity objectives in dialog● Dynamic programming in speech
● Implement: Try models on other tasks
he ate an apple → PRN VBD DET PPhe ate an apple → good, he needs to slim down
→ he ate an apple
37
Machine Translation and Sequence-to-sequence Models
Ensembling/SystemCombination
● Ensembles and distillation● Post-hoc hypothesis combination● Reranking● Implement: Ensembled decoding
Model 1 + Model 2
38
Machine Translation and Sequence-to-sequence Models
Hybrid Neural-symbolicModels
● Symbolic models with neural components● Neural models with symbolic components● Implement: Implement lexicons in NMT or
neural feature functions
watashi waI
CMU deat CMU
39
Machine Translation and Sequence-to-sequence Models
Multi-lingual and Multi-taskLearning
● Learning for multiple tasks● Learning for multiple languages● Implement: Implement a multi-lingual neural
system
hello こんにちはhola
40
Machine Translation and Sequence-to-sequence Models
Subword Models
● Character models● Subword models● Morphology models● Implement: Implement subword splitting
reconstructed
re+ construct+ ed
41
Machine Translation and Sequence-to-sequence Models
Document Level Models
● Document level modeling● Document level evaluation● Stream decoding● Implement: Document level measures
42
Machine Translation and Sequence-to-sequence Models
For Next Class
43
Machine Translation and Sequence-to-sequence Models
Homework
● Read n-gram language modeling materials
● Get software working on your machine (doing all atonce may be more efficient?)● By Thursday 1/19: Python● By Tuesday 1/24: Numpy● By Thursday 1/26: DyNet