Bridging the Gap: Machine Translation for Lesser Resourced Languages
description
Transcript of Bridging the Gap: Machine Translation for Lesser Resourced Languages
Bridging the Gap: Machine Translation for Lesser Resourced Languages
Christian Monson, Ariadna Font Llitjós, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell,
Robert Frederking, Erik Peterson, Kathrin Probst
2
Inupiaq100’s of Speakers
Quechua6 Million Speakers
Mapudungun900,000 Speakers
Katrina100’s of Speakers
3
Machine Translation (MT)
SourceLanguage
TargetLanguage
4
Machine Translation (MT)
SourceLanguage
TargetLanguageDirect
Statistical MTExample Based MT
5
Machine Translation (MT)
Text Generation
SourceLanguage
TargetLanguage
TransferRule Based MT
DirectStatistical MT
Example Based MT
Syntactic Parsing
Morphologial Analysis+
6
Machine Translation (MT)
Semantic Analysis
Sentence
Planning
Text Generation
SourceLanguage
TargetLanguage
TransferRule Based MT
DirectStatistical MT
Example Based MT
Interlingua
Syntactic Parsing
Morphologial Analysis+
7
Machine Translation (MT)
Semantic Analysis
Text Generation
SourceLanguage
TargetLanguage
TransferRule Based MT
Interlingua
DirectStatistical MT
Example Based MT
+ High quality- Expertise intensive
development cycle
Syntactic Parsing
Morphologial Analysis+
8
Machine Translation (MT)
Semantic Analysis
Text Generation
SourceLanguage
TargetLanguage
TransferRule Based MT
Interlingua
DirectStatistical MT
Example Based MT
+ Short development time
- Requires large bilingual corpus
Syntactic Parsing
Morphologial Analysis+
9
Machine Translation (MT)
Syntactic Parsing
Semantic Analysis
Text Generation
SourceLanguage
TargetLanguage
Interlingua
Morphologial Analysis+
TransferRule Based MT
DirectStatistical MT
Example Based MT
Our Approach
10
Machine Translation (MT)
Syntactic Parsing
Semantic Analysis
Text Generation
SourceLanguage
TargetLanguage
Interlingua
Morphologial Analysis+
TransferRule Based MT
DirectStatistical MT
Example Based MT
+ High quality- Expertise intensive
development cycle
11
Machine Translation (MT)
Syntactic Parsing
Semantic Analysis
Text Generation
SourceLanguage
TargetLanguage
Interlingua
Morphologial Analysis+ Automate the
development of deep-analysis MT
+ High quality- Expertise intensive
development cycle
12
Our Position
Linguistic Structure
and
Bilingual Informants
help automate the development of
deep-analysis machine translation systems
13
Sub-Problems
1. Morphology Induction
2. Syntax Refinement
14
Morphology Induction
1. Linguistic Structure
2. Bilingual Informants
15
Morphology Induction
1. Linguistic Structure
2. Bilingual Informants
16
Paradigms Organize Morphology
Hab Mode ReportPol / Mood
TenseObj Agr
ke pe (ü)rkela a
fiki fu
Ø Ø Ønu afu
ØØ Ø
Mapudungun
Subj Agr / Mood
(ü)n
li
chi
yu
…
Loc Asp
pa tu
pu ka
Ø Ø
17
Paradigm Discovery in 3 Steps1. Search out partial paradigms in a network of candidates
2. Cluster overlapping partial paradigms
3. Filter the clusters, keeping the largest clusters most likely to model true paradigms
e.er.erá.ido.ieron.ió28: deb, escog, ofrec, roconoc, vend, ...
e.ido.ieron.ir.irá.ió28: asist, dirig, exig, ocurr, sufr, ...
e.erá.ido.ieron.ió28: deb, escog, ...
e.er.ido.ieron.ió46: deb, parec, recog...
e.ido.ieron.irá.ió28: asist, dirig, ...
e.ido.ieron.ir.ió39: asist, bat, sal, ...
e.er.erá.ieron.ió32: deb, padec, romp, ...
e.ido.ieron.ió86: asist, deb, hund,...
e.erá.ieron.ió32: deb, padec, ...
er.ido.ieron.ió58: ascend, ejerc,
recog, ...
ido.ieron.ir.ió44: interrump, sal, ...
azar.e.ido.ieron.ir.ió1: sal
A portion of a Spanish paradigm candidate network
18
Morpho Challenge 2007
Unsupervised Morphology Induction Competition
English• 3rd Place Overall• Bested the Strong Baseline Morfessor (Creutz, 2006)
German• 1st Place when Combined with Morfessor
19
Morpho Challenge 2007
Unsupervised Morphology Induction Competition
English• 3rd Place Overall• Bested the Strong Baseline Morfessor (Creutz, 2006)
German• 1st Place when Combined with Morfessor
No Mapudungun yetAgglutinative sequences of suffixes coming soon
20
Our Machine Translation Architecture
INPUT TEXT
21
Our Machine Translation Architecture
INPUT TEXT
Morphology Analysis
Morphology Analysis Lexicon
22
Our Machine Translation Architecture
INPUT TEXT
Grammar
&
Lexicon
Machine Translation
System
Morphology Analysis
Morphology Analysis Lexicon
23
Morphology Generation
Our Machine Translation Architecture
INPUT TEXT
Grammar
&
Lexicon
Morphology Analysis
Morphology Analysis Lexicon
Morphology Generation
Lexicon
Machine Translation
System
24
Morphology Generation
Our Machine Translation Architecture
INPUT TEXT
Grammar
&
Lexicon
OUTPUT TEXT
Morphology Analysis
Morphology Analysis Lexicon
Morphology Generation
Lexicon
Machine Translation
System
25
Morphology Generation
Our Machine Translation Architecture
INPUT TEXT
Grammar
&
Lexicon
OUTPUT TEXT
Morphology Analysis
Morphology Analysis Lexicon
Morphology Generation
Lexicon
Machine Translation
System
26
Morphology Generation
Our Machine Translation Architecture
INPUT TEXT
Grammar
&
Lexicon
OUTPUT TEXT
Morphology Analysis
Morphology Analysis Lexicon
Morphology Generation
Lexicon
Machine Translation
System
27
Sub-Problems
1. Morphology Induction
2. Syntax Refinement
28
Syntax Refinement
1. Linguistic Structure
2. Bilingual Informants
29
Syntax Refinement
1. Linguistic Structure
2. Bilingual Informants
30
Mapudungun
pelafiñ Maria
Spanish
No vi a María
English
I didn’t see Maria
Linguistic Structure: Syntax
31
Mapudungun
pelafiñ Mariape -la -fi -ñ Mariasee -neg -3.obj -1.subj.indicative Maria
Spanish
No vi a MaríaNo vi a Maríaneg see.1.subj.past.indicative acc Maria
English
I didn’t see Maria
Linguistic Structure: Syntax
32
V
pe
pe-la-fi-ñ Maria
33
V
pe
pe-la-fi-ñ Maria
VSuff
laNegation = +
34
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffGPass all features up
35
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fiobject person = 3
36
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffGPass all features up from both children
37
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
person = 1number = sgmood = ind
38
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
Pass all features up from both children
VSuffG
39
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
Pass all features up from both children
VSuffG
VCheck that:1) negation = +2) tense is undefined
40
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
V NP
N
Maria
N person = 3number = sghuman = +
41
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
Check that NP is human = +
Pass features up from V VP
42
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
43
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
Pass all features to Spanish side
44
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
Pass all features down
45
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
Pass object features down
46
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
Accusative marker on objects is introduced because human = +
VP
NP“a”V
47
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
VP::VP [VBar NP] -> [VBar "a" NP]( (X1::Y1)
(X2::Y3)
((X2 type) = (*NOT* personal)) ((X2 human) =c +)
(X0 = X1) ((X0 object) = X2)
(Y0 = X0)
((Y0 object) = (X0 object))(Y1 = Y0)(Y3 = (Y0 object))((Y1 objmarker person) = (Y3 person))((Y1 objmarker number) = (Y3 number))((Y1 objmarker gender) = (Y3 gender)))
48
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
Pass person, number, and mood features to Spanish Verb
Assign tense = past
49
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
Introduced because negation = +
50
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
ver
51
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
vervi
person = 1number = sgmood = indicativetense = past
52
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
vi N
María
N
Pass features over to Spanish side
53
V
pe
I didn’t see Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
vi N
María
N
54
Syntax Refinement
1. Linguistic Structure
2. Bilingual Informants
55
Morphology Generation
Syntax Refinement Architecture
INPUT TEXT
Grammar
&
Lexicon
Run-Time MT
System
OUTPUT TEXT
Morphology Analysis
Morphology Analysis Lexicon
Morphology Generation
Lexicon
56
Morphology Generation
INPUT TEXT
Grammar
&
Lexicon
Run-Time MT
System
Rule Refinement
OUTPUT TEXT
Morphology Analysis
Online
Translation
Correction
Tool
Syntax Refinement Architecture
57
INPUT TEXT
Grammar
&
Lexicon
Run-Time MT
System
Rule RefinementMorphology
Analysis
Online
Translation
Correction
Tool
Syntax Refinement Architecture
58
INPUT TEXT
Grammar
&
Lexicon
Run-Time MT
System
Rule Refinement
OUTPUT TEXT
Morphology Analysis
Online
Translation
Correction
Tool
Syntax Refinement Architecture
Morphologhy Generation
59
Children played a game
60
61
62
The children played a game
63
VP
Det
NP
NP
N
niños
N
VP
S
PolP
V
jugaron
V
un N
juego
N
Refining the Grammar
64
VP
Det
NP
NP
N
niños
N
VP
S
PolP
V
jugaron
V
un N
juego
Nlos
Refining the Grammar
65
VP
Det
NP
NP
N
niños
N
VP
S
PolP
V
jugaron
V
un N
juego
Nlos
Refining the Grammar
66
Syntax Refinement Summary
• Increases translation quality on unseen data– English-Spanish experiments (Font Llitjós et al, 2007, MT Summit)
• Generalizes to a Mapudungun-Spanish machine translation system
67
Overall Summary
Linguistic Structure
and
Bilingual Informants
help automate the development of
deep-analysis machine translation systems:
Morphology Induction
and
Syntax Refinement
68
Thank You!