Optimizing Statistical Machine Translation for Text Simplification · Optimizing Statistical...
Transcript of Optimizing Statistical Machine Translation for Text Simplification · Optimizing Statistical...
Optimizing Statistical Machine Translation for Text Simplification
TACL paper @ ACL Aug-9-2016
Wei XuCourtney Napoles
Ellie Pavlich Quanze Chen
Chris Callison-Burch
UPenn ➞ Ohio State University JHU UPenn UPenn ➞ University of Washington UPenn
Text Simplification (A Monolingual Machine Translation Problem)
★ for children, disabled, non-native speakers … ★ for other NLP tasks (MT, summarization …)
Our Paper Last Year (TACL ’15)
Problems in Current Text Simplification Research (TACL ’15)
Data Evaluation System
not muchparaphrasing
no reliableautomatic metric
Simple Wikipediawas not that simple
The Newsela Corpus!very high qualityfreely available
(often use Moses as a black-box)
Wei Xu, Chris Callison-Burch, Courtney Napoles. “Problems in Current Text Simplification Research: New Data Can Help” TACL (2015)
This Paper and This Talk (TACL ’16)
Data Evaluation System
not muchparaphrasing
no reliableautomatic metric
Simple Wikipediawas not that simple
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
This Paper and This Talk (TACL ’16)
Data Evaluation System
not muchparaphrasing
no reliableautomatic metric
Simple Wikipediawas not that simple
Crowdsourcing3.5k sentences
8 referencesWei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in
TACL (2016)
This Paper and This Talk (TACL ’16)
Data Evaluation System
not muchparaphrasing
no reliableautomatic metric
Simple Wikipediawas not that simple
Crowdsourcing3.5k sentences
8 references
SARI the 1st metric
that worksWei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in
TACL (2016)
This Paper and This Talk (TACL ’16)
Data Evaluation System
not muchparaphrasing
no reliableautomatic metric
Simple Wikipediawas not that simple
Crowdsourcing3.5k sentences
8 references
SARI the 1st metric
that works
state-of-the-artparaphrasingfor simplicity
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
DataCrowdsourcing3.5k sentences
8 references
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
★ paraphrasing! (little deletion; no splitting) ★ simple quality control (see our NAACL 2015 paper) ★ freely available
DataCrowdsourcing3.5k sentences
8 references
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
★ paraphrasing! (little deletion; no splitting) ★ simple quality control (see our NAACL 2015 paper) ★ freely available
DataCrowdsourcing3.5k sentences
8 references
Wiki Jeddah is the principal gateway to Mecca, Islam’s holiest city, which able-bodied Muslims are required to visit at least once in their lifetime.
Turk #1 Jeddah is the main entrance to Mecca, the holiest city in Islam, which all healthy Muslims need to visit at least once in their life.
Our System
Jeddah is the main gateway to Mecca, Islam’s holiest city, which sound Muslims have to visit at least once in their life.
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
★ paraphrasing! (little deletion; no splitting) ★ simple quality control (see our NAACL 2015 paper) ★ freely available
DataCrowdsourcing3.5k sentences
8 references
Wiki Jeddah is the principal gateway to Mecca, Islam’s holiest city, which able-bodied Muslims are required to visit at least once in their lifetime.
Turk #1 Jeddah is the main entrance to Mecca, the holiest city in Islam, which all healthy Muslims need to visit at least once in their life.
Our System
Jeddah is the main gateway to Mecca, Islam’s holiest city, which sound Muslims have to visit at least once in their life.
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
★ paraphrasing! (little deletion; no splitting) ★ simple quality control (see our NAACL 2015 paper) ★ freely available
DataCrowdsourcing3.5k sentences
8 references
Wiki Jeddah is the principal gateway to Mecca, Islam’s holiest city, which able-bodied Muslims are required to visit at least once in their lifetime.
Turk #1 Jeddah is the main entrance to Mecca, the holiest city in Islam, which all healthy Muslims need to visit at least once in their life.
Our System
Jeddah is the main gateway to Mecca, Islam’s holiest city, which sound Muslims have to visit at least once in their life.
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
EvaluationSARI
the 1st metric that works
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
EvaluationSARI
the 1st metric that works
Input
SystemoutputHuman
references
keepO ∩ R ∩ I
addO ∩ R ∩ not I
delI ∩ not O ∩ not R
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
EvaluationSARI
the 1st metric that works
Input
SystemoutputHuman
references
keepO ∩ R ∩ I
addO ∩ R ∩ not I
delI ∩ not O ∩ not R
★ light-weighted and tunable ★ SARI — “the BLEU for simplification” ★ take input into account!★ use F-measure
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
EvaluationSARI
the 1st metric that works
Input
SystemoutputHuman
references
keepO ∩ R ∩ I
addO ∩ R ∩ not I
delI ∩ not O ∩ not R
SARI = d1Fadd + d2Fkeep + d3Pdel
d1 = d2 = d3 = 1/3
★ light-weighted and tunable ★ SARI — “the BLEU for simplification” ★ take input into account!★ use F-measure
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
Systemstate-of-the-artparaphrasingfor simplicity
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
Systemstate-of-the-artparaphrasingfor simplicity
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
★ In-depth adaptation of statistical MT (Joshua) • tuning data (small parallel corpus) • tunable metric (PRO pairwise ranking optimization) • PPDB (no large parallel corpus needed!) • simplification-specific features
Systemstate-of-the-artparaphrasingfor simplicity
★ In-depth adaptation of statistical MT (Joshua) • tuning data (small parallel corpus) • tunable metric (PRO pairwise ranking optimization) • PPDB (no large parallel corpus needed!) • simplification-specific features
Lexical applauded ⟶ praised
Phrasal generally acknowledged ⟶ widely accepted
Syntactic NNP’s JJ legislation ⟶ the JJ law of NNPWei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in
TACL (2016)
Input:
Lexical applauded ⟶ praised
Phrasal generally acknowledged ⟶ widely accepted
Syntactic NNP’s JJ legislation ⟶ the JJ law of NNP
PPDB (4+ million rules)
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
Input:
Candidate Simplifications:
Lexical applauded ⟶ praised
Phrasal generally acknowledged ⟶ widely accepted
Syntactic NNP’s JJ legislation ⟶ the JJ law of NNP
SARI = 0.21
SARI = 0.33
PPDB (4+ million rules)
SARI = 0.15Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in
TACL (2016)
Human Evaluation
0
1
2
3
4
Simple Wikipedia
Turk (Wubben et al. 2012) Joshua -BLEU-
Joshua -SARI-
Grammar Meaning Simplicity+
Automatic Simplification
tuning towards BLEU leaves input unchanged(8 references)
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
Human Evaluation
0
1
2
3
4
Simple Wikipedia
Turk Moses + reranking(Wubben et al. 2012)
Joshua -BLEU-
Joshua -SARI-
Grammar Meaning Simplicity+
Automatic Simplification
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
Human Evaluation
0
1
2
3
4
Simple Wikipedia
Turk Moses + reranking(Wubben et al. 2012)
Joshua -BLEU-
Joshua -SARI-
Grammar Meaning Simplicity+
Automatic Simplification
tuning towards BLEU leaves input unchangedWhy? Why? Why?Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in
TACL (2016)
Automatic Metrics (Spearman’s rho correlation w/ human ratings)
Grammar Meaning Simplicity
BLEU 0.59 0.70 0.15
FKBLEU 0.35 0.41 0.24
SARI 0.34 0.40 0.34
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
Automatic Metrics (Spearman’s rho correlation w/ human ratings)
Grammar Meaning Simplicity
BLEU 0.59 0.70 0.15
FKBLEU 0.35 0.41 0.24
SARI 0.34 0.40 0.34
this is not necessarily a bad thingWhy? Why? Why?
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
Gra
mm
ar
BLEU
0 0.25 0.5 0.75 1
BLEU correlates w/ Grammar
If a system leaves input unchanged, it gets
perfect Grammar perfect Meaning
and very high BLEU score.because there are multiple references
(unlike bilingual translation)
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
high Grammar high BLEU
Sim
plic
ity
0 0.25 0.5 0.75 1
BLEU is not for SimplificationG
ram
mar
BLEU
0 0.25 0.5 0.75 1
If a system leaves input unchanged, it gets
perfect Grammar perfect Meaning
and very high BLEU score.But it is not good simplification!
low Simplicity high BLEU
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
high Grammar high BLEU
Gra
mm
ar
BLEU
0 0.25 0.5 0.75 1
SARI is way better than BLEU
Gra
mm
arSARI
0.12 0.24 0.36 0.48 0.6
high Grammarcould be bad simplification
low SARI high Grammar
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
low BLEU high Grammar
Sim
plic
ity
BLEU
0 0.25 0.5 0.75 1
SARI is way better than BLEU
Sim
plic
itySARI
0.12 0.24 0.36 0.48 0.6
low Simplicitymust be bad simplification
low Simplicityhigh BLEU
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
low Simplicityhigh SARI
Machine Translation
Bilingual(Paraphrase =)Monolingual
naturally availableparallel text a lot much less
sensitive to error less more
objective straightforward sophisticated
standard metric BLEU etc. SARI and ?
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
Thank you
Sponsor: National Science FoundationData and code available at https://cocoxu.github.io/
Questions?
I am looking for PhD students and post-docs. Email me! Wei Xu @ Ohio State University
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)
Newsela Dataset (TACL ’15)
Wei Xu, Chris Callison-Burch, Courtney Napoles. “Problems in Current Text Simplification Research: New Data Can Help” TACL (2015)
every article at 5 levels of simplification total 50k+ sentences, written by trained editors, comes with comprehension quizzes