Optimizing Statistical Machine Translation for Text Simpliﬁcation · Optimizing Statistical...

Optimizing Statistical Machine Translation for Text Simplification

TACL paper @ ACL Aug-9-2016

Wei XuCourtney Napoles

Ellie Pavlich Quanze Chen

Chris Callison-Burch

UPenn ➞ Ohio State University JHU UPenn UPenn ➞ University of Washington UPenn

Text Simplification (A Monolingual Machine Translation Problem)

★ for children, disabled, non-native speakers … ★ for other NLP tasks (MT, summarization …)

Our Paper Last Year (TACL ’15)

Problems in Current Text Simplification Research (TACL ’15)

Data Evaluation System

not muchparaphrasing

no reliableautomatic metric

Simple Wikipediawas not that simple

The Newsela Corpus!very high qualityfreely available

(often use Moses as a black-box)

Wei Xu, Chris Callison-Burch, Courtney Napoles. “Problems in Current Text Simplification Research: New Data Can Help” TACL (2015)

This Paper and This Talk (TACL ’16)





Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)






Crowdsourcing3.5k sentences

8 referencesWei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in

TACL (2016)







8 references

SARI the 1st metric

that worksWei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in

TACL (2016)







8 references

SARI the 1st metric

that works

state-of-the-artparaphrasingfor simplicity


DataCrowdsourcing3.5k sentences

8 references


★ paraphrasing! (little deletion; no splitting) ★ simple quality control (see our NAACL 2015 paper) ★ freely available


8 references


★ paraphrasing! (little deletion; no splitting) ★ simple quality control (see our NAACL 2015 paper) ★ freely available


8 references

Wiki Jeddah is the principal gateway to Mecca, Islam’s holiest city, which able-bodied Muslims are required to visit at least once in their lifetime.

Turk #1 Jeddah is the main entrance to Mecca, the holiest city in Islam, which all healthy Muslims need to visit at least once in their life.

Our System

Jeddah is the main gateway to Mecca, Islam’s holiest city, which sound Muslims have to visit at least once in their life.


EvaluationSARI

the 1st metric that works


EvaluationSARI


Input

SystemoutputHuman

references

keepO ∩ R ∩ I

addO ∩ R ∩ not I

delI ∩ not O ∩ not R


EvaluationSARI


Input

SystemoutputHuman

references

keepO ∩ R ∩ I



★ light-weighted and tunable ★ SARI — “the BLEU for simplification” ★ take input into account!★ use F-measure


EvaluationSARI


Input

SystemoutputHuman

references

keepO ∩ R ∩ I



SARI = d1Fadd + d2Fkeep + d3Pdel

d1 = d2 = d3 = 1/3

★ light-weighted and tunable ★ SARI — “the BLEU for simplification” ★ take input into account!★ use F-measure


Systemstate-of-the-artparaphrasingfor simplicity




★ In-depth adaptation of statistical MT (Joshua) • tuning data (small parallel corpus) • tunable metric (PRO pairwise ranking optimization) • PPDB (no large parallel corpus needed!) • simplification-specific features


★ In-depth adaptation of statistical MT (Joshua) • tuning data (small parallel corpus) • tunable metric (PRO pairwise ranking optimization) • PPDB (no large parallel corpus needed!) • simplification-specific features

Lexical applauded ⟶ praised

Phrasal generally acknowledged ⟶ widely accepted

Syntactic NNP’s JJ legislation ⟶ the JJ law of NNPWei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in

TACL (2016)

Input:



Syntactic NNP’s JJ legislation ⟶ the JJ law of NNP

PPDB (4+ million rules)


Input:

Candidate Simplifications:



Syntactic NNP’s JJ legislation ⟶ the JJ law of NNP

SARI = 0.21

SARI = 0.33

PPDB (4+ million rules)

SARI = 0.15Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in

TACL (2016)

Human Evaluation

0

1

2

3

4

Simple Wikipedia

Turk (Wubben et al. 2012) Joshua -BLEU-

Joshua -SARI-

Grammar Meaning Simplicity+

Automatic Simplification

tuning towards BLEU leaves input unchanged(8 references)


Human Evaluation

0

1

2

3

4

Simple Wikipedia

Turk Moses + reranking(Wubben et al. 2012)

Joshua -BLEU-

Joshua -SARI-




Human Evaluation

0

1

2

3

4

Simple Wikipedia

Turk Moses + reranking(Wubben et al. 2012)

Joshua -BLEU-

Joshua -SARI-



tuning towards BLEU leaves input unchangedWhy? Why? Why?Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in

TACL (2016)

Automatic Metrics (Spearman’s rho correlation w/ human ratings)

Grammar Meaning Simplicity

BLEU 0.59 0.70 0.15

FKBLEU 0.35 0.41 0.24

SARI 0.34 0.40 0.34


Automatic Metrics (Spearman’s rho correlation w/ human ratings)

Grammar Meaning Simplicity

BLEU 0.59 0.70 0.15

FKBLEU 0.35 0.41 0.24

SARI 0.34 0.40 0.34

this is not necessarily a bad thingWhy? Why? Why?


Gra

mm

ar

BLEU

0 0.25 0.5 0.75 1

BLEU correlates w/ Grammar

If a system leaves input unchanged, it gets

perfect Grammar perfect Meaning

and very high BLEU score.because there are multiple references

(unlike bilingual translation)


high Grammar high BLEU

Sim

plic

ity

0 0.25 0.5 0.75 1

BLEU is not for SimplificationG

ram

mar

BLEU

0 0.25 0.5 0.75 1

If a system leaves input unchanged, it gets

perfect Grammar perfect Meaning

and very high BLEU score.But it is not good simplification!

low Simplicity high BLEU


high Grammar high BLEU

Gra

mm

ar

BLEU

0 0.25 0.5 0.75 1

SARI is way better than BLEU

Gra

mm

arSARI

0.12 0.24 0.36 0.48 0.6

high Grammarcould be bad simplification

low SARI high Grammar


low BLEU high Grammar

Sim

plic

ity

BLEU

0 0.25 0.5 0.75 1

SARI is way better than BLEU

Sim

plic

itySARI

0.12 0.24 0.36 0.48 0.6

low Simplicitymust be bad simplification

low Simplicityhigh BLEU


low Simplicityhigh SARI

Machine Translation

Bilingual(Paraphrase =)Monolingual

naturally availableparallel text a lot much less

sensitive to error less more

objective straightforward sophisticated

standard metric BLEU etc. SARI and ?


Thank you

Sponsor: National Science FoundationData and code available at https://cocoxu.github.io/

Questions?

I am looking for PhD students and post-docs. Email me! Wei Xu @ Ohio State University


Newsela Dataset (TACL ’15)

Wei Xu, Chris Callison-Burch, Courtney Napoles. “Problems in Current Text Simplification Research: New Data Can Help” TACL (2015)

every article at 5 levels of simplification total 50k+ sentences, written by trained editors, comes with comprehension quizzes

Optimizing Statistical Machine Translation for Text Simpliﬁcation · Optimizing Statistical...

Documents

Transcript of Optimizing Statistical Machine Translation for Text Simpliﬁcation · Optimizing Statistical...