Using Parallel Features in Parsing of Machine-Translated ...

Rudolf Rosa, Ondřej Dušek, David Mareček, Martin Popel{rosa,odusek,marecek,popel}@ufal.mff.cuni.cz

Using Parallel Featuresin Parsing

of Machine-Translated Sentencesfor Correction of Grammatical Errors

Charles University in PragueFaculty of Mathematics and Physics

Institute of Formal and Applied Linguistics

SSST, Jeju, 12th July 2012

Parsing of SMT Outputs

can be useful in many applications automatic classification of translation errors automatic correction of translation errors (Depfix) confidence estimation, multilingual question

answering...

✔ we have the source sentence available Can we use it to help parsing?

✗ SMT outputs noisy (errors in fluency, grammar...) parsers trained on gold standard treebanks Can we adapt parser to noisy sentences?

MST Parser

Maximum Spanning Tree dependency parser by Ryan McDonald

(1) Words and Tags

RudolphNNP

relaxesVBZ

abroadRB

#rootwords = nodes

(2) (Nearly) Complete Graph

RudolphNNP

relaxesVBZ

abroadRB

#rootall possible edges =

directed edges

(3) Assign Edge Weights

RudolphNNP

relaxesVBZ

abroadRB

-7 -1659325

1490 1154

-263 24

Margin InfusedRelaxed Algorithm

(MIRA)

edge weight =

sum ofedge featuresweights

(4) Maximum Spanning Tree

RudolphNNP

relaxesVBZ

abroadRB

-7 -1659325

1490 1154

-263 24

non-projective trees:Chu-Liu-Edmonds algorithm

(projective trees:

Eisner algorithm)

(5) Unlabeled Dependency Tree

RudolphNNP

relaxesVBZ

abroadRB

#rootdependency tree =

maximumspanning tree

(6) Labeled Dependency Tree

RudolphNNP

relaxesVBZ

abroadRB

Predicate

Subject Adverbial

labels asignedby a second stagelabeler

RUR Parser

reimplementation of MST Parser (so far only) first-order, non-projective

adapted for SMT outputs parsing parallel features ”worsening” the training treebank

English-to-Czech SMT

Czech language highly flective

4 genders, 2 numbers, 7 cases, 3 persons... Czech grammar requires agreement in related words

word order relatively free: word order errors not crucial

Phrase-Based SMT often makes inflection errors:➔ Rudolph's car is black.

✗ Rudolfova/fem auto/neut je černý/masc.

✔ Rudolfovo/neut auto/neut je černé/neut.

Parser Training Data

Prague Czech-English Dependency Treebank parallel treebank 50k sentences, 1.2M words morphological tags, surface syntax, deep syntax word alignment

Parallel Features

word alignment (using GIZA++) additional features (if aligned node exists):

aligned tag (NNS, VBD...) aligned dependency label (Subject, Attribute...) aligned edge existence (0/1)

Parallel Features Example

RudolfNN M S 1

relaxujeVB S 3

zahraničíNN N S 6

RudolphNNP

relaxesVBZ

abroadRB

AdvSubj

Worsening the Treebank

treebank used for training contains correct sentences

SMT output is noisy grammatical errors incorrect word order missing/superfluous words …

let's introduce similar errors into the treebank! so far, we have only tried inflection errors

Worsen (1): Apply SMT

translate English side of PCEDT to Czech by an SMT system (we used Moses)

now we have (e.g.): Gold English

Rudolph's car is black. Gold Czech

RudolfovoNEUT

autoNEUT

je černéNEUT

SMT Czech Rudolfova

FEM auto

NEUT je černý

Worsen (2): Align SMT to Gold

align SMT Czech to Gold Czech Monolingual Greedy Aligner

alignment link score = linear combination of: similarity of word forms (or lemmas) similarity of morphological tags (fine-grained) similarity of positions in the sentence indication whether preceding/following words aligned

repeat: align best scoring pair until below threshold no training: weights and threshold set manually

Worsen (3): Create Error Model

for each tag: estimate probabilities of SMT system using an

incorrect tag instead of the correct tag(Maximum Likelihood Estimate)

Czech tagset: fine-grained morphological tags part-of-speech, gender, number, case, person,

tense, voice... 1500 different tags in training data

Worsen (3): Error Model

Adjective, Masculine, Plural, Instrumental case(AAMP7), e.g. lingvistickými (linguistic)➔ 0.2 Adjective, Masculine, Singular, Nominative case

➔ e.g. lingvistický➔ 0.1 Adjective, Masculine, Plural, Nominative case

➔ e.g. lingvističtí➔ 0.1 Adjective, Neuter, Singular, Accusative case

➔ e.g. lingvistické

… altogether 2000 such change rules

Worsen (4): Apply Error Model

take Gold Czech for each word:

assign a new tag randomly sampled according to Tag Error Model

generate a new word form rule-based generator, generates even unseen forms new_form = generate_form(lemma, tag) || old_form

→ get Worsened Czech use resulting Gold English-Worsened Czech

parallel treebank to train the parser

Direct Evaluation by Inspection

manual inspection of several parse trees comparing baseline and adapted parser ouputs

examples of improvements: subject identification even if not in nominative case adjective-noun dependence identification even if

agreement violated (gender, number, case)

hard to do reliably trying to find a correct parse tree for an (often)

incorrect sentence – not well defined

Indirect Evaluation: in Depfix

rule-based grammar correction of SMT outputs input = aligned, tagged and parsed sentences:

target (Czech) sentence – to be corrected source (English) sentence – additional information

applies 20 correction rules: noun – adjective agreement (gender, number, case) subject – predicate agreement (gender, number) preposition – noun agreement (case) …

Depfix: Rudolph's Car

RudolphNNP

RudolfovaAA fem

autoNN neutAtr Atr

RudolfovoAA neut

autoNN neut

Adjective – Noun Agreement

Indirect Evaluation Results

differences in Depfix corrections evaluated by humans: better / worse / indefinite

three different parsers RUR + parallel features + worsened treebank – original McDonald's MST Parser RUR – our baseline setup

RUR + parallel features + worsened treebankbetter worse indefinite51% 30% 18%

RUR 54% 28% 18%

Conclusion

SMT outputs often hard to parse RUR parser – adapted to parsing SMT outputs

parallel features (tag, dep. label, edge existence) worsening the training treebank (tag error model)

outputs of English-to-Czech translation evaluated in Depfix

SMT errors correction system

Future Work

more sophisticated parallel features more experiments on worsening more languages

parallel tagging

Thank you for your attention

For this presentation and other information, visit:

http://ufal.mff.cuni.cz/~rosa/depfix/

Rudolf Rosa, Ondřej Dušek, David Mareček, Martin Popel{rosa,odusek,marecek,popel}@ufal.mff.cuni.cz

Charles University in PragueFaculty of Mathematics and Physics

Institute of Formal and Applied Linguistics

Using Parallel Features in Parsing of Machine-Translated ...

Documents

Transcript of Using Parallel Features in Parsing of Machine-Translated ...

An Arabic Semantic Parser and Meaning AnalyzerBottom-up chart parsing, Top-down chart parsing, Top-Down Parsing with Recursive Transition Networks and Recursive Descent Parsing [1].

Volume One Parallel Fusekiostencible.free.fr/Go Books English/Modern.Joseki.and.Fuseki...MODERN JOSEKI AND FUSEKI Volume One Parallel Fuseki by Sakata Eio, Honinbo-Judan translated

October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.

BAB 6: Contex-Free Grammar & Parsing - ocw.upj.ac.id€¦ · Bab 6: Context-Free Grammar & Parsing Agenda. • Context-Free Grammar • Top-Down Parsing • Buttom-Up Parsing Contex-Free

Parsing and Semantics/ Jan. 2006/ page 1 / for Xerox internal use only Xerox Incremental Parsing Parsing And Semantics.

Parsing in Parallel on Multiple Cores and GPUs

Parsing parallel evolution: ecological divergence and differential …people.earth.yale.edu/.../Hull/Manousakietal_molecol_12.pdf · 2020-01-07 · Parsing parallel evolution: ecological

Syntactic Analysis Operator-Precedence Parsing Recursive-Descent Parsing

Learning for semantic parsing using statistical syntactic parsing techniques

Parsing: Top-Down vs. Bottom-Up Parsing Algorithms Partial Parsing

Detecting Grammatical Errors in Machine Translation Output ... · detecting grammatical errors in Dutch machine-translated text, using dependency parsing and treebank querying. We

Top Down Parsing, Predictive Parsing

Massively Parallel Parsing: A Strongly Interactive Model of Natural

Parsing III (Top-down parsing: recursive descent & LL(1) )

Parallel Parsing with Attributes and Not: Technical Discussion W.Pasman, 4.6.7.

ParPaRaw: Massively Parallel Parsing of Delimiter ... · 1.We present an approach to massively parallel parsing of delimiter-separated data formats that is designed for scalability

Bare-Bones Dependency Parsing - Uppsala Universitynivre/docs/BareBones.pdf · I Parsing methods for bare-bones dependency parsing I Chart parsing techniques ... [Kuhlmann and Satta

Parsing. 2301373Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.

Dependency Parsing - Oregon State Universityclasses.engr.oregonstate.edu/.../notes/dependency-parsing-lecture.pdf · Outline Introduction Dependency Parsing Formal definition Parsing

Weighted Parsing, Probabilistic Parsing