Post on 20-Jan-2016
description
Statistical Machine Translation with Rule Based Re-ordering of Source SentencesAmit SangodkarVasudevan NOm P. Damani(CSE, IIT Bombay)
MotivationCombining Linguistic knowledge with Statistical Machine Translation.Can re-ordering source language sentences as per target language improve the alignment?
ExampleEnglish: Many Bengali poets have sung songs in praise of this land.
Hindi:
Re-order: Many Bengali poets this land of praise in songs sung have
Translation Architecture
Dependency ParserMany Bengali poets have sung songs in praise of this land.amod (poets-3, Many-1)nn (poets-3, Bengali-2)nsubj (sung-5, poets-3)aux (sung-5, have-4)dobj (sung-5, songs-6)prep_in (sung-5, praise-8)det (land-11, this-10)prep_of (praise-8, land-11)
------------------------------------Output of Stanford Parser
Tree ProcessingHandling Auxiliary Verbsremove and postfix to their respective verbe.g. aux(sung, have) sung_haveHandling Prepositions/Conjunctionsextract the preposition from the relation and attach to parent/childe.g. prep_in(sung, praise) prep(sung, praise_in)
Modified Dependency Tree
Re-orderingParent-Child PositioningPrioritizing the Relations
Re-ordering (Parent-Child Positioning)parent before child conj (conjunction), appos (apposition), advcl (adverbial clause), ccomp (clausal complement), rcmod (relative clause modifier)e.g. John cried because he fell advcl(cry, fell). In Hindi, cry is ordered before fell.child before parent nsubj(subject), dobj(object)e.g. Ram eats mangodobj(eat,mango). In Hindi, mango ordered before eat.
Re-ordering (Relation Priority)Deciding the order in case of multiple childrenPriority among relation pairs
Illustration - Re-ordering Input Dependency Treesung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering sung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering sung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering
Output: Manysung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering
Output: Manysung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering
Output: Many Bengalisung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering
Output: Many Bengali poetssung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering
Output: Many Bengali poetssung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering
Output: Many Bengali poets thissung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering
Output: Many Bengali poets this land ofsung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering
Output: Many Bengali poets this land of praise insung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering
Output: Many Bengali poets this land of praise insung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering
Output: Many Bengali poets this land of praise in songs sung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Illustration - Re-ordering
Output: Many Bengali poets this land of praise in songs sung have sung_havepoetspraise_insongsland_ofthisManyBengalinsubjprepdobjamodnnprepdet
Experimental SetupProcedureTrain Moses using Training data with 6-gram language modelTune the Moses using Development dataDecode Testing data using trained MosesThis experimentation procedure on pure data and reordered data
Results
Translation Example - IActual : .Baseline : .Re-ordered : .
Translation Example - IIActual : .Baseline : deliverance .Re-ordered : .
ConclusionUsing Linguistic knowledge appears to improve the SMT qualityBLEU score applicability in this context needs to be investigated
AcknowledgementsWe acknowledge the Department of IT (DIT), Government of India and the English-to-Indian Languages (EILMT) consortium for making the EILMT tourism dataset available.IIIT Data Set: Data acquired during DARPA TIDES MT project 2003 and later refined at LTRC,IIIT-H.
References[Hieu2008] Hieu Hoang, Philipp Koehn, Design of the Moses Decoder for Statistical Machine Translation, ACL Workshop on Software engineering, testing, and quality assurance for NLP 2008.[Marie2006] Marie-Catherine de Marneffe, Bill MacCartney and Christopher D. Manning, Generating Typed Dependency Parses from Phrase Structure Parses. In Proceedings of LREC-06. 2006.[Manual2008] Stanford Dependencies Manual, Available at http://nlp.stanford.edu/software/dependencies_manual.pdf..[Moses] Moses Tutorial, Available at http://www.statmt.org/moses/?n=Moses.Tutorial. .[Singh2007] Smriti. Singh, Mrugunk. Dalal, Vishal Vachhani, Pushpak Bhattacharyya, Om P. Damani. Hindi Generation from Interlingua (UNL), Machine Translation Summit XI, 2007.