Enriching low resource Statistical Machine Translation...

8
Enriching low resource Statistical Machine Translation using induced bilingual lexicons Uso de lexicos biling¨ ues inducidos para el enriquecimiento de un sistema de traducci´on autom´atica estad´ ıstica de pocos recursos Han Jingyi, Núria Bel Universitat Pompeu Fabra Roc Boronat, 138, 08018, Barcelona {jingyi.han, nuria.bel}@upf.edu Abstract: In this work we present an experiment for enriching a Statistical Ma- chine Translation (SMT) phrase table with automatically created bilingual word pairs. The bilingual lexicon is induced with a supervised classifier trained using a joint representation of word embeddings (WE) and Brown clusters (BC) of transla- tion equivalent word pairs as features. The classifier reaches a 0.94 F-score and the MT experiment results show an improvement of up to +0.70 BLEU over a low re- source Chinese-Spanish phrase-based SMT baseline, demonstrating that bad entries delivered by the classifier are well handled. Keywords: Machine translation, phrase table expansion, bilingual lexicon induc- tion, Natural language processing Resumen: En este art´ ıculo presentamos un m´ etodo para ampliar la tabla de frases de un traductor autom´ atico estad´ ıstico con entradas biling¨ ues creadas au- tom´ aticamente con un clasificador supervisado. El clasificador es entrenado con una representaci´on vectorial en la que se concatenan el vector distribuido (Word Embed- dings, WE) y una representaci´ on de agrupaciones de Brown (Brown clusters, BC) de 2 palabras equivalentes de traducci´on. El clasificador alcanza una F1 de 0,94 y el resultado de la evaluaci´ on del sistema de traducci´ on autom´ atica entre chino y espa˜ nol muestra una mejora de hasta +0,70 BLEU, demostrando que las malas traducciones producidas por el clasificador son controladas bien por el sistema de traducci´ on. Palabras clave: Traducci´ on autom´atica, Expansi´on de vocabulario, Inducci´ on de exicos biling¨ ues, Procesamiento del lenguaje natural 1 Introduction Parallel corpora are one of the key resources that support Statistical Machine Translation (SMT) to learn translation correspondences at the level of words, phrases and treelets. Although nowadays parallel data are widely available for well-resourced language pairs such as English-Spanish and English-French, parallel corpora are still scarce or even do not exist for most other language pairs. The translation quality with no data suffers to the extent of making SMT unusable. Many researches (Fung, 1995; Chiao and Zweigenbaum, 2002; Yu and Tsujii, 2009) at- tempt to alleviate the parallel data shortage problem by using comparable corpora which still are not readily available for many lan- guage pairs. Monolingual corpora, on con- trary, are being created at an astonishing rate. Therefore, in this work, we propose to extend an SMT translation model by aug- menting the phrase table with bilingual en- tries automatically learned out of non nec- essarily related monolingual corpora. The bilingual lexicon was delivered by a Support Vector Machine (SVM) classifier trained us- ing a joint representation of word embedding and Brown cluster of translation equivalents as features. The main contributions of this paper are: (1) We present a supervised approach to au- tomatically generate bilingual lexicons out of unrelated monolingual corpora with only a Procesamiento del Lenguaje Natural, Revista nº 59, septiembre de 2017, pp. 91-98 recibido 30-03-2017 revisado 16-05-2017 aceptado 22-05-2017 ISSN 1135-5948 © 2017 Sociedad Española para el Procesamiento del Lenguaje Natural

Transcript of Enriching low resource Statistical Machine Translation...

Page 1: Enriching low resource Statistical Machine Translation ...rua.ua.es/dspace/bitstream/10045/69095/1/PLN_59_10.pdf · frases de un traductor autom atico estad stico con entradas bilingues

Enriching low resource Statistical MachineTranslation using induced bilingual lexicons

Uso de lexicos bilingues inducidos para el enriquecimiento deun sistema de traduccion automatica estadıstica de pocos

recursos

Han Jingyi, Núria Bel Universitat Pompeu Fabra

Roc Boronat, 138, 08018, Barcelona{jingyi.han, nuria.bel}@upf.edu

Abstract: In this work we present an experiment for enriching a Statistical Ma-chine Translation (SMT) phrase table with automatically created bilingual wordpairs. The bilingual lexicon is induced with a supervised classifier trained using ajoint representation of word embeddings (WE) and Brown clusters (BC) of transla-tion equivalent word pairs as features. The classifier reaches a 0.94 F-score and theMT experiment results show an improvement of up to +0.70 BLEU over a low re-source Chinese-Spanish phrase-based SMT baseline, demonstrating that bad entriesdelivered by the classifier are well handled.Keywords: Machine translation, phrase table expansion, bilingual lexicon induc-tion, Natural language processing

Resumen: En este artıculo presentamos un metodo para ampliar la tabla defrases de un traductor automatico estadıstico con entradas bilingues creadas au-tomaticamente con un clasificador supervisado. El clasificador es entrenado con unarepresentacion vectorial en la que se concatenan el vector distribuido (Word Embed-dings, WE) y una representacion de agrupaciones de Brown (Brown clusters, BC)de 2 palabras equivalentes de traduccion. El clasificador alcanza una F1 de 0,94y el resultado de la evaluacion del sistema de traduccion automatica entre chinoy espanol muestra una mejora de hasta +0,70 BLEU, demostrando que las malastraducciones producidas por el clasificador son controladas bien por el sistema detraduccion.Palabras clave: Traduccion automatica, Expansion de vocabulario, Induccion delexicos bilingues, Procesamiento del lenguaje natural

1 Introduction

Parallel corpora are one of the key resourcesthat support Statistical Machine Translation(SMT) to learn translation correspondencesat the level of words, phrases and treelets.Although nowadays parallel data are widelyavailable for well-resourced language pairssuch as English-Spanish and English-French,parallel corpora are still scarce or even donot exist for most other language pairs. Thetranslation quality with no data suffers to theextent of making SMT unusable.

Many researches (Fung, 1995; Chiao andZweigenbaum, 2002; Yu and Tsujii, 2009) at-tempt to alleviate the parallel data shortageproblem by using comparable corpora which

still are not readily available for many lan-guage pairs. Monolingual corpora, on con-trary, are being created at an astonishingrate. Therefore, in this work, we proposeto extend an SMT translation model by aug-menting the phrase table with bilingual en-tries automatically learned out of non nec-essarily related monolingual corpora. Thebilingual lexicon was delivered by a SupportVector Machine (SVM) classifier trained us-ing a joint representation of word embeddingand Brown cluster of translation equivalentsas features.

The main contributions of this paper are:(1) We present a supervised approach to au-tomatically generate bilingual lexicons out ofunrelated monolingual corpora with only a

Procesamiento del Lenguaje Natural, Revista nº 59, septiembre de 2017, pp. 91-98 recibido 30-03-2017 revisado 16-05-2017 aceptado 22-05-2017

ISSN 1135-5948 © 2017 Sociedad Española para el Procesamiento del Lenguaje Natural

Page 2: Enriching low resource Statistical Machine Translation ...rua.ua.es/dspace/bitstream/10045/69095/1/PLN_59_10.pdf · frases de un traductor autom atico estad stico con entradas bilingues

small quantity of translation training exam-ples. (2) We prove that enriching an SMTphrase table using all the results, includingthe errors delivered by the classifier, is indeeda simple and effective solution.

The rest of the paper is structured as fol-lows: section 2 reports the previous works re-lated to our approach; section 3 describes oursupervised bilingual lexicon learning method;section 4 sets the experimental framework;section 5 reports our test results; and section6 gives the final conclusion of the work.

2 Related work

The use of monolingual resources to enrichtranslation models has been proposed by dif-ferent researches. For instance, (Turchiand Ehrmann, 2011; Mirkin et al., 2009;Marton, Callison-Burch, and Resnik, 2009)used morphological dictionaries and para-phrasing techniques to expand phrase tableswith more inflected forms and lexical vari-ants. Another line of work exploits graphpropagation-based methods to generate newtranslations for unknown words. For in-stance, Razmara et al. (2013) proposed toinduce lexicons by constructing a graph onsource language monolingual text. Nodesthat have related meanings were connectedtogether and nodes for which they had trans-lations in the phrase table were annotatedwith target side translations and their fea-ture values. A graph propagation algorithmwas then used to propagate translations fromlabeled nodes to unlabeled nodes. They ob-tained an increase of up to 0.46 BLEU com-pared to the French-English baseline. Sim-ilary, Saluja et al. (2014) presented a semi-supervised graph-based approach for generat-ing new translation rules that leverages bilin-gual and monolingual data. However, allthese methods generate new translation op-tions by depending on existing knowledge ofa baseline phrase table.

In order to create new entries, Irvine andCallison-Burch (2013) used a log-linear clas-sifier trained on various signals of trans-lation equivalence (e.g., contextual similar-ity, temporal similarity, orthographic simi-larity and topic similarity) to induce wordtranslation pairs from monolingual corpora.Irvine and Callison-Burch (2014) used theseinduced resources to expand the SMT phrasetable. Since much noise was introduced, 30monolingually-derived signals needed to be

applied as further translation table featuresto prune the new phrase pairs. Experimentswere conducted on two different languagepairs. An improvement of +1.10 BLEU forSpanish-English and +0.55 BLEU for Hindi-English was achieved.

The challenge of bilingual lexicon induc-tion from monolingual data has been of longstanding interest. The first work in this areaby Rapp (1995) was based on the hypoth-esis that translation equivalents in two lan-guages have similar distributional profiles orco-occurrence patterns. Following this idea,(Koehn and Knight, 2002; Haghighi et al.,2008; Schafer and Yarowsky, 2002) combinedcontext information and other monolingualfeatures (e.g., relative frequency and ortho-graphic substrings,etc.) of source and tar-get language words to learn translation pairsfrom monolingual corpora. Recently, severalworks (Mikolov, Le, and Sutskever, 2013a;Vulic and Moens, 2015; Vulic and Korho-nen, 2016; Chandar et al., 2014; Wang et al.,2016) proposed cross-lingual word embeddingstrategies to map words from a source lan-guage vector space to a target language vec-tor space, and also demonstrated its effectiveapplication to bilingual lexicon induction.

The approach presented here is simi-lar to the end-to-end experiments of Irvineand Callison-Burch (2014) and Irvine andCallison-Burch (2016), but to generate bilin-gual lexica, instead of using a large variety ofmonolingual signals to learn and prune newphrase pairs, our method basically trained anSVM classifier using WE vector (Mikolovet al., 2013b), together with BC informa-tion (Brown et al., 1992) as features. Toevaluate the impact of our blingual lexicaon SMT, we conducted our experiment onChinese (ZH)-Spanish (ES). Although theyare two of the most widely spoken languagesof the world, to the best of our knowl-edge, they are still suffering from the paralleldata shortage problem. There are no directSMT systems for this language pair but rule-based ones (Costa-Jussa and Centelles, 2014;Costa-Jussa and Centelles, 2016), which arestill lacking a lot of coverage.

3 Approach

In this section, we describe a simple approachto improve the performance of a low resourceSMT system by augmenting the phrase ta-ble with new translation pairs generated from

Han Jingyi, Núria Bel

92

Page 3: Enriching low resource Statistical Machine Translation ...rua.ua.es/dspace/bitstream/10045/69095/1/PLN_59_10.pdf · frases de un traductor autom atico estad stico con entradas bilingues

monolingual data. We treat bilingual lexicongeneration as a binary classification problem:given a source word, the classifier predictswhether a target language word is its trans-lation or not. Our classifier was trained witha seed lexicon of one thousand correct trans-lation pairs (Section 4.1). We first used theconcatenated WE of source and target wordas features to train an SVM binary classi-fier following Han and Bel (2016). Then thetrained model was used to find possible trans-lations for a given source word among all tar-get language vocabulary.

However, the first results showed thatsome words were wrongly considered as thetranslation of many different source wordswithout being related to them in any mean-ingful way. This could be a consequenceof the ‘hubness problem’ as reported byRadovanovic et al. (2010). To improve theperformance of our classifier, we decided toadd BC representation to our WE features,since (Birch, Durrani, and Koehn, 2013;Matthews et al., 2014; Tackstrom, McDon-ald, and Uszkoreit, 2012; Agerri and Rigau,2016) demonstrated that word clustering pro-vides relevant information for cross-lingualtasks. Observing our data, semantically re-lated words in the source monolingual cor-pus are grouped into the same class, whiletheir translations belong to a correspondingclass in the target monolingual corpus as well.For instance, in our ZH monolingual corpus,演员(actor) and 记者(journalist) belong tothe cluster 011111110110, while their trans-lations actor and periodista are both groupedinto the corresponding cluster 11010100 inthe ES monolingual corpus. Therefore, weadded BC of source and target words as ad-ditional features with the intention of helpingthe classifier to rule out those semanticallyunrelated target candidates to some extent.

To visualize the impact of using BC, inFigure 1, we plot the geometric arrangementof 6K word pairs (right translation and notranslation1) represented by only WE vectorsand with additional BC information in a 3-dimensional space. Each point represents aword pair since we concatenate the featuresof source and target words together. Thechange of the distribution of right translationor no translation demonstrates that the jointrepresentation does encode relevant informa-

1The definition of right translation and no trans-lation is given in section 4.1.

tion for the classification.

Figure 1: Distributed representations of 6Kword pairs (1K right translation and 5K notranslation) with WE of 400 dimensions andwith combination of WE and BC of 800 di-mensions. We used PCA to project high di-mensional vector representations down into a3-dimensional space

4 Experimental setup

In this section, we describe the experimen-tal settings for evaluating our approach. Theoutline of our experiments is: (i) Generatingthe training positive and negative word pairlists. (ii) Obtaining the corresponding wordembedding vector and (iii) Brown clustersfrom monolingual corpora. (iv) Concatenat-ing the representation features of the sourceword and its translation equivalent (or ran-dom words for negative instances) (iv) Train-ing an SVM classifier using the previouslyconcatenated representations. (v) Produc-ing new translation word pairs from monolin-gual corpora using the trained classifier. (vi)Training the SMT system with available par-

Enriching low resource Statistical Machine Translation using induced bilingual lexicons

93

Page 4: Enriching low resource Statistical Machine Translation ...rua.ua.es/dspace/bitstream/10045/69095/1/PLN_59_10.pdf · frases de un traductor autom atico estad stico con entradas bilingues

allel corpora plus the newly acquired trans-lation word pairs.

4.1 Classifier datasets

To obtain the positive training set (righttranslation), a translation list was producedby first randomly extracting a list of about1K nouns, verbs and adjectives2 (frequencyrange from 10 to 100K) from the ZH mono-lingual corpus. Then these randomly selectedwords were translated from ZH to ES usingon-line Google Translator and manually re-vised.

To build the negative training set (notranslation), we randomly selected non-related words from the monolingual corpus ofeach language and randomly combined them.The ratio was 5 negative instances for eachpositive one3. The data set was split fortraining and testing: 1K positive and 5K neg-ative word pairs for training; 300 positive and1.5K negative word pairs for testing.

4.2 Word embedding

The monolingual corpora that were usedfor learning WE and BC were: ChineseWikipedia Dump corpus4 (149M words) andSpanish Wikipedia corpus5 (130M words,2006 dump). WE were created with theContinuous Bag-of-words (CBOW) methodas implemented in the word2vec6 tool, be-cause it is faster and more suitable for largedatasets (Mikolov, Le, and Sutskever, 2013a).To train the CBOW models we used thefollowing parameters: window size 8, mini-mum word frequency 5 and 200 dimensionsfor both source and target vectors.

4.3 Brown clusteringrepresentation

Brown clusters7 were induced from the samemonolingual corpora that used for WE. Weset c=200 for computational cost savings,

2For PoS tagging of all corpora, we used the Stan-ford PoS Tagger (Toutanova, Dan Klein, and Singer,2003).

3We chose this unbalanced ratio to approach theactual distribution of the data to classify since therewill be many more no translation than right transla-tion pairs.

4https://archive.org/details/zhwiki_20100610

5http://hdl.handle.net/10230/200476https://code.google.com/archive/p/

word2vec/7https://github.com/percyliang/

brown-cluster

although with larger number of clusters itmight perform better. In order to includeBC in word pair representations, instead ofusing directly the bit path, we used one-hotencoding. More specifically, 400 binary fea-tures were added to WE concatenated vec-tors: 200 for each word. Each componentrepresents one of the 200 word clusters foreach source and target word.

4.4 SVM Classifier

We built and tested an SVM8 classifier onZH-ES using the datasets described in Sec-tion 4.1 for three word categories: noun, ad-jective and verb. The evaluation was double,as we performed a 10 fold cross-validationwith the training set and we tested again themodel with a held-out test set.

4.5 Phrase-based SMT setup

Our SMT system was built using Mosesphrase-based MT framework (Koehn et al.,2007). We used mgiza (Gao and Vogel,2008) to align parallel corpora and KenLM(Heafield, 2011) to train a 3-gram languagemodel. We applied standard phrase-basedMT feature sets, including direct and inversephrase and lexical translation probabilities.Reordering score was produced by a lexical-ized reordering model (Koehn et al., 2005).The parameter ‘Good Turing’9 was appliedin order to reduce overestimated translationprobabilities, since the parallel corpus con-tained many unigram phrase pairs providedby our classifier. For the evaluation, we usedBLEU metric (Papineni et al., 2002).

The parallel corpora that used to trainand test the SMT system were: Chinese-Spanish OpenSubtitles201310 (1M sentences)for training; TAUS translation memory11 (2Ksentences) and UN corpus12 (2K sentences)for testing. To train the language model,we combined Spanish Wikipedia corpus men-tioned in Section 4.2 with OpenSubtitles2013target corpus.

The classifier was used to deliver, for eachof about 3K selected source words (the mostfrequent words that were not present in the

8As implemented in WEKA (Hall et al., 2009).9http://www.statmt.org/moses/?n=

FactoredTraining.ScorePhrases10http://opus.lingfil.uu.se/

OpenSubtitles2013.php11http://www.tauslabs.com/12http://opus.lingfil.uu.se/UN.php

Han Jingyi, Núria Bel

94

Page 5: Enriching low resource Statistical Machine Translation ...rua.ua.es/dspace/bitstream/10045/69095/1/PLN_59_10.pdf · frases de un traductor autom atico estad stico con entradas bilingues

baseline phrase table), all the possible trans-lation candidates as found in the combinationwith the 30K target words of the same PoS(for computational saving). All word pairsclassified as right translation were then ap-pended to the existing parallel corpora fortraining a new SMT system. Figure 2 showsthe generation and integration of the newtranslation pairs.

Figure 2: Algorithm for the generation andintegration of supervised bilingual lexicons

5 Experimental results

We present here the evaluation results of theclassifier and their impact on our low resourceZH-ES SMT system.

5.1 Results on bilingual lexiconinduction

Table 1 shows the evaluation results of ourclassifier trained with WE and with the com-bination of WE and BC in terms of precision(P), recall (R) and F1-measure (F).

Evaluation results show that the classifiersare capable of finding out the correct transla-tion among all the candidates with same PoSin the target monolingual corpus in most ofthe cases. With the classifier trained only us-ing WE, we already obtained a precision andrecall of 0.926 and 0.87, respectively for righttranslation. To explore the relation between

10 cross-validation Held-out test setP R F1 P R F1

WEYes 0.937 0.919 0.928 0.926 0.87 0.89No 0.984 0.988 0.986 0.976 0.987 0.981

WE+BCYes 0.955 0.935 0.945 0.955 0.92 0.937No 0.987 0.991 0.989 0.985 0.992 0.988

Table 1: Test results of the ZH-ES classifiertrained with WE and with WE+BC

the performance of the classifier and the num-ber of training instances, Figure 3 plots thelearning curves (F1, and kappa value) overdifferent percentages of positive training in-stances from 100 (10%) to 900 (90%), withcorresponding negative instances from 500 to4500. It shows that the classifier achievedstable and good results with around 50% ofthe training instances.

Figure 3: Learning curve over different per-centages of the training data for Chinese andSpanish

However, the classifier trained using onlyWE was not efficient in the following cases:

(i) Candidates affected by hubness prob-lem.

After an error analysis, we realized that asmall group of target words were repeatedlyassigned as possible translations to many dif-ferent source words, such as parte (‘part’),nombre (‘name’) and tiempo (‘time’).

(ii) Semantically related candidates.Words that always occur in similar con-

texts or nearby tended to be confusing forthe classifier to make the right decision. Forinstance, the classifier assigned both tur-ista (‘tourist’), turismo (‘tourism’) as pos-sible translations for the source word 旅游业(‘tourism’).

After adding BC, both precision and recallresults were improved as shown in Table 1,demonstrating that BC indeed provided rel-

Enriching low resource Statistical Machine Translation using induced bilingual lexicons

95

Page 6: Enriching low resource Statistical Machine Translation ...rua.ua.es/dspace/bitstream/10045/69095/1/PLN_59_10.pdf · frases de un traductor autom atico estad stico con entradas bilingues

evant information for ruling out many wrongtranslation candidates. In terms of accuracy,with BC the performance improved from 96.8to 97.6, resulting in a considerable reduc-tion of the number of word pairs classified asright translation. Note that 88.65 M wordpairs were presented to the classifier from2955 source words combined with 30K tar-get words. The WE classifier delivered a7% word pairs classified as right translation,while the WE+BC classifier delivered only a2.7%.

In order to verify whether the classifierwas not learning that particular BCs wereassociated to the right or wrong translationcategories, we checked the distribution of theclusters in both categories: 57 different clus-ters were present in both positive and nega-tive examples in the training data set and 23in the test set.

5.2 Evaluation on SMTtranslation table expansion

Table 2 shows experimental results of theSMT system trained using the enriched par-allel corpora. The system was tested on twodifferent test sets (described in 4.5)and mea-sured by BLEU metric and Out of Vocabu-lary rate (OOV).

SetupTAUS UNBLEU OOV BLEU OOV

Baseline 8.8 9.6% 10.81 6.8%Baseline + 3K SBL 9.58 8.7% 11.42 5.9%

Table 2: BLEU and OOV test results of thebaseline and the system developed with oursupervised bilingual lexica (SBL)

According to the results shown in Table2, with the new translation candidates givenby our classifier, the performance of the SMTsystem improved with respect to the baselineby up to +0.70 and +0.61 BLEU scores, andthe OOV13 rate of baseline system was re-duced around 0.9% for both test sets.

Table 3 shows several examples of trans-lation outputs after adding the bilingual lex-ica compared to the results of the baselineSMT system. Note that although all possibletranslation candidates delivered by the clas-sifier are included, the SMT system is able tofind out the right translation, thus improving

13The OOV words were generated as shownin:http://www.statmt.org/moses/?n=Advanced.OOVs

the quality of the translations with respect toOOV, as expected.

Source: 文化多样性(Cultural diversity)Reference: diversidad culturalBaseline: 多样性. a la culturaBaseline+SBL: diversidad culturalSource: 负面影响(Negative impact)Reference: consecuencias negativasBaseline: la negativoBaseline+SBL: un impacto negativoSource: 继续支助(continue supporting)Reference: continuen apoyandoBaseline: seguir 支助Baseline+SBL: estado manteniendo

Table 3: Translation examples of our SMTbaseline and the system with acquired lexi-cons

6 Conclusions

This paper described a supervised approachto automatically learn bilingual lexicons frommonolingual corpora for improving the per-formance of a Chinese to Spanish SMT sys-tem. Our experiment shows an improvementof +0.7 BLEU score is achieved even thoughan average of 800 translation pairs per sourceword were added to the existing parallel cor-pus. The high recall of our classifier ensuresthat more reliable translation candidates canbe introduced to the SMT system and thelanguage model component is able to han-dle the selection of the correct one, hencedelivering a better translation output. Tofurther improve the performance of the clas-sification, our future work includes combin-ing our model with other models separatelytrained on multiple monolingual features us-ing ensemble learning.

7 Acknowledgments

Han Jingyi was supported by the FI-DGRgrant program of Generalitat de Catalunya.

References

Agerri, R. and G. Rigau. 2016. Robust mul-tilingual named entity recognition withshallow semi-supervised features. Artifi-cial Intelligence, pages 63–82.

Birch, A., N. Durrani, and P. Koehn. 2013.Edinburgh slt and mt system descriptionfor the iwslt 2013 evaluation. in Proceed-ings of the 10th International Workshop

Han Jingyi, Núria Bel

96

Page 7: Enriching low resource Statistical Machine Translation ...rua.ua.es/dspace/bitstream/10045/69095/1/PLN_59_10.pdf · frases de un traductor autom atico estad stico con entradas bilingues

on Spoken Language Translation, pages40–48.

Brown, P. F., P. V. Desouza, R. L. Mer-cer, V. J. D. Pietra, and J. C. Lai. 1992.Class-based n-gram models of natural lan-guage. Computational Linguistics, pages467–479.

Chandar, S., S. Lauly, H. Larochelle,M. Khapra, B. Ravindran, V. C. Raykar,and A. Saha. 2014. An autoencoder ap-proach to learning bilingual word repre-sentations. Advances in Neural Informa-tion Processing Systems, pages 1853–1861.

Chiao, Y.-C. and P. Zweigenbaum. 2002.Looking for candidate translational equiv-alents in specialized, comparable corpora.in Proceedings of the 19th InternationalConference on Computational Linguistics,pages 1208–1212.

Costa-Jussa, M. R. and J. Centelles. 2014.Chinese-to-spanish rule-based machinetranslation system. in Proceedings of theEACL Workshop on Hybrid Approaches toTranslation (HyTra).

Costa-Jussa, M. R. and J. Centelles. 2016.Description of the chinese-to-spanish rule-based machine translation system devel-oped using a hybrid combination of hu-man annotation and statistical techniques.ACM Transactions on Asian and Low-Resource Language Information Process-ing.

Fung, P. 1995. Compiling bilingual lexiconentries from a non-parallel english-chinesecorpus. in Proceedings of the Third Work-shop on Very Large Corpora, pages 173–183.

Gao, Q. and S. Vogel. 2008. Parallel imple-mentations of word alignment tool. Soft-ware Engineering, Testing, and QualityAssurance for Natural Language Process-ing, pages 49–57.

Haghighi, A., P. Liang, T. Berg-Kirkpatrick,and D. Klein. 2008. Learning bilinguallexicons from monolingual corpora. inProceedings of the annual meeting on As-sociation for Computational Linguistics,pages 771–779.

Han, J. and N. Bel. 2016. Towards pro-ducing bilingual lexica from monolingual

corpora. in Proceedings of the Interna-tional Language Resources and Evalua-tion, pages 2222––2227.

Heafield, K. 2011. Kenlm: Faster andsmaller language model queries. in Pro-ceedings of the Sixth Workshop on Statis-tical Machine Translation, pages 187–197.

Irvine, A. and C. Callison-Burch. 2013. Su-pervised bilingual lexicon induction withmultiple monolingual signals. in Proceed-ings of HLT-NAACL ’13, pages 518–523.

Irvine, A. and C. Callison-Burch. 2014. Hal-lucinating phrase translations for low re-source mt. in Proceedings of the Confer-ence on Computational Natural LanguageLearning, pages 160–170.

Irvine, A. and C. Callison-Burch. 2016. End-to-end statistical machine translation withzero or small parallel texts. Natural Lan-guage Engineering, pages 517–548.

Koehn, P., A. Axelrod, A. B. Mayne,C. Callison-Burch, M. Osborne, andD. Talbot. 2005. Edinburgh system de-scription for the 2005 iwslt speech trans-lation evaluation. MT summit, pages 79–86.

Koehn, P., A. B. Hieu Hoang, C. Callison-Burch, M. Federico, N. Bertoldi,B. Cowan, W. Shen, C. Moran, R. Zens,C. Dyer, O. Bojar, A. Constantin, andE. Herbst. 2007. Moses: Open sourcetoolkit for statistical machine translation.in Proceedings of the annual meetingof the ACL on interactive poster anddemonstration sessions, pages 177–180.

Koehn, P. and K. Knight. 2002. Learn-ing a translation lexicon from monolingualcorpora. ACL Workshop on UnsupervisedLexical Acquisition, pages 9–16.

Marton, Y., C. Callison-Burch, andP. Resnik. 2009. Improved statisticalmachine translation using monolingually-derived paraphrases. in Proceedingsof Conference on Empirical Methodsin Natural Language Processing, pages381–390.

Matthews, A., A. Waleed, A. Bhatia,W. Feely, G. Hanneman, E. Schlinger,S. Swayamdipta, Y. Tsvetkov, A. Lavie,and C. Dyer. 2014. The cmu machine

Enriching low resource Statistical Machine Translation using induced bilingual lexicons

97

Page 8: Enriching low resource Statistical Machine Translation ...rua.ua.es/dspace/bitstream/10045/69095/1/PLN_59_10.pdf · frases de un traductor autom atico estad stico con entradas bilingues

translation systems at wmt 2014. in Pro-ceedings of the Ninth Workshop on Statis-tical Machine Translation, pages 142–149.

Mikolov, T., K. Chen, G. Corrado, andJ. Dean. 2013b. Efficient estimationof word representations in vector space.arXiv preprint arXiv:1301.3781.

Mikolov, T., Q. V. Le, and I. Sutskever.2013a. Exploiting similarities among lan-guages for machine translation. arXivpreprint arXiv:1309.4168.

Mirkin, S., N. C. Lucia Specia, I. Dagan,M. Dymetman, and I. Szpektor. 2009.Source-language entailment modeling fortranslating unknown terms. in Proceed-ings of the Joint Conference of the 47thAnnual Meeting of the ACL and the 4thInternational Joint Conference on NaturalLanguage Processing of the AFNLP, pages791–799.

Papineni, K., S. Roukos, T. Ward, and W.-J.Zhu. 2002. Bleu: a method for automaticevaluation of machine translation. in Pro-ceedings of the Annual Meeting of the As-sociation for Computational Linguistics,pages 311–318.

Radovanovic, M., A. Nanopoulos, andM. Ivanovic. 2010. Hubs in space: Popu-lar nearest neighbors in high-dimensionaldata. Journal of Machine Learning Re-search 11.

Rapp, R. 1995. Identifying word translationsin non-parallel texts. in Proceedings of theannual meeting on Association for Com-putational Linguistics, pages 320–322.

Razmara, M., M. Siahbani, G. Haffari, andA. Sarkar. 2013. Graph propagation forparaphrasing out-of-vocabulary words instatistical machine translation. in Pro-ceedings of the annual meeting on Asso-ciation for Computational Linguistics.

Saluja, A., H. Hassan, K. Toutanova, andC. Quirk. 2014. Graph-based semi-supervised learning of translation modelsfrom monolingual data. in Proceedingsof the Annual Meeting of the Associationfor Computational Linguistics, pages 676–686.

Schafer, C. and D. Yarowsky. 2002. Induc-ing translation lexicons via diverse simi-larity measures and bridge languages. in

Proceedings of the Conference on NaturalLanguage Learning, pages 1–7.

Toutanova, K., C. M. Dan Klein, andY. Singer. 2003. Feature-rich part-of-speech tagging with a cyclic depen-dency network. in Proceedings of HLT-NAACL’03, pages 252–259.

Turchi, M. and M. Ehrmann. 2011. Knowl-edge expansion of a statistical machinetranslation system using morphological re-sources. Research Journal on ComputerScience and Computer Engineering withApplication (Polibits).

Tackstrom, O., R. McDonald, and J. Uszko-reit. 2012.

Vulic, I. and A. Korhonen. 2016. On the roleof seed lexicons in learning bilingual wordembeddings. in Proceedings of the annualmeeting on Association for ComputationalLinguistics, pages 247–257.

Vulic, I. and M.-F. Moens. 2015. Bilin-gual word embeddings from non-paralleldocument-aligned data applied to bilin-gual lexicon induction. in Proceedingsof the Annual Meeting of the Associationfor Computational Linguistics and the In-ternational Joint Conference on NaturalLanguage Processing, pages 719–725.

Wang, R., H. Zhao, S. Ploux, B.-L. Lu,M. Utiyama, and E. Sumita. 2016. Anovel bilingual word embedding methodfor lexical translation using bilingual senseclique. arXiv preprint arXiv:1607.08692.

Yu, K. and J. Tsujii. 2009. Bilingual dictio-nary extraction from wikipedia. in Pro-ceedings of the twelfth Machine Transla-tion Summit, pages 379––386.

Han Jingyi, Núria Bel

98