Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation...
-
Upload
estelle-delpech -
Category
Technology
-
view
211 -
download
0
description
Transcript of Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation...
![Page 1: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/1.jpg)
Extraction of domain-specific bilingual lexiconfrom comparable corpora
compositional translation and ranking
Estelle Delpech1, Beatrice Daille1, Emmanuel Morin1, ClaireLemaire2,3
1LINA, Universite de Nantes 2GREMUTS, Universite de Grenoble3Lingua et Machina
COLING’12 10/12/12 Mumbai, India
![Page 2: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/2.jpg)
Outline
1 Context
2 Translation method
3 Ranking method
4 Results of experiments
5 Future work
![Page 3: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/3.jpg)
Outline
1 Context
2 Translation method
3 Ranking method
4 Results of experiments
5 Future work
![Page 4: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/4.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Context : comparable corpora for Computer-AidedTranslation
Aim : provide domain-specific bilingual lexicons to translatorswhen no parallel data is available
⇒ Comparable corpora :
I Set of texts in languages L1 and L2, which are nottranslations, but which deal with the same subject matter, sothat there is still a possibility to extract translation pairs
1 / 31
![Page 5: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/5.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Context : comparable corpora for Computer-AidedTranslation
Aim : provide domain-specific bilingual lexicons to translatorswhen no parallel data is available
⇒ Comparable corpora :
I Set of texts in languages L1 and L2, which are nottranslations, but which deal with the same subject matter, sothat there is still a possibility to extract translation pairs
1 / 31
![Page 6: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/6.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Context : comparable corpora for Computer-AidedTranslation
Aim : provide domain-specific bilingual lexicons to translatorswhen no parallel data is available
⇒ Comparable corpora :
I Set of texts in languages L1 and L2, which are nottranslations, but which deal with the same subject matter, sothat there is still a possibility to extract translation pairs
1 / 31
![Page 7: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/7.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:
I 51% to 88% precision on top 20 candidates with specializedcorpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
I 81% to 94% precision on Top1[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
I More than 60% of terms in technical and scientific domains aremorphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
![Page 8: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/8.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:
I 51% to 88% precision on top 20 candidates with specializedcorpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
I 81% to 94% precision on Top1[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
I More than 60% of terms in technical and scientific domains aremorphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
![Page 9: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/9.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:I 51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
I 81% to 94% precision on Top1[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
I More than 60% of terms in technical and scientific domains aremorphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
![Page 10: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/10.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:I 51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
I 81% to 94% precision on Top1[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
I More than 60% of terms in technical and scientific domains aremorphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
![Page 11: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/11.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:I 51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
I 81% to 94% precision on Top1[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
I More than 60% of terms in technical and scientific domains aremorphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
![Page 12: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/12.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:I 51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :I 81% to 94% precision on Top1
[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
I More than 60% of terms in technical and scientific domains aremorphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
![Page 13: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/13.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:I 51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :I 81% to 94% precision on Top1
[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]I More than 60% of terms in technical and scientific domains are
morphologically complex [Namer and Baud, 2007]
I Outperforms context-based approaches for the translation ofterms with compositional meaning [Morin and Daille, 2009]
2 / 31
![Page 14: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/14.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:I 51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :I 81% to 94% precision on Top1
[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]I More than 60% of terms in technical and scientific domains are
morphologically complex [Namer and Baud, 2007]I Outperforms context-based approaches for the translation of
terms with compositional meaning [Morin and Daille, 2009]
2 / 31
![Page 15: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/15.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}Translate {α, β}
Reorder {αβ, βα}Select αβ
Output : ”αβ”
3 / 31
![Page 16: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/16.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}Translate {α, β}
Reorder {αβ, βα}Select αβ
Output : ”αβ”
3 / 31
![Page 17: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/17.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}
Translate {α, β}Reorder {αβ, βα}
Select αβ
Output : ”αβ”
3 / 31
![Page 18: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/18.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}Translate {α, β}
Reorder {αβ, βα}Select αβ
Output : ”αβ”
3 / 31
![Page 19: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/19.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}Translate {α, β}
Reorder {αβ, βα}
Select αβ
Output : ”αβ”
3 / 31
![Page 20: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/20.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}Translate {α, β}
Reorder {αβ, βα}Select αβ
Output : ”αβ”
3 / 31
![Page 21: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/21.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}Translate {α, β}
Reorder {αβ, βα}Select αβ
Output : ”αβ”
3 / 31
![Page 22: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/22.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Related work
Applied to phrases, decomposed into words[Robitaille et al., 2006, Morin and Daille, 2009]
I rate of evaporation → taux d’evaporation
Applied to words, decomposed into morphemes[Cartoni, 2009, Harastani et al., 2012]
I cardiology → cardiologieI ricostruire → rebuild
⇒ No approach links bound morphemes to words :I -cyto- → cellule ’cell’I cytotoxic → toxique pour les cellules ’toxic to the cells’
4 / 31
![Page 23: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/23.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Related work
Applied to phrases, decomposed into words[Robitaille et al., 2006, Morin and Daille, 2009]
I rate of evaporation → taux d’evaporation
Applied to words, decomposed into morphemes[Cartoni, 2009, Harastani et al., 2012]
I cardiology → cardiologieI ricostruire → rebuild
⇒ No approach links bound morphemes to words :I -cyto- → cellule ’cell’I cytotoxic → toxique pour les cellules ’toxic to the cells’
4 / 31
![Page 24: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/24.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Related work
Applied to phrases, decomposed into words[Robitaille et al., 2006, Morin and Daille, 2009]
I rate of evaporation → taux d’evaporation
Applied to words, decomposed into morphemes[Cartoni, 2009, Harastani et al., 2012]
I cardiology → cardiologieI ricostruire → rebuild
⇒ No approach links bound morphemes to words :I -cyto- → cellule ’cell’I cytotoxic → toxique pour les cellules ’toxic to the cells’
4 / 31
![Page 25: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/25.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Related work
Applied to phrases, decomposed into words[Robitaille et al., 2006, Morin and Daille, 2009]
I rate of evaporation → taux d’evaporation
Applied to words, decomposed into morphemes[Cartoni, 2009, Harastani et al., 2012]
I cardiology → cardiologieI ricostruire → rebuild
⇒ No approach links bound morphemes to words :I -cyto- → cellule ’cell’I cytotoxic → toxique pour les cellules ’toxic to the cells’
4 / 31
![Page 26: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/26.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
![Page 27: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/27.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
![Page 28: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/28.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
![Page 29: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/29.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
![Page 30: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/30.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
![Page 31: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/31.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
![Page 32: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/32.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
![Page 33: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/33.jpg)
Outline
1 Context
2 Translation method
3 Ranking method
4 Results of experiments
5 Future work
![Page 34: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/34.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}Match {non, toxique, cellule}
Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 35: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/35.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}Match {non, toxique, cellule}
Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 36: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/36.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}
Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},{non, cellule, toxicite}, {non, cyto, toxicite}
Reorder {non, toxique, cellule}, {non, cellule, toxique},{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 37: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/37.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},{non, cellule, toxicite}, {non, cyto, toxicite}
Reorder {non, toxique, cellule}, {non, cellule, toxique},{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 38: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/38.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},{non, cellule, toxicite}, {non, cyto, toxicite}
Reorder {non, toxique, cellule}, {non, cellule, toxique},{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 39: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/39.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}
Reorder {non, toxique, cellule}, {non, cellule, toxique},{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 40: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/40.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}
Reorder {non, toxique, cellule}, {non, cellule, toxique},{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 41: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/41.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 42: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/42.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 43: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/43.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 44: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/44.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 45: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/45.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}Match {non, toxique, cellule}
Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 46: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/46.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicite}, {non, cyto, toxicite}Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}Match {non, toxique, cellule}
Output : ”non toxique pour les cellules” ’non toxic to thecells’
7 / 31
![Page 47: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/47.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristicrules:
I split on hyphensI match substrings of the source term with:
a list of morphemesa list of lexical items
I respect some length constraints on the substrings
8 / 31
![Page 48: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/48.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristicrules:
I split on hyphensI match substrings of the source term with:
a list of morphemesa list of lexical items
I respect some length constraints on the substrings
8 / 31
![Page 49: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/49.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristicrules:
I split on hyphens
I match substrings of the source term with:
a list of morphemesa list of lexical items
I respect some length constraints on the substrings
8 / 31
![Page 50: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/50.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristicrules:
I split on hyphensI match substrings of the source term with:
a list of morphemesa list of lexical items
I respect some length constraints on the substrings
8 / 31
![Page 51: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/51.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristicrules:
I split on hyphensI match substrings of the source term with:
a list of morphemesa list of lexical items
I respect some length constraints on the substrings
8 / 31
![Page 52: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/52.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Concatenation
Generate all possible concatenations of the minimalcomponents
Increases the chances of matching the components withentries of the dictionaries
{ non, cyto, toxic} → {non, cyto, ∅ }{non, cytotoxic} → {non, cytotoxique }
9 / 31
![Page 53: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/53.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Concatenation
Generate all possible concatenations of the minimalcomponents
Increases the chances of matching the components withentries of the dictionaries
{ non, cyto, toxic} → {non, cyto, ∅ }{non, cytotoxic} → {non, cytotoxique }
9 / 31
![Page 54: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/54.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Concatenation
Generate all possible concatenations of the minimalcomponents
Increases the chances of matching the components withentries of the dictionaries
{ non, cyto, toxic} → {non, cyto, ∅ }{non, cytotoxic} → {non, cytotoxique }
9 / 31
![Page 55: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/55.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with direct dictionary look-up
Bilingual dictionary for lexical items:I toxic → toxique
Morpheme translation table for bound morphemes:I allow bound to free morpheme translation equivalenceI -cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}
10 / 31
![Page 56: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/56.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with direct dictionary look-up
Bilingual dictionary for lexical items:I toxic → toxique
Morpheme translation table for bound morphemes:I allow bound to free morpheme translation equivalenceI -cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}
10 / 31
![Page 57: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/57.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with direct dictionary look-up
Bilingual dictionary for lexical items:I toxic → toxique
Morpheme translation table for bound morphemes:I allow bound to free morpheme translation equivalenceI -cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}
10 / 31
![Page 58: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/58.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with direct dictionary look-up
Bilingual dictionary for lexical items:I toxic → toxique
Morpheme translation table for bound morphemes:I allow bound to free morpheme translation equivalenceI -cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}
10 / 31
![Page 59: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/59.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with variation
Morphological lexiconI toxic → toxique → toxicite ’toxicity’
SynonymsI toxic → toxique → veneneux ’poisonous’
{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}
11 / 31
![Page 60: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/60.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with variation
Morphological lexiconI toxic → toxique → toxicite ’toxicity’
SynonymsI toxic → toxique → veneneux ’poisonous’
{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}
11 / 31
![Page 61: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/61.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with variation
Morphological lexiconI toxic → toxique → toxicite ’toxicity’
SynonymsI toxic → toxique → veneneux ’poisonous’
{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}
11 / 31
![Page 62: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/62.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Translation with variation
Morphological lexiconI toxic → toxique → toxicite ’toxicity’
SynonymsI toxic → toxique → veneneux ’poisonous’
{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}
11 / 31
![Page 63: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/63.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Reordering
No translation patterns or reordering rules
Permutate the translated components :
{cellule, toxique} → {cellule, toxique},{toxique, cellule}
12 / 31
![Page 64: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/64.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Reordering
No translation patterns or reordering rules
Permutate the translated components :
{cellule, toxique} → {cellule, toxique},{toxique, cellule}
12 / 31
![Page 65: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/65.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Reordering
No translation patterns or reordering rules
Permutate the translated components :
{cellule, toxique} → {cellule, toxique},{toxique, cellule}
12 / 31
![Page 66: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/66.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Concatenation
Recreate target words by generating all possibleconcatenations of the components :
{toxique, cellule} → {toxique cellule},{toxiquecellule}
13 / 31
![Page 67: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/67.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Concatenation
Recreate target words by generating all possibleconcatenations of the components :
{toxique, cellule} → {toxique cellule},{toxiquecellule}
13 / 31
![Page 68: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/68.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection
Match target words with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
14 / 31
![Page 69: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/69.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection
Match target words with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
14 / 31
![Page 70: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/70.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection
Match target words with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
14 / 31
![Page 71: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/71.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Selection
Match target words with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
14 / 31
![Page 72: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/72.jpg)
Outline
1 Context
2 Translation method
3 Ranking method
4 Results of experiments
5 Future work
![Page 73: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/73.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Target term frequency
Number of occurrences of target term divided by the totalnumber of occurrences in the target texts
Freq(t) =occ(t)
N
16 / 31
![Page 74: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/74.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Target term frequency
Number of occurrences of target term divided by the totalnumber of occurrences in the target texts
Freq(t) =occ(t)
N
16 / 31
![Page 75: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/75.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in awindow of 5 words
Normalize cooccurrences with log-likelihood ratio
Compare contexts with weighted jaccard
Cont(s, t) =
∑w∈s∩t min(c(s,w), c(t,w))∑w∈s∪t max(c(s,w), c(t,w))
17 / 31
![Page 76: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/76.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in awindow of 5 words
Normalize cooccurrences with log-likelihood ratio
Compare contexts with weighted jaccard
Cont(s, t) =
∑w∈s∩t min(c(s,w), c(t,w))∑w∈s∪t max(c(s,w), c(t,w))
17 / 31
![Page 77: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/77.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in awindow of 5 words
Normalize cooccurrences with log-likelihood ratio
Compare contexts with weighted jaccard
Cont(s, t) =
∑w∈s∩t min(c(s,w), c(t,w))∑w∈s∪t max(c(s,w), c(t,w))
17 / 31
![Page 78: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/78.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in awindow of 5 words
Normalize cooccurrences with log-likelihood ratio
Compare contexts with weighted jaccard
Cont(s, t) =
∑w∈s∩t min(c(s,w), c(t,w))∑w∈s∪t max(c(s,w), c(t,w))
17 / 31
![Page 79: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/79.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in awindow of 5 words
Normalize cooccurrences with log-likelihood ratio
Compare contexts with weighted jaccard
Cont(s, t) =
∑w∈s∩t min(c(s,w), c(t,w))∑w∈s∪t max(c(s,w), c(t,w))
17 / 31
![Page 80: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/80.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Part-of-speech translation probability
Probability that source term with part-of-speech A translatesto target term with part of speech B
Pos(s, t) = P(pos(t)|pos(s))= P(B|A)
Acquired from pos-tagged parallel corpora [Tiedemann, 2009]with word alignment software AnyMalign [Lardrilleux, 2008]
18 / 31
![Page 81: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/81.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Part-of-speech translation probability
Probability that source term with part-of-speech A translatesto target term with part of speech B
Pos(s, t) = P(pos(t)|pos(s))= P(B|A)
Acquired from pos-tagged parallel corpora [Tiedemann, 2009]with word alignment software AnyMalign [Lardrilleux, 2008]
18 / 31
![Page 82: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/82.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Part-of-speech translation probability
Probability that source term with part-of-speech A translatesto target term with part of speech B
Pos(s, t) = P(pos(t)|pos(s))= P(B|A)
Acquired from pos-tagged parallel corpora [Tiedemann, 2009]with word alignment software AnyMalign [Lardrilleux, 2008]
18 / 31
![Page 83: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/83.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Resources reliability score
Some translation resources might give more reliabletranslations than others
I ex : bilingual dictionary > synonyms
I score = mean of the reliability of the resources used fortranslating the components
Reso(t = {c1, ...cn}) =
∑ni=1 resource reliability(ci )
n
Tuned on training data
19 / 31
![Page 84: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/84.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Resources reliability score
Some translation resources might give more reliabletranslations than others
I ex : bilingual dictionary > synonyms
I score = mean of the reliability of the resources used fortranslating the components
Reso(t = {c1, ...cn}) =
∑ni=1 resource reliability(ci )
n
Tuned on training data
19 / 31
![Page 85: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/85.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Resources reliability score
Some translation resources might give more reliabletranslations than others
I ex : bilingual dictionary > synonymsI score = mean of the reliability of the resources used for
translating the components
Reso(t = {c1, ...cn}) =
∑ni=1 resource reliability(ci )
n
Tuned on training data
19 / 31
![Page 86: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/86.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Resources reliability score
Some translation resources might give more reliabletranslations than others
I ex : bilingual dictionary > synonymsI score = mean of the reliability of the resources used for
translating the components
Reso(t = {c1, ...cn}) =
∑ni=1 resource reliability(ci )
n
Tuned on training data
19 / 31
![Page 87: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/87.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Combination
Linear combination of the 4 criterion Frequency, Context,Part-of-speech translation probability and Resources reliabilily
Combi(t, s) = Freq(s) + Cont(s, t) + Pos(s, t) + Reso(t)
20 / 31
![Page 88: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/88.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Combination
Linear combination of the 4 criterion Frequency, Context,Part-of-speech translation probability and Resources reliabilily
Combi(t, s) = Freq(s) + Cont(s, t) + Pos(s, t) + Reso(t)
20 / 31
![Page 89: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/89.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]I Coordinate Ascend [Metzler and Croft, 2000]I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
![Page 90: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/90.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]I Coordinate Ascend [Metzler and Croft, 2000]I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
![Page 91: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/91.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]I Coordinate Ascend [Metzler and Croft, 2000]I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
![Page 92: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/92.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]
I Coordinate Ascend [Metzler and Croft, 2000]I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
![Page 93: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/93.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]I Coordinate Ascend [Metzler and Croft, 2000]
I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
![Page 94: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/94.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]I Coordinate Ascend [Metzler and Croft, 2000]I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
![Page 95: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/95.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
I AdaRank [Li and Xu, 2007]I Coordinate Ascend [Metzler and Croft, 2000]I LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1http://people.cs.umass.edu/ vdang/ranklib.html21 / 31
![Page 96: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/96.jpg)
Outline
1 Context
2 Translation method
3 Ranking method
4 Results of experiments
5 Future work
![Page 97: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/97.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Corpora
English → French, German
breast cancer
≈ 400k words per language
23 / 31
![Page 98: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/98.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Corpora
English → French, German
breast cancer
≈ 400k words per language
23 / 31
![Page 99: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/99.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Corpora
English → French, German
breast cancer
≈ 400k words per language
23 / 31
![Page 100: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/100.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Corpora
English → French, German
breast cancer
≈ 400k words per language
23 / 31
![Page 101: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/101.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Lexicons
Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]
24 / 31
![Page 102: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/102.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Lexicons
Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]
24 / 31
![Page 103: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/103.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Lexicons
Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]
24 / 31
![Page 104: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/104.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Lexicons
Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]
24 / 31
![Page 105: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/105.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Lexicons
Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]
24 / 31
![Page 106: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/106.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Lexicons
Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]
24 / 31
![Page 107: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/107.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
![Page 108: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/108.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
![Page 109: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/109.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
![Page 110: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/110.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
![Page 111: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/111.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget texts
generated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
![Page 112: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/112.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
![Page 113: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/113.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
![Page 114: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/114.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus withtranslation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could begenerated and whose translation(s) is in thetarget textsgenerated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with notranslation in dictionary
25 / 31
![Page 115: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/115.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Results for translation generation
EN → FR EN → DE
# source terms 126 90
# at least 1 translation 86 (68%) 56 (62%)
# at least 1 translation 86 56
1 trans. in UMLS 68 (79%) 40 (71%)
1 trans. in UMLS or judged correct 81 (94%) 51 (91%)
26 / 31
![Page 116: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/116.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Results for translation ranking
EN → FR EN → DE Average
Random .83 .80 .815
Freq .92 .84 .88
Cont .90 .82 .86
Pos .88 .91 .895
Reso .92 .82 .87
Combination .93 .89 .91
ML AdaRank .90 .84 .87
ML CoordAsc .93 .89 .91ML LambdaMart .86 .88 .87
Table: Top1 translation in UMLS or judged correct
27 / 31
![Page 117: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/117.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Silence analysis
Missing translation in resources (≈30%)
Target term is not compositional (≈30%)I breastfeeding → allaitement (FR), stillen (DE)
Lexical divergence (≈20%)I radiosensitivity → Strahlentoleranz, sensitivity 6= toleranz
Additional elements (≈13%)I postpartum→ postpartalperiod
28 / 31
![Page 118: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/118.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Silence analysis
Missing translation in resources (≈30%)
Target term is not compositional (≈30%)I breastfeeding → allaitement (FR), stillen (DE)
Lexical divergence (≈20%)I radiosensitivity → Strahlentoleranz, sensitivity 6= toleranz
Additional elements (≈13%)I postpartum→ postpartalperiod
28 / 31
![Page 119: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/119.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Silence analysis
Missing translation in resources (≈30%)
Target term is not compositional (≈30%)I breastfeeding → allaitement (FR), stillen (DE)
Lexical divergence (≈20%)I radiosensitivity → Strahlentoleranz, sensitivity 6= toleranz
Additional elements (≈13%)I postpartum→ postpartalperiod
28 / 31
![Page 120: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/120.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Silence analysis
Missing translation in resources (≈30%)
Target term is not compositional (≈30%)I breastfeeding → allaitement (FR), stillen (DE)
Lexical divergence (≈20%)I radiosensitivity → Strahlentoleranz, sensitivity 6= toleranz
Additional elements (≈13%)I postpartum→ postpartalperiod
28 / 31
![Page 121: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/121.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Silence analysis
Missing translation in resources (≈30%)
Target term is not compositional (≈30%)I breastfeeding → allaitement (FR), stillen (DE)
Lexical divergence (≈20%)I radiosensitivity → Strahlentoleranz, sensitivity 6= toleranz
Additional elements (≈13%)I postpartum→ postpartalperiod
28 / 31
![Page 122: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/122.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Error analysis
Problems in word reorderingI self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translationsI in-patient → pas malade ’not ill’
in → “inside” → inside patientin → “inverse” → not a patient
29 / 31
![Page 123: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/123.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Error analysis
Problems in word reorderingI self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translationsI in-patient → pas malade ’not ill’
in → “inside” → inside patientin → “inverse” → not a patient
29 / 31
![Page 124: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/124.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Error analysis
Problems in word reorderingI self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translationsI in-patient → pas malade ’not ill’
in → “inside” → inside patientin → “inverse” → not a patient
29 / 31
![Page 125: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/125.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Impact of fertile translations
EN → FR EN → DE
exact translations 21% 10%
wrong translations 50% 80%
Table: % of fertile translations
German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige
French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes
30 / 31
![Page 126: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/126.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Impact of fertile translations
EN → FR EN → DE
exact translations 21% 10%
wrong translations 50% 80%
Table: % of fertile translations
German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige
French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes
30 / 31
![Page 127: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/127.jpg)
Outline
1 Context
2 Translation method
3 Ranking method
4 Results of experiments
5 Future work
![Page 128: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/128.jpg)
ContextTranslation method
Ranking methodResults of experiments
Future work
Future work
Improve quality of linguistic resourcesI morphological derivation rules instead of stemmingI use of a thesaurus
Try translations patterns on top of permutations
Try learning morpheme translation equivalences fromI cognatesI bilingual dictionariesI out-of-domain parallel data
31 / 31
![Page 129: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/129.jpg)
Thank you for your attention.
[email protected]@univ-nantes.fr
[email protected]@lingua-et-machina.com
![Page 130: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/130.jpg)
ADDITIONAL SLIDES
![Page 131: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/131.jpg)
Exact translations
Non fertiles:I pathophysiological → physiopathologiqueI overactive → uberaktiv
Fertiles:I cardiotoxicity → toxicite cardiaque ’cardiac toxicity’I mastectomy → ablation der brust ’ablation of the breast’
![Page 132: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/132.jpg)
Morphological variants
Non fertiles:I dosimetry → dosimetrique ’dosimetric’I radiosensitivity → strahlenempfindlich ’radiosensitive’
Fertiles:I milk-producing → production de lait ’production of milk’I selfexamination → selbst untersuchen ’self examine’
![Page 133: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/133.jpg)
Inexact but semantically related
Non fertiles:I oncogene → oncogenese ’oncogenesis’I breakthrough → durchbrechen ’break’
Fertiles:I chemoradiotherapy → chemotherapie oder strahlen
’chemotherapy or radiation’I treatable → pouvoir le traiter ’can treat it’
![Page 134: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/134.jpg)
Wrong translations
Non fertiles:I immunoscore → immunomarquer ’immunostain’I check-in → unkontrollieren ’uncontrolled’
Fertiles:I bloodstream → fliessen mehr blut ’more blood flow’I risk-reducing → risque de reduire ’risk of reducing’
![Page 135: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/135.jpg)
References I
Baldwin, T. and Tanaka, T. (2004).
Translation by machine of complex nominals.In Proceedings of the ACL 2004 Workshop on Multiword expressions: Integrating Processing, pages 24–31,Barcelona, Spain.
Bo, L. and Gaussier, E. (2010).
Improving corpus comparability for bilingual lexicon extraction from comparable corpora.In 23eme International Conference on Computational Linguistics, pages 23–27, Beijing, Chine.
Cartoni, B. (2009).
Lexical morphology in machine translation: A feasibility study.In Proceedings of the 12th Conference of the European Chapter of the ACL, pages 130–138, Athens, Greece.
Daille, B. and Morin, E. (2005).
French-English terminology extraction from comparable corpora.In Proceedings, 2nd International Joint Conference on Natural Language Processing, volume 3651 ofLecture Notes in Computer Sciences, page 707–718, Jeju Island, Korea. Springer.
Delpech, E. (2011).
Evaluation of terminologies acquired from comparable corpora : an application perspective.In Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), volume 11of NEALT Proceedings Series,, pages 66–73, Riga, Latvia. Pedersen B.S., Nespore G., Skadina I.
Fung, P. (1997).
Finding terminology translations from non-parallel corpora.pages 192–202, Hong Kong.
Garera, N. and Yarowsky, D. (2008).
Translating compounds by learning component gloss translation via multiple languages.In Proceedings of the 3rd International Joint Conference on Natural Language Processing, volume 1, pages403–410, Hyderabad, India.
![Page 136: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/136.jpg)
References II
Grefenstette, G. (1999).
The world wide web as a resource for example-based machine translation tasks.ASLIB’99 Translating and the computer, 21.
Harastani, R., Daille, B., and Morin, E. (2012).
Neoclassical compound alignments from comparable corpora.In Proceedings of the 13th International Conference on Computational Linguistics and Intelligent TextProcessing, volume 2, pages 72–82, New Delhi, India.
Hauer, B. and Kondrak, G. (2011).
Clustering semantically equivalent words into cognate sets in multilingual lists.In Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 865–873,Chiang Mai, Thailand.
Keenan, E. L. and Faltz, L. M. (1985).
Boolean semantics for natural language.D. Reidel, Dordrecht, Holland.
Lardrilleux, A. (2008).
A truly multilingual, high coverage, accurate, yet simple, sub-sentential alignment method.
Li, H. and Xu, J. (2007).
Adarank: A boosing algorithm for information retrieval.In Proceedings of the 30th annual international ACM SIGIR conference on Research and development ininformation retrieval, pages 391–398, Amsterdam, The Netherlands.
Metzler, D. and Croft, W. B. (2000).
Linear feature-based models for information retrieval.Information Retrieval, 10(3):257–274.
![Page 137: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e9a68b4c90573338b5349/html5/thumbnails/137.jpg)
References III
Morin, E. and Daille, B. (2009).
Compositionality and lexical alignment of multi-word terms.In Language Resources and Evaluation (LRE), volume 44 of Multiword expression: hard going or plainsailing, pages 79–95. P. Rayson, S. Piao, S. Sharoff, S. Evert, B. Villada Moiron, springer netherlandsedition.
Morin, E. and Daille, B. (2010).
Compositionality and lexical alignment of multi-word terms.In Rayson, P., Piao, S., Sharoff, S., Evert, S., and B., V. M., editors, Language Resources and Evaluation(LRE), volume 44 of Multiword expression: hard going or plain sailing, pages 79–95. Springer Netherlands.
Namer, F. and Baud, R. (2007).
Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system.International Journal of Medical Informatics, 76(2-3):226–33.
Porter, M. F. (1980).
An algorithm for suffix stripping.Program, 14(3):130–137.
Robitaille, X., Sasaki, X., Tonoike, M., Sato, S., and Utsuro, S. (2006).
Compiling French-Japanese terminologies from the web.In Proceedings of the 11th Conference of the European Chapter of the Association for ComputationalLinguistics, pages 225–232, Trento, Italy.
Tiedemann, J. (2009).
News from opus - a collection of multilingual parallel corpora with tools and interfaces.
Wu, Q., Burges, J. C., Svore, K., and Gao, J. (2010).
Adapting boosting for information retrieval measures.Journal of Information Retrieval, 13(3):254–270.