CrossLingual Image Search on the Web
Transcript of CrossLingual Image Search on the Web
CrossLingual Image Searchon the Web
Kobi Reiter, Stephen Soderland, Oren Etzioni
Turing CenterComputer Science and Engineering
University of Washington
2
Limitations to Monolingual Image Search
1. Limited Resource LanguagesSlovenian query ‘grenivka’ (grapefruit)
Results: only 9 images of grapefruit
3
Limitations to Monolingual Image Search
1. Limited Resource LanguagesSlovenian query ‘grenivka’ (grapefruit)
2. CrossCultural ImagesSearch for images of ‘food’ in different cultures
4
Zulu query: ‘ukudla’ (food)
5
Limitations to Monolingual Image Search
1. Limited Resource LanguagesSlovenian query ‘grenivka’ (grapefruit)
2. CrossCultural ImagesSearch for images of ‘food’ in different cultures
3. Crosslingual homonymsHungarian word for tooth is ‘fog’Results: misty weather, not teeth
6
Limitations to Monolingual Image Search
1. Limited Resource LanguagesSlovenian query ‘grenivka’ (grapefruit)
2. CrossCultural ImagesSearch for images of ‘food’ in different cultures
3. Crosslingual homonymsSearch for images with Hungarian word for tooth
4. Word Sense AmbiguitySearch in English for spring (flexible coil)
7
English query: ‘spring’
8
Solution: PanImages
• PanImages builds a translation graph from several dictionaries
• User specifies a language and a query term• PanImages presents possible translations• User selects one or more translation• PanImages sends translated query to Google Images
9Select from 50
source languagesautomatic word
completion
10
PanImages for Slovenian ‘grenivka’
over 40,000 imagesfor ‘grapefruit’ or ‘pamplemousse’
11
Search: English ‘Spring’
12
Senses for ‘Spring’
Different word sensesfound in the translationgraph
13
Select the Intended Sense of ‘spring’
14
Translated query for ‘spring’French ‘ressort’ is unambiguous
15
Outline of Talk
• Overview of PanImages• Building a translation graph
– Merging entries from multiple dictionaries– Computing translation probabilities
• Experimental Results• Conclusions
16
PanImages Architecture
translation graph
2. translations
1. query
dictionaries
3. select translation
PanImages compiler
PanImages query
processortranslated
query
images
user
17
Input from Machine Readable Dictionaries
• Multilingual dictionaries:– Each entry has translations in multiple languages– Distinguishes different senses of the word– “Wiktionaries” for 171 languages created by Web volunteers
www.wiktionary.org– Esperanto dictionary purl.org/net/voko/revo
• Bilingual dictionaries:– Each entry has translations into a single language– May mix together different senses of the word– freedict.org has 64 open source dictionaries
18
Translation Graph
• Nodes in the graph are ordered pairs (word, language)• Edges in the graph indicate translations between words• Each edge is labeled with a word sense ID
spring English
ressort French
primavera Spanish
printemps French
пружина Russian
1 2
2
2…1
1…
Edges from ‘spring’ from an English dictionary
19
Merging Multilingual Dictionaries(English Dictionary)
spring English
printemps French
primavera Spanish
Arabic
udaherri Basque
1
vzmet Slovenian пружина
Russian
ressort French
2
veer Dutch
… …
……
11 1
1
2
22
2
2
1
20
Merging Multilingual Dictionaries(English and French Dictionaries)
spring English
printemps French
primavera Spanish
Arabic
koanga Maori
udaherri Basque
1
vzmet Slovenian пружина
Russian
2
veer Dutch
… …
……
3
11 1
1
33
…
… 3
3
2
22
2
2
3
1
ressort French
21
Merging Multilingual Dictionaries(English and French Dictionaries)
spring English
printemps French
primavera Spanish
Arabic
koanga Maori
udaherri Basque
1
vzmet Slovenian пружина
Russian
ressort French
2
veer Dutch
рысора Belarusian
… …
……
3
11 1
1
33
…
…
… 3
3
…
444
2
22
4
2
2
4
3
4
1
22
Adding Bilingual Dictionaries
spring English printemps
French
primavera Spanish
udaherri Basque
…
1
111
xuân Vietnamese
2
from a VietnameseEnglish dictionary
23
Adding Bilingual Dictionaries
spring English printemps
French
primavera Spanish
udaherri Basque
…
1
111
xuân Vietnamese
2
from a VietnameseFrench dictionary
3
24
Inferring Word Sense Equivalence
• Compute where and are word senses
• Case1: and are each from– multilingual dictionaries– that distinguish word senses
• Case2: or are from either– bilingual dictionaries– or dictionaries that mix together word senses
)( ji ssprob = is js
is js
is js
25
Word Sense Equivalence
Multilingual dictionaries: is proportional to the degree of overlap
between and , where nodes(s) is the set of nodes with edges labeled s.
)( ji ssprob =)( jsnodes)( isnodes
+
+∩+
+∩β
αβ
α|)(|
|)(||)(|,|)(|
|)(||)(|maxj
ii
i
ii
snodessnodessnodes
snodessnodessnodes
== )( ji ssprob
26
Word Sense Equivalence
Bilingual dictionaries (or not sense distinguished):
Estimate this probability empirically. (prob = 0.85)
ksjs
is
)),,(|( kjikji ssscliquesssprob ∃==
),,( kji sssclique : a triangle from three dictionary entriesxuân
Vietnamese
spring English
printemps French
primavera Spanish
isis
…
27
Computing Translation Probabilities
• Probability decreases each time the word sense ID changes.• Probability increases with multiple distinct paths.
spring English
printemps French
primavera Spanish
Arabic
koanga Maori
udaherri Basque
1 … …
3
11 1
1
33
…
… 3
33
1
пружина Russian…
22
28
Computing Translation Probabilities
• CLISE can find translations that are not in any single source dictionary
• Translation probability decreases with each transition to a new word sense ID
kn1n 2n1s 2s … 1−kn 1−ks
=),,,( 1 Psnnpathp k
)())((max 11||,1||,1 +
−∈∈== ∏ ii
PiiPi
ssprobssprob
path P from to 1n kn
29
Probability from Multiple Paths
• Probability that is translated as in sense s increases when there are multiple paths between and
3s
kn1nkn1n
kn1n
2n
1s
2s
)),,,(1(1),,( 11 Psnnpathpsnnprob kdistinctPP
k −−= ∏∈
“Noisyor” probability model:
30
Outline of Talk
• Overview of PanImages• Building a translation graph
– Merging entries from multiple dictionaries– Computing translation probabilities
• Experimental Results• Conclusions
31
Graph Statistics
• Translation graph from 17 dictionaries: – English Wiktionary: 19,500 words with translations– French Wiktionary: 12,700 words with translations– Esperanto dictionary: 23,000 words with translations– 14 bilingual dictionaries: average 90,000 words each
• Graph has: 1.4 million words957 languages 60 languages have over 1,000 words
32
Experiment 1: Translation paths in Graph
• Evaluate translations for language pairs:– English Russian – English Hebrew– Turkish Russian
• Select random 1,000 English (Turkish) words from graph• Compare number of words translated, precision
– Baseline is Direct translation– Multilingual dictionaries vs. All dictionaries
– Effect of path length
33
Results of Experiment 1
Direct Multilingual All Dictionaries (length = 1) (length <= 2) (length <= 2) (length <= 4)
words P words P words P words P
EnglishRussian 385 0.91
EnglishHebrew 87 0.93
TurkishRussian 67 0.92
Direct translations: same sense ID in multilingual dictionary precision 0.92 due to parsing errors (inconsistent dictionary formats)
34
Results of Experiment 1
Direct Multilingual All Dictionaries (length = 1) (length <= 2) (length <= 2) (length <= 4) Gain from
words P words P words P words P Direct
EnglishRussian 385 0.91 417 0.82 1.08 x
EnglishHebrew 87 0.93 124 0.82 1.43 x
TurkishRussian 67 0.92 75 0.86 1.12 x
Adding paths with 2 sense IDs between multilingual dictionaries: modest gain in number of words translated precision in mid 0.80’s
35
Results of Experiment 1
Direct Multilingual All Dictionaries (length = 1) (length <= 2) (length <= 2) (length <= 4) Gain from
words P words P words P words P Direct
EnglishRussian 385 0.91 417 0.82 513 0.81 1.33 x
EnglishHebrew 87 0.93 124 0.82 157 0.79 1.80 x
TurkishRussian 67 0.92 75 0.86 211 0.80 3.15 x
Adding paths of length 2 from bilingual dictionaries: precision still above 0.80 large gain in number of words translated biggest gain from adding Turkish bilingual dictionaries
36
Results of Experiment 1
Direct Multilingual All Dictionaries (length = 1) (length <= 2) (length <= 2) (length <= 4) Gain from
words P words P words P words P Direct
EnglishRussian 385 0.91 417 0.82 513 0.81 516 0.71 1.34 x
EnglishHebrew 87 0.93 124 0.82 157 0.79 177 0.64 2.03 x
TurkishRussian 67 0.92 75 0.86 211 0.80 236 0.72 3.52 x
Adding paths with more than 2 sense IDs: only a small further gain in words translated sharp drop in precision
37
Experiment 2: Image Search
• 10 concepts with distinctive images:ant, clown, fig, lake, sky, train, eat, run, happy, tired
• 100 random nonEnglish terms from translation graph– 10 terms for each concept
• Compare results of Google Image search– using nonEnglish term as search query– using PanImages translation into English
• Metrics:– Number of results– Precision of first 15 pages of results (18 results per page)
38
Results of Experiment 2
For “33 minor languages” (Danish, Dutch, Greek, Lithuanian, …)• Increases number of correct results by 75% on first 270 results• Increases average precision by 27%
0
0.1
0.2
0.3
0.4
0.5
0.6
0 20 40 60 80 100Correct results
Prec
isio
n
Untranslated queryPanImages translation
39
Future Work
• Increase precision of translation paths– Cleaner parsing of dictionaries– More accurate word sense equivalence probabilities
• PanImages Web page in user’s choice of language• Word sense glosses in user’s language• More dictionaries to increase coverage of graph
40
Conclusions
• PanImages: a fullyimplemented crosslingual image search system for the Web: www.cs.washington.edu/research/panimages
• PanImages boosts recall and raises precision for minor language search queries
• We introduced the translation graph– Combines multiple machine readable dictionaries– Probabilistic wordsense merging across dictionaries– Infers translations not found in any source
dictionary
41
Thank you!