CrossLingual Image Search on the Web

41
Cross-Lingual Image Search on the Web Kobi Reiter, Stephen Soderland, Oren Etzioni Turing Center Computer Science and Engineering University of Washington

Transcript of CrossLingual Image Search on the Web

Page 1: CrossLingual Image Search on the Web

Cross­Lingual Image Searchon the Web

Kobi Reiter, Stephen Soderland, Oren Etzioni

Turing CenterComputer Science and Engineering

University of Washington

Page 2: CrossLingual Image Search on the Web

2

Limitations to Monolingual Image Search

1. Limited Resource LanguagesSlovenian query ‘grenivka’ (grapefruit)

Results: only 9 images of grapefruit

Page 3: CrossLingual Image Search on the Web

3

Limitations to Monolingual Image Search

1. Limited Resource LanguagesSlovenian query ‘grenivka’ (grapefruit)

2. Cross­Cultural ImagesSearch for images of ‘food’ in different cultures

Page 4: CrossLingual Image Search on the Web

4

Zulu query: ‘ukudla’ (food)

Page 5: CrossLingual Image Search on the Web

5

Limitations to Monolingual Image Search

1. Limited Resource LanguagesSlovenian query ‘grenivka’ (grapefruit)

2. Cross­Cultural ImagesSearch for images of ‘food’ in different cultures

3. Cross­lingual homonymsHungarian word for tooth is ‘fog’Results: misty weather, not teeth

Page 6: CrossLingual Image Search on the Web

6

Limitations to Monolingual Image Search

1. Limited Resource LanguagesSlovenian query ‘grenivka’ (grapefruit)

2. Cross­Cultural ImagesSearch for images of ‘food’ in different cultures

3. Cross­lingual homonymsSearch for images with Hungarian word for tooth

4. Word Sense AmbiguitySearch in English for spring (flexible coil)

Page 7: CrossLingual Image Search on the Web

7

English query: ‘spring’

Page 8: CrossLingual Image Search on the Web

8

Solution: PanImages

• PanImages builds a translation graph from several dictionaries

• User specifies a language and a query term• PanImages presents possible translations• User selects one or more translation• PanImages sends translated query to Google Images

Page 9: CrossLingual Image Search on the Web

9Select from 50 

source languagesautomatic word

completion

Page 10: CrossLingual Image Search on the Web

10

PanImages for Slovenian ‘grenivka’ 

over 40,000 imagesfor ‘grapefruit’ or ‘pamplemousse’

Page 11: CrossLingual Image Search on the Web

11

Search: English ‘Spring’

Page 12: CrossLingual Image Search on the Web

12

Senses for ‘Spring’

Different word sensesfound in the translationgraph

Page 13: CrossLingual Image Search on the Web

13

Select the Intended  Sense of ‘spring’

Page 14: CrossLingual Image Search on the Web

14

Translated query for ‘spring’French ‘ressort’ is unambiguous

Page 15: CrossLingual Image Search on the Web

15

Outline of Talk

• Overview of PanImages• Building a translation graph

– Merging entries from multiple dictionaries– Computing translation probabilities

• Experimental Results• Conclusions 

Page 16: CrossLingual Image Search on the Web

16

PanImages Architecture

translation graph

2. translations

1. query

dictionaries

3. select translation

PanImages compiler

PanImages query 

processortranslated 

query

images

user

Page 17: CrossLingual Image Search on the Web

17

Input from Machine Readable Dictionaries 

• Multilingual dictionaries:– Each entry has translations in multiple languages– Distinguishes different senses of the word– “Wiktionaries” for 171 languages created by Web volunteers

www.wiktionary.org– Esperanto dictionary purl.org/net/voko/revo

• Bilingual dictionaries:– Each entry has translations into a single language– May mix together different senses of the word– freedict.org has 64 open source dictionaries

Page 18: CrossLingual Image Search on the Web

18

Translation Graph

• Nodes in the graph are ordered pairs (word, language)• Edges in the graph indicate translations between words• Each edge is labeled with a word sense ID

spring English

ressort French

primavera Spanish

printemps French

пружина Russian

1 2

2

2…1

1…

Edges from ‘spring’ from an English dictionary

Page 19: CrossLingual Image Search on the Web

19

Merging Multilingual Dictionaries(English Dictionary) 

spring English

printemps French

primavera Spanish

 Arabic

udaherri Basque

1

vzmet Slovenian пружина 

Russian

ressort French

2

veer Dutch

… …

……

11 1

1

2

22

2

2

1

Page 20: CrossLingual Image Search on the Web

20

Merging Multilingual Dictionaries(English and French Dictionaries)

spring English

printemps French

primavera Spanish

 Arabic

koanga Maori

udaherri Basque

1

vzmet Slovenian пружина 

Russian

2

veer Dutch

… …

……

3

11 1

1

33

… 3

3

2

22

2

2

3

1

ressort French

Page 21: CrossLingual Image Search on the Web

21

Merging Multilingual Dictionaries(English and French Dictionaries)

spring English

printemps French

primavera Spanish

 Arabic

koanga Maori

udaherri Basque

1

vzmet Slovenian пружина 

Russian

ressort French

2

veer Dutch

рысора Belarusian

… …

……

3

11 1

1

33

… 3

3

444

2

22

4

2

2

4

3

4

1

Page 22: CrossLingual Image Search on the Web

22

Adding Bilingual Dictionaries

spring English printemps 

French

primavera Spanish

udaherri Basque

1

111

xuân Vietnamese

2

from a Vietnamese­English dictionary

Page 23: CrossLingual Image Search on the Web

23

Adding Bilingual Dictionaries

spring English printemps 

French

primavera Spanish

udaherri Basque

1

111

xuân Vietnamese

2

from a Vietnamese­French dictionary

3

Page 24: CrossLingual Image Search on the Web

24

Inferring Word Sense Equivalence

• Compute                          where      and      are word senses

• Case1:       and       are each from– multilingual dictionaries– that distinguish word senses

• Case2:       or        are from either– bilingual dictionaries– or dictionaries that mix together word senses

)( ji ssprob = is js

is js

is js

Page 25: CrossLingual Image Search on the Web

25

Word Sense Equivalence

Multilingual dictionaries:                             is proportional to the degree of overlap 

between                   and                    , where nodes(s) is the set of nodes with edges labeled s.  

)( ji ssprob =)( jsnodes)( isnodes

+

+∩+

+∩β

αβ

α|)(|

|)(||)(|,|)(|

|)(||)(|maxj

ii

i

ii

snodessnodessnodes

snodessnodessnodes

== )( ji ssprob

Page 26: CrossLingual Image Search on the Web

26

Word Sense Equivalence

Bilingual dictionaries (or not sense distinguished):

Estimate this probability empirically.  (prob = 0.85)

ksjs

is

)),,(|( kjikji ssscliquesssprob ∃==

),,( kji sssclique                    : a triangle from three dictionary entriesxuân 

Vietnamese

spring English

printemps French

primavera Spanish

isis

Page 27: CrossLingual Image Search on the Web

27

Computing Translation Probabilities

• Probability decreases each time the word sense ID changes.• Probability increases with multiple distinct paths. 

spring English

printemps French

primavera Spanish

 Arabic

koanga Maori

udaherri Basque

1 … …

3

11 1

1

33

… 3

33

1

пружина Russian…

22

Page 28: CrossLingual Image Search on the Web

28

Computing Translation Probabilities

• CLISE can find translations that are not in any single source dictionary

• Translation probability decreases with each transition to a new word sense ID

kn1n 2n1s 2s … 1−kn 1−ks

=),,,( 1 Psnnpathp k

)())((max 11||,1||,1 +

−∈∈== ∏ ii

PiiPi

ssprobssprob

path P from to    1n kn

Page 29: CrossLingual Image Search on the Web

29

Probability from Multiple Paths

• Probability that      is translated as       in sense s increases when there are multiple paths between       and      

3s

kn1nkn1n

kn1n

2n

1s

2s

)),,,(1(1),,( 11 Psnnpathpsnnprob kdistinctPP

k −−= ∏∈

“Noisy­or” probability model:

Page 30: CrossLingual Image Search on the Web

30

Outline of Talk

• Overview of PanImages• Building a translation graph

– Merging entries from multiple dictionaries– Computing translation probabilities

• Experimental Results• Conclusions 

Page 31: CrossLingual Image Search on the Web

31

Graph Statistics

• Translation graph from 17 dictionaries: – English Wiktionary:     19,500 words with translations– French Wiktionary:      12,700 words with translations– Esperanto dictionary:   23,000 words with translations– 14 bilingual dictionaries: average 90,000 words each

• Graph has: 1.4 million words957 languages   60 languages have over 1,000 words

Page 32: CrossLingual Image Search on the Web

32

Experiment 1: Translation paths in Graph

• Evaluate translations for language pairs:– English ­ Russian – English ­ Hebrew– Turkish ­ Russian

• Select random 1,000 English (Turkish) words from graph• Compare number of words translated, precision 

– Baseline is Direct translation– Multilingual dictionaries vs. All dictionaries                                    

                – Effect of path length

Page 33: CrossLingual Image Search on the Web

33

Results of Experiment 1

                                 Direct        Multilingual      All Dictionaries               (length = 1)        (length <= 2)     (length <= 2)      (length <= 4)

                    words      P         words       P        words      P         words       P

English­Russian      385   0.91                                                                        

English­Hebrew       87    0.93                          

Turkish­Russian       67    0.92                            

Direct translations:    ­ same sense ID in multilingual dictionary     ­ precision 0.92 due to parsing errors (inconsistent dictionary formats)

Page 34: CrossLingual Image Search on the Web

34

Results of Experiment 1

                                 Direct        Multilingual      All Dictionaries               (length = 1)        (length <= 2)     (length <= 2)      (length <= 4)       Gain from

                    words      P         words       P        words      P         words       P            Direct

English­Russian      385   0.91     417    0.82                                                 1.08 x

English­Hebrew       87    0.93     124    0.82                                                 1.43 x

Turkish­Russian       67    0.92       75    0.86                                                 1.12 x

Adding paths with 2 sense IDs between multilingual dictionaries:    ­ modest gain in number of words translated     ­ precision in mid 0.80’s

Page 35: CrossLingual Image Search on the Web

35

Results of Experiment 1

                                 Direct        Multilingual      All Dictionaries               (length = 1)        (length <= 2)     (length <= 2)      (length <= 4)       Gain from

                    words      P         words       P        words      P         words       P            Direct

English­Russian      385   0.91     417    0.82     513    0.81                           1.33 x

English­Hebrew       87    0.93     124    0.82     157    0.79                           1.80 x

Turkish­Russian       67    0.92       75    0.86     211    0.80                           3.15 x

Adding paths of length 2 from bilingual dictionaries:    ­ precision still above 0.80    ­ large gain in number of words translated     ­ biggest gain from adding Turkish bilingual dictionaries

Page 36: CrossLingual Image Search on the Web

36

Results of Experiment 1

                                 Direct        Multilingual      All Dictionaries               (length = 1)        (length <= 2)     (length <= 2)      (length <= 4)       Gain from

                    words      P         words       P        words      P         words       P            Direct

English­Russian      385   0.91     417    0.82     513    0.81    516   0.71       1.34 x

English­Hebrew       87    0.93     124    0.82     157    0.79    177   0.64       2.03 x

Turkish­Russian       67    0.92       75    0.86     211    0.80    236   0.72       3.52 x

Adding paths with more than 2 sense IDs:    ­ only a small further gain in words translated     ­ sharp drop in precision

Page 37: CrossLingual Image Search on the Web

37

Experiment 2: Image Search

• 10 concepts with distinctive images:ant, clown, fig, lake, sky, train, eat, run, happy, tired

• 100 random non­English terms from translation graph–  10 terms for each concept 

• Compare results of Google Image search– using non­English term as search query– using PanImages translation into English

• Metrics:– Number of results– Precision of first 15 pages of results (18 results per page)

Page 38: CrossLingual Image Search on the Web

38

Results of Experiment 2

For “33 minor languages” (Danish, Dutch, Greek, Lithuanian, …)• Increases number of correct results by 75% on first 270 results• Increases average precision by 27%

0

0.1

0.2

0.3

0.4

0.5

0.6

0 20 40 60 80 100Correct results

Prec

isio

n

Untranslated queryPanImages translation

Page 39: CrossLingual Image Search on the Web

39

Future Work

• Increase precision of translation paths– Cleaner parsing of dictionaries– More accurate word sense equivalence probabilities

• PanImages Web page in user’s choice of language• Word sense glosses in user’s language• More dictionaries to increase coverage of graph

Page 40: CrossLingual Image Search on the Web

40

Conclusions

• PanImages: a fully­implemented cross­lingual image search system for the Web:  www.cs.washington.edu/research/panimages

• PanImages boosts recall and raises precision for minor language search queries

• We introduced the translation graph– Combines multiple machine readable dictionaries– Probabilistic word­sense merging across dictionaries– Infers translations not found in any source 

dictionary

Page 41: CrossLingual Image Search on the Web

41

Thank you!