Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach

Combining Text and Image Queries at ImageCLEF2005:A Corpus-Based Relevance-Feedback Approach

Yih-Cheng Chang

Department of Computer Science and Information E

ngineeringNational Taiwan Universit

yTaipei, Taiwan

ImageCLEF 2005

Hsin-Hsi Chen Department of Computer Science and Information En

gineeringNational Taiwan University

Taipei, Taiwan

Wen-Cheng Lin

Department of Medical Informatics

Tzu Chi UniversityHualien, Taiwan

NTU NLPL 2

Why Combining Text and Image Queries in Cross Language Image Retrieval ?

Text-based image retrieval Translation errors in cross language image retrieval Annotation errors in automatic annotation Easy to catch semantic meanings Easy to construct textual query

Content-based image retrieval (CBIR) Semantic meanings are hard to be represented Have to find/draw example images Avoid translation in cross-language image retrieval Annotation is not necessary

NTU NLPL 3

How to Combine Text and Image Features in Cross Language Image Retrieval ?

Parallel approach Conducting text- and content-based retrieval separately and merging the retrieval results

Pipeline approach Using textual or visual information to perform initial retrieval, and then employing the other feature to filter out the irrelevant images

Transformation-based approach Mining the relations between images and text, and employing the mined relations to transform textual information into visual one, and vice versa

NTU NLPL 4

Approach at ImageCLEF 2004 Automatically transform textual queries into vis

ual representations Mine the relationships between text and images

Divide an image into several smaller parts Link the words in caption to the corresponding parts Analogous to word alignment in a sentence aligned parall

el corpus Build a transmedia dictionary

Transform a textual query into visual one using the transmedia dictionary

NTU NLPL 5

System at ImageCLEF2004

Query translation

Images Image captions

Text-Image correlation

learning

Text-based image retrieval

Source language textual query

Visual index

Textual index

Images Image captions

Query transformation

Transmedia dictionary

Target language textual queryVisual query

Content-based image retrieval

Result merging

Retrieved images

Language resources

Target collectionTraining collection

NTU NLPL 6

Learning Correlation

Mare and foal in field, slopes of Clatto Hill, Fife

hillmarefoalfieldslope

segmentationB01B02B03B04

NTU NLPL 7

Text-Based Image Retrievalat ImageCLEF2004

Run Query TranslationBackward

TransliterationMean Average

Precision

WCO WCO No 0.2920

WCO+NT WCO Yes 0.3276

F2hf First-two-highest-frequency No 0.4015

F2hf+NT First-two-highest-frequency Yes 0.4395

Mono - - 0.6304

Using similarity-based backward transliteration improves performance

69.71%

NTU NLPL 8

Cross-Language Experimentsat ImageCLEF2004

Query TypeMean Average

Precision

Textual Query (F2hf+NT) 0.4395

Generated Visual Query (18 topics) 0.0110

Textual Query + Generated Visual Query (N+V+A, n=30, t=0.02)

0.4441

poor

+0.46%:InsignificantPerformanceIncrease

+

NTU NLPL 9

Analyses of These Approaches

Parallel approach and Pipeline approach Simple and useful Not employ the relations between visual and textual

features Transformation-based approach

Textual and visual queries can be translated to each other using relations between visual and textual features

Hard to learn all relations between all visual and textual features

Degree of ambiguity of the relations is usually high

NTU NLPL 10

Our Approach at ImageCLEF2005:A Corpus-Based Relevance Feedback Method

A Corpus-Based Relevance Feedback approach Initiate a content-based retrieval Treat the retrieved images and their text

descriptions as aligned documents Adopt a corpus-based method to select key terms

from text descriptions, and generate a new query.

NTU NLPL 11

Fundamental Concepts of a Corpus-Based Relevant Feedback Approach

CBIR System

Example images

3511 Aux Squadron off to Germany Air-force personnel boarding aeroplane at military air base; technician cyclist and

aircraft hangars in background. July 1952 George Middlemass Cowie Fife Scotland

GMC-4-29-16 pc/mb

"Grouville Bay" Jersey Airways. Twin engined passenger aeroplane on grassy

airfield with aircraft hangars in background. Registered 2 July 1935 J Valentine & Co Jersey Channel Isles

JV-G2933 pc/mb

Textual query

Text-basedretrieval system

Textual retrieval result

aeroplane, military air base, airfield, ...

(Aircraft on the ground)

VIPER system

Normalizedmerge

CBIR system


Query translation

pseudo relevance feedback

Chinese query

English query


Visual retrieval result

Example image

Textual query (English

description)


Final retrieval result


Initial visual run

Feedback run

Textual run

Show images to user

Image database (Images with

English descriptions)





NTU NLPL 14

Bilingual Ad hoc Retrieval Task

28,133 photographs from St. Andrews University Library’s photographic collection

Collection is in English and queries are in different languages

In our experiments, queries are in Chinese All images are accompanied by a textual

description written in English by librarians working at St. Andrews Library

The test set contains 28 topics, and each topic has text description and an example image.

NTU NLPL 15

An Example – An image and Its Description

NTU NLPL 16

An Example – A topic in Chinese

A Chinese Title

An English Title

NTU NLPL 17

Some Models in Formal Runs

NTU NLPL 18

Experiment Results at ImageCLEF2005

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Query

Av

era

ge P

recis

ion

(%

)

CE

EE

EXCE+EX

EE+EX

+

+

+25.96%

+15.78%

+11.01%

Performance ofEE+EX >CE+EX EE >EX >CE >Visual run

NTU NLPL 19

Lessons Learned

Comparing to initial visual retrieval, average precision is increased from 8.29% to 34.25% after feedback cycle.

Combining Textual and Visual information can improve performance

20

Example: Aircraft on the Ground ( )

Text only (monolingual)

Text only (cross-lingual )

Top 2 images in cross-lingual run are non-relevant because of query translation problem : clear ( ), above ( ), floor ( )

NTU NLPL 21

Example: Aircraft on the Ground (after integration)

Text (monolingual) + Visual

Text+Visual Run is better than monolingual run because it expands some useful words, e.g., aeroplane, military air base, airfield

NTU NLPL 22

ImageCLEF2004 vs. ImageCLEF2005 Text-based IR (monolingual case)

0.6304 (2004) vs. 0.3952 (2005) Topics of this year is a little harder

Text+Image IR (monolingual case) 0.6591 (2004) vs. 0.5053 (2005)

Text+Image IR (crosslingual case) 0.4441 (2004) vs. 0.3977 (2005) 70.45% vs. 100.63%

NTU NLPL 23

Automatic Annotation Task The automatic annotate task in ImageCLEF 2005 ca

n be seen as a classification task, since each image can only be annotated with one word (i.e., a category)

We propose several methods to measure the similarity between a test image and a category, and a test image is classified to the most similar category.

The methods we proposed use the same image features, but different classification approaches.

NTU NLPL 24

Image Feature Extraction Resize images to 256 x 256 pixels Segment each image into 32 x 32 blocks

(each block is 8 x 8 pixels). Compute the average gray value of each

block to construct a vector with 1,024 elements.

The similarity between two images is measured by cosine formula.

NTU NLPL 25

Some Models and Experimental Results

NTU-annotate05-1NN

Baseline model. It uses 1-NN method to classify each image. NTU-annotate05-Top2

Computing the similarity between a test image and a category using the top 2 nearest images in each category, and classify the test image to the most similar category.

NTU-annotate05-SC Training data is clustered using k-means algorithm (k=1000). We compute

the centroid of each category in each cluster, and classify a test image to the category of the nearest centroid.

NTU NLPL 26

Conclusion:Bilingual Ad hoc Retrieval Task

An approach of combining textual and image features is proposed for Chinese-English image retrieval. a corpus-based feedback cycle from CBIR

Compared with the performance of monolingual IR (0.3952), integrating visual and textual queries achieves better performance in CL image retrieval (0.3977). resolve part of translation errors

The integration of visual and textual queries also improves the performance of the monolingual IR from 0.3952 to 0.5053. provide more information

The improvement is the best among all the groups. 78.2% of the best monolingual text retrieval

NTU NLPL 27

Conclusion:Automatic Annotation Task

A feature extraction algorithm is proposed and several classification approaches are explored under the same image features.

The approaches of 1-NN and top-2, which have error rates 21.7%, outperform the centroid-based approach (with error rate 22.5%).

Our method is 9% worse than the group of the best performance (error rate 12.6%), but is better than most of the groups in this task.

Thank You and Comments

Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach

Documents

Transcript of Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach