Fast Query Point Movement Techniques with Relevance...

Fast Query Point Movement Techniques withRelevance Feedback for Content-based Image

Retrieval

Danzhou Liu, Kien A. Hua, Khanh Vu, and Ning Yu

School of Computer ScienceUniversity of Central FloridaOrlando, Florida 32816, USA

{dzliu, kienhua, khanh, nyu}@cs.ucf.eduhttp://www.cs.ucf.edu/~dzliu

Abstract. Target search in content-based image retrieval (CBIR) sys-tems refers to finding a specific (target) image such as a particular reg-istered logo or a specific historical photograph. Existing techniques weredesigned around query refinement based on relevance feedback, sufferfrom slow convergence, and do not even guarantee to find intended tar-gets. To address those limitations, we propose several efficient querypoint movement methods. We theoretically prove that our approach isable to reach any given target image with fewer iterations in the worstand average cases. Extensive experiments in simulated and realistic en-vironments show that our approach significantly reduces the number ofiterations and improves overall retrieval performance. The experimentsalso confirm that our approach can always retrieve intended targets evenwith poor selection of initial query points and can be employed to im-prove the effectiveness and efficiency of existing CBIR systems.

1 Introduction

Content-based image retrieval (CBIR) has received much research attention inthe last decade, motivated by the immensely growing amount of multimediadata. Many CBIR systems have recently been developed, including QBIC [5],MARS [11, 14], Blobworld [2], PicHunter [4], and others [15, 18, 20, 21]. In a typ-ical CBIR system, low-level visual image features (e.g., color, texture and shape)are automatically extracted for image descriptions and indexing purposes. Tosearch for desirable images, a user presents an image as an example of similarity.The system then returns a set of similar images based on the extracted features.In CBIR systems with relevance feedback, the user can mark returned images aspositive or negative, which are fed back into the system as a new, refined queryfor the next round of retrieval. The process is repeated until the user is satisfiedwith the query result. Relevance feedback helps bridge the semantic gap betweenthe descriptive limitations of low-level features and human perception of similar-ity [16]. Such systems achieve high effectiveness for many practical applications[6].

There are two general types of search: target search and category search [4,6]. The goal of target search is to find a specific (target) image (e.g., a registeredlogo, a historical photograph or a painting), which can be determined basedon low-level features. The goal of category search is to retrieve a particularsemantic class or genre of images (e.g. scenery images or skyscrapers). Targetsearch corresponds to known-item search in information retrieval; category searchcorresponds to high-precision search. Due to semantic gaps, images in a semanticcategory might scatter in several clusters in low-level feature space. To retrievea semantic class, category search is normally decomposed into several targetsearches, in which representatives of the clusters are located. The representativesare then used to retrieve the members of the clusters. Efficient target searchtechniques are therefore essential for both target search and category search.Hence, we focus on target search in this paper.

Existing target search techniques allow the re-retrieval of checked imageswhen they fall in the search range. This leads to a host of major disadvantages:

– Local maximum traps. Since query points in relevance feedback systemshave to move through many regions before reaching a target, it is possiblethat they get trapped in one of these regions. Figure 1 illustrates a possiblescenario. As a result of a 3-NN search at ps, the system returns points p1 andp2, in addition to query point ps (s and t respectively denote the startingquery point ps and the target point pt). Since both p1 and p2 are relevant,the refined query point pr is their centroid and the anchor of the next 3-NNsearch. However, the system will retrieve exactly the same set, from whichpoints p1 and p2 are again selected. In other words, the system can neverget out because the retrieval set is saturated with the k checked images.Although, the system can escape with a larger k, it is difficult to guess aproper threshold (up to k = 14 in this example). Consequently, we mightnot even know a local maximum trap is occurring.

– No guarantee that returned images are the most relevant. This is due tolocal maximum traps and thereby no guarantee to find the target image.

– Slow convergence. The centroid of the relevant points is typically selected asthe anchor of refined queries. This, coupling with possible retrieval of alreadyvisited images, prevents aggressive movement of search (see Figure 2, wherek = 3). Slow convergence also implies that users must spend more time withthe system, refining intermediate queries.

– High resource requirements. These overheads are the results of slow conver-gence, local maximum traps and larger intermediate results.

To address the above limitations, we propose four target search methods:naıve random scan (NRS), local neighboring movement (LNM), neighboringdivide and conquer (NDC), and global divide and conquer (GDC) methods.All these methods are built around a common strategy: they do not retrievechecked images (i.e., shrink the search space). Furthermore, NDC and GDC ex-ploit Voronoi diagrams to aggressively prune the search space and move towardstarget images. We theoretically prove that the convergence speeds of GDC and

NDC are much faster than those of NRS and recent methods. Results of exten-sive experiments confirm our complexity analysis and show the superiority ofour techniques in both the simulated and realistic environments. A preliminarydesign based on heuristics was presented in [10]. This paper introduces theo-ries and formal proofs to support the proposed techniques, and presents moreextensive experiments.

The remaining of the paper is organized as follows. In Section 2, we surveyrecent works on target search. Section 3 presents in detail our proposed methodsfor target search. Section 4 describes our performance experiments and discussesthe results. Finally, we conclude the paper and highlight our future researchdirections in Section 5.

2

s 1

t

r

Cluster 1 Cluster 2

Fig. 1. Local maximum trap

t

s

t

Fig. 2. Slow convergence

2 Related Work

In this section, we survey existing techniques for target search. We also reviewcategory search techniques because they are closely related. Category search

techniques can be used for target search if we assume the desired category hasonly one target image.

Two well-known techniques for target search were proposed in QBIC [5] andPicHunter [4]. IBM’s QBIC system allows users to compose queries based onvisual image features such as color percentage, color layout, and texture presentin the target image, and ranks retrieved images according to those criteria. Toachieve good results, users are required to compose queries with an adequateknowledge of the targets’ properties, which is normally a difficult and time-consuming process for unskilled users. To lessen the burden on users, PicHunterproposes to predict query’s intents using a Bayesian-based relevance feedbacktechnique to guide query refinement and target search. PicHunter’s performance,however, depends on the consistency of users’ behavior and the accuracy of theprediction algorithm. In addition, both QBIC and PicHunter do not guaranteeto find target images and suffer local maximum traps.

Techniques for category search can be divided into two groups: single-pointand multipoint movement techniques. A technique is classified as a single-pointmovement technique if the refined query Qr at each iteration consists of onlyone query point. Otherwise, it is a multi-point movement technique. Typicalquery shapes of single-point movement and multi-point movement techniques areshown in Figures 3 and 4 where the contours represent equi-similarity surfaces.Single-point movement techniques, such as MARS [11, 14] and MindReader [8],construct a single query point, which is close to relevant images and away fromirrelevant ones. MARS uses a weighted distance (producing shapes as shownin Figure 3.2), where each dimension weight is inversely proportional to thestandard deviation of the relevant images’ feature values in that dimension. Therationale is that a small variation among the values is more likely to expressrestrictions on the feature, and thereby should carry a higher weight. On theother hand, a large variation indicates this dimension is not significant in thequery, thus should assume a low weight. MindReader achieves better results byusing a generalized weighted distance, see Figure 3.3 for its shape.

1. Original 2. Dimension Weighting 3. Generalized Weighting

Fig. 3. Single-points movement query shapes

In multipoint movement techniques such as Query expansion [3], Qcluster [9],and Query Decomposition [7], multiple query points are used to define the idealspace that is most likely to contain relevant results. Query expansion groupsquery points into clusters and chooses their centroids as Qr’s representatives ,

1. Convex Shape 2. Concave Shape

Fig. 4. Multiple point movement query shapes

see Figure 4.1. The distance of a point to Qr is defined as a weighted sum ofindividual distances to those representatives. The weights are proportional to thenumber of relevant objects in the clusters. Thus, Query expansion treats localclusters differently, compared to the equal treatment in single-points movementtechniques.

In some queries, clusters are too far apart for a unified, all-encompassingcontour to be effective; separate contours can yield more selective retrieval. Thisobservation motivated Qcluster to employ an adaptive classification and cluster-merging method to determine optimal contour shapes for complex queries. Qclus-ter supports disjunctive queries, where similarity to any of the query points isconsidered as good, see Figure 4.2. To handle disjunctive queries both in vec-tor space and in arbitrary metric space, a technique was proposed in FALCON[22]. It uses an aggregate distance function to estimate the (dis)similarity of anobject to a set of desirable images. To handle semantic gaps better, we recentlyproposed a query decomposition technique [7]. In general, the above categorysearch techniques do not guarantee to find target images and still suffer slowconvergence, local maximum traps and high computation overhead.

To avoid local maximum traps and their problems, our methods will ignore allchecked images. They will be discussed in the order of their sophistication in thenext section. The most complex, GDC, is based on the single-point movementmethod, which proves to converge faster than multipoint movement mehthods.It employs Voronoi diagrams to prune irrelevant images, assisting users in queryrefinement and enabling fast convergence.

3 Target Search Methods

In this section, we present the four proposed target search methods. Again, thegoals of our target search methods are avoiding local maximum traps, achievingfast convergence, reducing resource requirements, and guaranteeing to find targetimages. Reconsidering already checked images is one of the several shortcomingsof existing techniques that causes local maximum trap and slow convergence. Theidea of leaving out checked images is our motivation for a new design principle.We assume that users are able to accurately identify the most relevant image outof the returned images, and the most relevant image is the closest image to thetarget image among the returned ones. We will also discuss the steps to ensure

our target search system less sensitive to users’ inaccurate relevance feedback inSection 4.

The ultimate goal of target search is to find target images. Thus if targetimages were not found, the final precision and recall would be zero. In CBIRwith relevance feedback, the traditional recall and precision can be computedfor individual iteration. For target search, we can use the so-called ‘aggregate’recall and precision: if after several, say i, iterations the target image is found,the average precision and recall are 1/(i · k) and 1/i, where k is the numberof images retrieved at each iteration. In short, the number of iterations to findtarget images is not only the most significant measure of efficiency but also themost significant indicator of precision and recall. Therefore, we use the numberof iterations as the major measure for theoretical analysis and experimentalevaluation of the four proposed target search methods.

Initial Random Images or Query Result Images

Relevance Feedback Refined Query Q r

Terminate Find Target

Not Find Target Target Search

Methods

Evaluate Q r

User Evaluate

Fig. 5. Overview of the target search systems

A query is defined as Q = 〈nQ, PQ,WQ, DQ,S, k〉, where nQ denotes thenumber of query points in Q, PQ the set of nQ query points in the search spaceS, WQ the set of weights associated with PQ, DQ the distance function, andk the number of points to be retrieved in each iteration (see Figure 5). Forsingle point movement techniques, nQ = 1; for multipoint movement techniques,nQ > 1; and nQ = 0 signifies that the query is to randomly retrieve k points inS. This definition is a generalized version of Q = 〈nQ, PQ,WQ, DQ〉 defined in[3], where the search space is assumed to be the whole database for every search.In our generalized definition, S is included to account for the dynamic changeof search space, which is usually reduced after each iteration. Let Qs denote thestarting query, Qr a refined query at a feedback iteration, Qt a target querywhich results in the retrieval of the intended target, and Sk the query result set.

3.1 Naıve Random Scan Method

The NRS method randomly retrieves k different images at a time until the userfinds the target image or the remaining set is exhausted, see Figure 6. Specifically,at each iteration, a set of k random images are retrieved from the candidate (i.e.unchecked) set S′ for relevance feedback (lines 2 and 6), and S′ is then reduced

NAIVERANDOMSCAN(S, k)

Input:set of images Snumber of retrieved images at each iteration kOutput:target image pt

01 Qs ← 〈0, PQ, WQ, DQ, S, k〉02 Sk ← EVALUATEQUERY(Qs) /* randomly retrieve k points in S */03 S′ ← S− Sk

04 while user does not find pt in Sk do05 Qr ← 〈0, PQ, WQ, DQ, S′, k〉06 Sk ← EVALUATEQUERY(Qr) /* randomly retrieve k points in S′ */07 S′ ← S′ − Sk

08 enddo09 return pt

Fig. 6. Naıve Random Scan Method

by k (lines 3 and 7). Clearly, the naıve scan algorithm does not suffer localmaximum traps and is able to locate the target image after some finite number ofiterations. In the best case, NRS takes one iteration, while the worst case requires⌈|S|k

⌉. On average NRS can find the target in

⌈∑d |S|k ei=1 i/

⌈|S|k

⌉⌉=

⌈(⌈|S|k

⌉+ 1)/2

⌉

iterations. In other words, NRS takes O(|S|) to reach the target point. Therefore,NRS is only suitable for a small database set.

3.2 Local Neighboring Movement Method

Existing techniques allow already checked images to be reconsidered, which leadsto several major drawbacks as mentioned in Section 1. We apply our non-re-retrieval strategy to one such method, such as MindReader [8], to produce theLNM method. LNM is similar to NRS except lines 5 and 6 as follows:

05 Qr ← 〈nQ, PQ, WQ, DQ, S′, k〉 based on the user’s relevance feedback06 Sk ← EVALUATEQUERY(Qr) /* perform a k-NN query in S′ */

Specifically, Qr is constructed such that it moves towards neighboring relevantpoints and away from irrelevant ones, and k-NN query is now evaluated againstS′ instead of S (lines 5 and 6). When LNM encounters a local maximum trap,it enumerates neighboring points of the query, and selects the one closest to thetarget. Therefore, LNM can overcome local maximum traps, although it couldtake many iterations to do so.

Again, one iteration is required in the best case. To simplify the followingworst-case and average-case complexity analysis, we assume that S is uniformlydistributed in the n-dimensional hypercube and the distance between two nearestpoints is a unit.

Theorem 1. For LNM, the worst and average cases are⌈√

n n√|S|/dlog2n ke

⌉

and⌈(√

n n√|S|

dlog2n ke + 1)/2⌉, respectively, assuming S is uniformly distributed.

Proof. The hypercube’s edge length is n√|S|−1, and the diagonal’s

√n( n

√|S|−1).

Let the distance between the initial query point and the target point be l, thenl ≤ √

n( n√|S| − 1) <

√n n

√|S|. Note that the expected radius for k-NN search

in S is r = dlog2n ke. Since S′ ⊂ S, k-NN search in LNM requires a radiuslarger than r, but less than 2r. In other words, at each iteration, LNM movestowards the target image at an average speed of cr where 1 ≤ c < 2. It fol-lows that the number of iterations needed to reach the target is dl/(cdlog2n ke)e,which is bounded by

⌈√n n

√|S|/dlog2n ke

⌉. Then, the worst and average cases

are⌈√

n n√|S|/dlog2n ke

⌉and

⌈(√

n n√|S|

dlog2n ke + 1)/2⌉, respectively.

If data were arbitrarily distributed, then the worst case could be as high asNRS’s, i.e.

⌈|S|k

⌉iterations (e.g., when all points are on a line). In summary, in

the worst case LNM could take anywhere from O( n√|S|) to O(|S|).

3.3 Neighboring Divide and Conquer Method

Although LNM can overcome local maximum traps, it does so inefficiently, takingmany iterations and in the process returning numerous false hits. To speed upconvergence, we propose to use Voronoi diagrams [13] in NDC to reduce searchspace. The Voronoi diagram approach finds the nearest neighbors of a givenquery point by locating the Voronoi cell containing the query point. Specifically,NDC searches for the target as follows, see Figure 7. From the starting queryQs, k points are randomly retrieved (line 2). Then the Voronoi region V Ri isinitially set to the minimum bounding box of S (line 3). In the while loop,NDC first determines the Voronoi seed set Sk+1 (lines 6 to 10) and pi, the mostrelevant point in Sk+1 according to the user’s relevance feedback (line 11). Next,it constructs a Voronoi diagram V D inside V Ri using Sk+1 (line 12). The Voronoicell region containing pi in V D is now the new V Ri (line 13). Because only V Ri

can contain the target (as proved in Theorem 2), we can safely prune out theother Voronoi cell regions. To continue the search in V Ri, NDC constructs a k-NN query using pi as the anchor point (line 15), and evaluates it (line 16). Theprocedure is repeated until the target pt is found. When NDC encounters a localmaximum trap, it employs Voronoi diagrams to aggressively prune the searchspace and move towards the target image, thus significantly speeding up theconvergence. Therefore, NDC can overcome local maximum traps and achievefast convergence. We prove the following invariant.

Theorem 2. The target point is always contained inside or on an edge (surface)of V Ri, the Voronoi cell region enclosing the most relevant point pi.

NEIGHBORINGDIVIDECONQUER(S, k)

Input:set of images Snumber of retrieved images at each iteration kOutput:target image pt

01 Qs ← 〈0, PQ, WQ, DQ, S, k〉02 Sk ← EVALUATEQUERY(Qs) /* randomly retrieve k points in S */03 V Ri ← the minimum bounding box of S04 iter ← 105 while user does not find pt in Sk do06 if iter 6= 1 then07 Sk+1 ← Sk + {pi}08 else09 Sk+1 ← Sk

10 endif11 pi ← the most relevant point ∈ Sk+1

12 construct a Voronoi diagram V D inside V Ri using points in Sk+1 asVoronoi seeds

13 V Ri ← the Voronoi cell region associated with the Voronoi seed pi

in V D14 S′ ← such points ∈ S that are inside V Ri except pi

15 Qr ← 〈1, {pi}, WQ, DQ, S′, k〉16 Sk ← EVALUATEQUERY(Qr) /* perform a k-NN query in S′ */17 iter ← iter + 118 enddo19 return pt

Fig. 7. Neighboring Divide and Conquer Method

Proof. Theorem 2 can be proved by contradiction. First, note that according tothe properties of the Voronoi cell construction, if V Ri contains the most relevantpoint (i.e. the closest point) pi to the target point pt, its seed pi is the nearestneighbor of pt among Sk+1. Suppose pt is inside V Rj , i 6= j. Then there exitsanother point in Sk+1 closer to pt than pi, a contradiction.

Figure 8 explains how NDC approaches the target. In the first iteration,Sk = {p1, p2, ps} is randomly picked by the system, assuming k = 3. The useridentifies ps as pi (the most relevant point in Sk). NDC then constructs a Voronoidiagram based on those three points in Sk+1 = Sk, partitioning the search spaceinto three regions. According to Theorem 2, the target must be in V Ri. NDCthus ignoring the other two regions, performs a k-NN query anchored at ps

and retrieves Sk = {p3, p4, p5}, the three closest points inside V Ri. Again, theuser correctly identifies p5 as the most relevant point in Sk+1 = {ps, p3, p4, p5}.The system constructs a Voronoi diagram and searches only the Voronoi cell

associated with p5. The search continues and, finally, at the fourth iteration, thetarget point is reached as the result of a k-NN query of p6, the most relevantpoint in {p5, p6, p7, p8} retrieved in the third iteration. We now determine theworst-case complexity for NDC, assuming that S is uniformly distributed.

Theorem 3. Starting from any point in S, NDC can reach any target point inO(logk |S|) iterations.

Proof. At the first iteration, S is divided into k Voronoi cells. Since the pointsare uniformly distributed from which k points are randomly sampled, each V R

is expected to contain⌈|S|k

⌉points. According to Theorem 2, we only need to

search one V R, which contains about⌈|S|k

⌉points. In the second iteration, the

searched V R contains⌈( |S|k − 1)/k

⌉' ⌈|S|/k2

⌉points. In the ith iteration, each

V R contains about⌈|S|ki

⌉points. Since |S|

ki ≥ 1, NDC will stop by i ≤ logk |S|.Hence, NDC reaches the target point in no more than O(logk |S|) iterations.

When S is arbitrarily distributed, the worst case could take up to⌈ S

k

⌉iter-

ations (e.g., all points are on a line), the same as that of NRS. In other words,NDC could still require O(|S|) iterations to reach the target point in the worstcase.

2

3

1

7

8

t 6 4

5

1

3 s

Fig. 8. Example of NDC

3.4 Global Divide and Conquer Method

To reduce the number of iterations in the worst case in NDC, we propose theGDC method. Instead of using a query point and its neighboring points toconstruct a Voronoi diagram, GDC uses the query point and k points randomlysampled from V Ri. Specifically, GDC replaces lines 15 and 16 in NDC with:

15 Qr ← 〈0, PQ, WQ, DQ, S′, k〉16 Sk ← EVALUATEQUERY(Qr) /* randomly retrieve k points in S′ */

Similar to NDC, when encountering a local maximum trap, GDC employsVoronoi diagrams as well to aggressively prune the search space and move to-wards the target image, thus significantly speeding up the convergence. There-fore, GDC can overcome local maximum traps and achieve fast convergence.

1

2

6

3

t

4

5

s

Fig. 9. Example of GDC

Figure 9 shows how the target could be located according to GDC. In thefirst iteration, Sk = {p1, p2, ps} is the result of k = 3 randomly sampled points, ofwhich ps is picked as pi. Next, GDC constructs a Voronoi diagram and searchesthe V R enclosing ps. At the second iteration, Sk+1 = {ps, p4, p5, p6} and p5 isthe most relevant point pi. In the third and final iteration, the target point islocated; GDC takes 3 iterations to reach the target point. We prove that theworst case for GDC is bounded by O(logk |S|).

Theorem 4. Starting from an initial point in S, GDC can reach any targetpoint in O(logk |S|) iterations.

Proof. We will focus our attention on the size of V R at each iteration, keepingin mind that points are randomly sampled for Voronoi diagram construction.Thus, at the first iteration, the searched V R contains

⌈|S|k

⌉points; at the second

iteration, it contains⌈

|S|k·(k+1)

⌉points; and so on. At the ith iteration, it contains⌈

|S|k·(k+1)i−1

⌉points. Because |S|

k·(k+1)i−1 > 1, that is, it requires that i < logk |S|.In other words, GDC can reach any target point in no more than O(logk |S|)iterations.

Theorem 4 implies that for arbitrarily distributed datasets, GDC convergesfaster than NDC in general, although NDC might be as fast as GDC in certainqueries, e.g., if the starting query point is close to the target point. In the previous

example (Figure 8), NDC could also take three iterations, instead of four, to reachthe target point if the initial k points were the same as in Figure 9, as opposedto Figure 8.

For simplicity, we assume that users accurately pick the most relevant imageout of the returned images for each iteration in the above discussion. In practice,however, this cannot be easily achieved by users, and typically users can pickseveral relevant images instead of one in target search systems. Therefore, we canconstruct, in each iteration, a single query point that is the weighted centroid ofall the picked relevant images as in MARS and MindReader.

4 Experiments

In this section, we present the experimental results in both simulated and real-istic environments. Our dataset consists of more than 68,040 images from theCOREL library. There are a total of 37 visual image features in three maingroups: colors (9 features) [19], texture (10 features) [17], and edge structure(18 features) [23]. Our experiments were run on Sun UltraSPARC with 2GBmemory. All the data resided in memory.

4.1 Simulated Experiments

In these experiments, we evaluated the performances of MARS [11, 14], Min-dReader [8], and Qcluster [9] against our techniques (NRS’s results are omittedsince its performance can be statistically predicted). The performance metricsof interest are average total visited images, precision, recall, computation timeand the number of iterations (average, maximum, minimum, and their variance)needed for each method to retrieve an intended target. These were measured ask takes different values in {5, 15, 30, 50, 75, 100}. There were 100 pairs of startingpoints-target points selected randomly for the experiments.

In order to avoid the effects of user’s subjective and inconsistent behaviors,relevance feedback was simulated; the point in the retrieval set that is closestto the target point is automatically selected as the most relevant point. To savecomputation overhead for NDC and GDC, we constructed the Voronoi regionV Ri containing the most relevant point instead of the whole Voronoi diagram,and approximated V Ri by its minimum boundary box if V Ri contains too manysurfaces.

To illustrate common problems (slow convergence, local maximum traps,etc.) with existing approaches, we demonstrate that MARS, MindReader andQcluster have poor false hit ratios for small k. Figure 10 shows that when k issmall, their performance is affected by local maximum traps, i.e., their false hitratios are very high. Even for a fairly large k, false hits remain very high. Forexample, when k = 100, MARS’s false hit ratio is about 20% and Qcluster’sexceeds 40%, while the best performer MindReader is just below the 20% mark.As a result, users of these techniques have to examine a large number of returnedimages, but might not find their intended targets.

5 15 30 50 75 1000

0.2

0.4

0.6

0.8

1

k

fals

e hi

t rat

io

MARSMindReaderQcluster

Fig. 10. False Hit Ratio

5 15 30 50 75 1000

5

10

15

20

25

k

aver

gae

itera

tions

LNMNDCGDC

Fig. 11. Average Iterations

5 15 30 50 75 1000

10

20

30

40

50

60

k

max

imum

iter

atio

ns

LNMNDCGDC

Fig. 12. Maximum Iterations

5 15 30 50 75 1002

3

4

5

6

7

k

min

imum

iter

atio

ns

LNMNDCGDC

Fig. 13. Minimum Iterations

5 15 30 50 75 1000

2

4

6

8

10

12

k

stan

dard

dev

iatio

n

LNMNDCGDC

Fig. 14. Standard Deviation ofIterations

5 15 30 50 75 1000

0.1

0.2

0.3

0.4

0.5

k

aver

age

aggr

egat

e re

call

LNMNDCGDC

Fig. 15. Average Aggregate Re-call

5 15 30 50 75 1000

0.005

0.01

0.015

0.02

0.025

0.03

k

aver

age

aggr

egat

e pr

ecis

ion

LNMNDCGDC

Fig. 16. Average AggregatePrecision

5 15 30 50 75 1000

50

100

150

200

250

300

350

400

k

aver

age

tota

l che

cked

imag

es

LNMNDCGDC

Fig. 17. Average Total CheckedImages

5 15 30 50 75 1000.2

0.4

0.6

0.8

1

1.2

1.4

1.6

k

aver

age

tota

l cpu

tim

e (s

econ

ds) LNM

NDCGDC

Fig. 18. CPU Time

In the experiments that produced the number of iterations, we had to makesure that the compared techniques could successfully reach the intended targets.We thus used LNM in place of MindReader (LNM is an improved version ofMindReader, see Section 3). The experimental results for LNM, NDC and GDCare shown in Figures 11 to 18. They show that NDC and GDC perform more ef-ficiently when k is small, with GDC being slightly better than NDC. Specifically,when k = 5, the average numbers of iterations for LNM, NDC and GDC (seeFigure 11) are roughly 21, 10 and 7, respectively (compared to 68040

5 = 13608 it-erations in NRS); the maximum numbers are 58, 20 and 11, respectively (see Fig-ure 12); and the minimum numbers are 7, 4 and 4, respectively (see Figure 13).The results also confirm our analysis of GDC complexity (see Figure 11): GDCcan reach the target point in O(logk |S|) = (log5 68040) = 6.9141 ' 7 iterations.

The standard deviations of the iterations are shown in Figure 14. GDC andNDC are much more stable than LNM, with GDC’s slightly more uniform thanNDC’s. This indicates that GDC and NDC can achieve fast convergence evenwith a poor selection of initial query points.

The average ‘aggregate’ recalls and precisions, defined in Section 3, are shownin Figures 15 and 16 respectively. Again, experimental results show that NDCand GDC achieve better retrieval effectiveness (precision and recall) when k issmall compared to LNM, with GDC being slightly better than NDC.

The average total checked images for LNM, NDC, and GDC in the experi-ments are plotted in Figure 17. The figure shows that GDC and NDC examinefewer than half of the total checked images of LNM (compared to 68040

2 = 34020images need to be checked in NRS). In terms of CPU time, GDC is the mostefficient, although the difference is smaller as k increases (see Figure 18). Thisis because NDC and GDC take some computation overhead to construct V Ri,while LNM requires more iterations and associated computation time for ad-justing the generalized distance function. Overall, GDC and NDC significantlyoutperform LNM, with GDC slightly outdoing NDC.

4.2 Realistic Experiments

In simulated experiments, the most relevant points were assumed to be accu-rately selected among the returned points. In practice, however, this cannotbe easily achieved by human evaluators, unless the most relevant images aredistinctly stood out. To evaluate our methods’ performance in realistic envi-ronments, we have developed an image retrieval system based on ImageGrouper[12]. Our prototype, shown in Figure 19, allows users to pose queries by draggingand grouping multiple relevant images on the work space, choose discriminativevisual features, and select one of the three retrieval methods (LNM, NDC andGDC). It also allows users to rollback inaccurate relevance feedback in the pre-vious iteration. Thus, for instance, if there are several relevant images, the usercan group them together to form a query, and if he reaches a dead-end withoutfinding the target image, he can rollback.

We trained 20 graduate students how to use the target search system andasked them to find 36 given target images from different semantic categories.

Fig. 19. Target Search GUI Interface

In Figure 20, we show the results for the given 36 target images with k = 50.Two images, race cars and an ancient building, averagely took more iterationsthan the others to retrieve, mainly because many similar images exist in thecollection. Even so, only 7 iterations on average were needed to locate them.

Users’ inaccurate relevance feedback is a major issue for almost all CBIRsystems with relevant feedback. We have taken steps to ensure our system is lesssensitive to users’ inaccurate relevance feedback, in design and in implementa-tion. In the experimental study, our system monitored users’ feedback and wascapable of detecting inconsistent behavior (in the NDC and GDC algorithms,query points are selected following a general direction toward the target, i.e. inthe active Vonoroi cell). Our prototype (see Figure 19) allows users to backtracktheir selections if missteps are made. The results were excellent overall, indicatedby the successful finding of the intended targets. Of course users’ inaccurate rel-evance feedback is a difficult problem but our results are encouraging.

5 Conclusions

In this paper, we proposed four target search methods using relevance feedbackfor content-based image retrieval systems. Our research was motivated by the ob-servation that revisiting of checked images can cause many drawbacks includinglocal maximum traps and slow convergence. Our methods outperform existingtechniques including MARS (employing feature weighting), MindReader (em-ploying better feature weighting), and Qcluster (employing probabilistic mod-els). All our methods are able to find the target images, with NDC and GDCconverging faster than NRS and LNM (which represents an improved version ofMindReader). Again, the number of iterations to find target images is not onlythe most significant measure of efficiency but also the most significant indica-tor of precision and recall. Simulated experiments have shown that NDC and

Average Iterations: 6 Average Iterations: 5 Average Iterations: 5 Average Iterations: 7 Average Iterations: 5 Average Iterations: 4






Fig. 20. GUI results with k=50

GDC work more efficiently and effectively when k is smaller, and GDC achiev-ing O(logk |S|) iterations is slightly better than NDC. Experiments with ourprototype show that our approach can achieve fast convergence (i.e. O(logk |S|)iterations) even in the realistic environments.

We outline below some ongoing research to improve the system’s perfor-mance:

– Evaluating different visual features and deducting rules for identifying themost discriminative features of image queries so that higher accuracy ofrelevance feedback can be achieved.

– Adopting the idea of ostensive relevance feedback [1], where the checkedimages are used to refine the query, and the length of time since an imagewas checked is used in a decay function to modulate the impact of thosealready checked images.

– Extending our methods to support category search. Recall that, due to se-mantic gaps, images in a semantic class might scatter in several clusters inthe feature space. Category search can then be executed in the form of mul-tiple target searches. Each target search is to find a representative image ofa cluster, which will be the anchor of a k-NN query to find other images inthe respective cluster.

– Extending our methods to support video target search. Efficient video targetsearch technique (i.e. finding specific scenes in videos) is also an essential toolfor video retrieval applications.

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructivecomments and suggestions on a previous draft.

References

1. P. Browne and A. F. Smeaton. Video Information Retrieval Using Objects andOstensive Relevance Feedback. In Proceedings of the ACM Symposium on AppliedComputing (SAC), pages 1084–1090, 2004.

2. C. Carson, S. Belongie, H. Greenspan, and J. Malik. Blobworld: image segmenta-tion using expectation-maximization and its application to image querying. IEEETransactions on Pattern Analysis and Machine Intelligence, 24(8):1026–1038, 2002.

3. K. Chakrabarti, O.-B. Michael, S. Mehrotra, and K. Porkaew. Evaluating refinedqueries in top-k retrieval systems. IEEE Transactions on Knowledge and DataEngineering, 16(2):256–270, 2004.

4. I. J. Cox, M. L. Miller, T. P. Minka, T. V. Papathomas, and P. N. Yianilos.The Bayesian image retrieval system, PicHunter: theory, implementation, and psy-chophysical experiments. IEEE Transactions on Image Processing, 9(1):20–37,2000.

5. M. Flickner, H. S. Sawhney, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner,D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query by image and video content:The QBIC system. IEEE Computer, 28(9):23–32, 1995.

6. T. Gevers and A. Smeulders. Content-based image retrieval: An overview. InG. Medioni and S. B. Kang, editors, Emerging Topics in Computer Vision. PrenticeHall, 2004.

7. K. A. Hua, N. Yu, and D. Liu. Query Decomposition: A Multiple NeighborhoodApproach to Relevance Feedback Processing in Content-based Image Retrieval. InProceedings of the IEEE ICDE Conference, 2006.

8. Y. Ishikawa, R. Subramanya, and C. Faloutsos. MindReader: Querying databasesthrough multiple examples. In Proceedings of the 24th VLDB Conference, pages218–227, 1998.

9. D.-H. Kim and C.-W. Chung. Qcluster: relevance feedback using adaptive clus-tering for content-based image retrieval. In Proceedings of the ACM SIGMODConference, pages 599–610, 2003.

10. D. Liu, K. A. Hua, K. Vu, and N. Yu. Efficient Target Search with Relevance Feed-back for Large CBIR Systems. In Proceedings of the 21st Annual ACM Symposiumon Applied Computing, 2006.

11. O.-B. Michael and S. Mehrotra. Relevance feedback techniques in the MARS imageretrieval systems. Multimedia Systems, (9):535–547, 2004.

12. M. Nakazato, L. Manola, and T. S. Huang. ImageGrouper: a group-oriented userinterface for content-based image retrieval and digital image arrangement. Journalof Visual Languages and Computing, 14(4):363–386, 2003.

13. F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction.Springer-Verlag, New York Inc., 1985.

14. Y. Rui, T. Huang, M. Ortega, and S. Mehrotra. Relevance feedback: A power toolfor interactive content-based image retrieval. IEEE Transactions on Circuits andSystems for Video Technology, 8(5):644–655, 1998.

15. H. T. Shen, B. C. Ooi, and X. Zhou. Towards effective indexing for very largevideo sequence database. In Proceedings of the ACM SIGMOD Conference, pages730–741, 2005.

16. A. W. M. Smeulders, M. Worring, A. G. S. Santini, and R. Jain. Content-based im-age retrieval at the end of the early years. IEEE Transactions on Pattern Analysisand Machine Intelligence, 22(12):1349–1380, 2000.

17. J. R. Smith and S.-F. Chang. Transform features for texture classification anddiscrimination in large image databases. In Proceedings of the International Con-ference on Image Processing, pages 407–411, 1994.

18. J. R. Smith and S.-F. Chang. VisualSEEk: A fully automated content-based imagequery system. In Proceedings of the 4th ACM Multimedia Conference, pages 87–98,1996.

19. M. A. Stricker and M. Orengo. Similarity of color images. In Proceedings of Storageand Retrieval for Image and Video Databases (SPIE), pages 381–392, 1995.

20. K. Vu, K. A. Hua, and W. Tavanapong. Image retrieval based on regions of interest.IEEE Transactions on Knowledge and Data Engineering, 15(4):1045–1049, 2003.

21. J. Z. Wang, J. Li, and G. Wiederhold. SIMPLIcity: Semantics-sensitive integratedmatching for picture libraries. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 23(9):947–963, 2001.

22. L. Wu, C. Faloutsos, K. Sycara, and T. R. Payne. FALCON: feedback adaptiveloop for content-based retrieval. In Proceedings of the 26th VLDB Conference,pages 297–306, 2000.

23. X. S. Zhou and T. S. Huang. Edge-based structural features for content-basedimage retrieval. Pattern Recognition Letters, 22(5):457–468, 2001.

Fast Query Point Movement Techniques with Relevance...

Documents

Transcript of Fast Query Point Movement Techniques with Relevance...