Pathﬁnder Networks for Content Based Image Retrieval Based On Automated...

Pathfinder Networks for Content Based Image Retrieval Based OnAutomated Shape Feature Discovery

Rishi Mukhopadhyay Aiyesha Ma Ishwar K. SethiIntelligent Information Engineering Laboratory

Department of Computer Science and EngineeringOakland UniversityRochester, MI 48309

[email protected], [email protected], [email protected]

Abstract

In this paper, we present a computer–assisted imagebrowsing system based on Pathfinder Networks. Similar-ity of images to one another is determined through a pro-posed method of automatic shape feature discovery. Lo-cal features are generated by clustering small (on the or-der of 10 by 10 pixels) binary image blocks culled from theedge analysis of images in the database and using the clus-ter means as the local feature detectors. The clusteringmethod for the binary image blocks is based on the Haus-dorff metric of distance between sets of points. Relation-ships between local features then determine the similar-ity between images. Pathfinder Networks are then usedto visually represent similarity between images. The re-sults are presented on a database containing three cate-gories of images.

1. Introduction

Over the last ten years, multimedia databases, es-pecially those found on the internet, have grown insize at an astronomical rate. While search methodsfor hyper–text (e.g. google) are robust enough to re-turn semantically meaningful results in response to aquery, there is a lack of efficient, semantically mean-ingful search methods for multimedia data such as im-ages, video and sound. Content based image retrieval,in particular, has been a major research focus sincethe mid nineties. A review of the field in [16] cited over200 papers. Basically summarized, the field of contentbased image retrieval seeks to find semantically mean-ingful methods to index, browse and query large im-age databases. A lot of early work in the seventies con-centrated on hand-annotating the image databases and

then searching by means of standard text-based querymethods. However, two major problems have kept thismethod impractical. The first is that there is no consis-tent way to annotate large image databases owing toviewer subjectivity amongst other concerns. The sec-ond is the pragmatic issue that given the vast explo-sion in the amount of multimedia data that a given userhas to sort through, this method is thoroughly imprac-tical from the standpoint of time and labor costs [13].

Instead, research has been focussed lately on auto-mated and semi-automated methods of searching largeimage databases. Many methods rely on the extrac-tion of color or texture descriptors and organizing thatdata to determine the similarity of images to one an-other. Most image database search systems focus onsome subset of aiding the user in browsing the data-base, searching the database by an example image, orcategorizing the images and allowing the user to searchthe categories [16].

In this paper, we propose a system of com-puter aided browsing for image databases based onthe Pathfinder Network [15]. Additionally, we intro-duce a clustering algorithm based on the Hausdorffdistance metric for clustering binary images of line seg-ments culled from the edge-analysis of images in agiven database. These cluster centers are used as lo-cal shape feature descriptors. The images in thedatabase are then decomposed into feature vec-tors and this description is used to calculate thesimilarity of the images in the database to one an-other.

The rest of the paper is organized as follows: sec-tion 2 describes our proposed image browsing system,section 3 describes our proposed method of automatedfeature extraction, section 4 discusses our results, andsections 5 and 6 discuss our conclusions and propos-als for future research.

aie

Sixth IEEE International Symposium on Multimedia Software Engineering (ISMSE 2004), December 2004, Miami, Florida

2. Proposed Image Retrieval system

Our proposed system organizes the database of im-ages into a graph where the edges are links, similar inconcept to hypertext links, and the nodes are the im-ages in the database. The edge lengths between nodesis determined by a similarity metric based on the as-sociation of local features. To compute this similaritymetric, we propose a method of automated feature dis-covery based on the clustering of binary image blocksculled from the edge analysis of the images in the data-base. The images are then decomposed and representedby local features. Cooccurrence matrices of feature oc-currences are then calculated for each image, and theeuclidean distance between the co-occurrence matricesis used to estimate the perceptual distance between theimages. A diagram of the system is illustrated in fig-ure 1.

2.1. Pathfinder Networks

Pathfinder Networks, invented by Roger W. Schvan-eveldt, are a system of taking a distance matrix be-tween a set of nodes (often times concepts), and assem-bling a graph that tries to keep the semantically mean-ingful links and disregard the rest[15]. The PathfinderNetwork works by deleting links that violate the trian-gle inequality (W (a, b) ≤ W (a, c) + W (c, b)) within acertain radius from each node. It accepts two parame-ters, q and r. The q parameter indicates how many linksaway from each node to check for violation of the tri-angle inequality. The r parameter is the input to the

Minkowski r-metric, W (P ) =(∑k

i=1 wri

)1/r

, that cal-culates the distance between nodes in the graph thatare connected by path P .

Once we have a system to automatically generate adistance matrix for the database, we could then use thepathfinder network to generate a graph structure be-tween the images, and the edges would then representhypertext links. In this way, the user could browse thedatabase progressively getting closer to the target im-age as he clicked on increasingly similar images to theone he is looking for. In fact, Pathfinder networks havebeen used in a number of applications in text informa-tion retrieval [15], and in image retrieval based on colorinformation [2].

2.2. Image-to-image distance metric

Our current work in image retrieval is focussed onextracting shape information from images to determinetheir similarity to one another. We’ve developed a sys-tem for automatically extracting local shape descrip-

tors which is explained in section 3 and is a furtherdevelopment of work presented in [11]. Once the im-ages in the database are decomposed into a descriptionin terms of these local feature descriptors, we calcu-late a co-occurrence matrix between local features foreach image.

A co-occurrence matrix is a method of taking a setof data and extracting certain structural informationfrom it. Given an enumerated set of features (in ourcase, local shape descriptors), then the co-occurrencematrix of radius r for image I, MI , has the propertythat MI(i, j) = the number of times that feature ioccurs within a radius r from feature j in the imageI. Once these co-occurrence matrices have been calcu-lated, the euclidean distance between the co-occurrencematrices for the images in the database is used to cal-culate the distance matrix for use by the Pathfindernetwork. This approach is especially attractive sinceit could easily be extended to incorporate the use ofother types of descriptors such as color and texture de-scriptors which have already been developed in the im-age proccessing literature. A co-occurrence matrix isa method of taking a set of data and extracting cer-tain structural information from it. Given an enumer-ated set of features (in our case, local shape descrip-tors), then the co-occurrence matrix of radius r for im-age I, MI , has the property that MI(i, j) = the num-ber of times that feature i occurs within a radius r fromfeature j in the image I. Once these co-occurrence ma-trices have been calculated, a simple euclidean distancebetween the co-occurrence matrices for the images inthe database is used to calculate the distance matrixfor use by the Pathfinder network. This approach is es-pecially attractive since it could easily be extended toincorporate the use of other types of descriptors such ascolor and texture descriptors which have already beendeveloped in the image proccessing literature.

3. Local Shape Descriptors

Automated feature extraction from images has beena big research area in the pattern recognition and con-tent based image retrieval literature. Most of this workfocusses on texture feature extraction, such as [17], andcolor feature extraction, such as [18]. Stan’s work incolor feature detection [19] employed a vector quan-tization technique over blocks harvested from the im-ages in the database. Once the images had been de-composed into the code book representation, a num-ber of methods were used to utilize this informationfor the clustering, display, and search of large imagedatabases. Early work in local feature detection in bi-nary images includes perceptron learning of a set of

Figure 1. Diagram of System

features through examples and training [14].

In order to extract local shape information from theimages in the database, our algorithm first normalizesthe images to 256 by 256 pixels (cropping if neces-sary), and then runs the normalized database througha Laplacian of a Gaussian edge intensity filter to findthe edges in the picture. It then took these images anddecomposed them into blocks on the order of 10 by10 pixels. Once we had these binary images of localline segments, we sought to cluster these blocks andthen use the cluster means as our local feature detec-tors. These cluster means form a codebook or set oflocal feature descriptors, where an individual clustermean is referred to as a codeword.

Clustering of the blocks was performed by a k-means clustering approach using clustroid averaging.While there has been work involving the use of meansquared error distances with euclidean averaging onthe vector quantization of binary image blocks, thesemeasures capture information about pixel distributionrather than the shape of binary image blocks contain-ing line segments [11]. Instead, the blocks were normal-ized by aligning their centroids, and then the Hausdorffmetric of distance to determine the distance betweenthe blocks. The Hausdorff metric was chosen because ittreats the binary images as a set of points rather than

a light intensity distribution:

H(A,B) = max(h(A,B), h(B,A))

where

h(A,B) = (max(min(‖a− b‖)∀a ∈ A)∀b ∈ B)

Because the blocks were normalized with respect totheir centroids, the resulting clustering technique isroughly invariant to translation while still discriminat-ing between differently oriented and shaped line seg-ments. To deal with noise, two methods were employed.First, in each block, any line segment whose length wasbelow a given threshold (0.4∗ l, where l is the length ofthe block in pixels) was filtered out. Secondly, ratherthan take the maximum in the formula for h(A,B), wetook the 80th percentile. This method of taking a par-tial Hausdorff distance in noisy, real–world situationsis common in applications of the Hausdorff metric suchas [6] and [5].

4. Results

4.1. Datasets and parameters

To test our proposed system of image retrieval, weconstructed a set of 75 images that fell into roughly 3categories. Thirty were face images taken from [1], 20

were building images taken from a clip art collection,and 25 were stills from episodes of a TV show that gen-erally contained a mix of manmade objects and people.Our system was then run with 27 combinations of pa-rameters; block sizes of 8, 12 and 16 were clustered intocodebooks of size 16, 24 and 32 and co-occurrence ma-trices of radius 2, 3 and 4 were then calculated forthe representation of each image under each of the 9codebooks. For the Pathfinder Network, the parame-ters r = ∞ and q = 74 were used.

4.2. Pathfinder Network trees

Pathfinder networks for each of the 27 combinationsof parameters were generated. Owing to the difficulty ofdisplaying the networks in a small space, one pathfindernetwork was selected to be displayed in detail here infigure 2. Four points in the overall diagram are la-belled and enlargements of the regions around thesepoints of interest are displayed. Another pathfindernetwork with a different set of parameters is shownin figure 3. For more pathfinder network figures pleasevisit http://iielab-secs.secs.oakland.edu.

In figure 2, the region around point A shows the cen-tral cluster of face images which transitions into TVframes (mixtures of people and objects) and then tran-sitions into a few building images. The region betweenA and B mostly contains a large cluster of face im-ages, which transitions off to the sides into TV im-ages and building images. The region around C showstwo branches of TV images coming off of the face im-age cluster and transitions into building images on theright as displayed in the region around D. Occasion-ally there is noise, but for the most part neighboringimages tended to be similar to one another.

4.3. Block clustering

The cluster separations in table 1 for each codebookwere calculated by taking the ratio between the aver-age distance between each cluster mean and the othercluster means and the average distance between thatcluster mean and each cluster member. For each code-book, this ratio was averaged over each cluster, andthe standard deviation was also calculated. A large ra-tio indicates better cluster separation.

The codebooks generated by each of the 9 combina-tions of parameters are shown in table 2. Each of thesecodebooks represents the set of local features generatedby the clustering algorithm for use by our image simi-larity metric. One codeword was always automaticallydesignated as empty, so, for example, the codebooks ofsize 16 would have 15 codewords. Some sample clus-

ters and their representative codewords are shown intable 3.

16codewords

24codewords

32codewords

8 by 8

12 by 12

16 by 16

Table 2. Discovered local shape features – Code-books of size 16, 24 and 32 for blocks of size 8 by8, 12 by 12 and 16 by 16

Table 3. Some sample clusters for 12 by 12blocks are shown beneath their representativecode word. From left to right: codebook size 16,cluster 2; codebook size 24, cluster 18; codebooksize 32, cluster 9

5. Conclusion

In summary, in this paper we presented a methodfor visualizing and organizing image databases basedon Pathfinder Networks. Image similarity was deter-mined by the relationships between local features. Thiswas computed by comparing co-occurrence matrices ofautomatically generated local shape features. The au-tomated local shape feature generation was based ona new method of clustering binary image blocks culledfrom the edge analysis of the image database. For theautomatic feature generation by clustering, we experi-mented with three block sizes and three codebook sizes.

In general, codebooks of size 32 had the best clus-ter separation, but had the drawback that often timesseveral of the clusters would be similar to one another.

−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

16 by 16, 24 codewords, cooccurrence radius 4

A

B

C D

−0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2

−1.1

−1

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2


A

B

−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1


A

B

−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6


B

C

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


C D

Figure 2. The pathfinder network for 16 by 16 blocks with a codebook of size 24 and co-occurrence radius4 is shown at the top. The remaining four images display portions of the pathfinder network enlarged todemonstrate the similarity of adjacent images in the network.

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6


Figure 3. The pathfinder network for 8 by 8 blocks with a codebook of size 16 and co-occurrence radius 4

block size 8 by 8 12 by 12 16 by 16codebook size 16 24 32 16 24 32 16 24 32

average 3.46 3.74 7.52 3.24 3.99 5.02 3.24 3.625 4.37std. dev. 1.63 1.92 12.94 1.23 2.29 2.92 1.43 1.85 2.64

Table 1. Average inter–cluster to intra–cluster ratios.

As a result, the Pathfinder Networks based on code-books of size 32 tended to not discriminate as well be-tween images as did the codebooks of size 16 and 24.

In general, codebooks based on blocks of size 8 by8 and 12 by 12 produced the best Pathfinder Net-works, although the cluster separation ratios were allwell within the same range (especially when you con-sider how large the standard deviations tended to be).

Unfortunately, as perceptual subjectivity varies toogreatly to actually rank the resultant Pathfinder Net-works, unless this system were to be implemented and

user feedback collected, there would be no good way torank the relative success of the 27 combinations of pa-rameters in creating trees useful for content based im-age retrieval. At present, however, several research ar-eas would have to be further explored before this couldbe combined with other techniques to create a robustimage retrieval system.

6. Future Research

The work presented in this paper is simply one ofthe first few steps necessary to build a robust image

retrieval system. In future research we would want towork on incorporating the use of local color feature de-scriptors and local texture descriptors in addition tothe use of our local shape descriptors to better dis-criminate between different types of images.

Another area we would like to investigate is in usinga hierarchy of pathfinder networks to represent a largeimage database. By first clustering the images (possiblyusing clustroid averaging on the euclidean distance be-tween the co-occurrence matrices), The cluster meanscould be organized by means of a pathfinder network,and then clicking on a cluster mean would then dis-play a pathfinder network between the images in thecluster or, perhaps, cluster means of the subclusters ofthat cluster.

Also, we would like to investigate a potential im-provement to our clustering method. Since with largercodebooks there are often times redundant clusterswith members that look a lot like each other, we wouldlike to implement a two-tiered system of clustering thatfirst uses k-means clustering to generate 32 clusters ow-ing to its low computational complexity and then usehierarchical clustering to merge clusters.

7. Acknowledgements

The work presented in this paper was conductedthrough a Research Experience for Undergraduates(REU) program at Oakland University. Funding for theprogram was provided by the National Science Foun-dation, Ford Motor Company, and Daimler Chrysler.We would also like to thank our advisor, Dr. Ishwar K.Sethi and the co-PI of the program, Dr. Fatma Mili.We would also like to acknowledge the help of the en-tire Intelligent Information Engineering Laboratory.

References

[1] Database of face images from ftp://ftp.iam.unibe.ch.

[2] C. Chen, G. Gagaudakis, and P. Rosin. Similarity-basedimage browsing. Proceedings of the 16th IFIP WorldComputer Congress, Aug. 2000.

[3] P. Franti and T. Kaukoranta. Binary vector quantizerdesign using soft centroids. Signal Processing: ImageCommunication, 14:677–681, 1999.

[4] A. Gersho. On the structure of vector quantizers. IEEETransactions on Information Theory, 28(2):157–166,Mar. 1982.

[5] D. P. Huttenlocher, G. A. Klanderman, and W. J. Ruck-lidge. Comparing images using the hausdorff distance.IEEE Transactions on Pattern Analysis and MachineIntelligence, 15(9):850–863, Sept. 1993.

[6] D. P. Huttenlocher and W. J. Rucklidge. A multi-resolution technique for comparing images using the

hausdorff distance. Proceedings of the IEEE ComputerVision and Pattern Recognition Conference, pages 705–706, 1993.

[7] Q. Iqbal and J. K. Aggarwal. Applying perceptualgrouping to content-based image retrieval: Building im-ages. Proceedings of the IEEE International Conferenceon Computer Vision and Pattern Recognition, pages 42–48, June 1999.

[8] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clus-tering: A review. ACM Computing Surveys, 31(3), Sept.1999.

[9] Y.Linde,A.Buzo, andR.M.Gray. Analgorithmforvec-tor quantizer design. IEEE Transactions on Communi-cations, 28(1):84–95, Jan. 1980.

[10] S. P. Lloyd. Least square quantization in pcm. IEEETransactions on Information Theory, 28(2):129–137,Mar. 1982.

[11] A. Ma, R. Mukhopadhyay, and I. K. Sethi. Hausdorffmetric based vector quantization of binary images. TheProceedings ofThe 2003 InternationalConference on In-formation and Knowledge Engineering, June 2003.

[12] N. M. Nasrabadi and R. A. King. Image coding usingvector quantization: A review. IEEE Transactions onCommunications, 36(8):957–971, Aug. 1988.

[13] Y. Rui, T. S. Huang, and S.-F. Chang. Image retrieval:Past, present, and future. International Symposium onMultimedia Information Processing, 1997.

[14] D. E. Rumelhart and D. Zipser. Feature discovery bycompetitive learning. In Rumelhart, McClelland, andthe PDP Research Group, editors, Parallel and Distrib-uted Processing, volume 1, chapter 5. The MIT Press,1986.

[15] R. W. Schvaneveldt, editor. Pathfinder Associative Net-works, Studies in Knowledge Organization. Ablex Pub-lishing Corporation, 1990.

[16] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta,and R. Jain. Content-based image retrieval and the endof the early years. IEEE Transactions on Pattern Analy-sis and Machine Intelligence, 22(12), Dec. 2000.

[17] J. R. Smith and S.-F. Chang. Automated binary tex-ture feature sets for image retrieval. Proceedings of theInternational Conference on Acoustics Speech and Sig-nal Proccessing, May 1996.

[18] D. Stan and I.K. Sethi. Image retrieval using a hierarchyof clusters. In Lecture Notes in Computer Science: Ad-vances in Pattern Recognition, pages 377–388. SpringerVerlag Ltd., 2001.

[19] D. Stan and I. K. Sethi. eid: A system for exploration ofimage databases. Information Processing and Manage-ment Journal, 39/3, May 2003.

[20] L. Zhu, A. Rao, and A. Zhang. Advanced feature extrac-tion for keyblock-based image retrieval. ACM Multime-dia Workshop, pages 179–182, 2000.

Pathﬁnder Networks for Content Based Image Retrieval Based On Automated...

Documents

Transcript of Pathﬁnder Networks for Content Based Image Retrieval Based On Automated...