B uilding and U sing a S emantivisual I mage H ierarchy

Li-J ia LiYongwhan LimLi Fe i -Fe iChong WangDavid M. Ble i

BUILDING AND USING A SEMANTIVISUAL IMAGE

HIERARCHYCVPR, 2010

IntroductionBuilding the hierarchy

Graphical modal Learning

Semantivisual image hierarchy Implementation Visualizing the semantivisual hierarchy Quantitative evaluation

Application Annotation Labeling Classification

OUTLINE

For images, a meaningful image hierarchy can make image organization, browsing and searching more convenient and effective

Good image hierarchies can serve as knowledge ontology for end tasks such as image retrieval, annotation or classification.

Language-basedLow-level visual feature based

INTRODUCTION

Use a multi-modal model to represent images and textual tags on the semantivisual hierarchy

Each image is associated with a path of the hierarchy, where the image regions can be assigned to different nodes of the path

BULIDING THE HIERARCHY

• Each image is decomposed into a set of over-segmented regions R = [R1…Rr…RN]

• each of the N regions is characterized by four appearance features

Graphical model Each image-text pair (R,W) is assigned to a path C c = [Cc1,…,Ccl,…,CcL]


Learning the semantivisual image hierarchy Given a set of unorganized images and user tags associated with them Gibbs sampling : samples concept index Z, coupling variable S and path C

Sampling Z Depend on 1) the likelihood of the region appearance

2) the likelihood of tags associated with this region3) the concept indices of the other regions in the same image-text pair

..


Sampling S Its conditional distribution solely depends on the likelihood of the tag

Sampling C Influenced by the previous arrangement of the hierarchy and the

likelihood of the image-text pair


Prior probability induced by nCRP

likelihood

4000 user upload images and 538 unique user tags

Each image is divided into small patches of 10×10 pixels. Each patch is assigned to a codeword in a codebook of 500 visual word

obtained by K-means Obtain 4 region codebook for color(HSV histogram), location, texture,

normalized SIFT histogram To speed up learning, we initialize the levels in a path according to tf-idf score . We obtain a hierarchy of 121 nodes, 4 levels and 53 paths.

A SEMANTIVISUAL IMAGE HIERARCHY- - I mpleme ntati on

A SEMANTIVISUAL IMAGE HIERARCHY- - V i s ua l i z ing the Sema nti v is ua l H ierarchy

• General-to-specific relationship• Purely visual information cannot provide meaningful

image hierarchy• Purely language-based hierarchy would miss close

connection

Good clustering of images that share similar concepts ,i.e., image along the same path, should be more or less annotated with similar tags.

Good hierarchical structure given path, i.e., images and their associated tags at different levels of the path, should demonstrate good general-to-specific relationships.

A SEMANTIVISUAL IMAGE HIERARCHY- - A Quanti tati ve Eva luati on Of Im age H ierarch ies

A path of L levels is selected from the hierarchy.

Given our learned image ontology, we can propose a hierarchical annotation of an unlabeled query image.

nCRP cannot perform well on sparse tag words. I ts proposed hierarchy has many words assigned to the root node, resul t ing in very few paths.

A simple cluster ing algori thm such as KNN cannot find a good association between the test images and the training images in our chal lenging dataset with large visual divers ity.

In contrast , our model learns an accurate associat ion of visual and text data s imultaneously

APPLICATION- - H ierarch ica l A nnotatio n of I mage

Serving as an image and text knowledge ontology, our semantivisual hierarchy and model can be used for image labeling without a hierarchical relation.

APPLICATION- - I ma ge L abe l ing

Collect the top 5 predicted words of each image

Our model captures the hierarchical structure of image and tags !!

Another 4000 image are held out as test images.

APPLICATION- - I ma ge C las s ificati on

By encoding semantic meaning tothe hierarchy, our semantivisual hierarchy delivers a moredescriptive structure, which could be helpful for classification.

Use image and their tags to construct a meaningful hierarchy that organizes images in a general-to-specific structure.

Our quantitative evaluation by human subjects shows that our hierarchy is more meaningful and accurate than others.

CONCLUSION

B uilding and U sing a S emantivisual I mage H ierarchy

Documents

Transcript of B uilding and U sing a S emantivisual I mage H ierarchy