Accounting for the relative importance of objects in image retrieval
-
Upload
knox-howard -
Category
Documents
-
view
39 -
download
2
description
Transcript of Accounting for the relative importance of objects in image retrieval
![Page 1: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/1.jpg)
ACCOUNTING FOR THE RELATIVE IMPORTANCE OF OB-JECTS IN IMAGE RETRIEVAL
Sung Ju Hwang and Kristen GraumanUniversity of Texas at Austin
![Page 2: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/2.jpg)
Image retrieval
Query image
Image Database
Image 1
Image 2
Image k
Content-based retrieval from an image database…
![Page 3: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/3.jpg)
Relative importance of objects
Query image
Image Database
Which image is more relevant to the query?
?
![Page 4: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/4.jpg)
Relative importance of objects
Query imagecow
bird
water
cow
birdwater
Image Database
cow
fence
mud
Which image is more relevant to the query?
?
sky
![Page 5: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/5.jpg)
Relative importance of objects
An image can contain many different objects,
but some are more “impor-tant” than oth-ers.
sky
water
mountain
architecture
bird
cow
![Page 6: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/6.jpg)
Relative importance of objects
Some objects are background
sky
water
mountain
architecture
bird
cow
![Page 7: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/7.jpg)
Relative importance of objects
Some objects are less salient
sky
water
mountain
architecture
bird
cow
![Page 8: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/8.jpg)
Relative importance of objects
Some objects are more promi-nent or percep-tually define the scene
sky
water
mountain
architecture
bird
cow
![Page 9: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/9.jpg)
Our goal
Goal: Retrieve those images that share important ob-jects with the query image.
versus
How to learn a representation that accounts for this?
![Page 10: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/10.jpg)
The order in which person assigns tags provides implicit cues about object importance to scene.
Idea: image tags as importance cue
TAGSCowBirdsArchitectureWaterSky
![Page 11: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/11.jpg)
TAGS:
CowBirdsArchitectureWaterSky
Idea: image tags as importance cue
Learn this connection to improve cross-modal retrieval and CBIR.
The order in which person assigns tags provides implicit cues about object importance to scene.
![Page 12: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/12.jpg)
Related work
Previous work using tagged images focuses on the noun ↔ object correspondence.
Duygulu et al. 02 Fergus et al. 05 Li et al., 09Berg et al. 04
Lavrenko et al. 2003, Monay & Gatica-Perez 2003, Barnard et al. 2004, Schroff et al. 2007, Gupta & Davis 2008, …
Related work building richer image representations from “two-view” text+image data:
Bekkerman & Jeon 07, Qi et al. 09, Quack et al. 08, Quattoni et al 07, Yakhnenko & Honavar 09,…
Gupta et al. 08
height: 6-11 weight: 235 lbs position:forward, croatia college:
Blaschko & Lampert 08Hardoon et al. 04
![Page 13: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/13.jpg)
Approach overview:Building the image database
Extract visual and tag-based
features
CowGrass
HorseGrass
CarHouseGrassSky
Learn projections from each feature
space into common “semantic space”
Tagged training images
…
![Page 14: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/14.jpg)
CowTree
Retrieved tag-list
• Image-to-image retrieval• Image-to-tag auto annotation• Tag-to-image retrieval
Approach overview:Retrieval from the database
Untagged query image
CowTreeGrass
Tag list query
Imagedatabase
Retrieved im-ages
![Page 15: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/15.jpg)
Dual-view semantic space
Visual features and tag-lists are two views generated by the same concept.
Semantic space
![Page 16: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/16.jpg)
Learning mappings to semantic spaceCanonical Correlation Analysis (CCA): choose pro-jection directions that maximize the correlation of views projected from same instance.
Semantic space: new common feature space
View 1View 2
![Page 17: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/17.jpg)
Kernel Canonical Correlation Analysis
Linear CCA Given paired data:
Select directions so as to maximize:
Same objective, but projections in kernel space:
,
Kernel CCA Given pair of kernel functions:
,
[Akaho 2001, Fyfe et al. 2001, Hardoon et al. 2004]
![Page 18: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/18.jpg)
Semantic space
Building the kernels for each view
Word frequency,rank kernels
Visual kernels
![Page 19: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/19.jpg)
Visual features
captures the HSV color distribution
captures the total scene structure
captures local ap-pearance (k-means on DoG+SIFT)
Color Histogram Visual WordsGist
[Torralba et al.]
Average the component χ2 kernels to build a sin-gle visual kernel .
![Page 20: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/20.jpg)
Tag features
Traditional bag-of-(text)wordsWord Frequency
CowBirdWaterArchitectureMountainSky
tag countCow 1Bird 1Water 1Architecture 1Mountain 1Sky 1Car 0Person 0
![Page 21: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/21.jpg)
Tag features
Absolute Rank
CowBirdWaterArchitectureMountainSky
Absolute rank in this image’s tag-list
tag valueCow 1Bird 0.63Water 0.50Architecture 0.43Mountain 0.39Sky 0.36Car 0Person 0
![Page 22: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/22.jpg)
Tag features
Relative Rank
CowBirdWaterArchitectureMountainSky
Percentile rank obtained from the rank distribution of that word in all tag-lists. tag value
Cow 0.9Bird 0.6Water 0.8Architecture 0.5Mountain 0.8Sky 0.8Car 0Person 0
Average the component χ2 kernels to build a sin-gle tag kernel .
![Page 23: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/23.jpg)
Recap: Building the image database
Semantic space
Visual feature space tag feature space
![Page 24: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/24.jpg)
Experiments
We compare the retrieval performance of our method with two baselines:
Query image
1st retrieved image
Visual-Only Baseline
Query im-age
1st retrieved image
Words+Visual Baseline
[Hardoon et al. 2004, Yakhenenko et al. 2009]
KCCA seman-tic space
![Page 25: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/25.jpg)
We use Normalized Discounted Cumulative Gain at top K (NDCG@K) to evaluate retrieval performance:
Evaluation
Doing well in the top ranks is more important.
Sum of all the scores for the perfect ranking(normalization)
Reward termscore for pth ranked example
[Kekalainen & Jarvelin, 2002]
![Page 26: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/26.jpg)
We present the NDCG@k score using two different re-ward terms:
Evaluation
scale presence relative rank
absolute rank
Object presence/scale Ordered tag similarity
CowTreeGrass
PersonCowTreeFenceGrass
Rewards similarity of query’s ob-jects/scales and those in re-trieved image(s).
Rewards similarity of query’s ground truth tag ranks and those in retrieved image(s).
![Page 27: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/27.jpg)
Dataset
LabelMe
6352 images Database: 3799 images Query: 2553 images
Scene-oriented Contains the ordered
tag lists via labels added
56 unique taggers ~23 tags/image
Pascal
9963 images Database: 5011 images Query: 4952 images
Object-central Tag lists obtained on
Mechanical Turk 758 unique taggers ~5.5 tags/image
![Page 28: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/28.jpg)
Imagedatabase
Image-to-image retrieval
We want to retrieve images most similar to the given query image in terms of object importance.
Tag-list kernel spaceVisual kernel space
Untagged query image
Retrieved images
![Page 29: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/29.jpg)
Our method
Words +
Visual
Visual only
Image-to-image retrieval results
Query Image
![Page 30: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/30.jpg)
Image-to-image retrieval results
Our method
Words +
Visual
Visual only
Query Image
![Page 31: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/31.jpg)
Image-to-image retrieval results
Our method better retrieves images that share the query’s important objects, by both measures.
Retrieval accuracymeasured by object+scale similarity
Retrieval accuracymeasured by ordered tag-list similarity
39% improvement
![Page 32: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/32.jpg)
Tag-to-image retrieval
We want to retrieve the images that are best described by the given tag list
Imagedatabase
Tag-list kernel spaceVisual kernel space
Query tags
CowPersonTreeGrassRetrieved images
![Page 33: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/33.jpg)
Tag-to-image retrieval results
Our method better respects the importance cues implied by the user’s keyword query.
31% improvement
![Page 34: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/34.jpg)
Image-to-tag auto annotation
We want to annotate query image with ordered tags that best describe the scene.
Imagedatabase
Tag-list kernel spaceVisual kernel space
Untagged query image Output tag-lists
CowTreeGrass
CowGrass
FieldCowFence
![Page 35: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/35.jpg)
Image-to-tag auto annotation results
BoatPersonWaterSkyRock
BottleKnifeNapkinLightfork
PersonTreeCarChairWindow
TreeBoatGrassWaterPerson
Method k=1 k=3 k=5 k=10
Visual-only 0.0826 0.1765 0.2022 0.2095
Word+Visual 0.0818 0.1712 0.1992 0.2097
Ours 0.0901 0.1936 0.2230 0.2335
k = number of nearest neighbors used
![Page 36: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/36.jpg)
WomanTableMugLadder
Implicit tag cues as localization prior
MugKeyKeyboardTooth-brushPenPhotoPost-it
Object de-tector
Implicit tag features
ComputerPosterDeskScreenMugPoster
Training: Learn object-specific connection between localization parameters and implicit tag features.
MugEiffel
DeskMugOffice
MugCoffee
Testing: Given novel image, localize objects based on both tags and appearance.
P (location, scale | tags)
Implicit tag features
[Hwang & Grauman, CVPR 2010]
![Page 37: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/37.jpg)
Conclusion
• We want to learn what is implied (beyond objects present) by how a human provides tags for an im-age
• Approach requires minimal supervision to learn the connection between importance conveyed by tags and visual features.
• Consistent gains over• content-based visual search • tag+visual approach that disregards importance
![Page 38: Accounting for the relative importance of objects in image retrieval](https://reader030.fdocuments.in/reader030/viewer/2022020111/56812d40550346895d924576/html5/thumbnails/38.jpg)
THANK YOU