80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763...
-
Upload
claude-brooks -
Category
Documents
-
view
223 -
download
0
Transcript of 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763...
![Page 1: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/1.jpg)
80 million tiny images: a large dataset for non-parametric object and scene
recognition
CS 4763 Multimedia Systems
Spring 2008
![Page 2: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/2.jpg)
Motivation
There are billions of images available online, which is a dense sampling of the visual world. Can we use them effectively?
Existing datasets have 102 --104 images spreading over a few different classes.
![Page 3: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/3.jpg)
Problems needed to be concerned
How big is enough to robustly perform recognition?
What is the smallest resolution with reliable performance in classification?
![Page 4: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/4.jpg)
Low dimensional image representation
32 × 32 color images contain enough information for scene recognition, object detection and segmentation.
![Page 5: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/5.jpg)
Low dimensional image representation (Cont.)
Scene recognition
![Page 6: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/6.jpg)
Low dimensional image representation (Cont.)
Segmentation of 32 × 32 images
![Page 7: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/7.jpg)
Low dimensional image representation (Cont.)
We cannot recognize the below objects without the knowledge about their context.
![Page 8: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/8.jpg)
Low dimensional image representation (Cont.)
Conclusion for low resolution representation:
32 × 32 color image contains enough information for scene recognition, object detection and segmentation.
![Page 9: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/9.jpg)
Low dimensional image representation (Cont.)
Conclusion for low resolution representation:
It is practical to work with millions of images with a small resolution in respect of image storage capacity, image processing in retrieval process.
Example:256 × 256 × 3 = 192 KB / image
It takes 192 GB for 1 million images.
32 × 32 × 3 = 3KB / image
It takes 3 GB for 1 million images.
![Page 10: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/10.jpg)
A large dataset of 32 × 32 images (Cont.)
Collection procedure [Russell et al. 2008]Where -- internet, collecting images from 7 independent image search engines.
What -- result images from search engines by querying non-abstract nouns.
How --
![Page 11: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/11.jpg)
A large dataset of 32 × 32 images (Cont.)
Statistics of tiny image in database
![Page 12: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/12.jpg)
Statistics of very low resolution images (Cont.)
![Page 13: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/13.jpg)
Statistics of very low resolution images (Cont.)
Impact on performance:
logarithmical
similarity metrics:Dshift
![Page 14: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/14.jpg)
Experiments – person detection
Person detectionContaining person or not
Existing Detection:Face detection, head and shoulders, profile faces
![Page 15: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/15.jpg)
Experiments (Cont.) – person detection
![Page 16: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/16.jpg)
Experiments (Cont.) -- Person localization
Similarity
Measure:
Dshift
Nearest
Neighbor
Number: 80
![Page 17: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/17.jpg)
Experiments – Scene recognition
Scene recognitionRetrieving the images with semantic meaning of “location”
![Page 18: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/18.jpg)
Experiments (Cont.) – Scene recognition
High voting for “location”
Low voting for “location”
![Page 19: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/19.jpg)
Conclusion
Their experiments show that 32 × 32 is the minimum color image resolution for a reliable object recognition and scene recognition.
The 79 million dataset can provide a reasonable density over the manifold of natural images.
With the huge dataset and semantic voting scheme, it performs well in person detection, person localization and scene recognition.
![Page 20: 80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.](https://reader030.fdocuments.in/reader030/viewer/2022020308/56649efb5503460f94c0e0e1/html5/thumbnails/20.jpg)
References
1. B. C. Russell, A. Torralba, K. Murphy, W. T. Freeman. LabelMe: a database and web-based tool for image annotation. Intl. J. Computer Vision, 77(1-3):157-173,2008
2. C. Fellbaum. Wordnet: An Electronic Lexical Database. Bradford Books, 1998