Recognition Scene understanding / visual object categorization Pose clustering
Visual Object Recognition
description
Transcript of Visual Object Recognition
![Page 1: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/1.jpg)
Visual Object Recognition
Rob FergusCourant Institute,
New York University
http://cs.nyu.edu/~fergus/icml_tutorial/
![Page 2: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/2.jpg)
AgendaAgenda• Introduction
• Bag-of-words models
• Visual words with spatial location
• Part-based models
• Discriminative methods
• Segmentation and recognition
• Recognition-based image retrieval
• Datasets & Conclusions
![Page 3: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/3.jpg)
Li Fei-Fei, PrincetonRob Fergus, NYU
Antonio Torralba, MIT
Recognizing and Learning Recognizing and Learning Object Categories: Year 2007Object Categories: Year 2007
http://people.csail.mit.edu/torralba/shortCourseRLOC
![Page 4: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/4.jpg)
AgendaAgenda• Introduction
• Bag-of-words models
• Visual words with spatial location
• Part-based models
• Discriminative methods
• Segmentation and recognition
• Recognition-based image retrieval
• Datasets & Conclusions
![Page 5: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/5.jpg)
So what does object recognition involve?
![Page 6: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/6.jpg)
Classification: are there street-lights in the image?
![Page 7: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/7.jpg)
Detection: localize the street-lights in the image
![Page 8: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/8.jpg)
Object categorization
mountain
buildingtree
banner
vendorpeople
street lamp
![Page 9: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/9.jpg)
Scene and context categorization
• outdoor• city• …
![Page 10: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/10.jpg)
Application: Assisted driving
meters
met
ers
Ped
Ped
Car
Lane detection
Pedestrian and car detection
• Collision warning systems with adaptive cruise control, • Lane departure warning systems, • Rear object detection systems,
![Page 11: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/11.jpg)
Application:Computational photography
![Page 12: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/12.jpg)
Application: Improving online search
Query:STREET
Organizing photo collections
![Page 13: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/13.jpg)
Challenges 1: view point variation
Michelangelo 1475-1564
![Page 14: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/14.jpg)
Challenges 2: scale
![Page 15: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/15.jpg)
Challenges 3: illumination
slide credit: S. Ullman
![Page 16: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/16.jpg)
Challenges 4: background clutter
Bruegel, 1564
![Page 17: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/17.jpg)
Challenges 5: occlusion
http://lh5.ggpht.com/_wJc6t2hDl2M/RrL7Gh6sS7I/AAAAAAAAAYY/n3xaHc2opls/DSC00633.JPG
![Page 18: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/18.jpg)
Challenges 6: deformation
http://img.timeinc.net/time/asia/magazine/2007/1112/racehorse_1112.jpg
![Page 19: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/19.jpg)
History: single object recognition
Object 1 Object 2
Object 3
![Page 20: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/20.jpg)
David Lowe [1985]
Single object recognition history: Geometric methods
Rothwell et al. [1992]
![Page 21: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/21.jpg)
Single object recognition history: Appearance-based methods
• Murase & Nayer 1995 • Schmid & Mohr 1997• Lowe, et al. 1999, 2003• Mahamud and Herbert, 2000• Ferrari et al. 2004• Rothganger et al. 2004• Moreels and Perona, 2005• …
![Page 22: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/22.jpg)
Instance 1 Instance 2
Instance 3
Challenges 7: intra-class variation
Shoe class
![Page 23: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/23.jpg)
History: early object categorization
![Page 24: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/24.jpg)
• Fischler, Elschlager, 1973 • Turk and Pentland, 1991• Belhumeur, Hespanha, & Kriegman, 1997• Rowley & Kanade, 1998• Schneiderman & Kanade 2004• Viola and Jones, 2000• Heisele et al., 2001
• Amit and Geman, 1999• LeCun et al. 1998• Belongie and Malik, 2002• DeCoste and Scholkopf, 2002• Simard et al. 2003
• Poggio et al. 1993• Argawal and Roth, 2002• Schneiderman & Kanade, 2004…..
![Page 25: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/25.jpg)
![Page 26: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/26.jpg)
Three main issuesThree main issues
• Representation– How to represent an object category
• Learning– How to form the classifier, given training data
• Recognition– How the classifier is to be used on novel data
![Page 27: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/27.jpg)
Representation– Generative /
discriminative / hybrid
![Page 28: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/28.jpg)
Representation– Generative /
discriminative / hybrid– Appearance only or
location and appearance
![Page 29: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/29.jpg)
Representation– Generative /
discriminative / hybrid– Appearance only or
location and appearance
– Invariances• View point• Illumination• Occlusion• Scale• Deformation• Clutter• etc.
![Page 30: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/30.jpg)
Representation– Generative /
discriminative / hybrid– Appearance only or
location and appearance
– Invariances– Part-based or global with sub-window
![Page 31: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/31.jpg)
Representation– Generative /
discriminative / hybrid– Appearance only or
location and appearance
– Invariances– Parts or global w/sub-
window– Use set of features or
each pixel in image
![Page 32: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/32.jpg)
– Unclear how to model categories, so learn rather than manually specify
Learning
![Page 33: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/33.jpg)
– Unclear how to model categories, so learn rather than manually specify
– Methods of training: generative vs. discriminative
Learning
![Page 34: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/34.jpg)
– Unclear how to model categories, so learn rather than manually specify
– Methods of training: generative vs. discriminative
– Level of supervision• Manual segmentation; bounding box; image labels; noisy labels
Learning
Contains a motorbike
![Page 35: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/35.jpg)
Learning– Unclear how to model categories, so learn rather than manually specify
– Methods of training: generative vs. discriminative
– Level of supervision• Manual segmentation; bounding box; image labels; noisy labels
-- Training images:Issue of over-fitting (typically limited training data)Negative images for discriminative methods
![Page 36: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/36.jpg)
Learning– Unclear how to model categories, so learn rather than manually specify
– Methods of training: generative vs. discriminative
– Level of supervision• Manual segmentation; bounding box; image labels; noisy labels
-- Training images:Issue of over-fitting (typically limited training data)Negative images for discriminative methods
-- Priors
![Page 37: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/37.jpg)
– Scale / orientation range to search over – Speed– Context
Recognition
![Page 38: Visual Object Recognition](https://reader038.fdocuments.in/reader038/viewer/2022110215/568155a4550346895dc38273/html5/thumbnails/38.jpg)
Hoi
em, E
fros,
Her
bert,
200
6
– Context enables pruning of detector output
Recognition