CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model...
Transcript of CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model...
![Page 1: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/1.jpg)
CS 395T – Visual Recognition
Context and Scenes
Aron Yu
Oct 5, 2012
![Page 2: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/2.jpg)
Context in Detection
Image Credit: A. Torralba, K.P. Murphy, W.T. Freeman – Using Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
Myopic View of
Local Objects
2
![Page 3: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/3.jpg)
Using the Forest to See the Trees
Exploiting Context for Visual Object Detection and Localization
A. Torralba, K. P. Murphy, W. T. Freeman
3
![Page 4: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/4.jpg)
Overview
Past Approach
◦ associate objects with other
objects in the image
◦ presence of pedestrians given
presence of cars
Holistic Approach
◦ associate objects with the scene
category as a single entity
◦ presence of pedestrians given
street scene
Image Credit: A. Olivia, A. Torralba. – Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope 4
![Page 5: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/5.jpg)
Goals
Use global features to obtain contextual prior
for object categories
Develop a probabilistic framework for
combining local (bottom-up) and global (top-
down) features
Target Problems
◦ object presence detection (is there a car?)
◦ object localization (where is the car?)
5
![Page 6: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/6.jpg)
Global Feature - GIST
6 Image Idea: http://cybertron.cg.tu-berlin.de/pdci11ws/scene_completion/implementation.html
Polar Form
Spatial
Envelope . . . Vectorize
![Page 7: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/7.jpg)
Local Features
Preview of next week’s paper on multiclass and
multiview object detection
◦ local feature boosting using part-model
7
Car Model Screen Model
Image Credit: http://people.csail.mit.edu/torralba/shortCourseRLOC/CVPR2007_part3.ppt
![Page 8: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/8.jpg)
Local Features
8 Image Credit: http://people.csail.mit.edu/torralba/shortCourseRLOC/CVPR2007_part3.ppt
*
=
= *
=
![Page 9: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/9.jpg)
8-Scene Dataset
Image Credit: A. Olivia, A. Torralba. – Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope 9
Coast Fields Forests Mountains
Highways Streets Inside City Skyscrapers
![Page 10: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/10.jpg)
Object Presence
Is the object in the image?
◦ binary classifier
◦ prob. of object presence given gist
How many objects are in the image?
◦ categorize the scene from gist (quantization)
◦ prob. of having n objects given scene
10 Image Credit: A. Olivia, A. Torralba. – Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
![Page 11: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/11.jpg)
Object Localization
Where are the objects?
◦ local feature descriptors
◦ confidence score 𝑐𝑖𝑡 per region 𝑖
◦ use top D (~10) most confident regions for evaluation
11
?
?
?
?
?
Image Credit: A. Torralba, K.P. Murphy, W.T. Freeman – Using Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
![Page 12: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/12.jpg)
Object Localization
Location trimming using gist
◦ mixture of experts model
◦ predict most likely vertical location
◦ “mask out” unlikely regions for individual objects
12 Image Credit: A. Torralba, K.P. Murphy, W.T. Freeman – Using Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
![Page 13: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/13.jpg)
Integrated Model
Combine global and local features
Without location trimming
Prob. of object being present given confidence
scores and gist
◦ find the number of objects present using gist (global)
◦ show that many confidence scores (local)
13
presence of object
given gist
confidence scores
given presence of
object
![Page 14: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/14.jpg)
Integrated Model
With location trimming
Add a location term 𝑙𝑖𝑡 given presence of object
and gist
◦ suppress or boost confidence scores according to
location of confidence region
14
location given
presence of object
and gist
![Page 15: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/15.jpg)
Toy Demo
15
![Page 16: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/16.jpg)
16
Toy Demo
![Page 17: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/17.jpg)
17
Toy Demo
![Page 18: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/18.jpg)
Results
2688 images with 8 scenes
◦ half for training, half for testing
Focused solely on car identification
Integrated model is better than local features only
18 Image Credit: A. Torralba, K.P. Murphy, W.T. Freeman – Using Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
Local Integrated
confidence boost
![Page 19: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/19.jpg)
Results
Improves precision but not recall
◦ removes false-positives
Context oracle doesn’t improve the performance
as much for localization 19
![Page 20: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/20.jpg)
Evaluation - Strength
Probabilistic information fusion
Boost confidence of probable regions
◦ suppress confidence of non-probable regions
Location priming makes intuitive sense
Better performance than with only local features
20 Image Credit: A. Torralba, K.P. Murphy, W.T. Freeman – Using Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
![Page 21: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/21.jpg)
Evaluation - Weakness
Tested with only cars
Boost false positives within probable regions
75% accuracy on scene detector
◦ better than object detector but not perfect
Still relies heavily on object detector accuracy
21 Image Credit: A. Torralba, K.P. Murphy, W.T. Freeman – Using Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
![Page 22: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/22.jpg)
Evaluation - Weakness
Scenes with less spatial regularity
◦ suppress true positives within non-probable regions
22 Image Credit: A. Torralba, K.P. Murphy, W.T. Freeman – Using Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization
![Page 23: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/23.jpg)
Summary
Overall thoughts
◦ successfully incorporated scene information into a
probabilistic model
◦ scene context helps to reduce false-positives
◦ localization is much harder than presence detection
◦ object detector accuracy is still crucial
Extension
◦ more datasets, more objects
◦ multiple objects in the same image
23
![Page 24: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/24.jpg)
Multi-Class Segmentation with Relative Location Prior
S. Gould, J. Rodgers, D. Cohen, G. Elidan, D. Koller
24
![Page 25: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/25.jpg)
Overview
Multiclass image segmentation
◦ classify all pixels in an image
◦ leverage context information for spatial relationships
25 Image Credit: S. Gould, J. Rodgers, D. Cohen, G. Elidan, D. Koller – Multi-class Segmentation with Relative Location Prior
![Page 26: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/26.jpg)
Conditional Random Fields
Discriminative undirected probabilistic model
◦ pair-wise neighbors, no long range dependencies
Node potentials
◦ object class likelihood
Pair-wise potentials
◦ label smoothness preference
26 Image Credit: S. Gould, J. Rodgers, D. Cohen, G. Elidan, D. Koller – Multi-class Segmentation with Relative Location Prior
![Page 27: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/27.jpg)
Relative Location Prior
Encodes relative location between object classes
◦ conditioned on offset
Each superpixel votes for the most likely label
for all other superpixels
27 Image Credit: S. Gould, J. Rodgers, D. Cohen, G. Elidan, D. Koller – Multi-class Segmentation with Relative Location Prior
𝑝(𝑐𝑖 = 𝑐𝑎𝑟|𝑐𝑗 = 𝑟𝑜𝑎𝑑, 𝐷 𝑆𝑖 , 𝑆𝑗 )
𝑆𝑗
𝑆𝑖
?
![Page 28: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/28.jpg)
Relative Location Prior
Encodes relative location between object classes
◦ conditioned on offset
Each superpixel votes for the most likely label
for all other superpixels
28 Image Credit: S. Gould, J. Rodgers, D. Cohen, G. Elidan, D. Koller – Multi-class Segmentation with Relative Location Prior
𝑝(𝑐𝑖 = 𝑐𝑎𝑟|𝑐𝑗 = 𝑟𝑜𝑎𝑑, 𝐷 𝑆𝑖 , 𝑆𝑗 )
Car!
Tree!
Car!
Building!
?
![Page 29: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/29.jpg)
Relative Location Prior
29 Image Credit: S. Gould, J. Rodgers, D. Cohen, G. Elidan, D. Koller – Multi-class Segmentation with Relative Location Prior
![Page 30: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/30.jpg)
Connections
Probabilistic framework
Leveraging contextual relationships
◦ location priming vs. relative location prior
Sensitive to first stage errors
◦ scene detector vs. appearance classifier
30
Found it!
![Page 31: CS 395T – Visual Recognition Context and Scenescv-fall2012/slides/aron-paper.pdfIntegrated model is better than local features only 18 Image Credit: A. Torralba, K.P. Murphy, W.T.](https://reader036.fdocuments.in/reader036/viewer/2022081406/5f14c388b044924926184f0e/html5/thumbnails/31.jpg)
References
[1] “Using the Forest to see the Trees: Exploiting Context for Visual
Object Detection and Localization” A. Torralba, K.P. Murphy, W.T.
Freeman (CACM 2009)
[2] “Modeling the Shape of the Scene: A Holistic Representation of the
Spatial Envelope” A. Oliva, A. Torralba (IJCV 2001)
[3] “Sharing Visual Features for Multiclass and Multiview Object
Detection” A. Torralba, K.P. Murphy, W.T. Freeman (PAMI 2007)
[4] “Multi-class Segmentation with Relative Location Prior” S. Gould, J.
Rodgers, D. Cohen, G. Elidan, D. Koller (IJCV 2008)
[5] “Conditional Random Fields: Probabilistic Models for Segmenting
and Labeling Sequence Data” J. Lafferty, A. McCallum, F. Pereira.
(ICML 2001)
31