Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer...
Transcript of Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer...
![Page 1: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/1.jpg)
Contexts and 3D Scenes
Computer Vision
Jia-Bin Huang, Virginia Tech
Many slides from D. Hoiem
![Page 2: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/2.jpg)
Administrative stuffs
•Final project presentation • Nov 30th 3:30 PM – 4:45 PM
•Grading• Three senior graders (30%)• Peer reviews (70%)
•Presentation• 2.5 mins per group
![Page 3: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/3.jpg)
Context in Recognition
•Objects usually are surrounded by a scene that can provide context in the form of nearby objects, surfaces, scene category, geometry, etc.
![Page 4: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/4.jpg)
Context provides clues for function
•What is this?
These examples from Antonio Torralba
![Page 5: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/5.jpg)
Context provides clues for function
•What is this?
•Now can you tell?
![Page 6: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/6.jpg)
Sometimes context is the major component of recognition
•What is this?
![Page 7: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/7.jpg)
Sometimes context is the major component of recognition
•What is this?
•Now can you tell?
![Page 8: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/8.jpg)
More Low-Res
•What are these blobs?
![Page 9: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/9.jpg)
More Low-Res
•The same pixels! (a car)
![Page 10: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/10.jpg)
There are many types of context• Local pixels
• window, surround, image neighborhood, object boundary/shape, global image statistics
• 2D Scene Gist• global image statistics
• 3D Geometric• 3D scene layout, support surface, surface orientations, occlusions, contact points, etc.
• Semantic• event/activity depicted, scene category, objects present in the scene and their spatial extents, keywords
• Photogrammetric• camera height orientation, focal length, lens distorition, radiometric, response function
• Illumination• sun direction, sky color, cloud cover, shadow contrast, etc.
• Geographic• GPS location, terrain type, land use category, elevation, population density, etc.
• Temporal• nearby frames of video, photos taken at similar times, videos of similar scenes, time of capture
• Cultural• photographer bias, dataset selection bias, visual cliches, etc.
from Divvala et al. CVPR 2009
![Page 11: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/11.jpg)
Cultural context
Jason Salavon: http://salavon.com/SpecialMoments/Newlyweds.shtml
![Page 12: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/12.jpg)
Cultural context
Andrew Gallagher: http://chenlab.ece.cornell.edu/people/Andy/projectpage_names.html
“Mildred and Lisa”: Who is Mildred? Who is Lisa?
![Page 13: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/13.jpg)
Cultural context
Andrew Gallagher: http://chenlab.ece.cornell.edu/people/Andy/projectpage_names.html
Age given Appearance Age given Name
![Page 14: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/14.jpg)
Spatial layout is especially important
1. Context for recognition
![Page 15: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/15.jpg)
Spatial layout is especially important
1. Context for recognition
![Page 16: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/16.jpg)
Spatial layout is especially important
1. Context for recognition
2. Scene understanding
![Page 17: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/17.jpg)
Spatial layout is especially important
1. Context for recognition
2. Scene understanding
3. Many direct applicationsa) Assisted drivingb) Robot navigation/interactionc) 2D to 3D conversion for 3D TVd) Object insertion
![Page 18: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/18.jpg)
Spatial Layout: 2D vs. 3D?
![Page 19: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/19.jpg)
Context in Image Space
[Kumar Hebert 2005][Torralba Murphy Freeman 2004]
[He Zemel Cerreira-Perpiñán 2004]
![Page 20: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/20.jpg)
But object relations are in 3D…
Close
Not
Close
![Page 21: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/21.jpg)
How to represent scene space?
![Page 22: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/22.jpg)
Wide variety of possible representations
Figs from Hoiem - Savarese 2011 book
![Page 23: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/23.jpg)
Figs from Hoiem - Savarese 2011 book
![Page 24: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/24.jpg)
Figs from Hoiem - Savarese 2011 book
![Page 25: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/25.jpg)
Key Trade-offs
• Level of detail: rough “gist”, or detailed point cloud?• Precision vs. accuracy• Difficulty of inference
•Abstraction: depth at each pixel, or ground planes and walls?• What is it for: e.g., metric reconstruction vs. navigation
![Page 26: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/26.jpg)
Low detail, Low/Med abstraction
Holistic Scene Space: “Gist”
Oliva & Torralba 2001
Torralba & Oliva 2002
![Page 27: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/27.jpg)
High detail, Low abstraction
Depth Map
Saxena, Chung & Ng 2005, 2007
![Page 28: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/28.jpg)
Medium detail, High abstraction
Hedau Hoiem Forsyth 2009
Room as a Box
![Page 29: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/29.jpg)
Med-High detail, High abstraction
Guo Zou Hoiem 2015
Complete 3D Layout
![Page 30: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/30.jpg)
Examples of spatial layout estimation
•Surface layout• Application to 3D reconstruction
•The room as a box• Application to object recognition
![Page 31: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/31.jpg)
Surface Layout: describe 3D surfaces with geometric classes
Sky
Vertical
Support
Planar(Left/Center/Right)
Non-Planar Porous
Non-Planar Solid
![Page 32: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/32.jpg)
The challenge
?
?
?
![Page 33: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/33.jpg)
Our World is Structured
Abstract World Our World
Image Credit (left): F. Cunin and M.J. Sailor, UCSD
![Page 34: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/34.jpg)
Learn the Structure of the World
…
Training Images
![Page 35: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/35.jpg)
Infer the most likely interpretation
Unlikely Likely
![Page 36: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/36.jpg)
Geometry estimation as recognition
…
Surface Geometry Classifier
Vertical, Planar
Training Data
Region
FeaturesColor
TexturePerspective
Position
![Page 37: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/37.jpg)
Use a variety of image cues
Vanishing points, lines
Color, texture, image location
Texture gradient
![Page 38: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/38.jpg)
Surface Layout Algorithm
Segmentation
Hoiem Efros Hebert (2007)
FeaturesPerspective
ColorTexturePosition
Input Image Surface Labels
…
Training Data
Trained Region
Classifier
![Page 39: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/39.jpg)
Surface Layout AlgorithmMultiple
Segmentations
Hoiem Efros Hebert (2007)
FeaturesPerspective
ColorTexturePosition
Input ImageConfidence-Weighted
Predictions
…Training Data
Trained Region
Classifier
FinalSurface Labels
![Page 40: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/40.jpg)
Surface Description Result
![Page 41: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/41.jpg)
Results
Input Image Ground Truth Our Result
![Page 42: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/42.jpg)
Results
Input Image Ground Truth Our Result
![Page 43: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/43.jpg)
Results
Input Image Ground Truth Our Result
![Page 44: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/44.jpg)
Failures: Reflections, Rare Viewpoint
Input Image Ground Truth Our Result
![Page 45: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/45.jpg)
Average Accuracy
Main Class: 88%
Subclasses: 61%
![Page 46: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/46.jpg)
Automatic Photo Popup
Labeled Image Fit Ground-Vertical
Boundary with Line
Segments
Form Segments
into Polylines
Cut and Fold
Final Pop-up Model
[Hoiem Efros Hebert 2005]
![Page 47: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/47.jpg)
Automatic Photo Popup
![Page 48: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/48.jpg)
![Page 49: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/49.jpg)
Mini-conclusions
•Can learn to predict surface geometry from a single image
![Page 50: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/50.jpg)
Tour into picture
Adobe Affer Effect
![Page 51: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/51.jpg)
Interpretation of indoor scenes
![Page 52: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/52.jpg)
Vision = assigning labels to pixels?
LampWall
Sofa
FloorFloor
Table
![Page 53: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/53.jpg)
Vision = interpreting within physical space
Wall
Sofa
Floor
Table
![Page 54: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/54.jpg)
Physical space needed for affordance
Could I stand over here?Is this a good
place to sit?
Walkable path
Can I put my cup here?
![Page 55: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/55.jpg)
Physical space needed for recognition
Apparent shape depends strongly on viewpoint
![Page 56: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/56.jpg)
Physical space needed for recognition
![Page 57: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/57.jpg)
Physical space needed to predict appearance
![Page 58: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/58.jpg)
Physical space needed to predict appearance
![Page 59: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/59.jpg)
Key challenges
•How to represent the physical space?• Requires seeing beyond the visible
•How to estimate the physical space?• Requires simplified models• Requires learning from examples
![Page 60: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/60.jpg)
Our Box Layout
•Room is an oriented 3D box• Three vanishing points specify orientation• Two pairs of sampled rays specify position/size
Hedau Hoiem Forsyth, ICCV 2009
![Page 61: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/61.jpg)
Our Box Layout
•Room is an oriented 3D box• Three vanishing points (VPs) specify orientation• Two pairs of sampled rays specify position/size
Another box consistent with the same vanishing points
![Page 62: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/62.jpg)
Image Cues for Box Layout
•Straight edges• Edges on floor/wall
surfaces are usually oriented towards VPs
• Edges on objects might mislead
•Appearance of visible surfaces• Floor, wall, ceiling, object
labels should be consistent with box
left wall right wall
floor objects
![Page 63: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/63.jpg)
Box Layout Algorithm
1. Detect edges
2. Estimate 3 orthogonal vanishing points
3. Apply region classifier to label pixels with visible surfaces• Boosted decision trees on region based on color,
texture, edges, position
4. Generate box candidates by sampling pairs of rays from VPs
5. Score each box based on edges and pixel labels• Learn score via structured learning
6. Jointly refine box layout and pixel labels to get final estimate
+
![Page 64: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/64.jpg)
Evaluation
•Dataset: 308 indoor images• Train with 204 images, test with 104 images
![Page 65: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/65.jpg)
Experimental results
Detected Edges Surface Labels Box Layout
Detected Edges Surface Labels Box Layout
![Page 66: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/66.jpg)
Experimental results
Detected Edges Surface Labels Box Layout
Detected Edges Surface Labels Box Layout
![Page 67: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/67.jpg)
Experimental results
• Joint reasoning of surface label / box layout helps• Pixel error: 26.5% 21.2%• Corner error: 7.4% 6.3%
•Similar performance for cluttered and uncluttered rooms
![Page 68: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/68.jpg)
Mini-Conclusions
•Can fit a 3D box to the rooms boundaries from one image• Robust to occluding objects• Decent accuracy, but still much room for improvement
![Page 69: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/69.jpg)
Using room layout to improve object detection
Box layout helps 1. Predict the appearance of objects, because they are
often aligned with the room2. Predict the position and size of objects, due to physical
constraints and size consistency
2D Bed Detection 3D Bed Detection with Scene Geometry
Hedau, Hoiem, Forsyth, ECCV 2010, CVPR 2012
![Page 70: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/70.jpg)
Search for objects in room coordinates
Hedau Forsyth Hoiem (2010)
Recover Room Coordinates Rectify Features to Room Coordinates
Rectified Sliding Windows
![Page 71: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/71.jpg)
Reason about 3D room and bed space
Joint Inference with Priors
• Beds close to walls
• Beds within room
• Consistent bed/wall size
• Two objects cannot occupy the same space
Hedau Forsyth Hoiem (2010)
![Page 72: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/72.jpg)
3D Bed Detection from an Image
![Page 73: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/73.jpg)
Generic boxy object detection
Hedau et al. 2012
![Page 74: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/74.jpg)
Generic boxy object detection
![Page 75: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/75.jpg)
Generic boxy object detection
![Page 76: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/76.jpg)
SUN RGB-D
SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite, CVPR 2015
![Page 77: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/77.jpg)
ObjectNet3D
ObjectNet3D: A Large Scale Database for 3D Object Recognition, ECCV 2016
![Page 78: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/78.jpg)
Mini-Conclusions
•Simple room box layout helps detect objects by predicting appearance and constraining position
•We can search for objects in 3D space and directly evaluate on 3D localization
![Page 79: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/79.jpg)
Predicting complete models from RGBD
Key idea: create complete 3D scene hypothesis that is consistent with observed depth and appearance
Guo Hoiem Zou 2015
![Page 80: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/80.jpg)
Object Proposals
... Retrieved Region
Exemplar Region Retrieval
Source 3D Model
Retrieved Region Source 3D Model
Layout Proposals
3D Model Fitting
Transferred Model
Transferred Model
RGB-D Input
Annotated Scene
Composing
Overview of approach
![Page 81: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/81.jpg)
Example result (fully automatic)
![Page 82: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/82.jpg)
Original Image
ManualSegmentation
Composition withManual Segmentation
Auto ProposalComposition with
Auto ProposalGround TruthAnnotation
![Page 83: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/83.jpg)
Original Image
ManualSegmentation
Composition w.Manual Segmentation
Auto Proposal Composition w.Auto Proposal
Ground TruthAnnotation
![Page 85: Contexts and 3D Scenes - Virginia Techjbhuang/teaching/ece... · Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem. ... Robot navigation/interaction](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f0ac94e7e708231d42d5615/html5/thumbnails/85.jpg)
Things to remember
•Objects should be interpreted in the context of the surrounding scene• Many types of context to consider
•Spatial layout is an important part of scene interpretation, but many open problems• How to represent space?
• How to learn and infer spatial models?• Important to see beyond the visible
•Consider trade-off of abstraction vs. precision