Recovering Surface Layout from a Single Image D. Hoiem, A.A. Efros, M. Hebert Robotics Institute,...

Post on 31-Dec-2015

222 views 4 download

Tags:

Transcript of Recovering Surface Layout from a Single Image D. Hoiem, A.A. Efros, M. Hebert Robotics Institute,...

Recovering Surface Layout from a Single Image

D. Hoiem, A.A. Efros, M. HebertRobotics Institute, CMU

Presenter: Derek HoiemCS 598, Spring 2009

Jan 29, 2009

Why worry about 3d scenes?

Reason 1: We may want to interact with the scene

Navigation Manipulation

4

Reason 2: We need context

5

Reason 2: We need context

2D Object Detection

What the 2D Detector Sees

Computers need context tooTrue

Detection

True Detections

MissedMissed

False Detections

Local Detector: [Dalal-Triggs 2005]

9

Context in Image Space

[Kumar Hebert 2005][Torralba Murphy Freeman 2004]

[He Zemel Cerreira-Perpiñán 2004]

We need 3d info to reason about 3d relationships

Close

Not Close

How to represent scene space?

How to represent scene space?

Holistic Scene Space: “Gist”

Oliva & Torralba 2001

Torralba & Oliva 2002

How to represent scene space?

Depth Map

Saxena, Chung & Ng 2005, 2007

Gibson’s Surface Layout

slide from Aude Oliva

• Gibson: “The elementary impressions of a visual world are those of surface and edge.” The Perception of the Visual World (1950)• Focus on texture gradients

Surface Layout (Gibson cont.)

slide from Aude Oliva

Gibson’s Surface Layout

Surface Layout (Gibson cont.)

slide from Aude Oliva

Gibson’s Surface Layout

Marr’s 2½D Sketch

Marr’s 2½-D Sketch

Figs from Aude Oliva slide

Surface Layout (this paper)

Goal: Label image into 7 Geometric Classes:• Support• Vertical

– Planar: facing Left (), Center ( ), Right ()– Non-planar: Solid (X), Porous or wiry (O)

• Sky

Our Main Challenge

• Recovering 3D geometry from single 2D projection

• Infinite number of possible solutions!

Our World is Structured

Abstract World Our World

Image Credit (left): F. Cunin and M.J. Sailor, UCSD

Most Early Work Tried to Manually Specify the Structure

• Hansen & Riseman 1978 (VISIONS)• Barrow & Tenenbaum 1978 (Intrinsic Images)• Brooks 1979 (ACRONYM)• Marr 1982 (2½ D Sketch)

Ohta & Kanade 1978Guzman 1968

Learn the Structure of the World

Infer Most Likely Scene

Unlikely Likely

1. Use All Available Cues

Vanishing points, lines

Color, texture, image location

Texture gradient

Use All Available Cues

2. Get Good Spatial Support

50x50 Patch50x50 Patch

Image Segmentation

• Single segmentation won’t work

• Solution: multiple segmentations

For each segment:

- Get P(good segment | data) P(label | good segment, data)

Labeling Segments

Image Labeling

Labeled Segmentations

Labeled Pixels

segments

datasegmentgoodlabelPdatasegmentgoodPdatalabelP ),|()|()|(

30

Gray?

High inImage?

Many LongLines?

Yes

No

NoNo

No

Yes Yes

Yes

Very High Vanishing

Point?

High in Image?

Smooth? Green?

Blue?

Yes

No

NoNo

No

Yes Yes

Yes

Decision Trees + AdaboostDecision Trees + Adaboost

Ground Vertical Sky

Collins et al. 2002

Surface Confidence Maps

P(Support) P(Vertical) P(Sky)

P(Planar Left) P(Planar Center) P(Planar Right)

P(Non-Planar Porous) P(Non-Planar Solid)

Test Image

Experiments: Input Image

Experiments: Ground Truth

Experiments: Our Result

Surface Estimates: Outdoor

Input Image Ground Truth Our Result

Avg. Accuracy

Main Class: 88%

Subclass: 62%

Input Image Ground Truth Our Result

Surface Estimates: Outdoor

Input Image Ground Truth Our Result

Surface Estimates: Outdoor

Surface Estimates: Paintings

Input Image Our Result

Surface Estimates: Indoor

Avg. Accuracy

Main Class: 93%

Subclass: 76%

Input Image Ground Truth Our Result

Failures: Reflections and Shadows

Input Image Our Result

Average Accuracy

Main Class: 88%

Subclasses: 61%

Importance of Many Cues

All Position Only

Color Only

Texture Only

Perspective Only

Main 88% 83% 72% 80% 68%

Subclass 61% 43% 43% 55% 52%

All All But Position

All But Color

All But Texture

All But Perspective

Main 88% 84% 87% 87% 88%

Subclass 61% 60% 60% 58% 57%

Importance of Many Cues

Spatial Support Matters

Automatic Photo Popup

Labeled Image Fit Ground-Vertical Boundary with Line

Segments

Form Segments into Polylines

Cut and Fold

Final Pop-up Model

[Hoiem Efros Hebert 2005]

video

Surfaces Not Enough – Need Occlusion Reasoning

Image Surface Labels 3D Model

Surfaces + Occlusions + Objects = Better 3D Models

Surfaces Occlusions

Objects and Viewpoint

SupportHorizon, Object Maps

Surface Maps

Depth, Boundaries

Boundaries

Horizon, O

bject Maps

Viewpoint/Size Reasoning

video 2

Contributions• General principles

– Learn the structure of the world– Use all available cues– Spatial support matters– Use redundancy to deal with unreliable processes

(segmentation)

• Results include entire spread of failure and success

• First work to convincingly demonstrate single-view reconstruction

Criticisms• Still just 2D pattern recognition?

• Not clear how to generalize to arbitrary 3d angles

• Restricted to visible portion of scene

• Coarse layout: not clear if applicable to personal space or object shapes

Ideas for improvement

• Try improving features (e.g., add bag of words)

• Extend to characterize object shapes?

• Combine this surface-based layout with depth estimates from Saxena et al.

Discussion• Use for context (Eamon)• Multiple segmentations (Duan, Sanketh)• Subcategories (Duan, Sanketh)• Global info, use of object knowledge (Binbin)• Combination with multiview cues (Mani)• Landmarks (Gang)

Thank you

Things to cover when you present

• Background• Overview of method• Results• Things you like• Things you don’t• Ideas for improvement• Address bulletin board postings