Extracting Simple Verb Frames from Images

22
Extracting Simple Verb Frames from Images Toward Holistic Scene Understanding Prof. Daphne Koller Research Group Stanford University Geremy Heitz DARPA CLLR Workshop

description

Extracting Simple Verb Frames from Images. Toward Holistic Scene Understanding Prof. Daphne Koller Research Group Stanford University Geremy Heitz DARPA CLLR Workshop December 2, 2008. Grand Goal: Scene Understanding. Cigarette. Backpack. Man. Dog. “ A cow walking through the grass - PowerPoint PPT Presentation

Transcript of Extracting Simple Verb Frames from Images

Page 1: Extracting Simple Verb Frames from Images

Extracting Simple Verb Framesfrom Images

Toward Holistic Scene Understanding

Prof. Daphne Koller Research Group

Stanford University

Geremy HeitzDARPA CLLR Workshop

December 2, 2008

Page 2: Extracting Simple Verb Frames from Images

Grand Goal: Scene Understanding

“man wearing a backpack,

smoking a cigarette, walking a dog on a sidewalk”

Man

Dog

Backpack

Cigarette

“A cow walking through the grass

on a pasture by the sea”

Page 3: Extracting Simple Verb Frames from Images

Understanding Verb Frames

“a man is walking on a sidewalk”

Primitives Objects Parts Surfaces Regions

Interactions Context Actions

Methods exist to extract these, but we need to both do a better job, and get them all at once

Modeling verb frames requires understanding the interactions between primitives, and which fit well into the framework of graphical models.

Man

Dog

Backpack

Cigarette Building

Sidewalk

“a dog is walking on a sidewalk”

Frame: to walk

Page 4: Extracting Simple Verb Frames from Images

Outline Extracting the Primitives Qualitative 3D Scene Layout Modeling Relationships Learning Frames Refined Characterization of Objects

Page 5: Extracting Simple Verb Frames from Images

Computer View of a “Scene”

BUILDING

ROAD

STREET

SCENE

Page 6: Extracting Simple Verb Frames from Images

Object Detection

= Car

= Person

= Motorcycle= Boat

= Sheep= Cow

Detection Window W

Score(W) > 0.5

Page 7: Extracting Simple Verb Frames from Images

Finding the Primitives Jointly

SEASIDEPASTURE

GRASS

SKYGrass = FlatSky = FarFG = Vertical

40% Grass,30% Sky…

1 cow, 2 boats…

[Heitz et al., NIPS 2008a]

Page 8: Extracting Simple Verb Frames from Images

Results – TAS ModelContextual DetectorBase Detector

[Heitz et al.,ECCV 2008]

Page 9: Extracting Simple Verb Frames from Images

Qualitative 3D Scene Layout

Primitives imply a certain 3D layout of the scene, absolute depth may not be preserved

For example: Sky is a far, vertical plane Water, road are horizontal planes Objects “popup” from the image

Page 10: Extracting Simple Verb Frames from Images

Modeling Relationships

Beside

In front ofOn

We have explored how to model 2D relationships

We should be able to extend this to 3D relationships

[Gould et al., IJCV 2008]

[Heitz et al.,ECCV 2008]

Page 11: Extracting Simple Verb Frames from Images

Outline Extracting the Primitives Qualitative 3D Scene Layout Modeling Relationships Learning Frames Refined Characterization of Objects

Page 12: Extracting Simple Verb Frames from Images

Learning Semantics: Verb Frames

Given primitives, rough layout, and relationships Let’s learn subjects, verb, and objects for

frames:The [S] [V] the [O].

[S],[O]

CARROADCOW

GRASSPERSONAPPLE

[V]

WALKS ONEATS

DRIVES ONJUMPS OVER

THROWS…

Page 13: Extracting Simple Verb Frames from Images

The CAR DRIVES ON the ROAD

Page 14: Extracting Simple Verb Frames from Images

Refined Characterization

We need to know that the white stick is a cigarette…

and where the man’s mouth is…

in order to determine that he’s smoking.

Page 15: Extracting Simple Verb Frames from Images

Refined Object Characterization

Set of “keypoint” landmarks Outline shape defined by connecting contour

[Heitz et al., NIPS 2008b, IJCV in submission]

Page 16: Extracting Simple Verb Frames from Images

Results

Giraffe

Llama Rhino

Page 17: Extracting Simple Verb Frames from Images

Mammals

[Heitz et al., NIPS 2008b, IJCV in submission]Eating Standing

Running Standing

Page 18: Extracting Simple Verb Frames from Images

Activity RecognitionEating

Drinking

2) Extract histogram of “stuff” in a window around the head landmark

1) Localize the landmarks of the cow, including the head.

GrassCow

3) Make a decision

Eating

Page 19: Extracting Simple Verb Frames from Images

Activity Recognition with People

Running Walking Standing

Hitting

Pose of person is one of the important factors Also need to recognize objects person interacts with

Page 20: Extracting Simple Verb Frames from Images

How far can we take this?

Front legs off ground = Jumping

Ball near hands = Throwing

Apple near mouth = Eating

Page 21: Extracting Simple Verb Frames from Images

Does phased learning help?

Cartoon/Caricature

Exaggerates the most salient features of the object class.

Simple BG

Real object with no confusing clutter.

Cluttered BG

Object in standard pose on natural background.

Articulated

Once we have built a strong appearance model, can we learn complicated articulations?

Page 22: Extracting Simple Verb Frames from Images

Our Related Papers G. Elidan, B. Packer, G. Heitz, and D. Koller. Convex Point

Estimation using Undirected Bayesian Transfer Hierarchies. UAI, 2008.

S. Gould, J. Rodgers, D. Cohen, G. Elidan, and D. Koller. Multi-Class Segmentation with Relative Location Prior. IJCV, 2008.

S. Gould, P. Baumstarck, M. Quigley, A. Ng, and D. Koller. Integrating Visual and Range Data for Robotic Object Detection. ECCV Workshop M2SFA2, 2008.

G. Heitz and D. Koller. Learning Spatial Context: Using Stuff to Find Things. ECCV, 2008.

G. Heitz, S. Gould, A. Saxena, and D. Koller. Cascaded Classification Models: Combining Models for Holistic Scene Understanding. NIPS, 2008.

G. Heitz, G. Elidan, B. Packer, and D. Koller. Shape-based Object Localization for Descriptive Classification. NIPS, 2008.