Extracting Simple Verb Frames from Images
description
Transcript of Extracting Simple Verb Frames from Images
![Page 1: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/1.jpg)
Extracting Simple Verb Framesfrom Images
Toward Holistic Scene Understanding
Prof. Daphne Koller Research Group
Stanford University
Geremy HeitzDARPA CLLR Workshop
December 2, 2008
![Page 2: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/2.jpg)
Grand Goal: Scene Understanding
“man wearing a backpack,
smoking a cigarette, walking a dog on a sidewalk”
Man
Dog
Backpack
Cigarette
“A cow walking through the grass
on a pasture by the sea”
![Page 3: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/3.jpg)
Understanding Verb Frames
“a man is walking on a sidewalk”
Primitives Objects Parts Surfaces Regions
Interactions Context Actions
Methods exist to extract these, but we need to both do a better job, and get them all at once
Modeling verb frames requires understanding the interactions between primitives, and which fit well into the framework of graphical models.
Man
Dog
Backpack
Cigarette Building
Sidewalk
“a dog is walking on a sidewalk”
Frame: to walk
![Page 4: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/4.jpg)
Outline Extracting the Primitives Qualitative 3D Scene Layout Modeling Relationships Learning Frames Refined Characterization of Objects
![Page 5: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/5.jpg)
Computer View of a “Scene”
BUILDING
ROAD
STREET
SCENE
![Page 6: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/6.jpg)
Object Detection
= Car
= Person
= Motorcycle= Boat
= Sheep= Cow
Detection Window W
Score(W) > 0.5
![Page 7: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/7.jpg)
Finding the Primitives Jointly
SEASIDEPASTURE
GRASS
SKYGrass = FlatSky = FarFG = Vertical
40% Grass,30% Sky…
1 cow, 2 boats…
[Heitz et al., NIPS 2008a]
![Page 8: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/8.jpg)
Results – TAS ModelContextual DetectorBase Detector
[Heitz et al.,ECCV 2008]
![Page 9: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/9.jpg)
Qualitative 3D Scene Layout
Primitives imply a certain 3D layout of the scene, absolute depth may not be preserved
For example: Sky is a far, vertical plane Water, road are horizontal planes Objects “popup” from the image
![Page 10: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/10.jpg)
Modeling Relationships
Beside
In front ofOn
We have explored how to model 2D relationships
We should be able to extend this to 3D relationships
[Gould et al., IJCV 2008]
[Heitz et al.,ECCV 2008]
![Page 11: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/11.jpg)
Outline Extracting the Primitives Qualitative 3D Scene Layout Modeling Relationships Learning Frames Refined Characterization of Objects
![Page 12: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/12.jpg)
Learning Semantics: Verb Frames
Given primitives, rough layout, and relationships Let’s learn subjects, verb, and objects for
frames:The [S] [V] the [O].
[S],[O]
CARROADCOW
GRASSPERSONAPPLE
…
[V]
WALKS ONEATS
DRIVES ONJUMPS OVER
THROWS…
![Page 13: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/13.jpg)
The CAR DRIVES ON the ROAD
![Page 14: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/14.jpg)
Refined Characterization
We need to know that the white stick is a cigarette…
and where the man’s mouth is…
in order to determine that he’s smoking.
![Page 15: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/15.jpg)
Refined Object Characterization
Set of “keypoint” landmarks Outline shape defined by connecting contour
[Heitz et al., NIPS 2008b, IJCV in submission]
![Page 16: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/16.jpg)
Results
Giraffe
Llama Rhino
![Page 17: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/17.jpg)
Mammals
[Heitz et al., NIPS 2008b, IJCV in submission]Eating Standing
Running Standing
![Page 18: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/18.jpg)
Activity RecognitionEating
Drinking
2) Extract histogram of “stuff” in a window around the head landmark
1) Localize the landmarks of the cow, including the head.
GrassCow
3) Make a decision
Eating
![Page 19: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/19.jpg)
Activity Recognition with People
Running Walking Standing
Hitting
Pose of person is one of the important factors Also need to recognize objects person interacts with
![Page 20: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/20.jpg)
How far can we take this?
Front legs off ground = Jumping
Ball near hands = Throwing
Apple near mouth = Eating
![Page 21: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/21.jpg)
Does phased learning help?
Cartoon/Caricature
Exaggerates the most salient features of the object class.
Simple BG
Real object with no confusing clutter.
Cluttered BG
Object in standard pose on natural background.
Articulated
Once we have built a strong appearance model, can we learn complicated articulations?
![Page 22: Extracting Simple Verb Frames from Images](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681527d550346895dc0ab8e/html5/thumbnails/22.jpg)
Our Related Papers G. Elidan, B. Packer, G. Heitz, and D. Koller. Convex Point
Estimation using Undirected Bayesian Transfer Hierarchies. UAI, 2008.
S. Gould, J. Rodgers, D. Cohen, G. Elidan, and D. Koller. Multi-Class Segmentation with Relative Location Prior. IJCV, 2008.
S. Gould, P. Baumstarck, M. Quigley, A. Ng, and D. Koller. Integrating Visual and Range Data for Robotic Object Detection. ECCV Workshop M2SFA2, 2008.
G. Heitz and D. Koller. Learning Spatial Context: Using Stuff to Find Things. ECCV, 2008.
G. Heitz, S. Gould, A. Saxena, and D. Koller. Cascaded Classification Models: Combining Models for Holistic Scene Understanding. NIPS, 2008.
G. Heitz, G. Elidan, B. Packer, and D. Koller. Shape-based Object Localization for Descriptive Classification. NIPS, 2008.