Representation, Segmentation and Matching of 3D Visual Shapes ...
Representation and Detection of Shapes in Images
Transcript of Representation and Detection of Shapes in Images
![Page 1: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/1.jpg)
Representation and Detection
of Shapes in Images
Pedro F. Felzenszwalb
Department of Computer Science
University of Chicago
![Page 2: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/2.jpg)
Introduction
Study of shape is a recurring theme in computer vision.
• It is important for object recognition.
• Useful for model-based segmentation.
1
![Page 3: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/3.jpg)
Talk outline
1. Representation of objects using triangulated polygons.
2. Finding a non-rigid object in an image.
3. Learning a non-rigid shape model from examples.
4. Shape grammar for modeling generic objects.
2
![Page 4: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/4.jpg)
Triangulated polygon representation
• Consider two-dimensional objects with piecewise-smooth
boundaries and no holes.
• Approximate object using a simple polygon P .
• A triangulation is a decomposition of P into triangles defined
by non-crossing line segments connecting vertices of P .
object polygon triangulation
3
![Page 5: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/5.jpg)
Constrained Delauney triangulation
Natural decomposition of object into parts, closely related to the
medial axis transform.
Definition.The constrained Delauney triangulation contains the
edge ab if a is visible to b and there is a circle through a and b
that contains no vertex c visible to ab.
4
![Page 6: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/6.jpg)
Structural properties
There are two graphs associated with a triangulated polygon.
Dual graph of triangulated simple polygon is a tree.
Graphical structure of triangulation is a 2-tree.
5
![Page 7: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/7.jpg)
2-trees
A 2-tree is a graph defined by a set of “triangles” (3-cliques)
connected along edges in a tree structure.
Every 2-tree admits a perfect elimination order :
After eliminating the first i
vertices, the next one is in a
single triangle.
4
6 11
9
0
5
2
1
3
78
10
6
![Page 8: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/8.jpg)
Two-dimensional shape
How does a triangulation help describe the shape of a polygon?
We say to objects have the same shape if they are related by a
similarity transformation (translation, rotation, scale change).
Different objects with the same shape.
7
![Page 9: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/9.jpg)
Shape of triangulated polygons
Say we have an object defined by the location of n vertices V ,
and G = (V, E) is a 2-tree.
We can pick any shape for each “triangle” in G and obtain a
unique shape for the object.
1
2
0
3
0 1
2
2 1
3+
0 1
2
3=
⇒ Shape of object is a point in M1 × · · · × Mn−2, where each M
is a space of triangle shapes.
8
![Page 10: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/10.jpg)
Deforming triangulated polygons
(x1, . . . , xi, . . . , xn−2) → (x1, . . . , x′i, . . . , xn−2)
The rabbit ear can be bent by changing
the shape of a single triangle.
9
![Page 11: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/11.jpg)
Finding non-rigid objects in images
10
![Page 12: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/12.jpg)
Deformable template matching
Find “optimal” map from a template to the image.
f : →
Quality of f depends on
• how much the template is deformed.
• correlation between the deformed template and image data.
11
![Page 13: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/13.jpg)
Prior work
Most methods are based on local search techniques and depend
on initialization near the right answer.
[Grenander et al.] Deformable boundary models.
[Widrow] Rubber masks.
[Cootes et al.] Active shape and appearance models.
Few methods are based on global optimization.
[Amit, Kong] Sparse landmarks.
[Coughlan et al.] Open curves.
12
![Page 14: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/14.jpg)
Major challenges
• Represent both the boundary and the interior of objects.
• Capture natural shape deformations.
• Efficient matching algorithms:
– Search for the optimal deformation - global minimum of
cost function.
– Initialization-free, invariant to rigid motions and scale.
13
![Page 15: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/15.jpg)
Matching triangulated polygons
• Let T be a triangulation of a simple polygon P .
• Consider continuous maps f :P →R2
that are affine when restricted to each triangle.
– f takes triangles in the model to triangles in the image.
– f is defined by where it sends the vertices of P .
• Quality of f is given by a sum of costs per triangle,
C(f, I) =∑
t∈T
Ct(ft, I)
14
![Page 16: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/16.jpg)
Example cost function
• Deformation cost for each triangle.
• Shape boundary is attracted to high gradient areas.
C(f, I) =∑
t∈T
def(ft) − λ∫∂P
‖(∇I ◦ f)(s) × f ′(s)‖
‖f ′(s)‖ds
def(ft) measures how far ft is from a similarity transformation.
15
![Page 17: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/17.jpg)
Combinatorial optimization
• Restrict f(vi) to be a location li in a grid G.
• Dynamic programming algorithm using elimination order.
• Running time is O(n|G|3), where n is number of vertices.
At step i, find optimal location for
vi as a function of locations of two
other vertices.
vi
ba
16
![Page 18: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/18.jpg)
Matching results
17
![Page 19: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/19.jpg)
Matching results
18
![Page 20: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/20.jpg)
Matching results
Nosy images.
Multiple instances.
19
![Page 21: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/21.jpg)
Contrast with local search method
Initialization:
Result:
20
![Page 22: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/22.jpg)
Learning models
Given multiple examples of an object,
• Pick a common triangulation.
• Learn shape model for each triangle (mean and variance).
a b c
21
![Page 23: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/23.jpg)
Procrustes analysis
Given sample object configurations {X1, . . . , Xm}.
Assume each configuration comes from a mean by perturbation
ǫ and similarity transformation g,
Xj = gj(µ + ǫj)
Procrustes mean shape:
µ̂ = argmin||µ||=1
mingj
m∑j=1
||µ − gjXj||2
22
![Page 24: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/24.jpg)
Learning triangulated models
• Use Procrustes analysis for each possible triangle.
– Local instead of global rigidity assumption.
• Select triangulation that can best represent examples.
– There is a cost associated with each triangle.
(corresponding to how rigid it is)
– Can use dynamic programming to select optimal one.
23
![Page 25: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/25.jpg)
Local versus global rigidity assumption
a b
Procrustes mean Triangulated model
24
![Page 26: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/26.jpg)
Hands
A few of a total of 40 samples of hands from multiple people.
25
![Page 27: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/27.jpg)
Typical deformations of learned model
Random samples from the prior model for hands.
26
![Page 28: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/28.jpg)
Shape grammar
Finding objects in images without using specific models.
• Build a generic shape model to capture
properties of “typical” objects.
• Gestalt laws: continuity, smoothness,
closure, symmetry, etc.
27
![Page 29: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/29.jpg)
Shape grammar
• Define a stochastic growth process that generates triangu-
lated polygons using a context free grammar.
• The grammar can generate any triangulated polygon, but it
tends to generate shapes with certain properties.
– Gives a generic model for objects.
– Captures which are good interpretations of a scene.
28
![Page 30: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/30.jpg)
Shape tokens
t0 t1 t2
0
0
01
1
1 1
112
• t0 corresponds to ends of branches.
• sequences of t1 correspond to branches.
• t2 connects multiple branches together.
29
![Page 31: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/31.jpg)
Growing a shape
• A root triangle of type i is selected with probability pi.
• Each dotted edge “grows” into a new triangle.
• Repeat until there are no more dotted edges.
• For each triangle type there is a distribution over its shape.
30
![Page 32: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/32.jpg)
Structure
• Growth process always terminates when p2 < p0.
• Expected number of triangles is E[n] = 2/(p0 − p2).
• Expected number of branching points is E[j] = 2p2/(p0−p2).
• Together E[n] and E[j] define p0, p1 and p2.
• Typically p1 ≫ p0, p2.
31
![Page 33: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/33.jpg)
Geometry
If t1 is skinny and isosceles,
• shapes have smooth boundaries almost everywhere.
• each branch tends to have axial symmetry.
32
![Page 34: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/34.jpg)
Random shapes
33
![Page 35: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/35.jpg)
Finding objects in images
• Grammar generates random objects, without taking into ac-
count image data.
• Look for triangulated polygons that align with image features
and would likely be generated by the grammar.
– These are good hypotheses for objects in the scene.
• This process generates possible interpretations of a scene
and a separate process can verify each one.
34
![Page 36: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/36.jpg)
Example results
35
![Page 37: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/37.jpg)
Example results
36
![Page 38: Representation and Detection of Shapes in Images](https://reader033.fdocuments.in/reader033/viewer/2022051714/587c97ea1a28ab6b028bd481/html5/thumbnails/38.jpg)
Summary
• Representation of objects using triangulated polygons.
– Detecting deformable objects.
– Learning deformable shape models from examples.
– Detecting generic objects.
• Future work:
– Richer grammars, with intermediate structure.
– Shape classification.
37