06 robot vision

Robot Vision

Chapter 6.

2

Introduction

Computer vision

– Endowing machines with the means to “see”

Create an image of a scene and extract features

– Very difficult problem for machines

Several different scenes can produce identical images.

Images can be noisy .

Cannot directly „invert‟ the image to reconstruct the scene.

3

Human Vision (1)

4

Human Vision (2)

5

Human Vision (3)

6

Steering an Automobile

ALVINN system [Pomerleau 1991,1993]

– Uses Artificial Neural Network

Used 30*32 TV image as input (960 input node)

5 Hidden node

30 output node

– Training regime: modified “on-the-fly”

A human driver drives the car, and his actual steering

angles are taken as correct labels for the corresponding

inputs.

Shifted and rotated images were also used for training.

– ALVINN has driven for 120 consecutive kilometers at

speeds up to 100km/h.

7

Steering an Automobile-ALVINN

8

Two stages of Robot Vision (1)

Finding out objects in the scene

– Looking for “edges” in the image Edge:a part of the image across which the image intensity or some other

property of the image changes abruptly.

– Attempting to segment the image into regions. Region:a part of the image in which the image intensity or some other

property of the image changes only gradually.

9


Image processing stage

– Transform the original image into one that is more

amendable to the scene analysis stage.

– Involves various filtering operations that help reduce

noise, accentuate edges, and find regions.

Scene analysis stage

– Attempt to create an iconic or a feature-based

description of the original scene, providing the task-

specific information.

10


Scene analysis stage produces task-specific

information.

– If only the disposition of the blocks is important, appropriate

iconic model can be (C B A FLOOR)

– If it is important to determine whether there is another block on

top of the block labeled C, adequate description will include the

value of a feature, CLEAR_C.

11

Averaging (1)

Original image can be represented as an m*n array of

numbers. The numbers represent the light intensities

at corresponding points in the image.

Certain irregularities in the image can be smoothed by

an averaging operation.

Averaging operation involves sliding an averaging

widow all over the image array.

12

Averaging (2)

Smoothing operation thickens broad lines and eliminates thin lines and

small details.

The averaging window is centered at each pixel, and the weighted sum

of all the pixel numbers within the averaging window is computed. This

sum then replaces the original value at that pixel.

13

Averaging (3)

Common function used for smoothing is a Gaussian of

two dimensions.

Convolving an image with a Gaussian is equivalent to

finding the solution to a diffusion equation when the

initial condition is given by the image intensity field.

14

Averaging (4)

15

Edge enhancement (1)

Edge: any boundary between parts of the image with

markedly different values of some property.

Edges are often related to important object properties.

Edges in the image occur at places where the second

derivative of the image intensity is zero.

16

Edge enhancement (2)

17

Combining Edge Enhancement with Averaging (1)

Edge enhancement alone would tend to emphasize

noise elements along with enhancing edges.

To be less sensitive to noise, both operations are

needed. (First averaging and then edge enhancing)

We can convolve the one-dimensional image with the

second derivative of a Gaussian curve to combine

both operation.

18

Combining Edge Enhancement with Averaging (2)

Laplacian is second-derivate-type operation that enhances edges of any

orientation.

Laplacian of the two-dimensional Gaussian function looks like an upside-

down hat, often called a sombrero function.

Entire averaging/edge-finding operation can be achieved by convolving

the image with the sombrero function(Called Laplacian filtering)

19

6.4.4 Finding Region

Another method for processing image

to find “regions”

Finding regions Finding outlines

20

A region of the image

A region is homogeneous.

– The difference in intensity values of pixels in the region is no

more than some

– A polynomial surface of degree k can be fitted to the intensity

values of pixels in the region with largest error less than

For no two adjacent regions is it the case that the union of all the

pixels in these two regions satisfies the homogeneity property.

Each region corresponds to a world object or a meaningful part of

one.

21

Split-and-merge method

1. The algorithm begins with just one candidate region,

the whole image.

2. Until no more splits need be made.

1. For all candidate regions that do not satisfy the

homogeneity property, are each split into four equal-

sized candidate regions.

3. Adjacent candidate regions are merged if their pixels

satisfying homogeneity property.

23

Regions Found by Split Merge for a Grid-World Scene (from Fig.6.12)

24

“Cleaned Up” the regions found by Split-and-merge method

Eliminating very small regions (some of which are

transitions between larger regions).

Straightening bounding lines.

Taking into account the known shapes of objects likely

to be in the scene.

25

6.4.5 Using Image Attributes Other Than Intensity

Image attributes other than the homogeneity

Visual texture

fine-grained variation of the surface reflectivity of

the objects

Ex) a field of grass, a section of carpet, foliage in

tree, the fur of animals

The reflectivity variations in objects cause

similar fine-grained structure in image intensity.

26

Methods for analyzing texture

Structural methods

– Represent regions in the image by a tessellation (花纹) of

primitive “texels” –small shapes comprising black and white

parts

Statistical methods

– Based on the idea that image texture is best described by a

probability distribution for the intensity values over regions of

the image.

– Ex) an image of a grassy field in which the blades of grass are

oriented vertically

a probability distribution that peaks for thin, vertically

oriented regions of high intensity, separated by regions of low

intensity

27

Other attributes

If we had a direct way to measure the range from the camera to

objects in the scene, we could produce a “range image” and look

for abrupt range differences.

– Range image : each pixel value represents the distance from the

corresponding point in the scene to the camera.

Motion, color

28

6.5 Scene Analysis (1)

Scene Analysis

– Extracting from the image the needed information about the scene

– Requires either additional images (for stereo vision) or general information about the kinds of scenes, since the scene-to-image transformation is many-to-one.

The required knowledge

– very general or quite specific

– explicit or implicit

29

6.5 Scene Analysis (2)

Knowledge of surface reflectivity characteristics and shading of intensity in the image

give information about the shape of smooth objects in the scene.

Iconic scene analysis

– Build a model of the scene or parts of the scene

Feature-based scene analysis

– Extracts features of the scene needed by task

– Task-oriented or purposive vision

30

6.5.1 Interpreting Lines and Curves in the Image

Interpreting the line drawing

– Association between scene properties and the

components of a line drawing

Trihedral vertex polyhedra

The scene to contain only

planar surfaces such that no

more than three surfaces

intersect in a point

31

Three kinds of edges in Trihedral vertex polyhedra (1/2)

There are only three kinds of ways in which two planes can

intersect in a scene edge.

– Occlude

One kind of edge is formed by two planes, with

one of them occluding the other.

labeled in Fig. 6.15 with arrows ().

the arrowhead pointing along the edge such

that surface doing the occluding is to the right of

the arrow.

32

Three kinds of edges in Trihedral vertex polyhedra (2/2)

– Blade

Two planes can intersect such that both planes

are visible in the scene.

Two surfaces form a convex edge.

Labeled with pluses (+).

– Ford

Edge is concave.

Labeled with minus ()

33

Labels for Lines at Junctions

34

Line-labeling scene analysis (1/2)

1. Labeling all of the junctions in the image as V, W, Y, or T junctions according to the shape of the junctions

in the image

35

Line-labeling scene analysis (2/2)

2. Assign +, , or labels to the lines in the image.

An image line that connects two junctions must have a

consistent labeling.

If there is no consistent labeling,

there must have been some error in converting the image

into a line drawing.

the scene must no have been one of trihedral polyhedra.

Constraint satisfaction problem

36

6.5.2 Model-Based Vision (1/2)

If, we knew that the scene contained a parallelepiped (in Figure

6.15), we could attempt to fit a projection of a parallelepiped to

components of an image of this scene.

A generalized

cylinders as building

blocks for model

construction

Each cylinder has

9 parameters.

37

Model-Based Vision (2/2)

An example rough scene

reconstruction of a human

figure

– Hierarchical representation

– Each cylinder in the model

can be articulated into a

set of smaller cylinders

38

6.6 Stereo Vision and Depth Information

Depth information can be obtained using stereo vision, which based on triangulation calculations using two (or more) images.

Some depth information can be extracted from a single image.

– The analysis of texture in the image can indicate that some elements in the scene are closer than are others.

– More precise depth information; If we know that a perceived object is on the floor and the camera height above the floor, we can calculate the distance to the object.

39

Depth Calculation from a Single Image

40

Stereo Vision

Stereo vision uses triangulation.

Two lenses whose centers are separated by a baseline, b.

The image point of a scene point, at distance d, created by these

lenses.

The angles of these image points from the lens centers, , .

The optical axes are parallel, the image planes are coplanar, and the

scene point is in the same plane as that formed by two parallel optical

axes.

41

Triangulation in Stereo Vision

42

The main complication

In scenes containing more than one point, it must be

established which pair of points in the two images

correspond to the same scene point.

We must be able to identify a corresponding pixel in

the other image. correspondence problem

43

Techniques for correspondence problem

Geometric analysis reveals that we need only search

along one dimension (epipolar line).

One-dimensional searches can be implemented by

cross-correlation of two image intensity profiles along

corresponding epipolar lines.

We do not have to find correspondences between

individual pairs of image points but can do so

between pairs of larger image components, such as

lines.

44

Assignments

Page 111~112

– Ex.6.2, Ex. 6.3

06 robot vision

Art & Photos

Transcript of 06 robot vision