Image

Image Acquisition

http://homepages.inf.ed.ac.uk/rbf/IAPR/researchers/D2PAGES/d2tut.htm

Image Acquisition

The first stage of any vision system is the image acquisition stage.

After the image has been obtained, various methods of processing can be applied to the image to perform the many different vision tasks required today.

However, if the image has not been acquired satisfactorily then the intended tasks may not be achievable, even with the aid of some form of image enhancement

2D Image Input

The basic two-dimensional image is a monochrome (greyscale) image which has been digitised.

Describe image as a two-dimensional light intensity function f(x,y) where x and y are spatial coordinates and the value of f at any point (x, y) is proportional to the brightness or grey value of the image at that point.

A digitised image is one where

spatial and greyscale values have been made discrete.

intensity measured across a regularly spaced grid in x and y directions

intensities sampled to 8 bits (256 values).

For computational purposes, we may think of a digital image as a two-dimensional array where x and y index an image point. Each element in the array is called a pixel (picture element). See Figs.1and2.

Fig.1 Greyscale image and highlighted region

Figure: Pixel values in highlighted region

2D Input Devices

TV Camera or Vidicon Tube

A first choice for a two-dimensional image input device may be a television camera -- output is a video signal:

Image focused onto a photoconductive target.

Target scanned line by line horizontally by an electron beam

Electric current produces as the beam passes over target.

Current proportional to the intensity of light at each point.

Tap current to give a video signal.

This form of device has several disadvantages.

Limited resolution

-- finite number of scan lines (about 625) and frame rate (30 or 60 frames per second)

Distortion

--

unwanted persistence between one frame and the next

Non-linear video output with respect to light intensity.

Non-flat target on tube.

CCD Camera

By far the most popular two-dimensional imaging device is the charge-coupled device (CCD) camera.

Single IC device

Consists of an array of photosensitive cells

each cell produces an electric current dependent on the incident light falling on it.

Video Signal Output

Less geometric distortion

More linear Video output.

Frame Stores

Video Signal must be digitised.

A device known as a frame storeor frame grabber usually performs this task. It:

Digitises the incoming video signal

Samples signal into discrete pixels at appropriate intervals -- line by line.

Samples signal into a (8 bit) digital value.

Stores sample frame own memory.

Frame easily transferred to computer memory or a file.

3D imaging

The 3D Image -- Depth Maps

The simplest and most convenient way of representing and storing the depth measurements taken from a scene is a depth map.

A depth map is a two-dimensional array where the x and y distance information corresponds to the rows and columns of the array as in an ordinary image, and the corresponding depth readings (z values) are stored in the array's elements (pixels).

Depth map is like a grey scale image except the z information (float - 32 bytes) replaces the intensity information.

Fig.3 Artificial depth maps

Fig.4 Real depth maps

Why use 3D data?

An 3D image containing has many advantages over its 2D counterpart:

Explicit Geometry

--

2D images give only limited information the physical shape and size of an object in a scene.

3d images express the geometry in terms of three-dimensional coordinates.

e.g Size (and shape) of an object in a scene can be straightforwardly computed from its three-dimensional coordinates.

Recent technological advances ( e.g. in camera optics, CCD cameras and laser rangefinders) have made the production of reliable and accurate three-dimensional depth data possible.

Consequently many three-dimensional data acquisition systems have been developed.

Introduction to Stereo Imaging -- Theory

Let us consider a simplified approach to the mathematics of the problem in order to aid understanding of the tasks involved.

We will consider a set up using two cameras in stereo. -- other methods that involve stereo are similar.

Let's consider a simplified optical set up:

Fig.5 A simplified stereo imaging system

Fig.5 shows:

2 cameras with their optical axes parallel and separated by a distance d.

The line connecting the camera lens centres is called the baseline.

Let baseline be perpendicular to the line of sight of the cameras.

Let the x axis of the three-dimensional world coordinate system be parallel to the baseline

let the origin O of this system be mid-way between the lens centres.

Consider a point (x,y,z), in three-dimensional world coordinates, on an object.

Let this point have image coordinates and in the left and right image planes of the respective cameras.

Let f be the focal length of both cameras, the perpendicular distance between the lens centre and the image plane. Then by similar triangles:

Solving for (x,y,z) gives:

The quantity which appears in each of the above equations is called the disparity.

There are several practical problems with this set up:

Near objects accurately acurately but impossible for far away objects. Normally, d and f are fixed. However, distance is inversely proportional to disparity. Disparity can only be measured in pixel differences.

Disparity is proportional to the camera separation d. This implies that if we have a fixed error in determining the disparity then the accuracy of depth determination will increase with d.

However as the camera separation becomes large difficulties arise in correlating the two camera images.

In order to measure the depth of a point it must be visible to both cameras and we must also be able to identify this point in both images.

As the camera separation increases so do the differences in the scene as recorded by each camera.

Thus it becomes increasingly difficult to match corresponding points in the images.

This problem is known as the stereo correspondence problem.

Methods of Acquisition

Laser Ranging Systems

Laser ranging works on the principle that the surface of the object reflects laser light back towards a receiver which then measures the time (or phase difference) between transmission and reception in order to calculate the depth.

Most laser rangefinders:

Work at long distances (greater than )

Consequently their depth resolution is inadequate for detailed vision tasks.

Shorter range systems exist but still have an inadequate depth resolution (1cm at best) for most practical industrial vision purposes.

Structured Light Methods

Basic idea:

Project patterns of light (grids, stripes, elliptical patterns etc.) onto an object.

Surface shapes are then deduced from the distortions of the patterns that are produced on Object's Surface.

Knowing relevant camera and projector geometry, depth can be inferred by triangulation.

Many methods have been developed using this approach.

Major advantage -- simple to use.

Low spatial resolution -- patterns become sparser with distance.

Some close range (4cm) sensors exist with good depth resolution (around 0.05mm) but have very narrow field of view and close range of operation.

Moire Fringe Methods

The essence of the method is that a grating is projected onto an object and an image is formed in the plane of some reference grating as shown in Fig.6.

The image then interferes with the reference grating to form Moire fringe contour patterns which appear as dark and light stripes, as demonstrated by Fig.7. Analysis of the patterns then gives accurate descriptions of changes in depth and hence shape.

NOTE: Ambiguities arise in interrogating the fringe patterns.

It is not possible to determine whether adjacent contours are higher or lower in depth.

Resolve by moving one of the gratings and taking multiple Moire images.

Reference grating can also be omitted and its effect can be simulated in software.

Moire fringe methods are capable of producing very accurate depth data (resolution to within about 10 microns) but the methods have certain drawbacks.

Methods are relatively computationally expensive.

Surfaces at a large angle are sometimes unmeasurable -- fringe density becomes too dense.

Shape from Shading Methods

Methods based on shape from shading employ photometric stereo techniques to produce depth measurements.

Using a single camera, two or more images are taken of an object in a fixed position but under different lighting conditions.

By studying the changes in brightness over a surface and employing constraints in the orientation of surfaces, certain depth information may be calculated.

Methods based on these techniques are not suited for general three-dimensional depth data acquisition:

Methods are sensitively dependent on the illumination and surface reflectance properties of objects present in the scene.

Methods only work well on objects with uniform surface texture.

It is difficult to infer absolute depth, and only surface orientation is easily inferred.

Methods are mostly used when it is desired to extract surface shape information.

Passive Stereoscopic Methods

Stereoscopy as a technique for measuring range by triangulation to selected locations in a scene imaged by two cameras already -- further details on general stereo configurations in Books.

The primary computational problem of stereoscopy is to find the correspondence of various points in the two images.

This requires:

Reliable extraction of certain features (such as edges or points) from both images

Matching of corresponding features between images.

Both of these tasks are non-trivial and computationally complex.

Passive stereo may not produce depth maps within a reasonable time.

the depth data produced is typically sparse since high level features, such as edges, are used rather than points.

NOTE:

Problems in finding and accurately locating features in each image can be hard.

Care needed not to introduce errors.

Depth measurements accurate to a few millimetres.

One such passive stereo vision system is TINA developed at Sheffield University.

Active Stereoscopic Methods

This Section describes the active stereoscopic subsystem which provides the three-dimensional data to our system for automatically inspecting mechanical parts.

NOTE: Whilst this Section considers some specific active stereo problems, many of the other issues discussed are not specific to any particular three-dimensional data acquisition technique, and will be of general interest.

The main components of the Vision System are illustrated by the schematic diagram in Fig.8.

The vision system consists of:

a matched pair of high sensitivity CCD cameras,

a laser scanner all mounted on an optical bench to reduce vibration.

Initially the cameras of the system must be calibrated in order to

determing the 3D position of them relative to some world coordinates

focal length and lens distortion of the camera (+ lens etc.).

Camera Calibration is described in my book.

Depth maps extracted from the scene by :

Moving the laser stripe across the scene to obtain a series of vertical columns of pixels

Triangulate Pixels to give the required dense depth map. The depth of a point is measured as the distance from one of the cameras, chosen as the master camera.

Knowing the relevant geometry and optical properties of the cameras the depth map is constructed using the following method:

Fig.9 Measuring a depth value

1. For each vertical stripe of laser light form an image of the stripe in the pair of frames from each camera.

2. For each row in the master camera image, search until the stripe is found at point P(i,j), say.

3. Form a three-dimensional line l passing through the centre of the master camera and P(i,j).

4. Construct the epipolar line which is the projection of the line l into the image formed by the other camera. Do this by projecting two arbitrary points and into the image and constructing a line between the two projected points.

5. Search along the epipolar line for the laser stripe. If it is found at , proceed to Step 6.

6. Find the point on line l which corresponds to . Calculate the (x,y,z) coordinates of , and store the z value at position (i,j) corresponding to x and y in the depth map.

The position of the point is easily found by projecting a line from the centre of the secondary camera passing through Q. The intersection of the lines l and gives the coordinates of .

The depth map is formed by using a world coordinate system fixed on the master camera with its origin at .

Fig.10 Depth Map/Image Overlay

Image processing

Image processing is in many cases concerned with taking one array of pixels as input and producing another array of pixels as output which in some way represents an improvement to the original array.

For example, this processing

may remove noise,

improve the contrast of the image,

remove blurring caused by movement of the camera during image acquisition,

it may correct for geometrical distortions caused by the lens.

We will not be considering every image processing technique in this section.

Many such techniques are dealt with in Professor Batchelor's companion course.

Many books, such as Gonzalez and Woods, are devoted to this subject

Image processing methods may be broadly divided into

Real space

methods -- which work by directly processing the input pixel array.

Fourier space

methods -- which work by firstly deriving a new representation of the input data by performing a Fourier transform, which is then processed, and finally, an inverse Fourier transform is performed on the resulting data to give the final output image.

Fourier Methods

Lets consider a 1D Fourier transform example:

Consider a complicated sound such as the noise of a car horn. We can describe this sound in two related ways:

sample the amplitude of the sound many times a second, which gives an approximation to the sound as a function of time.

analyse the sound in terms of the pitches of the notes, or frequencies, which make the sound up, recording the amplitude of each frequency.

Similarly brightness along a line can be recorded as a set of values measured at equally spaced distances apart, or equivalently, at a set of spatial frequency values.

Each of these frequency values is referred to as a frequency component.

An image is a two-dimensional array of pixel measurements on a uniform grid.

This information be described in terms of a two-dimensional grid of spatial frequencies.

A given frequency component now specifies what contribution is made by data which is changing with specified x and y direction spatial frequencies.

What do frequencies mean in an image?

If an image has large values at high frequency components then the data is changing rapidly on a short distance scale. e.g. a page of text

If the image has large low frequency components then the large scale features of the picture are more important. e.g. a single fairly simple object which occupies most of the image.

Smoothing Noise

The idea with noise smoothing is to reduce various spurious effects of a local nature in the image, caused perhaps by

noise in the image acquisition system,

arising as a result of transmission of the image, for example from a space probe utilising a low-power transmitter.

The smoothing can be done either by considering the real space image, or its Fourier transform.

Real Space Smoothing Methods

Extracting Edges from Images

Many edge extraction techniques can be broken up into two distinct phases:

Finding pixels in the image where edges are likely to occur by looking for discontinuities in gradients.

Candidate points for edges in the image are usually referred to as edge points, edge pixels, or edgels.

Linking these edge points in some way to produce descriptions of edges in terms of lines, curves etc.

Each phase in turn will be discussed in the following Sections.

Image

Documents

Transcript of Image